The MySQL MD5 function is used to return an MD5 128-bit checksum representation of a string. The MD5 message-digest algorithm is a widely used hash function producing a 128-bit hash value. The value returned by the MD5 function is a binary string of 32 hexadecimal digits, or NULL if the argument was NULL.
While reading about Amazon S3 API documentation to find out how Amazon S3 does the integrity check on objects to verify that the data is the same data that was originally sent you might come across the statement The base64-encoded 128-bit MD5 digest of the message
and it may not make much sense or you may wonder how this is different than an MD5 hash you can calculate with standard md5
command.
MD5, as you told is a hash function. It generates a 128 bit possibly unique bit pattern for any input. This makes it suitable for verifying the integrity of data transferred between a sender and receiver, which would simply require comparison of the hashes of the data at the sender and receiver. To represent 16 bytes (128 bits), we need a 32 digit hexadecimal number. If you go back and check the output of md5 command above, you will see it is exactly 32 digits long. Base64 Encoding. Base64 encoding is used to represent binary data in an ASCII string. Base64 encoding is used commonly in HTTP requests and headers.
It is pretty easy and relatively fast to calculate md5sum for a file.
What is MD5?
The MD5 message-digest algorithm is a widely used cryptographic hash function producing a 128-bit (16-byte) hash value, typically expressed in text format as a 32 digit hexadecimal number. Regardless of the file size, an md5 hash is always 128 bits.
128-bit MD5 digest
statement in Amazon S3 documentation, just implies that MD5 digest is 128 bit long as defined in the RFC.
What is Hexadecimal?
In mathematics and computing, hexadecimal (also base 16, or hex) is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F (or alternatively a, b, c, d, e, f) to represent values ten to fifteen.
Examples for Hexadecimal (base 16) to decimal (base 10) conversion
MD5 Hash and Hexadecimal
MD5 hash has 128 bits which is 16 bytes. Biggest decimal value that 1 byte (8 bits) can hold is 255. From the chart above, we know hexadecimal FF represents 255 as well (16 x 15 + 15).
We need 2 digit hexadecimal number (FF) to represent max value in a byte which is 1111 1111
. To represent 16 bytes (128 bits), we need a 32 digit hexadecimal number. If you go back and check the output of md5
command above, you will see it is exactly 32 digits long.
Base64 Encoding
Base64 encoding is used to represent binary data in an ASCII string. Base64 encoding is used commonly in HTTP requests and headers. Check the wiki page for base64 encoding here, to find out the interesting calculation on how grous of 6 bits are converted to individual numbers and how padding is done when the number of bytes is not divisible by three.
Calculating Base64-Encoded MD5
The output of md5
command produces a 32 digit long hexadecimal text which is ASCII encoded and base64
command calculates base64 encoded string.
Check the command below and try to figure out why it is not going to produce what we are really looking for.
Here is why. We need hexadecimal value of md5sum, instead with md5
command we are getting an ASCII text representing the hexadecimal value. Remembering base64 is used to represent binary data in ASCII, we need to find binary value of md5 result. Below you can find the command which will give the right base64 encoded md5 hash.
What if you already have md5sums calculated for bunch of files and don’t want to calculate these but instead just convert to base64?
xxd
command can convert a hexadecimal string to binary value and you can use this binary to calculate base64 as seen below.
As a way to verify the output, we see that base64 encoded text above matches the one we found using openssl.
Conclusion
I hope Base64 Encoded 128-bit MD5 Digest
is a clear statement now.
It is important to note that, AWS SDK for JavaScript has S3 ManagedUpload API (similar to Java TransferManager or Go s3Manager) which calculates base64-encoded md5 digest and passes it in content-md5
automatically for you if you set computeChecksums
option to true
. If you choose to use PUT API, you will have to calculate the value before passing it as an option for the API call.
If you want to keep MD5 hash value of the file in Amazon S3 along with the object, you can post it in user metadata of the object. If you choose to do so, you don’t really have to calculate base64 value but pass md5 hash as it is if you like. Since you will be interpreting user metadata, it is up to you to decide in which format / encoding you want to store it.
In this tutorial, we will learn to encode a string using the MD5 algorithm in Python language. MD5 which is also known as the message-digest algorithm is used to produce a 128-bit hash value. This hashing algorithm is a one-way cryptographic function which takes input of any size and produces an output message digest of fixed size i.e. 128 bits.
MD5 hash using Python
Python consists of a library hashlib which is a common interface to various hashing and message digest algorithms. It includes the MD5 algorithm, secure hash algorithms like SHA1, SHA224, SHA256, and SHA512. In this tutorial, we will use this library to generate the message digest for the input string.
The code illustrated below takes an input string and the hash function encodes it. We get the byte equivalent of the string using the digest() function and finally, print it.
This tutorial will also help you: Secure Hash Algorithm (SHA) in Python
Md5 Hash Converter
Python program to encode a string in MD5
We can also generate the hexadecimal equivalent of the encoded value using the code mentioned below. In this, we use hexdigest() to generate the hexadecimal value instead of digest().
Md5 Encoding Python
You may also read: