#22 Sudo Explain This: Encoding, Encryption, and Hashing for Dummies

Confusing encoding, encryption, and hashing is surprisingly common. While mix-ups can be amusing, they often lead to serious security issues.

Aug 17, 2024

Encoding, encryption, and hashing: the three musketeers of data transformation. I have lost count of how many times these digital daredevils were mistaken for one another. While each has its own usage, they are often thrust into roles they were never meant to play.

It is like asking a fish to climb a tree – technically possible but hardly ideal.

Let’s have a look at each of them.

Encoding (The Translator)

Purpose: Converting data into a specific format for proper representation.
Key Trait: Transforms data without hiding its content.
Reversibility: Easily reversible, like flipping a pancake.
Security Level: About as secure as a cookie jar with a transparent lid.

One example of encoding is Base64. It converts binary data into a text format using 64 different characters. A-Z (26 characters), a-z (26 characters), 0-9 (10 characters), +, and / (2 characters).

If someone wants to decode the encoded text, they can do it without problems—they don’t need any secret to do that. That’s why it is a very bad idea to use it to store passwords (I have seen it many times) or to mask sensitive data.

Encoding is mainly used to convert binary data into a text format for sending over text-based protocols (e.g., email).

Encryption (The Locksmith)

Purpose: Protecting data from unauthorized access.
Key Trait: Uses complex algorithms and keys to scramble data.
Reversibility: Reversible with the right key, like a well-oiled lock.
Security Level: Fort Knox, if implemented correctly.

Encryption can be done using two methods:

Symmetric. This method is based on a single (private) key. The sender uses this key to encrypt the plain text into cipher text, and the recipient uses the same key to decrypt the cipher text back into plain text. An example of a symmetric encryption algorithm is AES (Advanced Encryption Standard).
Asymmetric. This method is based on two keys: public and private. The public key is available to anyone who wants to encrypt plain text, but the private key is only accessible to authorized recipients. This way, sharing the same key for encryption and decryption is unnecessary. However, asymmetric encryption is slower than the symmetric one. An example of an asymmetric encryption algorithm is RSA (Rivest–Shamir–Adleman).

Let’s examine the encryption of the text “SecretText” using the symmetric method and AES algorithm with a 256-bit key.

Anyone who wants to read the encrypted text must decrypt it; to do so, they need to know the key. Otherwise, the original text will not be readable.

Encryption is often used to secure conversations (e.g., in WhatsApp) or mask the application's sensitive data.

Hashing (The Fingerprint)

Purpose: Creating unique, fixed-size "fingerprints" of data.
Key Trait: Produces consistent output for the same input.
Reversibility: One-way street - no turning back.
Security Level: What goes in stays in.

It’s time to check what the “SecretText” looks like after hashing. I will use SHA-256 for hashing.

Hashing is a one-way process, making it theoretically impossible to revert a hashed value back to its original form. However, this doesn't mean hashed passwords are entirely secure.

While hashing is irreversible, attackers can still attempt to guess the original text through various methods:

Dictionary attacks. Hackers use lists of common words, passwords, and phrases.
Rainbow tables. Pre-computed tables of hash values for common strings.

When a database of hashed passwords is compromised, attackers compare these hashes against their pre-computed values. If a match is found, they have discovered the user's password.

Simple or common passwords (like "qwerty", "password1", or "admin1") are particularly vulnerable, as their hashes are often included in these pre-computed lists.

Moreover, if only basic hashing is used without additional security measures, finding one user's password allows attackers to quickly identify other users with the same password by looking for matching hashes.

It is common practice to enhance security by adding a "salt" to passwords before hashing. A salt is a random string concatenated with the password, creating a unique input for the hash function. Even if a user chooses a simple password like "qwerty", the resulting hash will be completely different from the hash of the plain "qwerty":

Hash (SHA-256) of plain “qwerty” is 65e84be33532fb784c48129675f9eff3a682b27168c0ea744b2cf58ee02337c5.
Hash (SHA-256) of “qwerty” concatenated with salt “98b670de7eab” is 37c54ea7d9c7f8b37655e6efa27b27b42306170908c491cbca3cd09c345f7e9a.

Using salts, we can also ensure that when two users pick the same password, their hashed passwords look completely different. This is because each user gets their own unique salt.

Do you have any funny stories about misinterpreting encoding, encryption, and hashing?

Fractional Architect

Discussion about this post