Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Putting the “Fun” in “Hash Function”

There are several different methods for securely hashing a password server-side for storage and future authentication. The most common one (a.k.a. the one that FIPS allows you to use, if compliance matters for you) is called PBKDF2. It stands for Password-Based Key Derivation Function #2.

Why #2? It’s got nothing to do with pencils. There was, in fact, a PBKDF1! But PBKDF1 was fatally insecure in a way I find very interesting. This StackOverflow answer is a great explainer on the difference between the two.

Very Hand-Wavy Description of a Hash Function

Let’s defined a Hash Function as any one-way transformation of some arbitrary-length string () to a fixed-length, deterministic, pseudo-random output.

Note: When in doubt, I err on the side of being easily understood by non-experts over pedantic precision. (Art by Swizz)

For example, this is a dumb hash function (uses SipHash-2-4 with a constant key):

function dumb_hash(string $arbitrary, bool $raw_binary = false): string
    $h = sodium_crypto_shorthash($arbitrary, 'SoatokDreamseekr');
    if ($raw_binary) {
        return $h;
    return sodium_bin2hex($h);

You can see the output of this function with some sample inputs here.

Properties of Hash Functions

A hash function is considered secure if it has the following properties:

  1. Pre-image resistance. Given , it should be difficult to find .
  2. Second pre-image resistance. Given , it should be difficult to find such that
  3. Collision resistance. It should be difficult to find any arbitrary pair of messages () such that

That last property, collision resistance, is guaranteed up to the Birthday Bound of the hash function. For a hash function with a 256-bit output, you will expect to need on average trial messages to find a collision.

Exploring PBKDF1’s Insecurity

If you recall, hash functions map an arbitrary-length string to a fixed-length string. If your input size is larger than your output size, collisions are inevitable (albeit computationally infeasible for hash functions such as SHA-256).

But what if your input size is equal to your output size, because you’re taking the output of a hash function and feeding it directly back into the same hash function?

Then, as explained here, you get an depletion of the possible outputs with each successive iteration.

But what does that look like?

Without running the experiments on a given hash function, there are two possibilities that come to mind:

  1. Convergence. This is when ) will, for two arbitrary messages and a sufficient number of iterations, converge on a single hash output.
  2. Cycles. This is when for some integer .

The most interesting result would be a quine, which is a cycle where for some integer (that is to say, .

The least interesting result would be for random inputs to converge into large cycles (e.g. cycles of size for a 256-bit hash function).

Conjecture: I would expect secure cryptographic hash functions in use today (e.g. SHA-256) to lean towards the least interesting output.

An Experiment Design

Since I don’t have an immense amount of cheap cloud computing at my disposal to run this experiments on a real hash function, I’m going to cheat a little and use my constant-key SipHash code from earlier in this post. In future work, cryptographers may find studying real hash functions (e.g. SHA-256) worthwhile.

Given that SipHash is a keyed pseudo-random function with an output size of 64 bits, my dumb hash function can be treated as a 64-bit hash function.

This means that you should expect your first collision (with 50% probability) after only trial hashes. This is cheap enough to try on a typical laptop.

Here’s a simple experiment for convergence:

  1. Generate two random strings .
  2. Set .
  3. Iterate until .

You can get the source code to run this trivial experiment here.

Clone the git repository, run composer install, and then php bin/experiment.php. Note that this may need to run for a very long time before you get a result.

If you get a result, you’ve found a convergence.

If the loop doesn’t terminate even after 2^64 iterations, you’ve definitely found a cycle. (Actually detecting a cycle and analyzing its length would require a lot of memory, and my simple PHP script isn’t suitable for this effort.)

“What do you mean I don’t have a petabyte of RAM at my disposal?”

What Does This Actually Tell Us?

The obvious lesson: Don’t use surjective functions when designing a key derivation function.

But beyond that, unless you can find a hash function that reliably converges or produces short cycles (, for an n-bit hash function), not much. (This is just for fun, after all!)

Definitely fun to think about though! (Art by circuitslime)

If, however, a hash function is discovered to produce interesting results, this may indicate that the chosen hash function’s internal design is exploitable in some subtle ways that, upon further study, may lead to better cryptanalysis techniques. Especially if a hash quine is discovered.

(Header art by Khia)

This post first appeared on Dhole Moments - Software, Security, Cryptography, And The Furry Fandom, please read the originial post: here

Share the post

Putting the “Fun” in “Hash Function”


Subscribe to Dhole Moments - Software, Security, Cryptography, And The Furry Fandom

Get updates delivered right to your inbox!

Thank you for your subscription