Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Understanding HKDF

NIST opened public comments on SP 800-108 Rev. 1 (the NIST recommendations for Key Derivation Functions) last month. The main thing that’s changed from the original document published in 2009 is the inclusion of the Keccak-based KMAC alongside the incumbent algorithms.

One of the recommendations of SP 800-108 is called “KDF in Counter Mode”. A related document, SP 800-56C, suggests using a specific algorithm called Hkdf instead of the generic Counter Mode construction from SP 800-108–even though they both accomplish the same goal.

Isn’t standards compliance fun?

Interestingly, HKDF isn’t just an inconsistently NIST-recommended KDF, it’s also a common building block in a software developer’s toolkit which sees a lot of use in different protocols.

Unfortunately, the way HKDF is widely used is actually incorrect given its formal security definition. I’ll explain what I mean in a moment.

Art: Scruff

How Developers Understand and Use HKDF

If you’re a software developer working with cryptography, you’ve probably seen an API in the crypto module for your programming language that looks like this, or maybe this.

hash_hkdf(
    string $algo,
    string $key,
    int $length = 0,
    string $info = "",
    string $salt = ""
): string

Software developers that work with cryptography will typically think of the HKDF parameters like so:

  • $algo — which hash function to use
  • $key — the input key, from which multiple keys can be derived
  • $length — how many bytes to derive
  • $info — some arbitrary string used to bind a derived key to an intended context
  • $salt — some additional randomness (optional)

The most common use-case of HKDF is to implement key-splitting, where a single input key (the Initial Keying Material, or IKM) is used to derive two or more independent keys, so that you’re never using a single key for multiple algorithms.

See also: defuse/php-encryption, a popular PHP encryption library that does exactly what I just described.

At a super high level, the HKDF usage I’m describing looks like this:

class MyEncryptor {

protected function splitKeys(CryptographyKey $key, string $salt): array 
{
    $encryptKey = new CryptographyKey(hash_hkdf(
        'sha256',
        $key->getRawBytes(),
        32, 
        'encryption',
        $salt
    ));
    $authKey = new CryptographyKey(hash_hkdf(
        'sha256',
        $key->getRawBytes(),
        32,
        'message authentication',
        $salt
    ));
    return [$encryptKey, $authKey];
}

public function encryptString(string $plaintext, CryptographyKey $key): string
{
    $salt = random_bytes(32);
    [$encryptKey, $hmacKey] = $this->splitKeys($key, $salt);
    // ... encryption logic here ...
    return base64_encode($salt . $ciphertext . $mac);
}

public function decryptString(string $encrypted, CryptographyKey $key): string
{
    $decoded = base64_decode($encrypted);
    $salt = mb_substr($decoded, 0, 32, '8bit');
    [$encryptKey, $hmacKey] = $this->splitKeys($key, $salt);
    // ... decryption logic here ...
    return $plaintext;
}

// ... other method here ...

}

Unfortunately, anyone who ever does something like this just violated one of the core assumptions of the HKDF security definition and no longer gets to claim “KDF security” for their construction. Instead, your protocol merely gets to claim “PRF security”.

Art: Harubaki

KDF? PRF? OMGWTFBBQ?

Let’s take a step back and look at some basic concepts.

PRF: Pseudo-Random Functions

A pseudorandom function (PRF) is an efficient function that emulates a random oracle.

“What the hell’s a random oracle?” you ask? Well, Thomas Pornin has the best explanation for random oracles:

A random oracle is described by the following model:

  • There is a black box. In the box lives a gnome, with a big book and some dice.
  • We can input some data into the box (an arbitrary sequence of bits).
  • Given some input that he did not see beforehand, the gnome uses his dice to generate a new output, uniformly and randomly, in some conventional space (the space of oracle outputs). The gnome also writes down the input and the newly generated output in his book.
  • If given an already seen input, the gnome uses his book to recover the output he returned the last time, and returns it again.

So a random oracle is like a kind of hash function, such that we know nothing about the output we could get for a given input message m. This is a useful tool for security proofs because they allow to express the attack effort in terms of number of invocations to the oracle.

The problem with random oracles is that it turns out to be very difficult to build a really “random” oracle. First, there is no proof that a random oracle can really exist without using a gnome. Then, we can look at what we have as candidates: hash functions. A secure hash function is meant to be resilient to collisions, preimages and second preimages. These properties do not imply that the function is a random oracle.

Thomas Pornin

Alternatively, Wikipedia has a more formal definition available to the academic-inclined.

In practical terms, we can generate a strong PRF out of secure cryptographic hash functions by using a keyed construction; i.e. HMAC.

Thus, as long as your HMAC key is a secret, the output of HMAC can be generally treated as a PRF for all practical purposes. Your main security consideration (besides key management) is the collision risk if you truncate its output.

Art: LvJ

KDF: Key Derivation Functions

A key derivation function (KDF) is exactly what it says on the label: a cryptographic algorithm that derives one or more cryptographic keys from a secret input (which may be another cryptography key, a group element from a Diffie-Hellman key exchange, or a human-memorable password).

Despite what you may read online, KDFs do not need to be built upon cryptographic hash functions, specifically; but in practice, they often are. Notable counter-examples to this hash function assumption:

  • Bcrypt (Password KDF) uses Blowfish’s key schedule to derive a key, which then encrypts the string OrpheanBeholderScryDoubt with the derived key 64 times. Blowfish is a block cipher, not a hash function.
  • CMAC in Counter Mode (from NIST SP 800-108) uses AES-CMAC, which is a variable-length input variant of CBC-MAC. CBC-MAC uses a block cipher, not a hash function.

Regardless of the construction, KDFs use a PRF under the hood, and the output of a KDF is supposed to be a uniformly random bit string.

Art: LvJ

The HKDF Algorithm

HKDF is an HMAC-based KDF. Its algorithm consists of two distinct steps:

  1. HKDF-Extract uses the Initial Keying Material (IKM) and Salt to produce a Pseudo-Random Key (PRK).
  2. HKDF-Expand actually derives the keys using PRK, the info parameter, and a counter (from 0 to 255) for each hash function output needed to generate the desired output length.

If you’d like to see an implementation of this algorithm, defuse/php-encryption provides one (since it didn’t land in PHP until 7.1.0). Alternatively, there’s a Python implementation on Wikipedia that uses HMAC-SHA256.

This detail about the two steps will matter a lot in just a moment.

Art: Swizz

How HKDF Salts Are Misused

The HKDF paper, written by Hugo Krawczyk, contains the following definition (page 7).

The paper goes on to discuss the requirements for authenticating the Salt over the communication channel, lest the attacker have the ability to influence it.

A subtle detail of this definition is that the security definition says that A salt value , not Multiple salt values.

Which means: You’re not supposed to use HKDF with a constant IKM, info label, etc. but vary the salt for multiple invocations. The salt must either be a fixed random value, or NULL.

The HKDF RFC makes this distinction even less clear when it argues for random salts.

We stress, however, that the use of salt adds significantly to the strength of HKDF, ensuring independence between different uses of the hash function, supporting “source-independent” extraction, and strengthening the analytical results that back the HKDF design.

Random salt differs fundamentally from the initial keying material in two ways: it is non-secret and can be re-used. As such, salt values are available to many applications. For example, a pseudorandom number generator (PRNG) that continuously produces outputs by applying HKDF to renewable pools of entropy (e.g., sampled system events) can fix a salt value and use it for multiple applications of HKDF without having to protect the secrecy of the salt. In a different application domain, a key agreement protocol deriving cryptographic keys from a Diffie-Hellman exchange can derive a salt value from public nonces exchanged and authenticated between communicating parties as part of the key agreement (this is the approach taken in [IKEv2]).

RFC 5869, section 3.1

Okay, sure. Random salts are better than a NULL salt. And while this section alludes to “[fixing] a salt value” to “use it for multiple applications of HKDF without having to protect the secrecy of the salt”, it never explicitly states this requirement. Thus, the poor implementor is left to figure this out on their own.

Thus, because it’s not using HKDF in accordance with its security definition, many implementations (such as the PHP encryption library we’ve been studying) do not get to claim that their construction has KDF security.

Instead, they only get to claim “Strong PRF” security, which you can get from just using HMAC.

Art: LvJ

What Purpose Do HKDF Salts Actually Serve?

Recall that the HKDF algorithm uses salts in the HDKF-Extract step. Salts in this context were intended for deriving keys from a Diffie-Hellman output, or a human-memorable password.

In the case of [Elliptic Curve] Diffie-Hellman outputs, the result of the key exchange algorithm is a random group element, but not necessarily uniformly random bit string. There’s some structure to the output of these functions. This is why you always, at minimum, apply a cryptographic hash function to the output of [EC]DH before using it as a symmetric key.

HKDF uses salts as a mechanism to improve the quality of randomness when working with group elements and passwords.

Extending the nonce for a symmetric-key AEAD mode is a good idea, but using HKDF’s salt parameter specifically to accomplish this is a misuse of its intended function, and produces a weaker argument for your protocol’s security than would otherwise be possible.

How Should You Introduce Randomness into HKDF?

Just shove it in the info parameter.

Art: LvJ

It may seem weird, and defy intuition, but the correct way to introduce randomness into HKDF as most developers interact with the algorithm is to skip the salt parameter entirely (either fixing it to a specific value for domain-separation or leaving it NULL), and instead concatenate data into the info parameter.

class BetterEncryptor extends MyEncryptor {

protected function splitKeys(CryptographyKey $key, string $salt): array 
{
    $encryptKey = new CryptographyKey(hash_hkdf(
        'sha256',
        $key->getRawBytes(),
        32, 
        $salt . 'encryption',
        '' // intentionally empty
    ));
    $authKey = new CryptographyKey(hash_hkdf(
        'sha256',
        $key->getRawBytes(),
        32,
        $salt . 'message authentication',
        '' // intentionally empty
    ));
    return [$encryptKey, $authKey];
}

}

Of course, you still have to watch out for canonicalization attacks if you’re feeding multi-part messages into the info tag.

Another advantage: This also lets you optimize your HKDF calls by caching the PRK from the HKDF-Extract step and reuse it for multiple invocations of HKDF-Expand with a distinct info. This allows you to reduce the number of hash function invocations from to (since each HMAC involves two hash function invocations).

Notably, this HKDF salt usage was one of the things that was changed in V3/V4 of PASETO.

Does This Distinction Really Matter?

If it matters, your cryptographer will tell you it matters–which probably means they have a security proof that assumes the KDF security definition for a very good reason, and you’re not allowed to violate that assumption.

Otherwise, probably not. Strong PRF security is still pretty damn good for most threat models.

Art: LvJ

Closing Thoughts

If your takeaway was, “Wow, I feel stupid,” don’t, because you’re in good company.

I’ve encountered several designs in my professional life that shoved the randomness into the info parameter, and it perplexed me because there was a perfectly good salt parameter right there. It turned out, I was wrong to believe that, for all of the subtle and previously poorly documented reasons discussed above. But now we both know, and we’re all better off for it.

So don’t feel dumb for not knowing. I didn’t either, until this was pointed out to me by a very patient colleague.

“Feeling like you were stupid” just means you learned.
(Art: LvJ)

Also, someone should really get NIST to be consistent about whether you should use HKDF or “KDF in Counter Mode with HMAC” as a PRF, because SP 800-108’s new revision doesn’t concede this point at all (presumably a relic from the 2009 draft).

This concession was made separately in 2011 with SP 800-56C revision 1 (presumably in response to criticism from the 2010 HKDF paper), and the present inconsistency is somewhat vexing.

(On that note, does anyone actually use the NIST 800-108 KDFs instead of HKDF? If so, why? Please don’t say you need CMAC…)



This post first appeared on Dhole Moments - Software, Security, Cryptography, And The Furry Fandom, please read the originial post: here

Share the post

Understanding HKDF

×

Subscribe to Dhole Moments - Software, Security, Cryptography, And The Furry Fandom

Get updates delivered right to your inbox!

Thank you for your subscription

×