[remark] Misusing random oracles for practical purposes -- Volution Notes

Why this article? Because I believe that cryptographic "random oracles" are such a wonderful construct that should be known and used even outside the theoretical cryptography domain, where they are used mainly to prove the security of various cryptographic constructs "in the random oracle model".

(And because, as feedback to my article from yesterday about multi-factor encryption, I've received a few comments that stated I've made a mess of things, mixing "theoretical constructs such as oracles used in proofs" with real world encryption.)

What are cryptographic "random oracles"?

In cryptography, a random oracle is an oracle (a theoretical black box) that responds to every unique query with a (truly) random response chosen uniformly from its output domain. If a query is repeated, it responds the same way every time that query is submitted.
Stated differently, a random oracle is a mathematical function chosen uniformly at random, that is, a function mapping each possible query to a (fixed) random response from its output domain.
[Wikipedia article on "Random oracle"]

Why are they important, and how are they used?

I strongly recommend reading Matthew Green's series on the topic:

[...] Even though everyone is allowed to compute the hash function, they can't do it by themselves. Hashing is done by a new magical party called the "oracle". If any party needs to hash something, they ship the message (over an imaginary secure channel) to the oracle, who computes the hash and sends it back to them.
[...] We'll place two restrictions on the random oracle. First, it must implement a genuine random function (sampled at the beginning of the game), with the same input/output profile as a realistic hash function we might use (e.g., 256-bit output strings if we're eventually going to implement with SHA256).
Secondly, the responses given by the oracle must be consistent. In other words, just as hashing Peter Piper through SHA1 will always produce 1c6244929dc8216f5c1024eebefb5cb73c378431, regardless of who does it, sending a given string through the random oracle will produce the same result, regardless of which party asks for it.

To summarize all of the above, a cryptographic random oracle has the following properties:

it's a function that takes as input any arbitrary sequence of bytes; (of any length, because even if the function we use accepts only fixed sized byte arrays, we can always hash the actual input to that desired length;)
it's a function that returns as output an arbitrary sequence of bytes; (as with the inputs, if we require a fixed size, hashing the output can solve that;)
the returned sequence of bytes should be unguessable even if we (or the attacker) have the input; (the only way to find the output should be asking the oracle;) thus we could say that the oracle must keep its internal state or mechanism secret from everyone;
(the previous item implies that the oracle output has some minimum entropy in the cryptographic sense;)
the function has to be deterministic; (for the same input, it will always return the same output;)

!!! WARNING !!!
I am not a cryptographer, or even a security expert!, at best I am a hobbyist applied-cryptography developer, thus take everything I say with a large grain of salt and lots of circumspection.

Before I go and describe how we can use them, I want to give a few examples of real-world instantiations of such oracles:

any HMAC-like function (including the Blake3 hash function), provided that the oracle keeps the key secret; we just take the input and pass that through the HMAC function with the secret key and we return the hash as the output of the oracle;
any HKDF-like function, including scrypt and Argon2, by applying exactly the same steps as above (but ignoring the salt); as a bonus we also achieve natural rate-limiting for calling the oracle (provided it doesn't have infinite resources);
any block cipher encryption function (like AES), provided that the oracle keeps the key secret; we just take the input (hashing it if of the wrong block size), then pass it through either the encryption or decryption function with the key, and we return the resulting block as the output of the oracle; (it doesn't matter if we use the encryption or decryption function, as long as we are consistent; however it's perhaps wiser to always use the encryption function;)
any nonce-based encryption function (like the Salsa20 or ChaCha20 family), provided that the oracle keeps the key secret; we take the input, hash it to the expected nonce size, then encrypt any constant data (like all zeroes of the same size as the desired output) with the key and computed nonce, and we return the encrypted data as the output of the oracle; (in essence with this family of functions, the encryption / decryption procedure is the same, as it's usually based on XOR;)
the decryption operation of the RSA algorithm, provided that the oracle keeps the key secret; in essence the "raw RSA decryption" is used in either the "proper RSA decryption" or the "proper RSA signature", but because many implementations don't offer access to "raw decryption", or don't let us feed "garbage" into the "proper decryption", we are left using RSA signatures; we take the input, hash it to the expected size (which usually is slightly less than the size of the key), then pass that through the RSA signature with the key, and return the signature (or a hash of it) as the oracle output; (this includes any RSA USB token, the ssh-agent, and other similar agents;)
the signature operation based on Curve25519 algorithm, applying exactly the same steps as in the case of RSA; (as above, this includes many USB cryptographic tokens, including U2F-related, the ssh-agent, and other similar agents;)
the DHE construct with Curve25519 (and I assume other elliptic curve algorithms), provided that the oracle keeps both its public and private keys secret; we take the input and deterministically convert it into a Curve25519 (or similar) private key, then we execute the DHE algorithm between this input key pair and the oracle's key pair, and return the DHE shared secret as the oracle output (perhaps after hashing it together with the input to increase entropy);
we could then imagine constructs that are somewhat similar to scrypt or Argon2, but which instead of constructing the memory block based on the input, reuse existing sets of blocks, perhaps stored on physical medium;
any KMS-like implementation or service, from HashiCorp's Vault, to Linux's key management and retention facility, and including any managed offers from cloud providers; many of these either implement RSA or elliptic-curve algorithms, or have built-in support for key wrapping (in which case we wrap the hash of the input);
even HOTP-like algorithms, where the input is converted into the counter; (although these offer very weak security, as their output isn't large enough to prohibit brute-forcing;)

Once these "random oracles" have been demystified, and once we've seen that most real world instantiations are nothing more than implementations (or misusages) of existing cryptographic algorithms, we could ask ourselves why focusing on them beyond their purpose in theoretical cryptography?

Indeed, they are completely useless in practical terms (as opposed to just reusing their backing primitives), unless we take into account one key element: they are distinct entities, running outside our software, perhaps even on a remote machine or security enclave! Thus, just as many two-factor authentication mechanisms have nothing special in them, except the disconnect between them and the user's device.

Thus, we can think of such oracles as second factors that we could use in our cryptographic constructs. Ones which, if properly secured, might thwart at least brute-force attacks, or require the attacker a much wider foothold in the user's environment.

Where could we use them: for anything that accepts as input a password or shared key; for example we could use them to:

derive encryption keys;
derive strong passwords (to be used by other systems that perhaps don't implement any advanced security measures or multiple factors);
derive authentication keys, or even providing the authentication themselves (by proving that one can call such an oracle;) (basically what the whole WebAuthn thing is all about;)

For example I use them as additional encryption factors in my own z-tokens project, as currently implemented in z-tokens/exchange/crypto.rs.

How can one actually use these?

Assuming an oracle is embodied as a function oracle (bytes) -> (bytes), and assuming we have a sequence of such functions in an array (oracles), we need to:

foremost, make sure that the oracles array is in a canonical order, else just swapping the order of two oracles would yield completely different outputs; we could achieve this by sorting the oracles based on some defining factor (like for example type, identities, etc.);
we assume there is a function derive (context, [input, input, ...]) -> (hash) that is either HMAC-based, HKDF-based, or Blake3's derive_key, that takes two inputs:
- a static context (or purpose) used for (cryptographic) domain separation, that we hard-code into our software, and which should be globally unique;
- the one or more inputs we want to derive; (care must be taken to implement a canonical serialization of these into a single input; for example hashing them to equal length and then concatenating;)
- (in case of something HMAC-based, we can use the context as the key;)
- (in case of something HKDF-based, we can use the context as the key, and ignoring the salt;)
we start with
let intermediary_key = derive ("{oracle-intermediary-key-begin}", [input_key])
then, for oracle in oracles:
- let oracle_input = derive ("{oracle-input}", [intermediary_key, oracle.identifier]);
  (here we assume that each oracle has a canonical identifier or description, but this is optional and can be eliminated;)
- let oracle_output = oracle.execute (oracle_input);
- let intermediary_key = derive ("{oracle-intermediary-key-update}", [intermediary_key, oracle_output]);
  (see below why we should do this extra hashing;)
we finally
let output_key = derive ("{oracle-intermediary-key-end}", [intermediary_key]);
where all the "{...}" are the static globally unique context identifiers I've described initially;
if we want to use scrypt or Argon2 to derive any passwords, we could execute this on the output_key, so that an attacker is forced to pass through oracles every time, and thus is not able to pre-compute anything; but we need to trust the first oracle in the sequence, as he might execute brute-forcing against the initial input;

(Other constructs are possible, like for example allowing parallel execution, but I believe executing them serially yields the strongest security.)

Why do I use the derive function extensively?

for once, because I don't want to give the oracle any value that it can then use to attack me; at worst, if an oracle is compromised, it would behave just like it wouldn't have existed in the first place, and its step would be just some hard-coded unnecessary steps in the public algorithm;
then, we might reuse (or better said misuse) existing agents (like ssh-agent) and I don't want a rogue proper client of such agent to be able to trick my agent into executing a derivation; (for example, if we rely on ssh-agent, I don't want a rogue SSH server to trick my agent into executing a derivation);
and finally, because we can't trust the oracle to output truly random data, thus by hashing the output together with something that is already random, we can protect ourselves from bad oracle implementations;

What are the properties of such a construct:

the oracles can serve the role of a second (or third, fourth, etc.) factor in encryption / decryption;
thus, restricting access to these oracles provides road-blocks for those that might try to brute-force other factors used in the encryption (like passwords);
if an oracle is compromised, it doesn't break anything, in strength terms it just like it was never there to begin with;
an attacker that wants to brute-force anything, has to serially execute each oracle, and can't parallelize them;

!!! WARNING !!!
I am not a cryptographer, and I have not verified the algorithm above. I did ponder and developed this algorithm over the course of the last 5 months, and I do believe it doesn't hide anything nefarious. But I wouldn't trust this with important data before someone with experience takes a look!

In conclusion, I would have liked to discover this wonderful construct much earlier, especially since it's nothing magical about it, and also because it has many practical applications (like for example database record encryption, multi-factor encryption, etc.).

As always, any feedback (criticism or ideas alike) is welcome, and please get in touch with me as described in the feedback section.

As a bonus, here are a few open-source projects (which I haven't reviewed or endorse) that use the ssh-agent as such a random oracle: