[remark] Context binding password-based authentication -- Volution Notes

Warning
The following scheme and code should be considered highly experimental!
I have not analyzed the security strength and implications of the proposed scheme and code.
Also, I am not a cryptographer, thus I might be making huge mistakes.
Any feedback is welcome.

While working on an experimental project I wanted to try something "new" with regard to password-based authentication, something that would fulfill the following requirements (starting with the most important ones):

the password hash must be bound to the following:
- the user's email address (or login name);
- the internal identity for the account (for example the table primary key);
- (obviously, the user login password;)
the whole scheme should be implemetable fully within Postgres as a custom function (i.e. CREATE FUNCTION), leveraging only what the standard Postgres has to offer, or other extensions that are also available for most cloud offerings (like AWS RDS); (in what follows, we only need the pgcrypto and uuid-ossp extensions;)
the password salt should be representable as a UUID (version 4 in this case, offering 122 bits of entropy);
the password hash should also be representable as an UUID (version 5 in this case, also offering 122 bits of entropy);
the password hashing algorithm must be suitable for the task;

Why should the password hash be bound to the user email and the internal account identity?

Because we get the following feature: someone can't just copy-paste the (hashed) password from one account to another.

For example, an attacker who has write access to the accounts database, but doesn't have access to other parts of the database holding important user data (which he wants to obtain), could create a new account (that he controls and thus knows the password of), then swap the two accounts (the fake one and the targeted one) hashed passwords in the database, then authenticate normally via the web (or another) interface, exfiltrate the important user data, and finally swap back the hashed passwords in the database. Thus, by binding the password hash to the internal account identity, he can't undertake this type of attack.

(Obviously, the attacker can swap the user email address and undertake a password reset attack, and this should be handled as an independent issue.)

Why store the password salt and hash as UUIDs?

Because UUIDs are ubiquitous, available in most programming languages and databases (if not available, they can be easily stored as strings), can easily be printed (as strings), and if needed, can easily be converted to raw bytes through hex encoding.

Moreover, at least in Postgres, they are efficiently represented as 16 bytes. Thus, two UUIDs have an overhead of 32 bytes, compared to a standard Bcrypt string of 60 characters (which depending on the database encoding might mean more than 60 bytes).

Why implement the whole scheme directly in Postgres?

Because we can move most of the "sensitive" code paths within the database, and if one needs to change the backend, most code wouldn't need to be rewriten.

Also, because one would get fully-featured REPL for free, via the standard Postgres psql shell.

About choosing the password hashing function...

Well, it's Bcrypt, for no other reason than:

it is still considered "safe", but marked as "obsolete" in most guides (like OWASP Password Storage Cheat Sheet);
it is the only "safe" algorithm available in Postgre's pgcrypto extension;
and luckily, unlike scrypt or Argon2, it has a reduced memory footprint (only 4 KiB), thus making it suitable to run inside a database;

The work factor is chosen (and hard-coded) as 10, which given that Bcrypt uses a logarithmic work factor, implies 1024 iterations. It is the current minimum recommended by OWASP. It takes on average around 100 ms per hash, thus not a large burden on the database.

(Hard-coding the password hash function and iterations could make future migrations to other schemes difficult, however it is not a concern, as one can just add a new column stating the scheme used.)

There are a few concerns about Bcrypt (specifically to the way it is implemented or used):

the password size is limited to 72 bytes;
although, Schneier (the author of the Blowfish algorithm, an underlying component of Bcrypt) states that the key must have at most 448 bits (thus 56 bytes);
some Bcrypt implementations don't handle well `\0` bytes inside a password;
due to the wide deployment of password hashing schemes involving plain-hashes (i.e. MD5, SHA1, etc.), plain-Bcrypt, and Bcrypt-of-plain-hash (i.e. Bcrypt(MD5(password))), there is an attack called password shucking that we must protect against;

Thus, because of the previous concerns and other practicalities, the following choices are made:

the password is first Unicode NFC normalized, then encoded as UTF-8 into a series of bytes;
then the password bytes are passed through a hash function yielding 256 bits, by using "cryptographic domain separation"; (see the implementation below;)
then the password bytes are encoded as standard Base64 encoding;
(256 bits raw, encoded as Base64, yields 44 ASCII characters, thus well under the 72/56 bytes limit of Bcrypt/Blowfish;)

Instead of pseudo-code, here is the working Python3 implementation, that one can easily copy-paste into a Python3 shell:

all these imports, except for bcrypt, are part of the standard Python3 library:

import bcrypt
import hashlib
import hmac
import base64
import uuid
import unicodedata

these are the inputs to our password hashing scheme:
- the handle is the internal account identifier, an UUID itself;
- the nonce is the password salt stored in the database; (we call it "nonce" because "salt" already has a meaning in the password hashing algorithm;)
- the email_input and password_input are login name and password, Unicode strings of any length; (care must be taken to validate the password strength, and various other constraints that are out of the scope of this article;)
```
_handle_input = "6a9e4086-b11e-4833-86eb-09aa2676c13f"
_nonce_input = "94b81ffc-1803-418b-8eb4-b73243c34bfb"
_email_input = "person@example.com"
_password_input = "password"
```

transform the inputs into raw bytes:

_handle_raw = bytes.fromhex (_handle_input.replace ("-", ""))
_nonce_raw = bytes.fromhex (_nonce_input.replace ("-", ""))
_email_raw = unicodedata.normalize ("NFC", _email_input) .encode ("utf8")
_password_raw = unicodedata.normalize ("NFC", _password_input) .encode ("utf8")

in what follows, ignore the .encode ("ascii") or .encode ("utf8") (and the .decode (...) variant), as these are needed to switch between Python strings (str) and Python byte array (bytes) types that various functions expect; (the Postgres implementation is much more straight-forward;)

these hard-coded strings (hashed to 256 bits) help us implement cryptographic domain separation:

_purpose_core = "skeldvakt:password-based-authentication:2024a"
_purpose_derive = hashlib.sha3_256 ((_purpose_core + ":derive") .encode ("utf8")) .digest ()
_purpose_password = hashlib.sha3_256 ((_purpose_core + ":password") .encode ("utf8")) .digest ()
_purpose_salt = hashlib.sha3_256 ((_purpose_core + ":salt") .encode ("utf8")) .digest ()
_purpose_hash = hashlib.sha3_256 ((_purpose_core + ":hash") .encode ("utf8")) .digest ()

deriving cryptographic materials actually used for password hashing:
- here we create a "cryptographic key tree", rooted in a derivation key, obtained based on the handle (i.e. the internal account identity) and the nonce (i.e. the internal password salt);
- the Bcrypt password is a HMAC keyed by the derivation key (prefixed with a specific constant purpose for domain separation) and the input password;
- the Bcrypt salt is a HMAC keyed by the derivation key (prefixed with another specific constant) and the input email;
- (a similar scheme is used in AWS v4 request signature scheme;)
```
_derive_key = hmac.digest (_purpose_derive + _nonce_raw, _handle_raw, hashlib.sha3_256)
_password_key = hmac.digest (_purpose_password + _derive_key, _password_raw, hashlib.sha3_256)
_salt_key = hmac.digest (_purpose_salt + _derive_key, _email_raw, hashlib.sha3_256)
```
because Bcrypt (in most C-based implementations) expects a \0 terminated string, we encode the password material as Base64:
```
_password_base64 = base64.standard_b64encode (_password_key) .decode ("ascii")
```
in a similar manner we Base64 encode the salt material, but:
- due to the fact that the standard crypt function (which exposes the Bcrypt algorithm) uses a different Base64 alphabet, we need to translate from "standard Base64" to "crypt Base64";
- Bcrypt uses a 16 byte salt, thus we need to truncate our salt material`;
- Base64 encoding 16 bytes yields the Base64 trailer of ==, thus we drop that, keeping only the first 22 characters of the encoding;
```
_salt_raw = _salt_key [0:16]
_salt_base64 = base64.standard_b64encode (_salt_raw) .decode ("ascii") [0:22]
_salt_crypt = _salt_base64.translate (str.maketrans ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/", "./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"))
```

we compute the standard crypt (with the Bcrypt algorithm) "salt" and resulting password "hash" (see the Bcrypt algorithm for details):

_crypt_strength = 10
_crypt_salt = ("$2a$" + ("%02d" % _crypt_strength) + "$" + _salt_crypt)
_crypt_hash = bcrypt.hashpw (_password_base64.encode ("ascii"), _crypt_salt.encode ("ascii")) .decode ("ascii")

we extract the standard crypt password "hash", convert it from "crypt Base64" to "standard Base64", and decode it to raw bytes (23 of them):

_hash_crypt = _crypt_hash [29:]
_hash_base64 = _hash_crypt.translate (str.maketrans ("./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789", "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"))
_hash_raw = base64.standard_b64decode (_hash_base64 + "=")

if the Python Bcrypt module would have offered us a lower-level construct, then the previous two snippets could have been written as:
```
_hash_raw = bcrypt.hashpw_low_level (_password_base64, _crypt_strength, _salt_raw)
```

finally we derive the hash cryptographic material (again with domain separation):

_hash_key = hmac.digest (_purpose_hash + _derive_key, _hash_raw, hashlib.sha3_256)

and because we want to get an UUID out of it, we rely on UUIDv5, that resembles some sort of HMAC keyed by a namespace UUID (all zeroes in our case), and the hex representation of the hash material:
```
_hash_hex = _hash_key.hex ()
_hash_output = uuid.uuid5 (uuid.UUID (int = 0), _hash_hex)
```

Observations:

if one wants to implement password peppering, one could thread the pepper as part of the derive_key computation, like for example:

_pepper_input = "pepper"
_pepper_raw = unicodedata.normalize ("NFC", _pepper_input) .encode ("utf8")
_purpose_pepper = hashlib.sha3_256 ((_purpose_core + ":derive") .encode ("utf8")) .digest ()
_pepper_key = hmac.digest (_purpose_pepper + _nonce_raw, _pepper_raw, hashlib.sha3_256)
_derive_key = hmac.digest (_purpose_derive + _pepper_key, _handle_raw, hashlib.sha3_256)

if one wants to bind the resulting password hash to more context than just the email, one could iterate hashing the derivation key, like for example:

_associated_inputs = ["associated-1", "associated-2"]
_associated_raws = [unicodedata.normalize ("NFC", _associated_input) .encode ("utf8") for _associated_input in _associated_inputs]
_purpose_associated = hashlib.sha3_256 ((_purpose_core + ":associated") .encode ("utf8")) .digest ()
for _associated_raw in _associated_raws :
    _derive_key = hmac.digest (_purpose_associated + _derive_key, _associated_raw, hashlib.sha3_256)

if one wants to derive other cryptographic key materials based on the password, one could replicate the way _hash_key is computed, like for example:

_purpose_something = hashlib.sha3_256 ((_purpose_core + ":something") .encode ("utf8")) .digest ()
_something_key = hmac.digest (_purpose_something + _derive_key, _hash_raw, hashlib.sha3_256)

Running the code above, and printing all the intermediary steps, yields:

print ("## inputs")
print ("-> handle-raw  %s (%s) | %s" % (_handle_raw.hex (), len (_handle_raw), _handle_input))
print ("-> nonce-raw   %s (%d) | %s" % (_nonce_raw.hex (), len (_nonce_raw), _nonce_input))
print ("-> email-raw   %s (%d) | %s (%d)" % (_email_raw.hex (), len (_email_raw), _email_input, len (_email_input)))
print ("-> passw-raw   %s (%d) | %s (%d)" % (_password_raw.hex (), len (_password_raw), _password_input, len (_password_input)))
print ("-> derive-key  %s (%d) | %s (%d)" % (_derive_key.hex (), len (_derive_key), _purpose_derive.hex (), len (_purpose_derive)))
print ("-> passw-key   %s (%d) | %s (%d)" % (_password_key.hex (), len (_password_key), _purpose_password.hex (), len (_purpose_password)))
print ("-> salt-key    %s (%d) | %s (%d)" % (_salt_key.hex (), len (_salt_key), _purpose_salt.hex (), len (_purpose_salt)))
print ("## crypt")
print ("-> pass-base64 %s (%d)" % (_password_base64, len (_password_base64)))
print ("-> salt-bytes  %s (%d)" % (_salt_raw.hex (), len (_salt_raw)))
print ("-> salt-base64 %s (%d)" % (_salt_base64, len (_salt_base64)))
print ("-> salt-crypt  %s (%d)" % (_salt_crypt, len (_salt_crypt)))
print ("-> crypt-salt  %s (%d)" % (_crypt_salt, len (_crypt_salt)))
print ("-> crypt-hash  %s (%d)" % (_crypt_hash, len (_crypt_hash)))
print ("-> hash-crypt  %s (%d)" % (_hash_crypt, len (_hash_crypt)))
print ("-> hash-base64 %s (%d)" % (_hash_base64, len (_hash_base64)))
print ("-> hash-raw    %s (%d)" % (_hash_raw.hex (), len (_hash_raw)))
print ("## outputs")
print ("-> hash-key    %s (%d) | %s (%d)" % (_hash_key.hex (), len (_hash_key), _purpose_hash.hex (), len (_purpose_hash)))
print ("-> hash-hex    %s (%d)" % (_hash_hex, len (_hash_hex)))
print ("-> hash-output %s" % (_hash_output))

## inputs
-> handle-raw  6a9e4086b11e483386eb09aa2676c13f (16) | 6a9e4086b11e483386eb09aa2676c13f
-> nonce-raw   94b81ffc1803418b8eb4b73243c34bfb (16) | 94b81ffc1803418b8eb4b73243c34bfb
-> email-raw   706572736f6e406578616d706c652e636f6d (18) | person@example.com (18)
-> passw-raw   70617373776f7264 (8) | password (8)
-> derive-key  eb4c3b0b8dd6cc06310ee0d759116fbc1a1b8b706e38de8dfc293d2fcff81390 (32) | 1c666ffbc42e8225563d7f6b0988dc6c218a882d545b62935c34071c14f2bcb2 (32)
-> passw-key   71622f4d8649f3e7c73f1365e14e5fe0802dc2f0cb339d438f2357fdb63db90c (32) | 2e197c7520e2159323f97d1fa62f96a64e4c495df0a17ea05e127743bcfd35b7 (32)
-> salt-key    8694b08e77be9b68d5bb6566121376f9d515ae9a6a9d035e60557fcbc4d84d5c (32) | 66a7b84a9bb80a91b477e5746a4d29e8c674a8fe8c5f2b145db533b303f14644 (32)
## crypt
-> pass-base64 cWIvTYZJ8+fHPxNl4U5f4IAtwvDLM51DjyNX/bY9uQw= (44)
-> salt-bytes  8694b08e77be9b68d5bb6566121376f9 (16)
-> salt-base64 hpSwjne+m2jVu2VmEhN2+Q (22)
-> salt-crypt  fnQuhlc8k0hTs0TkCfL08O (22)
-> crypt-salt  $2a$10$fnQuhlc8k0hTs0TkCfL08O (29)
-> crypt-hash  $2a$10$fnQuhlc8k0hTs0TkCfL08O3xz8XB3LioTk8TpXk/VZWXxrVuPXfCi (60)
-> hash-crypt  3xz8XB3LioTk8TpXk/VZWXxrVuPXfCi (31)
-> hash-base64 5z1+ZD5NkqVm+VrZmBXbYZztXwRZhEk (31)
-> hash-raw    e73d7e643e4d92a566f95ad99815db619ced5f04598449 (23)
## outputs
-> hash-key    cb2408885909a9b0112fd0bfa423e42547219b10fce04bf887a1bd1aea25d872 (32) | 1b289a071d0c392722e415298c7b5140ce1fbabaecf2232bf7e4bce738276df2 (32)
-> hash-hex    cb2408885909a9b0112fd0bfa423e42547219b10fce04bf887a1bd1aea25d872 (64)
-> hash        c119df3b-d187-5414-9c62-78d3ce67fcf8

And here is the corresponding Postgres function implementation:

create or replace function password_hash
  (
    in _handle uuid,
    in _nonce uuid,
    in _email text,
    in _password text
  )
  returns uuid
  returns null on null input
  immutable
  leakproof

  language 'plpgsql'

as $__plpgsql_code__$

  declare

    _handle_input constant uuid default _handle;
    _nonce_input constant uuid default _nonce;
    _email_input constant text default _email;
    _password_input constant text default _password;

    _handle_raw constant bytea default uuid_send (_handle_input);
    _nonce_raw constant bytea default uuid_send (_nonce_input);
    _email_raw constant bytea default convert_to (normalize (_email_input, nfc), 'utf8');
    _password_raw constant bytea default convert_to (normalize (_password_input, nfc), 'utf8');

    _purpose_core constant text default 'skeldvakt:password-based-authentication:2024a';
    _purpose_derive bytea default digest (_purpose_core || ':derive', 'sha3-256');
    _purpose_password bytea default digest (_purpose_core || ':password', 'sha3-256');
    _purpose_salt bytea default digest (_purpose_core || ':salt', 'sha3-256');
    _purpose_hash bytea default digest (_purpose_core || ':hash', 'sha3-256');

    _derive_key constant bytea default hmac (_handle_raw, _purpose_derive || _nonce_raw, 'sha3-256');
    _password_key constant bytea default hmac (_password_raw, _purpose_password || _derive_key, 'sha3-256');
    _salt_key constant bytea default hmac (_email_raw, _purpose_salt || _derive_key, 'sha3-256');

    _password_base64 constant text default encode (_password_key, 'base64');
    _salt_raw constant bytea default substring (_salt_key, 1, 16);
    _salt_base64 constant text default substring (encode (_salt_raw, 'base64'), 1, 22);
    _salt_crypt constant text default translate (_salt_base64, 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/', './ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789');
    _crypt_strength constant int8 default 10;
    _crypt_salt constant text default ('$2a$' || lpad (format ('%s', _crypt_strength), 2, '0') || '$' || _salt_crypt);
    _crypt_hash constant text default crypt (_password_base64, _crypt_salt);
    _hash_crypt constant text default substring (_crypt_hash, 30);
    _hash_base64 constant text default translate (_hash_crypt, './ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789', 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/');
    _hash_raw constant bytea default decode (_hash_base64 || '=', 'base64');

    _hash_key constant bytea default hmac (_hash_raw, _purpose_hash || _derive_key, 'sha3-256');
    _hash_hex constant text default encode (_hash_key, 'hex');

    _hash_output constant text default uuid_generate_v5 (uuid_nil (), _hash_hex);

  begin
    return _hash_output;
  end;

$__plpgsql_code__$;

> select password_hash (
        '6a9e4086-b11e-4833-86eb-09aa2676c13f',
        '94b81ffc-1803-418b-8eb4-b73243c34bfb',
        'person@example.com',
        'password'
    );

            password_hash
--------------------------------------
 c119df3b-d187-5414-9c62-78d3ce67fcf8
(1 row)