Yet another incompatible alternative
As I've started discussing in yesterday's article about binary to text encoding, I want to sell my own snake-oil in the form of an open-source project:
=> z-tokens -- random tokens generation and related tools
This is my own take on
"all token related swiss army knife tool",
that besides generating passwords / passphrases and other tokens,
has this nice exchange armor
/ exchange dearmor
sub-command
that tries to put together in a unique package
some of the features I've discussed in the mentioned article.
Please note that at the moment the format is experimental, and most likely prone to backward incompatible changes!
The dump that is worth a thousand words
z-tokens exchange armor < /usr/share/syslinux/mbr.bin
#01, 2ymHZ9BtGEtTd 2gUAFKhkdGG6G bpmVaU3c5LJo 3U9aUZP9CToy 42umy5e4T9XME
#02, 3TPuiaMuQdHab 3Je8iXRUn2aAD 4zg1wNnj3Mmn 2fwxaqQFaALZF 32cf2bmqKr8Ez
#03, AFk2KBm56ujk 3DsRPK3Vd8UuU 2HDGG1rFap63P 3VB96ygabKH68 XUbLGpH1ddRR
#04, 34iM3hC9S1X7o 2iDftFE4SKcJp 3641UPkGoQRtf wRuC6S1WpJR BtJJZ7KoJTZi
#05, 2NQuVKc3ikJgM 3L5MRHQqRH2WH 2YVDeALy63kUx pghKDbvw7rqE 3QZK3MSPsC4Gf
#06, 3hEWV6sm1gVXJ 2SMEeaZn3SiLz 3211KnwHHX42K GqXM56S1nLrN 2GVrZYS6z8y72
#07, DBQ6gYpd6Vu5 3mTV76GnUwLbb Mm6P1gRs1cJx 2rG2v3Shu3VNJ 2UynV4L2FxHiK
#08, THXSoaBVgKAu jFPa1tzhSYqE 3t32YALoQNC1r 2hGAP4HKJ2m2p fwqsXuAJgUfj
#09, 2wD6VybecDXwo 2Fy4aq75Q52kF 2WdmPf5UK7Gz5 22SfKnQjbAdbH 3PyUawhEQg4Br
#10, 2qwmWNGkCkD5V 3DbLXJKqfEikW 2rbAtvoFwThfR 2U9P2QgCKyQHr g5R5fvRYGiHz
#11, 2CUChvS6D2Es 2gBBynWN3Andd 1gkYuuLNJCtU wwmEVTML5Zgu 3pJS9q8jYZ1xG
#12. YY
Features and helpful traits
What does it provide:
obviously, binary-to-text encoding; :)
compression, Brotli based, taken to the 11;
checksumming, Blake3 based, providing 96 bits of cryptographic-strength hashing;
all-or-nothing processing, and this isn't something implementation specific! the encoding / decoding technique requires that you first have the whole text available, and if all of it is correct, only then can you have your output; (the downside being that if you lose a small chunk of the text, nothing is recoverable!)
Besides all that, it also focuses on human friendliness:
line numbers, thus you can safely copy-paste overlapping chunks and rearrange them in any way and still be able to re-assemble them with a simple:
paste | sort -u | z-tokens exchange dearmor
(however you can't safely swap words on the same line, not without triggering validation errors;)
copy-paste and reflowing friendliness; text editors and applications see these as lines of words, and except for
#nn,
markers (that could even be forgotten) only letters and numbers are used (in fact Base58 encoding);butter-fingers safe; I'm certainly not speaking about me, but I've seen others forget to actually write
base64 | tool
before pasting the Base64 snippet... with this new format, because each line (assuming it's not reflowed) starts with#
, the shell will just ignore it and do nothing;error hinting (although the reporting is not yet implemented); each word contains an 8-bit checksum, thus if it's misplaced or miswritten there is a 99.6% chance that the error is correctly reported as early as possible;
Undesired features
What it doesn't provide:
streaming (i.e. compressing unbounded streams of data); the tool supports by design at most 4 GiB of binary data, and by implementation at most 128 MiB of binary data; plus everything must fit into memory;
batch processing efficiency; the tool (and algorithm) isn't meant to be run in a tight loop, encoding millions of individual chunks; the tool is meant to be used by a human, at human speed; (the main inefficiency, but a big advantage, being the compression;)
encryption; although initially the
armor
/dearmor
supported something resembling a PIN, that feature was now removed (but available as thez-tokens exchange encrypt
sub-command;)diff
-ing; one can easily think about an encoding format that isdiff
friendly, so that changing bytes here and there won't trigger a cascade effect; however this format, because it uses AONT, isn't one, change even a byte, and most likely each output byte would be different;
How about encoding efficiency?
A single word: meh!
Move along! Nothing to see here!
z-tokens exchange armor < /etc/hosts | wc -c
#>> 998
cat < /etc/hosts | basenc --base64 | wc -c
#>> 2088
wc -c < /etc/hosts
#>> 1543
Tipping my hat to others
Were any of these features original? Obviously not!
However, I've yet to find a tool that combines all of the above in a single package!
Here are a few places I took inspiration from:
Looking under the hood
As mentioned, the code and format is still experimental, and I would like to play with it for a while before putting it into a proper proposal.
However, here is a short overview (as of the time of publishing) of the encoding algorithm:
compress the binary data with Brotli, and append to that the length of original binary data; (if the compressed data is larger or equal in length to the original data, drop the compression and just use the original data;)
compute the Blake3 96-bit hash of the resulting data (compressed then suffixed by length); (in cryptography the hash should have been done before the compression, but I want to be sure I don't feed broken data into the Brotli decompressor;)
apply an AONT (all or nothing transform); (this doesn't provide any cryptographic strength!)
split the resulting data into 8 bytes chunks (no padding is needed, smaller chunks are acceptable);
compute the CRC8 checksum of each chunk, and append it; (mix in the CRC8 input also other signals, such as chunk offset or if it's the last chunk);
take each resulting chunk (at most 8 bytes of data plus 1 byte of CRC8 checksum) and encode it in raw-Base58 (without their own checksumming);
join all resulting words in 5 words per line;
prefix each line with
#nn,
or#nn.
, wherenn
is the line number padded with 0's to properly align all resulting lines; use.
(instead of,
) for the last line; (if there is only one line, no prefix is needed;)
The decoding algorithm is applying all of the above in reverse, with the following observations:
split the input text on any whitespace into individual chunks;
ignore all empty chunks;
ignore all chunks starting with
#
; thus one can add as many comments one wants, by making sure each comment word (and thus not line) is prefixed by#
;apply the rest of the encoding scheme in reverse;
Not yet convinced?
curl | bash
alternative
What if instead of:
curl -L https://nixos.org/nix/install \
| bash -s -- --daemon
## or taking some inspiration from RustUp
curl --proto =https --tlsv1.2 -sSLf https://nixos.org/nix/install \
| bash -s -- --daemon
We could write:
z-tokens exchange dearmor << 'EOS' | bash -s -- --daemon
#01, 4Ee97kVpkDywW tGRXPAJgEs49 2SboChQsHG284 3NVnotR6pURSR 2bFvgrwRdg8Kx
#02, 28Czbx5F2RLZP 24wZh58BW6LLn 2fcy3fG11dxJ1 2DL1gZoz7ELm2 321YaubTZXzHr
#03, xWcPmgkpRiUf mNMVv5vBmJPE 2tSPLLvbnne5N YcrZEDbzgF5Z 3sX13dAQ8GoYW
#04, 3kX66iVVn7kji vpyEMacVP2eH 4AQZ7MpuL5SYz 3GcRBqyjp8qrp 4EAhq5j1Z2e6R
...
#41, 3ybPRnG4k61hb YTJRpdYXTvjy 2JSgqRRMbFFMG 3xmLgZ7LNAvoU dfJew7e8QcfK
#42, DbTCjg56CypK mwu9jUReX6bz 32ERojR93rtdX 2RZBu52uwg2mJ 3pcJMXPVLkqgg
#43, R2P9hSwDLSE7 2Ra8xdUjd4zbD 42L3R98j5pgno 2hTsTVTHT8Ecm 2rXMxEEEBti9q
#44. mhK2L6U2GhGp 32a
EOS
It's perhaps 45 lines longer, but at least it's somewhat safer, and at least reproducible (assuming the script is deterministic) at a later time.
Printed backups
Care to double-check each and every character of your OCR?
restic init -r ./backups
cat ./backups/keys/93ff9f940dc8de0d22fd93075659c879be12d72bbfc61ffc73a30452327b9df4 \
| z-tokens exchange armor
#01, 2iiT1Jc7g73FT 2iH2VmFsattXj 26z5LgkziQgzy erwACWogRTYe Qy9CLNRvZauL
#02, 2AiayWqhpzYqR 3SHvNCvXNRY8 3NKkoG1g5ZZST 3aWYhRrSc98Zu 3ndYfK8QJj3kK
#03, 2fVWT7XKXDyrs 2kEnZLEgiHHKi fNazuikccYm3 itrzum9a9F8v 3t2WJLQ9LTZw2
#04, 3GypVk8cfHGUN 3TEnfUcbNnFfo CvedQwHDiAb1 2z6peBbQsDnse 25Nu2NmPpXWdp
#05, j5h4Vk18ymkZ 2dwDZSp46Hbmv 34173BxNoG9wK 2VgRh6LtK8RWX fPNjdNWr39Lf
#06, 3SgGGQcb2Gaok 4WfQBqFX4mBX 2PCpkWJtD9hKS 2tAkcis45is7u 27TL86niS58c4
#07, 3n263eoNNqTbw 3RSZkmHiRVj5y NdpUjxfF2Sze 2QP9y25dZopMC 4C8AG3JN2dcAv
#08, 3ckDkKtQfDjqw Qs12o8KsDAU3 2gcXM1qMiuGRm BF6pUkdKZKKh 3JenYYrMS4iBU
#09, 2HEYTVvqLNr1K 9xh6FQBuLjzP Zqb2KDcCQLtQ 2tov7UCUeeYQG 2GMLFbNcNsQ2Q
#10. 2sjpzsbCxqjPo 218Cxnojckw8Q 3kAjBJhcEYHLe 287y9yLqx
cat ./backups/keys/93ff9f940dc8de0d22fd93075659c879be12d72bbfc61ffc73a30452327b9df4 \
| base64
eyJjcmVhdGVkIjoiMjAyMy0wMi0wNVQxNDoyNzozMi4wNjgxNjUzNDcrMDI6MDAiLCJ1c2VybmFt
ZSI6ImNpcHJpYW4iLCJob3N0bmFtZSI6InhpeWlwb2tlIiwia2RmIjoic2NyeXB0IiwiTiI6MzI3
NjgsInIiOjgsInAiOjQsInNhbHQiOiJCOEZKQzk1YjE5ZUVINERWakhvbjZnU1VlUDlWRm1IeDV4
ZGRDMnQ1aXdKVUVrdUVlQ1BMdGppU3BMWEFKYmhIdzUydXlMQXRObEVpS0NvY2Jma3Fjdz09Iiwi
ZGF0YSI6ImNqemE2cHYyL3Q5b0ZsaUJoMkpBNG1PMzRsWVVrOVdxMzBPNFVRcFp0eitlT01qVWov
a2ZMTzIySFN2bTdBSm9sSXNxVktseWFaUU9DVWVEYXF3eWpMRDFYbVZ5TTBNZS92bDdyZEdXVUdH
Y1dGbWJEejgwMjZxZENKM2lOQkVhMXovWnBWRmFaMUI3Wlg5WEd0UVVpamRUYXk0dEt5Y3RhUGxJ
cjJEWmRZblZrSFhzL3JGWFQrdDdWTFlmYm5XU0x4Rm54bnN1R1FvTWlYTzdKQWlpTmc9PSJ9
Comments and feedback
If you like (or even hate) the idea, let me know via my email in the feedback section, or open a discussion or issue in the project's repository over at Github.