TLDR
Just read the proposed solution section, which boils down to:
Let's drop the concept of software packages, and let's adopt the self-contained binary executable as the new unit of software distribution!
If my words don't sway you, here is a list (that I'll keep updating in the future) of links to other articles touching on similar ideas:
Modern software distribution
AKA "actually free with no-ads apps store" of the Linux and BSD worlds. (Although it remains to see for how long the "no-ads" statement will hold...)
One of the major differentiators between Linux and BSD's when compared to Windows (and to some extent OSX) is the package manager that fulfills a few important tasks:
discovery -- searching for software packages (mostly open-source) that are known to work on one's machine;
fetching -- downloading (and verifying the signature of) all the files required to run the software on one's machine, usually in the form of bundled packages (thus the name of "package manager"); and it does this not only for the actual software files (mainly executables, libraries, resources, or documentation), but also other dependent files (mainly other libraries or resources);
deployment -- installing, updating, and to some extent configuring (mainly through installation hooks) both the actual software and its (far too many most of the time) dependencies;
but, most importantly, solving Sudoku -- given that many package managers are SAT solvers, one could just cheat at Sudoku and let the package manager solve the puzzle, provided that one is able to express it in terms of dependency constraints;
I always remember with great dread the time when I was using Windows and I had to reinstall it: get the "install kits", both for software and drivers, which most of the time were more cherished than grandmas jewelry because these were unique priceless executables found only on the CD that came with the expansion card; then sit through endless sessions of next-next-next wizards; rinse and repeat each 6 months, or until it crashed more than 5 times a day...
And, if it weren't for Homebrew, OSX users would suffer from the same fate on every fresh install...
Also, I was recently made aware that there is Chocolatey for Windows, that seems to have ~9K packages available.
Sustainability of the status-quo
Unfortunately, especially as of late, say for the last 4 or 5 years, things have started to regress on the package management front... (No, it isn't the Sudoku solving capability, it's still there, and it seems mandatory for any non-trivial distribution-wide upgrade...)
It's that one, especially when working in a developer capacity, can't always find the software packages one is looking for (and not only the latest version, but any version thereof).
Granted, for most distributions -- Debian, ArchLinux, Gentoo, FreeBSD, OpenBSD, etc., and to some extent OpenSUSE, Ubuntu, Fedora, etc. -- package repositories are maintained exclusively by volunteers, who, in their spare time, and without any compensation (except complaints that they don't do enough), have to keep up with the upstream release cycle: patch, build, test, sometimes document, package, release, etc.
Unfortunately, their life isn't made easier, for once due to the explosion of a myriad of open source projects that have to be integrated in their distribution, but also due to a few factors that are more in the developer's corner:
increased usage of languages and run-times that don't fit the current build model of
./configure && make && make install
; they are written in languages like Rust / Go/ NodeJS, and to some extent Python / Ruby / Haskel / Java, and even emerging ones like Zig / Nim; these new languages not only come with different tools, but also with their own language specific dependency managers (some of which I bet are themselves Sudoku solving engines);some of these new projects have large numbers (sometimes in the tens, hundreds, and even thousands) of dependencies; given the current build model, each of these dependencies have to be individually packaged and released; (here is an analysis of dependencies for NodeJS and Rust;)
on top of that, many projects have conflicting (recursive) dependency requirements (I knew that the capability of solving Sudoku would have its purpose somewhere); previously, distributions managed to provide a single version of a given library, which doesn't stand anymore; not only projects have conflicting dependency requirements, some projects themselves use at the same time (due to deep dependency trees) different versions of the same library (e.g. Rust, NodeJS);
Here are a few more words about the subject:
- a description of these issues from a security perspective;
- suggestions about what developers can do better to help distribution volunteers;
Numbers supporting the sustainability issue
OpenSUSE packages statistics
For example, on my OpenSUSE Tumbleweed laptop (with some additional development repositories enabled) there are:
- ~56K packages in total;
- ~13K packages that actually install binaries under
/bin
,/sbin
,/usr/bin
, or/usr/sbin
; (that is ~23%;) - ~9K packages that start with
^lib
; (these are "classical" libraries;) - ~5K packages that end with
-devel$
; (these are mostly tied with the classical libraries;) - ~10K packages that start with
^python[0-9]*-
; (that is ~17%;) (these are for Python;) granted that some of these are in triplicates, as one library might be packaged both for Python2, Python3.6, Python3.9, etc., but this is besides the point, as each is independently packaged and installed; - ~8K packages that start with
^texlive-
; (these are for LaTeX and related;) - ~1500 packages that start with
^perl-
; (these are for Perl;) - ~1200 packages that start with
^ghc-
; (these are for Haskel;) - ~700 packages that contain
-rubygem-
; (these are for Ruby;) - ~160 packages that start with
^golang-
; (these are for Go;) - none seem to be available for NodeJS or Java;
In my view these numbers, namely the ration between packages that one can actually execute, and packages that are just supporting libraries or resources, will not scale with the explosion of open-source development...
GitHub projects statistics
If one searches for all projects tagged cli
,
one can find:
- ~27K projects in total;
- ~9K projects written in JavaScript or written in TypeScript;
- ~5K projects written in Python;
- ~4K projects written in Go;
- ~2K projects written in Rust;
- only ~600 projects written in C; (no figures for C++;)
Based on these numbers, and say that only 10% of them are worthwhile (that is ~3K), say that each of these projects has on average 2 unique dependencies (not shared with other tools) (that is an extra ~6K), we get a total of ~10K more software packages that need volunteers to patch, test, build, and release...
I think it's obvious this is not sustainable.
Some non-solutions
In fact, and this has been going on for a few years already, the cracks are starting to show given how many projects suggest alternative install methods:
from the "security incident waiting to happen" --
curl http://harmless.tool/promise-not-to-run-sudo-and-trash-your-system-and-steal-your-passwords-install-script | bash
; (I know, I know... it's unsafe because I'm not usinghttps://...
;)alternating with "the cloud is now your laptop" --
docker run just-another-2g-vm-image-running-a-fully-fledged-os-with-all-the-systemd-glory-masked-as-a-container
; (you get bonus points if the software in question is actually a database engine, that certainly won't lose any data when you purge it by mistake;)sometimes sprinkled with the "iOS / Android apps alternative for Linux", namely one of Flatpack, AppImage, Snappy, PackageKit, 0install, etc.; (here is a nice article describing how these are not the future;)
or the "modern alternative to
./configure && make && make install
", depending on the language:go install most-likely-will-be-quick
; (no sarcasm here!)cargo install will-not-set-the-computer-on-fire-while-building
;npm install will-not-download-the-whole-internet-for-colored-messages
;pip install hope-you-like-debugging-why-native-dependencies-fail-to-build
;- (I have used some Ruby, Julia, but my initial estimation puts them in the
pip install
category;)
or the slightly more "saner" (?) version of the above, but by using "virtual environments";
to my preferred one -- no sarcasm here also! -- just grab the damn executable from the GitHub releases page, copy it somewhere on your
${PATH}
, and go do whatever you wanted to do in the first place; (granted this is not economically sustainable, because now I can't bill my client for "8 hours, provisioning the dev environment";)
A possible "worse is better" solution
Which brings me to my point...
Given today's software landscape, lots of dependencies, lots of building issues, very quick release cycles, many distributions to cover, highly demanding users, I think there is only a single way out of this:
No, it's not Nix, or Guix, or Homebrew...
they are just prolonging the illness...
Let's drop the concept of software package, and let's adopt the self-contained binary executable as the new unit of software distribution!
How would the workflow look like?
with a package manager, just like it does now, sans the dependency hassle; (i.e. we can drop our Sudoku solvers from the package manager;)
(from here on, I assume there is no package manager;)
the packager (which most of the time is the developer himself):
- build the executable for the few platforms on which the tool works;
- upload the executable to the project's GitHub releases page
(which can be automated with GitHub's
gh
tool); (or anywhere else, certainly a centralized repository would be nice;)
the user:
curl -o ./some-tool https://harmless.tool/some-tool
;sha256sum ./some-tool
and compare with the release; (this is a subject for another discussion; in the interim see cosign;)chmod a=rx ./some-tool
sudo mv ./some-tool /usr/local/bin/some-tool
;- enjoy!
Like everything in life: tradeoffs
Cons
Are there disadvantages: yes!
Just don't tell the security people, they'll scream at you about vulnerabilities that won't be easily fixed through an upgrade.
Also, the executable size is huge, in the tens of MiB, or sometimes even in the hundreds...
Plus, what if the binary doesn't do what it says, and instead it's malicious? What if the packager replaced the tool with something else that does crypto-mining? Well, do you trust the developer (or the developer of a dependency deep in the dependency tree) not to steal from your crypto-wallet?
Here are a few more words about the disadvantages:
Pros
Are there advantages: yes, x10!
Not only can you easily install new updates (that will certainly include those vulnerabilities fixes the security people yell about), but you can now:
- have multiple versions of the same tool installed and running at the same time;
- have it installed and running even when you don't have
sudo
rights; just place it in/home/user/.bin
(and have that in your${PATH}
;) - have it stowed in your
.git
repository so that everyone in your team gets the same experience without onboarding scripts;
Limitations
And of course, this isn't suitable for everything.
I don't think a rich GUI application like LibreOffice, Firefox, Thunderbird, Slack, Discord, Skype, GMail, etc., can be delivered as a single executable.
Why, they need resources, and libraries, and HTML, and JS, and CSS, and sounds, and half the fluffy cat pictures on the internet, all to implement "features that allow users to express their most authentic selves and to bring them joy while" doing whatever users do with these wonderful life-enriching software jewels of the modern world... (The quotation is from Firefox 94 "Colorways" release note.)
Prerequisites for single binary executables
What would be the prerequisites for something like this to work?
foremost, bundle all your code into the executable itself; obviously provided out-of-the-box for compiled languages like C / C++ / Rust / Go / Zig / Nim, but it also works for other languages like Python / Deno / Erlang / Java that can use a Zip (or alternative) files as the container for the code;
second, but just as important, bundle all your resources into the executable itself; works out-of-the-box for Rust / Go, can easily work for C / C++ / Python / Deno / Erlang / Java, not sure for others;
make the tool "install path independent"; that is, the code shouldn't care if it's executable is installed in
/usr/bin/local
,/opt/some-tool/v2/bin
, or even/home/user/.bin
;make the tool "install name independent"; just like the code shouldn't care where it's executable is installed, it also shouldn't care what it's executable file is named; it could be named
some-tool
,some-tool-v42
,whatever
, etc.; (if it needs to re-execute itself, just look atargv[0]
or/proc/self/exec
on Linux;) (if it needs to execute another tool, that in turn will execute it, just exportSOME_TOOL_EXEC
with theargv[0]
, and document that for others to use;)if possible, especially important for Linux with it's
glibc
variation across distributions, try static linking; works flawlessly with Go (including cross-compiling), somewhat with Rust / C / C++, and most-likely it's already fulfilled with interpreted languages like Python / Erlang / Deno / Java;if possible, especially to aid the static linking, but also to remove external dependencies, try to depend only on "pure libraries" written in your programming language; that is libraries that don't depend on any OS provided libraries; and where it doesn't work, just keep their number low;
if it's not possible not to depend on OS provided libraries, as an alternative, try to use
rpath
with$ORIGIN
and provide those libraries for download; (never rely onLD_LIBRARY_PATH
;)
Not a novel discovery, already in use
This is just for toy projects you say?
Here are a few examples:
ClickHouse -- a modern SQL server for analytical use-cases; (written in C++;) (used by many companies, from its parent Yandex to CloudFlare;)
FFmpeg -- the go-to-tool for video compression and transcoding; (written in C;)
firecracker -- Amazon's lightweight VM alternative to QEMU, used in their containers and lambda functions solutions; (written in Rust;)
deno -- the Deno language runtime (saner alternative to NodeJS); (written in Rust;)
Python3 -- custom Python3 static builds for Linux (and dynamic, but without dependencies, for others); (written in C;)
bash -- custom
bash
static builds for Linux (and dynamic, but without dependencies, for others); (written in C;)Hugo -- the popular static website generator; (written in Go;)
kubectl -- the management tool for Kubernetes; (written in Go;)
Lego -- ACME (i.e. Let's Encrypt) client; (written in Go;)
DNSControl -- StackExchange's Git friendly DNS records management tool; (written in Go;)
cloudflared -- CloudFlare's tool to establish Argo tunnels, and other related tasks; (written in Go;)
minio -- S3 compliant self-hosted server; (written in Go;)
gh -- GitHub's tool for repository related tasks; (written in Go;)
restic -- modern encrypted backups, supporting many cloud providers from AWS / Azure / Google to BackBlaze and SFTP; (written in Go;)
age -- modern encryption tool; (written in Go;)
jq -- query for JSON; (written in C;)
minify -- minifier for HTML / CSS / JS; (written in Go;)
hyperfine -- cli tool benchmarker; (written in Rust;)
rebar3 -- Erlang's package manager and build tool;
ninja -- build tool alternative to
make
; (written in C++;)
Sidenote about Go's suitability for such use-cases
Hmm... Have you noticed something?
- most are written in Go;
- few are written in Rust / C / C++;
- none are written in Python / Ruby / NodeJS / Java;
Wonder why?
A tale for another time... But until then, here's a hint though...
- how easy is it to build a static binary executable in Go?
just run
go build -tags netgo ./cmd/some-tool.go
; - how easy is it to cross-compile the above?
just prepend
GOOS=darwin
; - how easy is to do that in Rust / C / C++? let's just say that sometimes, especially if the code is small enough, it's easier to just rewrite the damn thing in Go...
(Disclaimer: I kind of hate Go for various reasons... But I do use it mostly for the reason above...)
Languages suitable for single binary executables
So, say I've convinced you about this single binary executable practice, how should you proceed?
Depending on the programming language:
Go -- you're already covered, just don't use non-pure libraries that depend on OS provided libraries;
Rust -- say goodbye to cross-compiling, but if you stay away from OS provided libraries, you are kind of covered;
C / C++ -- certainly possible, but you'll need on retainer a UNIX graybeard, or at least an
autotools
-foo master;Python -- you're already covered, just read the zipapp documentation;
Erlang -- you're already covered, just read the escript documentation;
Deno -- you're already covered;
NodeJS -- technically, I know it's possible; but I can almost bet my right arm that given today's ecosystem it's not feasible;
Ruby -- don't know; but given Ruby's focus (web development), I would say it's not possible;
Zig / Nim -- probably (?);
Java -- possible, but you'll need a startup script to just call
java -jar some-tool.jar
; (also not a good fit for short-lived tools, mainly due to startup times;)
Concluding remarks
In the end, is it all worth it? Does it solve the software distribution problem?
Well, it's complicated...
Short term
I say that in the short term, it can certainly solve our immediate software distribution problems, by just having tools that are easy to install, and just work out-of-the-box.
Long term
However, in the long term, although I don't think it's the right solution, I certainly think it pushes us in the correct direction...
As a parallel I'll take systemd
:
before it was adopted, our services were initialized by poorly written
and error-prone bash
scripts;
after it was adopted, upstream developers started paying more
attention to their services life-cycles and run-environments
(other service dependencies, start, stop, restart, reload,
child processes, logging, etc.).
At the same time, I don't think systemd
(in its current form)
is the right answer.
On the other hand, at least it forced us to clean up
this part of our software...
Now, coming back to the single binary executable, although I don't think in the long term it's the right solution, I do think it will force developers to pay more attention to their dependencies, build / test workflows, and in general improve their projects from a logistical point of view.