[remark] Linux UEFI booting simplified -- Volution Notes

For the last few days I've been playing with UEFI booting, and especially UEFI Secure Boot, mostly in relation to what I have described in a previous article about a "secure" computer: Musing about a secure computer for sensitive data

I use the term "secure" in quotes, because unfortunately in this day and age, nothing is truly "secure". Most things, and especially when it comes to IT&C, are just "secure enough", or, more accurately, "presumed secure until proven otherwise"...

So far, I've tried to stay as away as possible from UEFI booting (especially with Linux), because I found the whole mechanism too complicated and convoluted. (And, many years ago, when I had my hands on my first hardware that happened to support UEFI booting, the fact that this particular UEFI implementation wasn't working quite properly didn't help much in building my confidence in the technology.)

However, a long time has passed since then, the technology has (hopefully?) matured somewhat in the interim, and thus I've said it's time to give it a second look. (And, such is karma, this time around another piece of hardware I happen to own requires me to use UEFI booting, as opposed to enabling the "legacy mode", -- which in UEFI parlance is called "CSM" (Compatibility Support Module) -- otherwise I don't get integrated graphics support... For those who are wondering, Intel is the culprit here; or perhaps it's Gigabyte? Who knows...)

Update
A few days after I wrote this article, while going through my reading queue, I've stumbled upon the following LWN article which describes a DevConf.cz 2024 talk that is very similar in spirit with my own article:
the LWN article: Giving bootloaders the boot with nmbl
the DevConf.cz talk: No more boot loader: Please use the kernel instead
the accompanying article: nmbl: we don't need a bootloader

Why am I writing this article?

Because this is the document I wish had existed, when I was researching about how to boot Linux from UEFI!

I want to document a particular aspect of UEFI booting, that not many other articles approach, or if they approach it, it requires a lot of digging to get to that point.

What is the angle I'm focusing on?

My problem with bootloaders -- and the main reason that to this day I managed to use almost exclusively SysLinux -- is their inherent complexity and propensity to breakage. Our operating systems (and especially Linux) are complicated enough, I don't also need the bootloader to be yet another mini-OS -- thus the reason I don't want to touch Grub...

Thus, getting back to UEFI, I want to speak about it from a perspective of simplicity, reliability, and reduced boot complexity. (For those interested in UEFI bells-and-whistles, there are countless articles out there.)

First of all, a bit of history, that would allow us to understand, contrast, and hopefully appreciate where UEFI has gotten us.

How does BIOS / MBR / CMS / legacy booting happen?

In most cases, the user connects an HDD, SSD, or USB stick to the machine he wants to boot. (CD/DVD booting is a bit different, but not by much.)

From the early firmware point of view -- be it an actual genuine BIOS firmware, or it's modern reimplementation / simulation UEFI CMS -- that HDD / SSH / USB stick is nothing else than a block device, a "disk".

The firmware looks at the disk, reads the first 440 bytes from it, and executes them. It is worth noting that, from the firmware point of view, this code is nothing special, it could even be the whole OS if it would fit in 440 bytes.

Back to booting, in many cases these first 440 bytes of a disk are not actually part of the bootloader, instead they are a small program that expects a partition table to exist on that disk, which then looks at which partition is marked as bootable, reads the first 440 bytes from that partition, and executes them. Thus, from a functional point of view, quite similar with the previous phase. In fact, one could even dispense with the partition, and create the file-system directly at the disk level (i.e. /dev/sda instead of /dev/sda1).

These 440 bytes of code are an early part of the bootloader, tasked with reading some more code, this time from a hard-coded offset in the partition it was loaded from, which then executes it.

Some bootloaders call the early boot part the "stage-1", and the second part the "stage-2".

This second part is the actual bootloader, which in turn looks at the file-system on the same partition it was loaded from, searches for the kernel, initramfs / initrd, loads them into memory, and passes the control to the kernel.

I forgot to mention that the actual bootloader also reads a bunch of other files that it needs for its own functionality (like libraries for displaying menus, fonts, graphics, libraries for loading and executing the kernel, etc.)

A...
One more thing, did I mention that the bootloader also needs a configuration file?
Indeed, it does need one... (In fact, many bootloaders have many-many more than one...)

How does the bootloader load all these files?
As hinted initially, it is a mini-OS, with mini-drivers to interact with the hardware, read the file-system, etc.

Why did I go through this lengthy description about the legacy of booting systems?

Because, as said, I'm interested in eliminating as much complexity as possible from the boot process.

Let's look at possible avenues for problems in this whole booting procedure -- and sometimes these are attack vectors, because in the early days, especially in the case of MS-DOS, many malware persisted by embedding themselves in those first 440 bytes of the disk / partition:

if something happens to the first 440 bytes of the disk, the system doesn't boot;
if something happens to the first 440 bytes of the partition, the system doesn't boot;
if something marks the boot partition as "not for booting", the system doesn't boot;
if something happens to the data stored in the hard-coded portion of the partition -- like for example removing the file, replacing it with another file (which might change its offset on the partition), sometimes even updating its contents, etc. -- the system doesn't boot;
if something happens to the partition file-system, especially if it's not FAT, and its remains in an unclean state, the system doesn't boot;
if something happens to one of the bootloader extra files, the system doesn't boot;
if something happens to the bootloader configuration file, the system doesn't boot;

Now that's a lot of "somethings" that can go wrong... (And, through personal experience, almost all of these "somethings" have happened to my systems at least once.)

Just to conclude this detour, SysLinux (my legacy bootloader of choice), to its merit, is among the simplest bootloaders out there, and still has all of these problems.

Back on track, how about UEFI booting?

There are many good information sources about UEFI booting (like for example Managing EFI Boot Loaders for Linux), thus I'll keep it brief.

From the point of view of the boot process, the UEFI firmware is the bootloader. (It's much more than that, but for the scope of this article we'll consider it plays the role of a bootloader.)

Here is how it works, or at least how I understand it works (note that I'm simplifying this to the point of silliness, just to highlight the functionality), the UEFI firmware:

looks at some UEFI variables (which are stored in non-volatile memory present on the motherboard), for a list of UEFI programs to launch (boot);
each entry in this list points to an UEFI program (that is just a simple file with the .efi extension), and states from which disk and partition be loaded from, and what arguments to be given at startup;
it tries to load and execute each UEFI program, one by one, moving to the next one if the previous one fails to load or to execute;
finally, one of the UEFI programs on that list will be a working OS;
if any of that fails, it looks at the HDDs / SSDs / USB sticks connected to the system, searching for partitions marked as bootable -- which in of itself depends on the UEFI implementer, because the standard says something, but each UEFI implementer might have some other fallbacks -- and inside each one of these it tries to load and execute the /EFI/BOOT/BOOTx64.efi program;
if all fails, some error message or menu is presented to the user, depending on the UEFI implementer;

Now, before looking at what can go wrong, many might wonder where are all the UEFI bootloaders like rEFInd, systemd-boot, Grub SysLinux, etc.?

They are actually UEFI programs, that just happen to execute other UEFI programs, or load and pass control to a Linux kernel (or other operating system), and from this perspective, one can think of UEFI like a sort of DOS-like system. These UEFI programs that present themselves as bootloaders, like their BIOS-native counterparts, have their own configuration files, libraries, etc.; thus the BIOS-native complexity is still present in their UEFI-native version.

But, strictly from a technical point of view, there is no need for such UEFI "extra" bootloaders. (Unless one wants nice menus, splash screens, etc.)

In fact, although modern Linux kernel builds can act as fully functional UEFI programs, thus capable of booting themselves, all Linux distributions out there employ the "extra" UEFI bootloaders for their other nonessential functionalities (again, nice menus and splash screens, etc.)...

Back to the UEFI boot process, what can go wrong:

if something happens to your UEFI firmware, you actually need to RMA the hardware; it's broken in a very bad way, and I just wonder how you got yourself in that situation; :) (else a "reset to factory defaults" of the UEFI might help;)
if something happens to the file system, where the UEFI program is stored, the system doesn't boot;
that's it;

There might, and most likely will, be cases when the Linux kernel or initramfs / initrd are mismatched, corrupted, or most likely incompatible with your hardware. However, this is out of the control of the bootloader, be it UEFI or BIOS.

What is the minimal UEFI-enabled system?

To conclude, in order to have the most simple, straightforward, and possibly quite reliable UEFI bootup solution one needs:

have a HDD, SSD or USB stick,
that has a GPT partition table;
- although some UEFI implementations also support a MBR partition table;
- and in fact, some UEFI implementations work even without a partition table, having the file-system created directly at the disk level;
in that partition table have a FAT32 partition;
have that FAT32 partition marked as "bootable" (marked as ESP in case of GPT);
in that FAT32 partition have an EFI/BOOT folder and subfolder;
and in that EFI/BOOT subfolder have a BOOTx64.efi file, that is a UEFI program;

There is no need to write the first 440 bytes of the disk / partition.
There is no need for hard-coded data, at hard-coded offsets on the partition.
There is no need for other library files.
There is no need for configuration files.

It just works!

What's the catch?

A veteran Linux / BSD user, would ask:
Where is the initramfs / initrd?
Where are the kernel boot arguments?

Technically the UEFI-enabled Linux kernel is able to load its own initramfs / initrd (if you pass the initrd=... boot argument).

And where do these boot arguments come from?
From the UEFI variables, where someone has previously written them.
How do the UEFI variables get written then?
Well, it's a long story...

Or, one can just use a UEFI "extra" bootloader (such as rEFInd, systemd-boot, etc.), and now we can see what is the advantage of having a mini-OS for a bootloader: we get menus, we get configuration files, we get nice things.

Alternatively, one can embed in the kernel image, at kernel build time, the boot arguments and the initramfs / initrd image.

Finally, you can use a UKI (Unified Kernel Image), which is a UEFI program that embeds in itself the Linux kernel, initramfs / initrd, and boot arguments. (I'll write more on the topic in a later article. Until then one can look at one of the following sources: Gentoo, Arch Linux, or this article.)

To really conclude

In order to have a fully functional UEFI bootable system, one needs a single disk, with a single partition (although in some UEFI implementations this is optional), with a single file (in case of UKI).

What can go wrong?

In reality, it turns out a lot, because you need to fuss a lot with the UKI to have it working. But this is a story for another time...

UEFI Secure Boot?

A few words about UEFI Secure Boot, and how it ties into all this. (More details about it can also be found in: The rEFInd Boot Manager: Managing Secure Boot.)

What does it do?
In simple terms, it allows a system administrator to sign a UEFI program, and have the UEFI firmware load and execute only signed UEFI programs.

This also ties into TPM, which in the end allows you to cryptographically bind some secrets, like disk encryption keys, to a particular hardware, a particular firmware, and a particular OS.

Unfortunately in practice, due to many reasons, one of which is UEFI boot complexity, UEFI Secure Boot isn't as secure as it promises to be (there are countless articles and reports out there, just search for "UEFI vulnerability").

However, if you strip the boot complexity to its minimum, following what I have been describing in this article would get you to a UEFI Secure Boot procedure quite as secure as it can be.

But, this is also a topic for a later article.