GRUB, it is time we broke up. It’s not you, it’s me. Okay, it’s you.
The last 15+ years have some great (read: painful) memories. But it is time to
call it quits.
GRUB was designed for a world where bootloaders had to locate a Linux kernel
on a filesystem. This meant it needed support for all the filesystems anyone
might conceivably use. It was also built for a world where dual-booting meant
having a bootloader implemented menu to choose between operating systems.
The UEFI world we live in today looks nothing like this. UEFI requires
support for a standard filesystem. This filesystem, which for all intents and
purposes duplicates the contents of /boot, is required on every Linux system
which boots UEFI. So UEFI loads the bootloader from the UEFI partition and then
the bootloader loads the kernel from the /boot partition.
Did you know that UEFI can just boot the kernel directly? It can!
The situation, however, is much worse than just duplicated effort. With the
exception of Apple hardware, practically all UEFI implementations ship with
Secure Boot and a TPM enabled by default. Only appropriately signed UEFI
code will be run. This means we now introduce a [shim][shim] which is signed.
This, in turn, loads GRUB from the UEFI partition.
This means that our boot process now looks like this:
- UEFI filesystem
- /boot filesystem
It gets worse. Microsoft OEMs are now enabling BitLocker by
default. BitLocker seals (encrypts) the Windows partition to the TPM PCRs.
This means that if the boot process changes (and you have no backup of the
key), you can’t decrypt your data. So remember that great boot menu that GRUB
provided so we can dual-boot with Windows? It can never work,
The user experience of this process is particularly painful. Users who manage
to get Fedora installed will see a nice GRUB menu entry for Windows. But if
they select it, they are immediately greeted with a terrifying message
telling them that the boot configuration has changed and their encrypted data
To recap, where Secure Boot is enabled (pretty much all Intel hardware), we
must use the boot menu provided by UEFI. If we don’t, the PCRs of the TPM
have unknown hashes and anything sealed to the boot state will fail to decrypt.
The good news is that Intel provides a reference implementation of UEFI, and
it includes pretty much everything we’d ever need. This means that most vendors
get it pretty much correct as well. OEMs are even using these facilities for
their own (hidden) recovery partitions.
So why not just have UEFI boot the kernel directly? There are still some
drawbacks to this approach.
First, it requires signing every build of the kernel. This is definitely
undesirable since kernels are updated pretty regularly.
Second, every kernel upgrade would mean a write to UEFI NVRAM. There are some
concerns about the longevity of the hardware under such frequent UEFI writes.
Third, it exposes kernels as a menu option in UEFI. This menu typically
contains operating systems, not individual kernels, which results in a poor
user experience. Most users don’t need to care about what kernel they boot.
There should be a bootloader which loads the most recently installed kernel
and falls back to older kernels if the new kernels fail to boot. All of this
can be done without a menu (unless the user presses a key).
With systemd-boot, our boot process can look like this:
- UEFI filesystem
It would even be possible (though, not necessarily desirable) to sign
systemd-boot directly and get rid of the shim.
In short, we need to stop trying to make GRUB work in our current context and
switch to something designed specifically for the needs of our modern systems.
We already ship this code in systemd. Further, systemd already ships a tool for
managing the bootloader. We just need to enable it in Anaconda and test it.
Who’s with me!?
P.S. – It would be very helpful if we could get some good documentation on
manually migrating from GRUB to systemd-boot. This would at least enable
the testing of this setup by brave users.