Booting Linux

From Firmware to a running Linux Kernel

Linux
Published

November 5, 2015

Modified

February 14, 2024

What is Linux?

Boot Methods

  • Direct Linux kernel boot…
    • …skip the bootloader entirely …fake the bootloader-to-Linux interface
    • …load the kernel into memory and jump to its entry point
    • …set processor state (stack pointer, MMU, etc.) to prepare kernel
    • …complex to do …easier to use a real bootloader
  • Boot from disk…
    • …bootloader starts configurable second-stage (for example GRUB)
    • …requires a bootloader binary and a disk image
    • …Linux kernel & command-line parameters included in the disk image
    • …boot device selection depends bootloader & target system
  • Network boot…
    • …standard method for network-based booting is PXE
    • …bootloader retrieve disk image from a server on the network
    • …multiple-stage process to load more capable binaries and disk images
    • …relies DHCP …not very good at traversing NAT routers

Pre-Boot

After hardware Initialisation …executes bootloader

  • …bootloader stored in FLASH or EEPROM
  • …(typically) automatically executed
  • …code located at the system reset vector

Firmware & Microcode

Firmware…

  • …held in non-volatile memory
    • …EPROM, EEPROM
    • …flash memory
  • …low-level control for a device’s specific hardware
  • …provides abstraction to higher-level software
  • Flashing …overwriting of existing firmware
  • Embedded systems typically boot from firmware to application
  • Zero-stage …hands over to a bootloader

Microcode translates machine instructions to circuit-level operations…

  • …intermediary layer between CPU and the instruction set of a computer
  • …set of hardware-level instructions …implement higher-level machine code instructions
  • …stored to a special high-speed memory
  • …lowest layer in a computer’s software stack (…basically firmware that runs on the processor)

Baseline microcode baked into ROM during CPU manufacturing…

  • …immutable …can’t be changed after the processor is built
  • …modern CPUs support to apply volatile updates at initialization
  • Can be applied to the processor one of two ways…
    • …system firmware via OEM
    • …by the operating system (OS)
    • …neither is updating the microcode in the processors ROM

Microcode updates required to mitigate security vulnerabilities…

  • & to address stability and performance issues
  • May be delivered along with UEFI firmware (or BIOS) updates…
    • …applied during firmware initialization
    • …roll-out by motherboard not always timely
    • …firmware update typically not automated

Linux Microcode Loader

Linux Microcode Loader 1 may update microcode during boot…

  • …loaded each time the CPU is initialized (during boot)
  • Three loading methods…
    • …built-in microcode …compiled into kernel …applied by early loader
    • early loading …updates microcode during boot (before initramfs)
    • late loading …updates microcode after booting
      • …may be to late …CPUs may have used related instruction already
      • …dangerous …possible unpredictable results with operational and running workloads

Verifying that microcode got updated on boot…

journalctl -k --grep=microcode
# ...or
dmesg | grep microcode

Packages on Enterprise Linux microcode_ctl and linux-firmware

Install a suitable microcode package for late loading…

  • …ensure existence of /sys/devices/system/cpu/microcode/reload
  • …looks for microcode blobs /lib/firmware/{intel-ucode,amd-ucode}
# ...as root ...write the reload interface to 1 to reload the microcode file
echo 1 > /sys/devices/system/cpu/microcode/reload

Intel

Processor Signaturek,.

  • …number identifying the model and version of an Intel processor
  • …microcode image is named after the family/model/stepping
# ...resented as 3 fields ....family, model, and stepping
>>> grep -E '^(cpu family|model|stepping|microcode)' /proc/cpuinfo | sort -u
cpu family      : 6
microcode       : 0x71a
model           : 45
model name      : Intel(R) Xeon(R) CPU E5-1603 0 @ 2.80GHz
stepping        : 7
  • …example above family 06, model 45, stepping 07
  • …corresponding microcode file would be 06-45-07 in /lib/firmware/intel-ucode
>>> dnf install -y cpuid
>>> cpuid -1 | grep -E '(family|model|stepping)'
      family          = 0x6 (6)
      model           = 0x5 (5)
      stepping id     = 0x7 (7)
      extended family = 0x0 (0)
      extended model  = 0x5 (5)
      (family synth)  = 0x6 (6)
      (model synth)   = 0x55 (85)

Microcode published in the Intel Microcode Package for Linux repository on GitHub…

# ...microcode version number is included in the binary header
>>> od -t x4 intel-ucode/06-55-04 | head -n1
0000000 00000001 02006f05 12212022 00050654
#                |        |
#                version  date

AMD

Different firmware per CPU family…

>>> grep -E '^(cpu family|model|stepping|microcode)' /proc/cpuinfo | sort -u
cpu family      : 23
microcode       : 0x8301034
model           : 49
model name      : AMD EPYC 7662 64-Core Processor
stepping        : 0

wrmsr & rdmsr

Intel MSR-Tools are available via RPM package…

sudo dnf install -y msr-tools
  • …modifies MSR (model-specific register)
  • …control registers in the x86 system architecture
  • Used for
    • …debugging
    • …program execution tracing
    • …computer performance monitoring
    • …toggling certain CPU features

Example use for the mitigation of the Zenbleed security issue on AMD CPUs…

# set and revert a chicken bit
wrmsr -a 0xc0011029 $(($(rdmsr -c 0xc0011029) | (1<<9)))
wrmsr -a 0xc0011029 $(($(rdmsr -c 0xc0011029) ^ (1<<9)))

Bootloader

Responsible for booting a computer…

  • …executed after firmware
  • First-stage
    • …smaller in code size
    • …simpler implementation
    • …eventually hand over to second stage bootloader
  • Second-stage aka boot manager
    • …for dual or multi-booting
    • …configured to give the user boot choices
    • …booting into a rescue, safe mode or memory test

1st Stage Bootloader

List of commonly used bootloaders

UEFI

UEFI specification outlines…

  • …interface between the operating system and hardware
  • …several fundamental elements…
    • …system partition
    • …boot services
    • …runtime services
    • …bootloader
  • UEFI Platform Initialization (PI) specification…
    • …mainstream way to implement UEFI
    • …multiple phases…
      • Security (SEC) phase initializes the CPU and the system
      • Pre-EFI Initialization Environment (PEI)
      • Driver Execution Environment (DXE)
      • Boot Device Selection (BDS)

TianoCore community…

  • EDK II Project …EDK (EFI Development Kit)
  • …open source UEFI reference implementation (by Intel)
  • …de facto standard generic UEFI services implementation

UEFI shim loader 2

  • …EFI application to execute another binary application
  • …if Secure Boot enabled …validate binary against built-in certificate
  • …enables second-stage bootloader to perform similar binary validation
  • …if TPM enabled …PCRs extended with digests of targets

efibootmgr

Configures the EFI boot manager 3

  • …install the RPM efibootmgr package
  • …create & delete boot entries
  • …change the boot order

No arguments lists the configuration…

  • BootCurrent…start the currently running system
  • BootOrder …for the boot manager
  • BootNext …scheduled to be run on next boot
  • …boot entries
    • …number BootXXXX and name
    • …active/inactive flag (* means active)
  • …option -v …verbose
    • …GPT partition numbers
    • …UUID of the EFI system partition
    • …boot loader file
    • …PCI device addresses

Change the configuration requires root privileges (use sudo)…

Changing Boot Order…

  • …option -o/--bootorder
    • …list of boot entry numbers
    • …separated by , comma
  • …option -n/--bootnext
# change the boot order
efibootmgr -o 5,0

# change the boot order for the next boot only
efibootmgr -n 5

# display individual UEFI boot options, from a file or an UEFI variable
efibootdump

Coreboot

…open source boot firmware for various architectures

  • …aimed at replacing the proprietary firmware (BIOS/UEFI)
  • …bare minimum necessary to ensure that hardware is usable
  • …pass control to a different program called the payload
    • …user interfaces, file system drivers, various policies…
    • …boots the operating system …for example UEFI or GRUB
  • References…

2nd Stage Boot-Manager

List of commonly used second-stage bootloaders…

GRUB Bootloader

GRand Unified Bootloader …default EFI-mode boot loader for many distributions

Configuration

Configuration to generate the boot configuration in /etc/grub.d

  • …file contain GRUB code …collected into the final grub.cfg
  • …numbering scheme with prefix XX_ provide ordering
  • …should not be modified with the exception of…
    • {40,41}_custom …to generate user modifications
    • …add custom files after 10_linux (for example to boot non-Linux OSs)

Boot configuration in /boot/grub2/

  • grub.cfg …script like code
    • …list of installed kernels
    • …array ordered by sequence of installation

grub-mkconfig

/etc/default/grub controls the operation of grub-mkconfig

  • key/value pairs
  • …sources by a shell-script …must be valid POSIX shell input

Generate the boot configuration…

  • …initializes reading /etc/default/grub
  • …generates grub.cfg reading /etc/grub.d
grub2-mkconfig > /boot/grub2/grub.cfg

grubby

grubby tool to manipulating bootloader configuration files

# ..list the installed kernels
grubby --info=ALL | grep title
title="Rocky Linux (4.18.0-372.32.1.el8_6.x86_64) 8.6 (Green Obsidian)"
title="Rocky Linux (0-rescue-f78ee9576d7c41a7beeed4e77aa8a87f) 8.6 (Green Obsidian)"

Current kernel booted by default…

>>> grubby --default-kernel
/boot/vmlinuz-6.4.6-200.fc38.x86_64

# ...currently running kernel
>>> ls -1 /boot/vmlinuz-$(uname -r)
/boot/vmlinuz-6.4.6-200.fc38.x86_64

List installed kernels…

>>> ls -1 /boot/vmlinuz-*
/boot/vmlinuz-0-rescue-cb79b5692a1f4f22bee94e24d5397acd
/boot/vmlinuz-6.3.11-200.fc38.x86_64
/boot/vmlinuz-6.3.12-200.fc38.x86_64
/boot/vmlinuz-6.4.6-200.fc38.x86_64

Set a specific kernel as default…

grubby --set-default /boot/vmlinuz-${version}.${arch}

Set the default kernel for only the next reboot

>>> grubby --info ALL | grep ^id
id="cb79b5692a1f4f22bee94e24d5397acd-6.4.6-200.fc38.x86_64"
id="cb79b5692a1f4f22bee94e24d5397acd-6.3.12-200.fc38.x86_64"
id="cb79b5692a1f4f22bee94e24d5397acd-6.3.11-200.fc38.x86_64"
id="cb79b5692a1f4f22bee94e24d5397acd-0-rescue"
>>> grub2-reboot ${id}

Early User-Space

Early userspace stage …temporary mount of a root file-system (rootfs) aka init-root-directory (initrd) …used to:

  • …load kernel modules to access real root file-system
  • …handle decrypt of a file-system if required
  • …(real) root is mounted at /sysroot …then switched to
  • …start init program from the real root file system

initrd

RAM-based file-system, cf. ramfs, rootfs and initramfs
https://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt

  • ramdisk - Fixed size synthetic block device in RAM backing a file-system (requires a corresponding file-system driver).
  • ramfs - Dynamically resizable RAM file-system (without a backing block device).
  • tmpfs - Derivative of ramfs with size limits and swap support.
  • rootfs - Kernel entry point for the root file-system storage initialized as ramfs/tmpfs. During boot early user-space usually mounts a target root file-system to the kernel rootfs.

The initramfs (aka. initrd (init ram-disk)) is a compressed CPIO formatted file-system archive extracted into rootfs during kernel boot. Contains an “init” file and the early user-space tools to enable the mount of a target root file-system.

>>> file initrd.img
initrd.img: XZ compressed data
# extract the archive and restore the CPIO formated file-system
>>> xz -dc < initrd.img | cpio --quiet -i --make-directories
# create a CPIO formated file-system, and compress it
>>> find . 2>/dev/null | cpio --quiet -c -o | xz -9 --format=lzma >"new_initrd.img"

Initramfs is loaded to (volatile) memory during Linux boot and used as intermediate root file-system, aka. early user-space:

  • Prepares device drivers required to mount the final/target root file-system (rootfs) if is loaded:
    • …by addressing a local disk (block device) by label or UUID
    • …from the network (NFS, iSCSI, NBD)
    • …from a logical volume LVM, software (ATA)RAID dmraid, device mapper multi-pathing
    • …from an encrypted source dm-crypt
    • …live system on squashfs or iso9660
  • Provides a minimalistic rescue shell
  • Mounted by the kernel to / if present, before executing /init main init process (PID 1)

Enable support in the Linux kernel configuration:

>>> grep -e BLK_DEV_INITRD -e BLK_DEV_RAM -e TMPFS -e INITRAMFS $kernel/linux.config
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_INITRAMFS_COMPRESSION=".gz"
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
# CONFIG_BLK_DEV_RAM is not set
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y

While testing a new initramfs image in a virtual machine with kvm following error message may indicate not enough memory to uncompress the image:

...
[    0.974160] Unpacking initramfs...
[    1.230752] Initramfs unpacking failed: write error
...

Use option -m <MB> to increase the available memory.

C Program

Simple C program execute as initrd payload:

#include <stdio.h>
#include <unistd.h>
#include <sys/reboot.h>
int main(void) {
    printf("Hello, world!\n");
    reboot(0x4321fedc);
    return 0;
}
# compile the program
>>> gcc -o initrd/init -static init.c
# create a CPIO formated file-system, and compress it
>>> ( cd initrd/ && find . | cpio -o -H newc ) | gzip > initrd.gz
# test in a virtual machine
>>> kvm -m 2048 -kernel ${KERNEL}/${version}/linux -initrd initrd.gz -append "debug console=ttyS0" -nographic

BusyBox

Download the latest BusyBox from https://busybox.net/downloads

### Download and extract busybox
>>> version=1.26.2 ; curl https://busybox.net/downloads/busybox-${version}.tar.bz2 | tar xjf -
### Configure and compile busybox
>>> cd busybox-${version} 
>>> make defconfig
>>> make LDFLAG=--static -j $(nproc) 2>&1 | tee build.log
>>> make install 2>&1 | tee -a build.log 
>>> ls -1 _install
bin/
linuxrc@
sbin/
usr/

Build the initramfs file system

>>> initfs=/tmp/initramfs && mkdir -p $initfs 
>>> mkdir -pv ${initfs}/{bin,sbin,etc,proc,dev,sys,tmp,root}
>>> sudo cp -va /dev/{null,console,tty} ${initfs}/dev/
### Copy the busybox binaries into the initramfs
>>> cp -avR _install/* $initfs/
>>> fakeroot mknod $initfs/dev/ram0 b 1 0
>>> fakeroot mknod $initfs/dev/console c 5 0
### check the busybox environment
>>> fakeroot fakechroot /usr/sbin/chroot $initfs/ /bin/sh
>>> cat ${initfs}/init
#!/bin/sh
/bin/mount -t proc none /proc
/bin/mount -t sysfs none /sys
/sbin/mdev -s
echo -e "\nBoot took $(cut -d' ' -f1 /proc/uptime) seconds\n"
exec /bin/sh
>>> chmod +x ${initfs}/init
>>> find $initfs -print0 | cpio --null -ov --format=newc | gzip -9 > /tmp/initrd.gz

Test using a virtual machine:

>>> cmdline='root=/dev/ram0 rootfstype=ramfs init=init debug console=ttyS0'
>>> kvm -nographic -m 2048 -append "$cmdline" -kernel ${KERNEL}/${version}/linux -initrd /tmp/initrd.gz

Systemd

Debian user-space and systemd in an initramfs:

>>> apt install -y debootstrap systemd-container
>>> export ROOTFS_PATH=/tmp/rootfs
## create the root file-system
>>> debootstrap stretch $ROOTFS_PATH
## configure the root fiel-system
>>> chroot $ROOTFS_PATH
>>> passwd                          # change the root password
>>> ln -s /sbin/init /init          # use systemd as /init
>>> exit
## create an initramfs image from the roofs
>>> ( cd $ROOTFS_PATH ; find . | cpio -ov -H newc | gzip -9 ) > /tmp/initramfs.cpio.gz
## test with a virtual machine
>>> kvm -m 2048 -kernel /boot/vmlinuz-$(uname -r) -initrd /tmp/initramfs.cpio.gz

initramfs-tools

Modular initramfs generator tool chain maintained by Debian:

https://tracker.debian.org/pkg/initramfs-tools

  • Hook scripts are used to create an initramfs image.
  • Boot scripts are included into the initramfs image and executed during boot.
apt install -y initramfs-tools                       # install package
## manage initramfs images on the local file-system, utilizing mkinitramfs
man update-initramfs                                 # manual page
/etc/initramfs-tools/update-initramfs.conf           # configuration
update-initramfs -u -k $(uname -r)                   # update initramfs of the currently running kernel

Low-level tools to generate initramfs images:

man initramfs-tools                                  # introduction to writing scripts for mkinitramfs
man initramfs.conf                                   # configuration file documentation
/etc/initramfs-tools/initramfs.conf                  # global configuration
ls -1 {/etc,/usr/share}/initramfs-tools/conf.d*      # hooks overwriting the configuration file
ls -1 {/etc,/usr/share}/initramfs-tools/hooks*       # hooks executed during generation of the initramfs
ls -1 {/etc,/usr/share}/initramfs-tools/modules*     # module configuration
/usr/share/initramfs-tools/hook-functions            # help functions use within hooks
mkinitramfs -o /tmp/initramfs.img                    # create an initramfs image for the currently running kernel
sh -x /usr/sbin/mkinitramfs -o /tmp/initramfs.img |& tee /tmp/mkinitramfs.log
                                                     # debug the image creation
lsinitramfs                                          # list content of an initramfs image
lsinitramfs /boot/initrd.img-$(uname -r)             # ^ of the currently running kernel
unmkinitramfs <image> <path>                         # extract the content of an initramfs

Use following to build a new initramfs and to debug it with a KVM virtual machine:

  • The debug=1 argument writes a log-file to /run/initramfs/initramfs.debug
  • Use break= to spawn a shell at a chosen run-time (top, modules, premount, mount, mountroot, bottom, init)
  • The shell is basically bash with busybox…
# build the image
>>> mkinitramfs -o /tmp/initramfs.img
# start kernel & initramfs with debug flags
>>> kvm -m 256 -nographic -kernel /boot/vmlinuz-$(uname -r) -initrd /tmp/initramfs.img -append 'console=ttyS0 debug=1 break=top' 
...
(initramfs) poweroff -f
...

Examine an initramfs image by extracting it into a temporary directory:

>>> cd `mktemp -d` && gzip -dc /tmp/initramfs.img | cpio -ivd
# the first program called by the Linux kernel
>>> cat init
...

Hooks

Executed during image creation to add and configure files.

Following scripting header is used as a skeleton:

  • PREREQ should contain a list of dependency hooks
  • Read /usr/share/initramfs-tools/hook-functions for a list of predefined helper-functions.

Infiniband

Following examples loads Infiniband drivers:

# install Infiniband support
>>>  apt install -y libmlx4-1 infiniband-diags ibutils
>>> cat /etc/initramfs-tools/hooks/infiniband
#!/bin/sh
PREREQ=""
prereqs()
{
     echo "$PREREQ"
}

case $1 in
prereqs)
     prereqs
     exit 0
     ;;
esac

. /usr/share/initramfs-tools/hook-functions

mkdir -p ${DESTDIR}/etc/modules-load.d

# make sure the infiniband modules get loaded
cat << EOF > ${DESTDIR}/etc/modules-load.d/infiniband.conf
mlx4_core
mlx4_ib
ib_umad
ib_ipoib
rdma_ucm
EOF

# adds kernel module (and its dependencies) to the initramfs image 
# and also unconditionally loads the module during boot
for module in $(cat ${DESTDIR}/etc/modules-load.d/infiniband.conf); do
    force_load ${module}    
done
# make the hook executable
>>> chmod +x /etc/initramfs-tools/hooks/infiniband
# build the image..
# check if required file are in the initramfs image
>>> lsinitramfs /tmp/initramfs.img  | grep infiniband

Live-Boot

The live-boot package contains a hook for the initramfs-tools that configure a live system during the boot process (early userspace):

  • Activated if boot=live was used as a kernel parameter
  • At boot time it will look for a (read-only) medium containing a “/live” directory where a root filesystems (often a compressed filesystem image like squashfs) is stored. If found, it will create a writable environment, using aufs, to boot the system from.

https://wiki.debian.org/DebianLive

apt install -y live-boot live-boot-doc
man live-boot                            # overview documentation
/usr/share/initramfs-tools/hooks/live    # initramfs-tools hook
/bin/live-boot                           # sources the config & exec. scripts
/lib/live/boot/                          # scripts

Use a kvm virtual machine with a network device and the kernel options ‘break=mountroot boot=live’:

>>> kvm -m 512 -nographic -netdev user,id=net0 -device virtio-net-pci,netdev=net0 \
    -kernel /boot/vmlinuz-$(uname -r) -initrd /tmp/initramfs.img \ 
    -append 'console=ttyS0 debug=1 boot=live ip=dhcp toram fetch=http://10.1.1.28/root.squashfs' 
## in case things break ##
# check the live-boot log
(initramfs) cat /boot.log
# check the network configuration
(initramfs) cat /run/net*.conf

HTTP server hosting the files for network booting over PXE:

# install the Apache web-server
>>> apt install -y apache2
>>> rm /var/www/html/index.html
# publish the Linux kernel
>>> cp /boot/vmlinuz-$(uname -r) /var/www/html/vmlinuz
# create an initramfs image
>>> mkinitramfs -o /var/www/html/initramfs.img
# create a rootfs
>>> apt install -y debootstrap systemd-container squashfs-tools
>>> debootstrap stretch /tmp/rootfs
# access the root file-system
>>> chroot /tmp/rootfs
## ...set the root password ...
# start the root file-system in a container
>>> systemd-nspawn -b -D /tmp/rootfs/
## ... customize ...
# create a SquashFS
>>> mksquashfs /tmp/rootfs /var/www/html/root.squashfs
# iPXE kernel command line
>>> cat /var/www/html/menu
#!ipxe
kernel vmlinuz initrd=initramfs.img boot=live components toram fetch=http://10.1.1.28/root.squashfs
initrd initramfs.img
boot

Start a virtual machine with the iPXE bootloader

>>> wget http://boot.ipxe.org/ipxe.iso
# start a virtual machine with the iPXE bootloader
>>> kvm -m 2048 ipxe.iso
## Ctrl+B to get to the iPXE prompt
iPXE> dhcp
iPXE> chain http://10.1.1.28/menu

mkosi

mkosi 4 stands for Make Operating System Image …workflow:

  1. Generate OS tree of a distribution …install packages
  2. Package OS tree in variety of output formats
  3. (Optional) Boot resulting image in qemu or systemd-nspawn.
mkosi -d rocky -p systemd -p linux --autologin qemu
# ...add --tools-tree=default on older systems

Install

# install required packages
sudo dnf -y install mkosi mkosi-initrd systemd-ukify
# run from teh source code repository
>>> git clone https://github.com/systemd/mkosi
>>> alias mkosi=$PWD/mkosi/bin/mkosi
>>> which mkosi ; mkosi --version
mkosi=/tmp/foo/mkosi/bin/mkosi
mkosi 20.2

Footnotes

  1. Linux Microcode Loader
    https://docs.kernel.org/arch/x86/microcode.html↩︎

  2. UEFI Shim Loader, Red Hat Bootloader Team, GitHub
    https://github.com/rhboot/shim↩︎

  3. efibootmgr, Red Hat Bootloader Team, GitHub
    https://github.com/rhboot/efibootmgr
    https://src.fedoraproject.org/rpms/efibootmgr↩︎

  4. mkosi, GitHub
    https://github.com/systemd/mkosi↩︎