Booting Linux
From Firmware to a running Linux Kernel
What is Linux?
- Linux kernel …core OS + built-in kernel modules
- Root file-system …used by the kernel
- Kernel command-line …overwrite default kernel configuration
Boot Methods
- Direct Linux kernel boot…
- …skip the bootloader entirely …fake the bootloader-to-Linux interface
- …load the kernel into memory and jump to its entry point
- …set processor state (stack pointer, MMU, etc.) to prepare kernel
- …complex to do …easier to use a real bootloader
- Boot from disk…
- …bootloader starts configurable second-stage (for example GRUB)
- …requires a bootloader binary and a disk image
- …Linux kernel & command-line parameters included in the disk image
- …boot device selection depends bootloader & target system
- Network boot…
- …standard method for network-based booting is PXE
- …bootloader retrieve disk image from a server on the network
- …multiple-stage process to load more capable binaries and disk images
- …relies DHCP …not very good at traversing NAT routers
Pre-Boot
After hardware Initialisation …executes bootloader
- …bootloader stored in FLASH or EEPROM
- …(typically) automatically executed
- …code located at the system reset vector
Firmware & Microcode
Firmware…
- …held in non-volatile memory
- …EPROM, EEPROM
- …flash memory
- …low-level control for a device’s specific hardware
- …provides abstraction to higher-level software
- Flashing …overwriting of existing firmware
- Embedded systems typically boot from firmware to application
- Zero-stage …hands over to a bootloader
Microcode translates machine instructions to circuit-level operations…
- …intermediary layer between CPU and the instruction set of a computer
- …set of hardware-level instructions …implement higher-level machine code instructions
- …stored to a special high-speed memory
- …lowest layer in a computer’s software stack (…basically firmware that runs on the processor)
Baseline microcode baked into ROM during CPU manufacturing…
- …immutable …can’t be changed after the processor is built
- …modern CPUs support to apply volatile updates at initialization
- Can be applied to the processor one of two ways…
- …system firmware via OEM
- …by the operating system (OS)
- …neither is updating the microcode in the processors ROM
Microcode updates required to mitigate security vulnerabilities…
- & to address stability and performance issues
- May be delivered along with UEFI firmware (or BIOS) updates…
- …applied during firmware initialization
- …roll-out by motherboard not always timely
- …firmware update typically not automated
Linux Microcode Loader
Linux Microcode Loader 1 may update microcode during boot…
- …loaded each time the CPU is initialized (during boot)
- Three loading methods…
- …built-in microcode …compiled into kernel …applied by early loader
- …early loading …updates microcode during boot (before initramfs)
- …late loading …updates microcode after booting
- …may be to late …CPUs may have used related instruction already
- …dangerous …possible unpredictable results with operational and running workloads
Verifying that microcode got updated on boot…
journalctl -k --grep=microcode
# ...or
dmesg | grep microcode
Packages on Enterprise Linux microcode_ctl
and linux-firmware
Install a suitable microcode package for late loading…
- …ensure existence of
/sys/devices/system/cpu/microcode/reload
- …looks for microcode blobs
/lib/firmware/{intel-ucode,amd-ucode}
# ...as root ...write the reload interface to 1 to reload the microcode file
echo 1 > /sys/devices/system/cpu/microcode/reload
Intel
Processor Signaturek,.
- …number identifying the model and version of an Intel processor
- …microcode image is named after the family/model/stepping
# ...resented as 3 fields ....family, model, and stepping
>>> grep -E '^(cpu family|model|stepping|microcode)' /proc/cpuinfo | sort -u
cpu family : 6
microcode : 0x71a
model : 45
model name : Intel(R) Xeon(R) CPU E5-1603 0 @ 2.80GHz
stepping : 7
- …example above family 06, model 45, stepping 07
- …corresponding microcode file would be
06-45-07
in/lib/firmware/intel-ucode
>>> dnf install -y cpuid
>>> cpuid -1 | grep -E '(family|model|stepping)'
family = 0x6 (6)
model = 0x5 (5)
stepping id = 0x7 (7)
extended family = 0x0 (0)
extended model = 0x5 (5)
(family synth) = 0x6 (6)
(model synth) = 0x55 (85)
Microcode published in the Intel Microcode Package for Linux repository on GitHub…
# ...microcode version number is included in the binary header
>>> od -t x4 intel-ucode/06-55-04 | head -n1
0000000 00000001 02006f05 12212022 00050654
# | |
# version date
AMD
Different firmware per CPU family…
>>> grep -E '^(cpu family|model|stepping|microcode)' /proc/cpuinfo | sort -u
cpu family : 23
microcode : 0x8301034
model : 49
model name : AMD EPYC 7662 64-Core Processor
stepping : 0
- …identify the right firmware blob file from a list in the Gentoo Wiki
- …AMD published new microcode on kernel.org/…/firmware
wrmsr
& rdmsr
Intel MSR-Tools are available via RPM package…
sudo dnf install -y msr-tools
- …modifies MSR (model-specific register)
- …control registers in the x86 system architecture
- Used for
- …debugging
- …program execution tracing
- …computer performance monitoring
- …toggling certain CPU features
Example use for the mitigation of the Zenbleed security issue on AMD CPUs…
# set and revert a chicken bit
wrmsr -a 0xc0011029 $(($(rdmsr -c 0xc0011029) | (1<<9)))
wrmsr -a 0xc0011029 $(($(rdmsr -c 0xc0011029) ^ (1<<9)))
Bootloader
Responsible for booting a computer…
- …executed after firmware
- First-stage…
- …smaller in code size
- …simpler implementation
- …eventually hand over to second stage bootloader
- Second-stage aka boot manager
- …for dual or multi-booting
- …configured to give the user boot choices
- …booting into a rescue, safe mode or memory test
1st Stage Bootloader
List of commonly used bootloaders
UEFI
UEFI specification outlines…
- …interface between the operating system and hardware
- …several fundamental elements…
- …system partition
- …boot services
- …runtime services
- …bootloader
- UEFI Platform Initialization (PI) specification…
- …mainstream way to implement UEFI
- …multiple phases…
- Security (SEC) phase initializes the CPU and the system
- Pre-EFI Initialization Environment (PEI)
- Driver Execution Environment (DXE)
- Boot Device Selection (BDS)
TianoCore community…
- …EDK II Project …EDK (EFI Development Kit)
- …open source UEFI reference implementation (by Intel)
- …de facto standard generic UEFI services implementation
UEFI shim loader 2
- …EFI application to execute another binary application
- …if Secure Boot enabled …validate binary against built-in certificate
- …enables second-stage bootloader to perform similar binary validation
- …if TPM enabled …PCRs extended with digests of targets
efibootmgr
Configures the EFI boot manager 3…
- …install the RPM
efibootmgr
package - …create & delete boot entries
- …change the boot order
No arguments lists the configuration…
- …
BootCurrent
…start the currently running system - …
BootOrder
…for the boot manager - …
BootNext
…scheduled to be run on next boot - …boot entries
- …number
BootXXXX
and name - …active/inactive flag (
*
means active)
- …number
- …option
-v
…verbose- …GPT partition numbers
- …UUID of the EFI system partition
- …boot loader file
- …PCI device addresses
Change the configuration requires root privileges (use sudo
)…
Changing Boot Order…
- …option
-o
/--bootorder
- …list of boot entry numbers
- …separated by
,
comma
- …option
-n
/--bootnext
# change the boot order
efibootmgr -o 5,0
# change the boot order for the next boot only
efibootmgr -n 5
# display individual UEFI boot options, from a file or an UEFI variable
efibootdump
Coreboot
…open source boot firmware for various architectures
- …aimed at replacing the proprietary firmware (BIOS/UEFI)
- …bare minimum necessary to ensure that hardware is usable
- …pass control to a different program called the payload…
- …user interfaces, file system drivers, various policies…
- …boots the operating system …for example UEFI or GRUB
- References…
2nd Stage Boot-Manager
List of commonly used second-stage bootloaders…
- …GNU GRUB
- …
systemd-boot
- …rEFInd
- …LinuxBoot
GRUB Bootloader
GRand Unified Bootloader …default EFI-mode boot loader for many distributions
- …loads the Linux kernel into memory
- …turns over execution to the kernel
- …supports multiple Linux kernels…
- …allows the user to select between them at boot time
- References…
Configuration
Configuration to generate the boot configuration in /etc/grub.d
…
- …file contain GRUB code …collected into the final
grub.cfg
- …numbering scheme with prefix
XX_
provide ordering - …should not be modified with the exception of…
{40,41}_custom
…to generate user modifications- …add custom files after
10_linux
(for example to boot non-Linux OSs)
Boot configuration in /boot/grub2/
…
grub.cfg
…script like code- …list of installed kernels
- …array ordered by sequence of installation
grub-mkconfig
/etc/default/grub
controls the operation of grub-mkconfig
…
- …key/value pairs
- …sources by a shell-script …must be valid POSIX shell input
Generate the boot configuration…
- …initializes reading
/etc/default/grub
- …generates
grub.cfg
reading/etc/grub.d
grub2-mkconfig > /boot/grub2/grub.cfg
grubby
grubby
tool to manipulating bootloader configuration files
# ..list the installed kernels
grubby --info=ALL | grep title
title="Rocky Linux (4.18.0-372.32.1.el8_6.x86_64) 8.6 (Green Obsidian)"
title="Rocky Linux (0-rescue-f78ee9576d7c41a7beeed4e77aa8a87f) 8.6 (Green Obsidian)"
Current kernel booted by default…
>>> grubby --default-kernel
/boot/vmlinuz-6.4.6-200.fc38.x86_64
# ...currently running kernel
>>> ls -1 /boot/vmlinuz-$(uname -r)
/boot/vmlinuz-6.4.6-200.fc38.x86_64
List installed kernels…
>>> ls -1 /boot/vmlinuz-*
/boot/vmlinuz-0-rescue-cb79b5692a1f4f22bee94e24d5397acd
/boot/vmlinuz-6.3.11-200.fc38.x86_64
/boot/vmlinuz-6.3.12-200.fc38.x86_64
/boot/vmlinuz-6.4.6-200.fc38.x86_64
Set a specific kernel as default…
grubby --set-default /boot/vmlinuz-${version}.${arch}
Set the default kernel for only the next reboot
>>> grubby --info ALL | grep ^id
id="cb79b5692a1f4f22bee94e24d5397acd-6.4.6-200.fc38.x86_64"
id="cb79b5692a1f4f22bee94e24d5397acd-6.3.12-200.fc38.x86_64"
id="cb79b5692a1f4f22bee94e24d5397acd-6.3.11-200.fc38.x86_64"
id="cb79b5692a1f4f22bee94e24d5397acd-0-rescue"
>>> grub2-reboot ${id}
Early User-Space
Early userspace stage …temporary mount of a root file-system (rootfs) aka init-root-directory (initrd) …used to:
- …load kernel modules to access real root file-system
- …handle decrypt of a file-system if required
- …(real) root is mounted at
/sysroot
…then switched to - …start init program from the real root file system
initrd
RAM-based file-system, cf. ramfs, rootfs and initramfs
https://www.kernel.org/doc/Documentation/filesystems/ramfs-rootfs-initramfs.txt
- ramdisk - Fixed size synthetic block device in RAM backing a file-system (requires a corresponding file-system driver).
- ramfs - Dynamically resizable RAM file-system (without a backing block device).
- tmpfs - Derivative of ramfs with size limits and swap support.
- rootfs - Kernel entry point for the root file-system storage initialized as ramfs/tmpfs. During boot early user-space usually mounts a target root file-system to the kernel rootfs.
The initramfs (aka. initrd (init ram-disk)) is a compressed CPIO formatted file-system archive extracted into rootfs during kernel boot. Contains an “init” file and the early user-space tools to enable the mount of a target root file-system.
>>> file initrd.img
initrd.img: XZ compressed data
# extract the archive and restore the CPIO formated file-system
>>> xz -dc < initrd.img | cpio --quiet -i --make-directories
# create a CPIO formated file-system, and compress it
>>> find . 2>/dev/null | cpio --quiet -c -o | xz -9 --format=lzma >"new_initrd.img"
Initramfs is loaded to (volatile) memory during Linux boot and used as intermediate root file-system, aka. early user-space:
- Prepares device drivers required to mount the final/target root file-system (rootfs) if is loaded:
- …by addressing a local disk (block device) by label or UUID
- …from the network (NFS, iSCSI, NBD)
- …from a logical volume LVM, software (ATA)RAID
dmraid
, device mapper multi-pathing - …from an encrypted source
dm-crypt
- …live system on
squashfs
oriso9660
- Provides a minimalistic rescue shell
- Mounted by the kernel to
/
if present, before executing/init
main init process (PID 1)
Enable support in the Linux kernel configuration:
>>> grep -e BLK_DEV_INITRD -e BLK_DEV_RAM -e TMPFS -e INITRAMFS $kernel/linux.config
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_INITRAMFS_COMPRESSION=".gz"
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
# CONFIG_BLK_DEV_RAM is not set
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y
While testing a new initramfs image in a virtual machine with kvm
following error message may indicate not enough memory to uncompress the image:
...
[ 0.974160] Unpacking initramfs...
[ 1.230752] Initramfs unpacking failed: write error
...
Use option -m <MB>
to increase the available memory.
C Program
Simple C program execute as initrd payload:
#include <stdio.h>
#include <unistd.h>
#include <sys/reboot.h>
int main(void) {
("Hello, world!\n");
printf(0x4321fedc);
rebootreturn 0;
}
# compile the program
>>> gcc -o initrd/init -static init.c
# create a CPIO formated file-system, and compress it
>>> ( cd initrd/ && find . | cpio -o -H newc ) | gzip > initrd.gz
# test in a virtual machine
>>> kvm -m 2048 -kernel ${KERNEL}/${version}/linux -initrd initrd.gz -append "debug console=ttyS0" -nographic
BusyBox
Download the latest BusyBox from https://busybox.net/downloads
### Download and extract busybox
>>> version=1.26.2 ; curl https://busybox.net/downloads/busybox-${version}.tar.bz2 | tar xjf -
### Configure and compile busybox
>>> cd busybox-${version}
>>> make defconfig
>>> make LDFLAG=--static -j $(nproc) 2>&1 | tee build.log
>>> make install 2>&1 | tee -a build.log
>>> ls -1 _install
bin/
linuxrc@
sbin/
usr/
Build the initramfs file system
>>> initfs=/tmp/initramfs && mkdir -p $initfs
>>> mkdir -pv ${initfs}/{bin,sbin,etc,proc,dev,sys,tmp,root}
>>> sudo cp -va /dev/{null,console,tty} ${initfs}/dev/
### Copy the busybox binaries into the initramfs
>>> cp -avR _install/* $initfs/
>>> fakeroot mknod $initfs/dev/ram0 b 1 0
>>> fakeroot mknod $initfs/dev/console c 5 0
### check the busybox environment
>>> fakeroot fakechroot /usr/sbin/chroot $initfs/ /bin/sh
>>> cat ${initfs}/init
#!/bin/sh
/bin/mount -t proc none /proc
/bin/mount -t sysfs none /sys
/sbin/mdev -s
echo -e "\nBoot took $(cut -d' ' -f1 /proc/uptime) seconds\n"
exec /bin/sh
>>> chmod +x ${initfs}/init
>>> find $initfs -print0 | cpio --null -ov --format=newc | gzip -9 > /tmp/initrd.gz
Test using a virtual machine:
>>> cmdline='root=/dev/ram0 rootfstype=ramfs init=init debug console=ttyS0'
>>> kvm -nographic -m 2048 -append "$cmdline" -kernel ${KERNEL}/${version}/linux -initrd /tmp/initrd.gz
Systemd
Debian user-space and systemd in an initramfs:
>>> apt install -y debootstrap systemd-container
>>> export ROOTFS_PATH=/tmp/rootfs
## create the root file-system
>>> debootstrap stretch $ROOTFS_PATH
## configure the root fiel-system
>>> chroot $ROOTFS_PATH
>>> passwd # change the root password
>>> ln -s /sbin/init /init # use systemd as /init
>>> exit
## create an initramfs image from the roofs
>>> ( cd $ROOTFS_PATH ; find . | cpio -ov -H newc | gzip -9 ) > /tmp/initramfs.cpio.gz
## test with a virtual machine
>>> kvm -m 2048 -kernel /boot/vmlinuz-$(uname -r) -initrd /tmp/initramfs.cpio.gz
initramfs-tools
Modular initramfs generator tool chain maintained by Debian:
https://tracker.debian.org/pkg/initramfs-tools
- Hook scripts are used to create an initramfs image.
- Boot scripts are included into the initramfs image and executed during boot.
apt install -y initramfs-tools # install package
## manage initramfs images on the local file-system, utilizing mkinitramfs
man update-initramfs # manual page
/etc/initramfs-tools/update-initramfs.conf # configuration
update-initramfs -u -k $(uname -r) # update initramfs of the currently running kernel
Low-level tools to generate initramfs images:
man initramfs-tools # introduction to writing scripts for mkinitramfs
man initramfs.conf # configuration file documentation
/etc/initramfs-tools/initramfs.conf # global configuration
ls -1 {/etc,/usr/share}/initramfs-tools/conf.d* # hooks overwriting the configuration file
ls -1 {/etc,/usr/share}/initramfs-tools/hooks* # hooks executed during generation of the initramfs
ls -1 {/etc,/usr/share}/initramfs-tools/modules* # module configuration
/usr/share/initramfs-tools/hook-functions # help functions use within hooks
mkinitramfs -o /tmp/initramfs.img # create an initramfs image for the currently running kernel
sh -x /usr/sbin/mkinitramfs -o /tmp/initramfs.img |& tee /tmp/mkinitramfs.log
# debug the image creation
lsinitramfs # list content of an initramfs image
lsinitramfs /boot/initrd.img-$(uname -r) # ^ of the currently running kernel
unmkinitramfs <image> <path> # extract the content of an initramfs
Use following to build a new initramfs and to debug it with a KVM virtual machine:
- The
debug=1
argument writes a log-file to/run/initramfs/initramfs.debug
- Use
break=
to spawn a shell at a chosen run-time (top, modules, premount, mount, mountroot, bottom, init) - The shell is basically bash with busybox…
# build the image
>>> mkinitramfs -o /tmp/initramfs.img
# start kernel & initramfs with debug flags
>>> kvm -m 256 -nographic -kernel /boot/vmlinuz-$(uname -r) -initrd /tmp/initramfs.img -append 'console=ttyS0 debug=1 break=top'
...
(initramfs) poweroff -f
...
Examine an initramfs image by extracting it into a temporary directory:
>>> cd `mktemp -d` && gzip -dc /tmp/initramfs.img | cpio -ivd
# the first program called by the Linux kernel
>>> cat init
...
Hooks
Executed during image creation to add and configure files.
Following scripting header is used as a skeleton:
PREREQ
should contain a list of dependency hooks- Read
/usr/share/initramfs-tools/hook-functions
for a list of predefined helper-functions.
Infiniband
Following examples loads Infiniband drivers:
# install Infiniband support
>>> apt install -y libmlx4-1 infiniband-diags ibutils
>>> cat /etc/initramfs-tools/hooks/infiniband
#!/bin/sh
PREREQ=""
prereqs()
{
echo "$PREREQ"
}
case $1 in
prereqs)
prereqs
exit 0
;;
esac
. /usr/share/initramfs-tools/hook-functions
mkdir -p ${DESTDIR}/etc/modules-load.d
# make sure the infiniband modules get loaded
cat << EOF > ${DESTDIR}/etc/modules-load.d/infiniband.conf
mlx4_core
mlx4_ib
ib_umad
ib_ipoib
rdma_ucm
EOF
# adds kernel module (and its dependencies) to the initramfs image
# and also unconditionally loads the module during boot
for module in $(cat ${DESTDIR}/etc/modules-load.d/infiniband.conf); do
force_load ${module}
done
# make the hook executable
>>> chmod +x /etc/initramfs-tools/hooks/infiniband
# build the image..
# check if required file are in the initramfs image
>>> lsinitramfs /tmp/initramfs.img | grep infiniband
Live-Boot
The live-boot
package contains a hook for the initramfs-tools that configure a live system during the boot process (early userspace):
- Activated if
boot=live
was used as a kernel parameter - At boot time it will look for a (read-only) medium containing a “/live” directory where a root filesystems (often a compressed filesystem image like squashfs) is stored. If found, it will create a writable environment, using aufs, to boot the system from.
https://wiki.debian.org/DebianLive
apt install -y live-boot live-boot-doc
man live-boot # overview documentation
/usr/share/initramfs-tools/hooks/live # initramfs-tools hook
/bin/live-boot # sources the config & exec. scripts
/lib/live/boot/ # scripts
Use a kvm
virtual machine with a network device and the kernel options ‘break=mountroot boot=live’:
>>> kvm -m 512 -nographic -netdev user,id=net0 -device virtio-net-pci,netdev=net0 \
-kernel /boot/vmlinuz-$(uname -r) -initrd /tmp/initramfs.img \
-append 'console=ttyS0 debug=1 boot=live ip=dhcp toram fetch=http://10.1.1.28/root.squashfs'
## in case things break ##
# check the live-boot log
(initramfs) cat /boot.log
# check the network configuration
(initramfs) cat /run/net*.conf
HTTP server hosting the files for network booting over PXE:
# install the Apache web-server
>>> apt install -y apache2
>>> rm /var/www/html/index.html
# publish the Linux kernel
>>> cp /boot/vmlinuz-$(uname -r) /var/www/html/vmlinuz
# create an initramfs image
>>> mkinitramfs -o /var/www/html/initramfs.img
# create a rootfs
>>> apt install -y debootstrap systemd-container squashfs-tools
>>> debootstrap stretch /tmp/rootfs
# access the root file-system
>>> chroot /tmp/rootfs
## ...set the root password ...
# start the root file-system in a container
>>> systemd-nspawn -b -D /tmp/rootfs/
## ... customize ...
# create a SquashFS
>>> mksquashfs /tmp/rootfs /var/www/html/root.squashfs
# iPXE kernel command line
>>> cat /var/www/html/menu
#!ipxe
kernel vmlinuz initrd=initramfs.img boot=live components toram fetch=http://10.1.1.28/root.squashfs
initrd initramfs.img
boot
Start a virtual machine with the iPXE bootloader
>>> wget http://boot.ipxe.org/ipxe.iso
# start a virtual machine with the iPXE bootloader
>>> kvm -m 2048 ipxe.iso
## Ctrl+B to get to the iPXE prompt
iPXE> dhcp
iPXE> chain http://10.1.1.28/menu
mkosi
mkosi
4 stands for Make Operating System Image …workflow:
- Generate OS tree of a distribution …install packages
- Package OS tree in variety of output formats
- (Optional) Boot resulting image in
qemu
orsystemd-nspawn
.
mkosi -d rocky -p systemd -p linux --autologin qemu
# ...add --tools-tree=default on older systems
Install
# install required packages
sudo dnf -y install mkosi mkosi-initrd systemd-ukify
# run from teh source code repository
>>> git clone https://github.com/systemd/mkosi
>>> alias mkosi=$PWD/mkosi/bin/mkosi
>>> which mkosi ; mkosi --version
mkosi=/tmp/foo/mkosi/bin/mkosi
mkosi 20.2
Footnotes
Linux Microcode Loader
https://docs.kernel.org/arch/x86/microcode.html↩︎UEFI Shim Loader, Red Hat Bootloader Team, GitHub
https://github.com/rhboot/shim↩︎efibootmgr
, Red Hat Bootloader Team, GitHub
https://github.com/rhboot/efibootmgr
https://src.fedoraproject.org/rpms/efibootmgr↩︎mkosi
, GitHub
https://github.com/systemd/mkosi↩︎