After I did diskless SUSE, the obvious next step was to do the same with Red Hat.
The grandest idea was the idea of mounting an overlay filesystem as root. Thus, the shared root read-only file system becomes like a template for each node. Locally written files are stored in node's memory and will be lost during poweroff or reboot. As a downside, the overlay's lower, base filesystem cannot be changed on the fly without breaking the filesystem. This means that every maintenance of the shared root file system requires a reboot of all connected nodes.
A patch for dracut including the overlay solution has been added to the body of the article.
First, I deployed a minimal installation of Red Hat 8.5 and added second NIC to it, connected to an isolated network. This network will be used for PXE boot and NFS traffic. I gave it an IP of 10.1.255.254 serving Class B 10.1.0.0/16 network.
# ip address 1: lo:mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 .. 2: ens18: mtu 1500 qdisc fq_codel state UP group default qlen 1000 .. 3: ens19: mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 72:ba:11:3c:df:36 brd ff:ff:ff:ff:ff:ff # nmcli connection add con-name PXE type ethernet ifname ens19 ipv4.method manual ipv4.addresses 10.1.255.254/16 ipv6.method ignore Connection 'PXE' (6b9a9f13-f988-48b8-bad9-b92a7dbe0b7d) successfully added.
This time we will try use NFSv4 for difference.
# dnf install -y bash-completion rsync nfs-utils # sed -e 's/.*vers2=.*/vers2=n/' \ -e 's/.*vers3=.*/vers3=n/' \ -e 's/.*vers4=.*/vers4=y/' \ -e 's/.*vers4\.0=.*/vers4.0=y/' \ -e 's/.*vers4\.1=.*/vers4.1=y/' \ -e 's/.*vers4\.2=.*/vers4.2=y/' \ -i /etc/nfs.conf # systemctl mask --now rpc-statd.service rpcbind.service rpcbind.socket # systemctl enable --now nfs-server
The firewall is installed and enabled by default, you can either disable it or add a new NFS rule:
# firewall-cmd --add-service=nfs --permanent success # firewall-cmd --reload success # firewall-cmd --list-all public (active) target: default icmp-block-inversion: no interfaces: ens18 ens19 sources: services: cockpit dhcpv6-client nfs ssh ports: protocols: forward: no masquerade: no forward-ports: source-ports: icmp-blocks: rich rules:
Add some NFS exports:
# mkdir -p /export/root /export/home # echo "/export/root -sec=sys,no_root_squash,sync,fsid=1,ro 10.1.0.0/16" > /etc/exports # echo "/export/home -sec=sys,no_root_squash,sync,fsid=2,rw 10.1.0.0/16" >> /etc/exports # exportfs -a # exportfs -v /export/root 10.1.0.0/16(sync,wdelay,hide,no_subtree_check,fsid=1,sec=sys,ro,secure,no_root_squash,no_all_squash) /export/home 10.1.0.0/16(sync,wdelay,hide,no_subtree_check,fsid=2,sec=sys,rw,secure,no_root_squash,no_all_squash)
Populate an NFS root export with some minimal data:
# dnf --installroot /export/root \ --setopt=reposdir=/etc/yum.repos.d --config /etc/dnf/dnf.conf \ install -y \ shadow-utils dnf redhat-release rsyslog passwd rsync \ nfs-utils openssh-server bash-completion vim-minimal \ patch kernel dracut dracut-network
Install to /export/root instead of host, while using host configuration and repositories.
You cannot install an RPM directly on NFS because some RPMs use extended attributes that are not supported by NFS. Therefore, you must perform this installation on any Red Hat system and copy the resulting content to an NFS server. In our case, copying is not necessary.
Create a chroot environment, enter the chroot environment, patch dracut, and create an initrd.
# cat > /export/root/command_mount_chroot << EOF mount -o bind /proc proc mount -o bind /sys sys mount -o bind /dev dev PS1='chroot# ' chroot . /bin/bash -i umount dev umount sys umount proc EOF # cd /export/root # sh command_mount_chroot chroot# cd /usr/lib/dracut/modules.d/95nfs chroot# patch -p1 << 'EOFpatch' diff -Naur 95nfs/module-setup.sh 95nfs.patch/module-setup.sh --- 95nfs/module-setup.sh 2022-01-11 16:21:14.000000000 +0200 +++ 95nfs.patch/module-setup.sh 2022-01-28 17:03:38.712711159 +0200 @@ -92,6 +92,7 @@ inst_hook cmdline 90 "$moddir/parse-nfsroot.sh" inst_hook pre-udev 99 "$moddir/nfs-start-rpc.sh" inst_hook cleanup 99 "$moddir/nfsroot-cleanup.sh" + inst_hook pre-mount 95 "$moddir/nfs-overlay-mount.sh" inst "$moddir/nfsroot.sh" "/sbin/nfsroot" inst "$moddir/nfs-lib.sh" "/lib/nfs-lib.sh" mkdir -m 0755 -p "$initdir/var/lib/nfs/rpc_pipefs" diff -Naur 95nfs/nfs-overlay-mount.sh 95nfs.patch/nfs-overlay-mount.sh --- 95nfs/nfs-overlay-mount.sh 1970-01-01 02:00:00.000000000 +0200 +++ 95nfs.patch/nfs-overlay-mount.sh 2022-01-28 17:01:54.755365530 +0200 @@ -0,0 +1,5 @@ +#!/bin/sh + +mkdir -p /run/{lower,upper,work} +nfsroot lo $netroot /run/lower +mount -t overlay overlay -o rw,lowerdir=/run/lower,upperdir=/run/upper,workdir=/run/work $NEWROOT EOFpatch chroot# chmod 755 nfs-overlay-mount.sh chroot# dracut --no-hostonly --no-hostonly-cmdline --nofscks \ --add-drivers "virtio_net overlay" --install more -m "nfs network base" \ --force /boot/rh8-5nfs.ird $(basename $(ls -1d /lib/modules/* | tail -1)) chroot# echo "10.1.255.254:/export/home /home nfs4 rw,sec=sys 0 0" > /etc/fstab chroot# exit
Install packages required for PXE:
# dnf install -y dhcp-server syslinux tftp-server tftp # ln -s /var/lib/tftpboot / # dnf install -y syslinux-tftpboot.noarch
The last package installs things in /tftpboot while the TFTP server expects it in /var/lib/tftpboot. This simple symbolic link before installation solves the problem of copying content.
Create a config file
# /etc/dhcp/dhcpd.conf allow booting; allow bootp; ddns-update-style none; default-lease-time 14400; deny unknown-clients; ignore client-updates; update-static-leases on; get-lease-hostnames true; use-host-decl-names on; subnet 10.1.0.0 netmask 255.255.0.0 { #option domain-name "diskless.domain.com"; #option domain-name-servers 192.168.0.1; #option routers 192.168.0.1; #option ntp-servers 192.168.0.1; option subnet-mask 255.255.0.0; filename "pxelinux.0"; next-server 10.1.255.254; pool { range dynamic-bootp 10.1.1.0 10.1.1.255; host dc1 { hardware ethernet 26:a7:0d:a6:62:7d; fixed-address 10.1.1.1; } } }
Please fix the "dc1" MAC address reflects the reality.
Add DHCP and TFTP to firewall rules and start services.
# firewall-cmd --get-services .. # firewall-cmd --add-service=dhcp --add-service=tftp --permanent success # firewall-cmd --reload success # firewall-cmd --list-all public (active) target: default icmp-block-inversion: no interfaces: ens18 ens19 sources: services: cockpit dhcp dhcpv6-client nfs ssh tftp ports: protocols: forward: no masquerade: no forward-ports: source-ports: icmp-blocks: rich rules: # systemctl enable --now dhcpd.service tftp.socket
Create pxelinux configuration files:
# mkdir /tftpboot/pxelinux.cfg # cat > /tftpboot/pxelinux.cfg/default << EOF default vesamenu.c32 timeout 15 LABEL linux MENU LABEL NFS root kernel rh8-5nfs.krl append initrd=rh8-5nfs.ird splash=none root=nfs:10.1.255.254:/export/root:ro,vers=4.2,sec=sys,nolock IPAPPEND 2 EOF
Copy kernel and initrd to tftp root directory
# cp -fLv /export/root/boot/vmlinuz-$(basename $(ls -1d /lib/modules/* | tail -1)) /tftpboot/rh8-5nfs.krl # cp -fLv /export/root/boot/rh8-5nfs.ird /tftpboot/ # chmod 644 /tftpboot/rh8-5nfs.*
You can set a password for root. This is optional because you can set up an ssh key exchange (see next paragraph). If the host system is using selinux, this password setting will not work until you switch to Permissive mode. This is because the selinux labels are already sat to the NFS root during the initial installation.
# setenforce Permissive # sh command_mount_chroot chroot# passwd Changing password for user root. New password: Retype new password: passwd: all authentication tokens updated successfully. chroot# exit
As an alternative, configure passwordless ssh connection:
# rsync -av /etc/ssh/ /export/root/etc/ssh/ # ssh-keygen -t rsa -b 2048 # mkdir -m700 /export/root/root/.ssh # cat ~/.ssh/id_rsa.pub >> /export/root/root/.ssh/authorized_keys
During client boot, NetworkManager will attempt to configure an already running network interface, which will result in the root file system being unavailable and the client hanging. This file here marks all devices as unmanaged by NM:
# cat > /export/root/etc/NetworkManager/conf.d/99-unmanaged-devices.conf << EOFcat [keyfile] unmanaged-devices=interface-name:* EOFcat
Now you can boot your first "dc1" server from network and check everything works.
NOTE: Just to remind you, that changind lower overlay filesystem will break all functionality and every node should be rebooted.