Diskless REDHAT (on Red Hat 8.5)

The target for this POC

After I did diskless SUSE, the obvious next step was to do the same with Red Hat.

The grandest idea was the idea of mounting an overlay filesystem as root. Thus, the shared root read-only file system becomes like a template for each node. Locally written files are stored in node's memory and will be lost during poweroff or reboot. As a downside, the overlay's lower, base filesystem cannot be changed on the fly without breaking the filesystem. This means that every maintenance of the shared root file system requires a reboot of all connected nodes.

A patch for dracut including the overlay solution has been added to the body of the article.

First, I deployed a minimal installation of Red Hat 8.5 and added second NIC to it, connected to an isolated network. This network will be used for PXE boot and NFS traffic. I gave it an IP of 10.1.255.254 serving Class B 10.1.0.0/16 network.

# ip address
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
 ..
2: ens18:  mtu 1500 qdisc fq_codel state UP group default qlen 1000
 ..
3: ens19:  mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 72:ba:11:3c:df:36 brd ff:ff:ff:ff:ff:ff
# nmcli connection add con-name PXE type ethernet ifname ens19 ipv4.method manual ipv4.addresses 10.1.255.254/16 ipv6.method ignore
Connection 'PXE' (6b9a9f13-f988-48b8-bad9-b92a7dbe0b7d) successfully added.

NFS server

This time we will try use NFSv4 for difference.

# dnf install -y bash-completion rsync nfs-utils
# sed   -e 's/.*vers2=.*/vers2=n/' \
	-e 's/.*vers3=.*/vers3=n/' \
	-e 's/.*vers4=.*/vers4=y/' \
	-e 's/.*vers4\.0=.*/vers4.0=y/' \
	-e 's/.*vers4\.1=.*/vers4.1=y/' \
	-e 's/.*vers4\.2=.*/vers4.2=y/' \
	-i /etc/nfs.conf
# systemctl mask --now rpc-statd.service rpcbind.service rpcbind.socket
# systemctl enable --now nfs-server

The firewall is installed and enabled by default, you can either disable it or add a new NFS rule:

# firewall-cmd --add-service=nfs --permanent
success
# firewall-cmd --reload
success
# firewall-cmd --list-all
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: ens18 ens19
  sources: 
  services: cockpit dhcpv6-client nfs ssh
  ports: 
  protocols: 
  forward: no
  masquerade: no
  forward-ports: 
  source-ports: 
  icmp-blocks: 
  rich rules: 

Add some NFS exports:

# mkdir -p /export/root /export/home
# echo "/export/root -sec=sys,no_root_squash,sync,fsid=1,ro  10.1.0.0/16" > /etc/exports
# echo "/export/home -sec=sys,no_root_squash,sync,fsid=2,rw  10.1.0.0/16" >> /etc/exports
# exportfs -a
# exportfs -v
/export/root    10.1.0.0/16(sync,wdelay,hide,no_subtree_check,fsid=1,sec=sys,ro,secure,no_root_squash,no_all_squash)
/export/home    10.1.0.0/16(sync,wdelay,hide,no_subtree_check,fsid=2,sec=sys,rw,secure,no_root_squash,no_all_squash)

Populate an NFS root export with some minimal data:

# dnf --installroot /export/root \
      --setopt=reposdir=/etc/yum.repos.d --config /etc/dnf/dnf.conf \
	install -y \
	shadow-utils dnf redhat-release rsyslog passwd rsync \
	nfs-utils openssh-server bash-completion vim-minimal \
	patch kernel dracut dracut-network

Install to /export/root instead of host, while using host configuration and repositories.

You cannot install an RPM directly on NFS because some RPMs use extended attributes that are not supported by NFS. Therefore, you must perform this installation on any Red Hat system and copy the resulting content to an NFS server. In our case, copying is not necessary.

Create a chroot environment, enter the chroot environment, patch dracut, and create an initrd.

# cat > /export/root/command_mount_chroot << EOF
mount -o bind /proc proc
mount -o bind /sys sys
mount -o bind /dev dev
PS1='chroot# ' chroot . /bin/bash -i
umount dev
umount sys
umount proc
EOF
# cd /export/root
# sh command_mount_chroot
chroot# cd /usr/lib/dracut/modules.d/95nfs
chroot# patch -p1 << 'EOFpatch'
diff -Naur 95nfs/module-setup.sh 95nfs.patch/module-setup.sh
--- 95nfs/module-setup.sh       2022-01-11 16:21:14.000000000 +0200
+++ 95nfs.patch/module-setup.sh 2022-01-28 17:03:38.712711159 +0200
@@ -92,6 +92,7 @@
     inst_hook cmdline 90 "$moddir/parse-nfsroot.sh"
     inst_hook pre-udev 99 "$moddir/nfs-start-rpc.sh"
     inst_hook cleanup 99 "$moddir/nfsroot-cleanup.sh"
+    inst_hook pre-mount 95 "$moddir/nfs-overlay-mount.sh"
     inst "$moddir/nfsroot.sh" "/sbin/nfsroot"
     inst "$moddir/nfs-lib.sh" "/lib/nfs-lib.sh"
     mkdir -m 0755 -p "$initdir/var/lib/nfs/rpc_pipefs"
diff -Naur 95nfs/nfs-overlay-mount.sh 95nfs.patch/nfs-overlay-mount.sh
--- 95nfs/nfs-overlay-mount.sh  1970-01-01 02:00:00.000000000 +0200
+++ 95nfs.patch/nfs-overlay-mount.sh    2022-01-28 17:01:54.755365530 +0200
@@ -0,0 +1,5 @@
+#!/bin/sh
+
+mkdir -p /run/{lower,upper,work}
+nfsroot lo $netroot /run/lower
+mount -t overlay overlay -o rw,lowerdir=/run/lower,upperdir=/run/upper,workdir=/run/work $NEWROOT
EOFpatch
chroot# chmod 755 nfs-overlay-mount.sh
chroot# dracut --no-hostonly --no-hostonly-cmdline --nofscks \
	--add-drivers "virtio_net overlay" --install more -m "nfs network base" \
	--force /boot/rh8-5nfs.ird  $(basename $(ls -1d /lib/modules/* | tail -1))
chroot# echo "10.1.255.254:/export/home /home nfs4 rw,sec=sys 0 0" > /etc/fstab
chroot# exit

PXE server

Install packages required for PXE:

# dnf install -y dhcp-server syslinux tftp-server tftp
# ln -s /var/lib/tftpboot /
# dnf install -y syslinux-tftpboot.noarch

The last package installs things in /tftpboot while the TFTP server expects it in /var/lib/tftpboot. This simple symbolic link before installation solves the problem of copying content.

Create a config file

# /etc/dhcp/dhcpd.conf
allow booting;
allow bootp;
ddns-update-style none;
default-lease-time 14400;
deny unknown-clients;
ignore client-updates;
update-static-leases on;
get-lease-hostnames true;
use-host-decl-names on;

subnet 10.1.0.0 netmask 255.255.0.0 {
        #option domain-name "diskless.domain.com";
        #option domain-name-servers 192.168.0.1;
        #option routers          192.168.0.1;
        #option ntp-servers      192.168.0.1;
        option subnet-mask      255.255.0.0;
        filename        "pxelinux.0";
        next-server     10.1.255.254;
        pool {
                range dynamic-bootp 10.1.1.0 10.1.1.255;
                host dc1 {
                        hardware ethernet 26:a7:0d:a6:62:7d;
                        fixed-address   10.1.1.1;
                }
        }
}

Please fix the "dc1" MAC address reflects the reality.

Add DHCP and TFTP to firewall rules and start services.

# firewall-cmd --get-services
 ..
# firewall-cmd --add-service=dhcp --add-service=tftp  --permanent
success
# firewall-cmd --reload
success
# firewall-cmd --list-all
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: ens18 ens19
  sources: 
  services: cockpit dhcp dhcpv6-client nfs ssh tftp
  ports: 
  protocols: 
  forward: no
  masquerade: no
  forward-ports: 
  source-ports: 
  icmp-blocks: 
  rich rules:
# systemctl enable --now dhcpd.service tftp.socket

Create pxelinux configuration files:

# mkdir /tftpboot/pxelinux.cfg
# cat > /tftpboot/pxelinux.cfg/default << EOF
default vesamenu.c32
timeout 15

LABEL linux
  MENU LABEL NFS root
  kernel rh8-5nfs.krl
  append initrd=rh8-5nfs.ird splash=none root=nfs:10.1.255.254:/export/root:ro,vers=4.2,sec=sys,nolock 
  IPAPPEND 2
EOF

Copy kernel and initrd to tftp root directory

# cp -fLv /export/root/boot/vmlinuz-$(basename $(ls -1d /lib/modules/* | tail -1)) /tftpboot/rh8-5nfs.krl
# cp -fLv /export/root/boot/rh8-5nfs.ird /tftpboot/
# chmod 644 /tftpboot/rh8-5nfs.*

You can set a password for root. This is optional because you can set up an ssh key exchange (see next paragraph). If the host system is using selinux, this password setting will not work until you switch to Permissive mode. This is because the selinux labels are already sat to the NFS root during the initial installation.

# setenforce Permissive
# sh command_mount_chroot
chroot# passwd
Changing password for user root.
New password: 
Retype new password: 
passwd: all authentication tokens updated successfully.
chroot# exit

As an alternative, configure passwordless ssh connection:

# rsync -av /etc/ssh/ /export/root/etc/ssh/
# ssh-keygen -t rsa -b 2048
# mkdir -m700 /export/root/root/.ssh
# cat ~/.ssh/id_rsa.pub >> /export/root/root/.ssh/authorized_keys

During client boot, NetworkManager will attempt to configure an already running network interface, which will result in the root file system being unavailable and the client hanging. This file here marks all devices as unmanaged by NM:

# cat > /export/root/etc/NetworkManager/conf.d/99-unmanaged-devices.conf << EOFcat
[keyfile]
unmanaged-devices=interface-name:*
EOFcat

Now you can boot your first "dc1" server from network and check everything works.

NOTE: Just to remind you, that changind lower overlay filesystem will break all functionality and every node should be rebooted.


Updated on Sun Feb 20 10:21:35 IST 2022 More documentations here