While setting up various clusters for SAP on SUSE, I met customers without a redundant NFS storage solution. S/4 HANA uses a small NFS volume in their HA solution. To resolve this case, I was asked to provide an HA solution for the NFS service.
I will be using SLE15 in this installation since it is already used for the rest of the SAP installation.
Both nodes must be prepared in the same way. There are a number of important points that need to be configured and checked.
The HA solution cannot be implemented using DHCP, use fixed IP addresses. Names for nodes and VIPs must be DNS resolvable. In any case, add all used names and IP addresses to /etc/hosts on both nodes. This will reduce the dependency on third-party services and also reduce name resolution time.
Cluster services are closely related to time and timeouts. It is important to synchronize time with any available external time source.
The cluster software configuration only looks for working chronyd, so check the contents of the /etc/chrony.conf and /etc/chrony.d/*.conf files and enable the chronyd service.
# systemctl enable --now chronyd # chronyc sources
This may seem wrong, but it can be a life saver when connecting to a cluster VIP over SSH.
root@nfs1:~ # rsync -av /etc/ssh/ssh_host_* root@nfs2:/etc/ssh/
Since SLE is used here, you need to activate the HA module. It is already enabled if you use SLE for SAP.
# SUSEConnect -l | grep -B1 sle-ha/15.5/x86_64 SUSE Linux Enterprise High Availability Extension 15 SP5 x86_64 (Activated) Deactivate with: suseconnect -d -p sle-ha/15.5/x86_64
Then install all the packages needed to run HA on both nodes:
# zypper in -y iputils .. # zypper in -y -t pattern ha_sles ..
On one node, say node1, start the initial cluster:
root@nfs1:~ # crm cluster init -u -n NFS INFO: Loading "default" profile from /etc/crm/profiles.yml INFO: SSH key for root does not exist, hence generate it now INFO: The user 'hacluster' will have the login shell configuration changed to /bin/bash Continue (y/n)? y INFO: SSH key for hacluster does not exist, hence generate it now INFO: Configuring csync2 INFO: Starting csync2.socket service on nfs1 INFO: BEGIN csync2 checking files INFO: END csync2 checking files INFO: Configure Corosync (unicast): This will configure the cluster messaging layer. You will need to specify a network address over which to communicate (default is eth0's network, but you can use the network address of any active interface). Address for ring0 [192.168.120.11]<Enter> Port for ring0 [5405]<Enter> INFO: Configure SBD: If you have shared storage, for example a SAN or iSCSI target, you can use it avoid split-brain scenarios by configuring SBD. This requires a 1 MB partition, accessible to all nodes in the cluster. The device path must be persistent and consistent across all nodes in the cluster, so /dev/disk/by-id/* devices are a good choice. Note that all data on the partition you specify here will be destroyed. Do you wish to use SBD (y/n)? n WARNING: Not configuring SBD - STONITH will be disabled. INFO: Hawk cluster interface is now running. To see cluster status, open: INFO: https://192.168.120.11:7630/ INFO: Log in with username 'hacluster', password 'linux' WARNING: You should change the hacluster password to something more secure! INFO: BEGIN Waiting for cluster ........... INFO: END Waiting for cluster INFO: Loading initial cluster configuration INFO: Configure Administration IP Address: Optionally configure an administration virtual IP address. The purpose of this IP address is to provide a single IP that can be used to interact with the cluster, rather than using the IP address of any specific cluster node. Do you wish to configure a virtual IP address (y/n)? n INFO: Configure Qdevice/Qnetd: QDevice participates in quorum decisions. With the assistance of a third-party arbitrator Qnetd, it provides votes so that a cluster is able to sustain more node failures than standard quorum rules allow. It is recommended for clusters with an even number of nodes and highly recommended for 2 node clusters. Do you want to configure QDevice (y/n)? n INFO: Done (log saved to /var/log/crmsh/crmsh.log)Explanation:
It's time to connect the second node to the cluster. On the second node, run the command:
root@nfs2:~ # crm cluster join INFO: Join This Node to Cluster: You will be asked for the IP address of an existing node, from which configuration will be copied. If you have not already configured passwordless ssh between nodes, you will be prompted for the root password of the existing node. IP address or hostname of existing node (e.g.: 192.168.1.1) []192.168.120.11 INFO: SSH key for root does not exist, hence generate it now INFO: The user 'hacluster' will have the login shell configuration changed to /bin/bash Continue (y/n)? y INFO: SSH key for hacluster does not exist, hence generate it now INFO: Configuring csync2 INFO: Starting csync2.socket service INFO: BEGIN csync2 syncing files in cluster INFO: END csync2 syncing files in cluster INFO: Merging known_hosts INFO: BEGIN Probing for new partitions INFO: END Probing for new partitions Address for ring0 [192.168.120.12]<Enter> INFO: Hawk cluster interface is now running. To see cluster status, open: INFO: https://192.168.120.12:7630/ INFO: Log in with username 'hacluster', password 'linux' WARNING: You should change the hacluster password to something more secure! INFO: BEGIN Waiting for cluster .. INFO: END Waiting for cluster INFO: Set property "priority" in rsc_defaults to 1 INFO: BEGIN Reloading cluster configuration INFO: END Reloading cluster configuration INFO: Done (log saved to /var/log/crmsh/crmsh.log)
The answers are obvious and the results look like this:
# crm status Status of pacemakerd: 'Pacemaker is running' (last updated 2024-10-03 15:19:15 +03:00) Cluster Summary: * Stack: corosync * Current DC: nfs1 (version 2.1.5+20221208.a3f44794f-150500.6.17.1-2.1.5+20221208.a3f44794f) - partition with quorum * Last updated: Thu Oct 3 15:19:15 2024 * Last change: Thu Oct 3 15:16:02 2024 by root via cibadmin on nfs2 * 2 nodes configured * 0 resource instances configured Node List: * Online: [ nfs1 nfs2 ] Full List of Resources: * No resources
Setting up fencing is beyond the scope of this article, as the LAB environment does not reflect the fencing used in production. You can refer to this example to set up the correct fencing.
Without fencing, a two-node cluster is non-functional and should not be used.
We need to install the missing software on both nodes:
# zypper in -y nfs-kernel-server
If you do not have a shared disk from storage, which is possible in the case of a stretched cluster, you can use a DRBD device instead. This article describes how to configure a DRBD resource. The other resources described here should be tight to the master state of DRBD resource.
We have a shared disk attached to both of our nodes. I will create an LVM VG and two LVs on it. The VG will be activated by the cluster in exclusive mode. So no VGs other than the "rootvg" should be activated during boot. This can be achieved by setting the volume_list parameter in the /etc/lvm/lvm.conf file. Set it as shown:
activation { .. volume_list = [ "rootvg" ] .. }where "rootvg" is a real name of your root VG.
After correcting the /etc/lvm/lvm.conf file, the initrd file should be recreated. This is because, despite the new system-wide configuration file, other VGs may be activated during the old initrd stage of boot.
# dracut --force
Apply these changes on both nodes.
Let's create the required LVM structure on the shared disk of the first node:
root@nfs1:~ # pvcreate /dev/sda Physical volume "/dev/sda" successfully created. root@nfs1:~ # vgcreate nfsvg /dev/sda Volume group "nfsvg" successfully created root@nfs1:~ # vgs VG #PV #LV #SN Attr VSize VFree nfsvg 1 0 0 wz--n- 10.00g 10.00g rootvg 1 5 0 wz--n- 60.00g 48.00g root@nfs1:~ # lvcreate -L64m -n track nfsvg Volume nfsvg/track is not active locally (volume_list activation filter?). Aborting. Failed to wipe start of new LV.
As you can see, there is no access to "nfsvg" because of the filter we added. Let's accomplish our task using a temporary lvm.conf:
root@nfs1:~ # cp /etc/lvm/lvm.conf /tmp/ root@nfs1:~ # vi /tmp/lvm.conf # <- Remove volume_list = [ "rootvg" ] line !! root@nfs1:~ # export LVM_SYSTEM_DIR=/tmp root@nfs1:~ # vgchange -ay nfsvg 0 logical volume(s) in volume group "nfsvg" now active root@nfs1:~ # lvcreate -L64m -n track nfsvg Logical volume "track" created. root@nfs1:~ # lvcreate -L2G -n data nfsvg Logical volume "data" created. root@nfs1:~ # mkfs.ext4 -j -m0 /dev/nfsvg/track .. root@nfs1:~ # mkfs.xfs /dev/nfsvg/data .. root@nfs1:~ # vgchange -an nfsvg 0 logical volume(s) in volume group "nfsvg" now active root@nfs1:~ # unset LVM_SYSTEM_DIR
We will create resources by adding them to a text file. This file can be used as a backup of the configuration and can be managed by any version control system, such as GIT. Here is the content of the resources.txt file.
# Activate LVM VG in exclusive mode: primitive p-vg-activate LVM \ params volgrpname=nfsvg exclusive=true \ meta target-role=Started \ op start timeout=30s interval=0s \ op stop timeout=30s interval=0s \ op monitor timeout=30s interval=10s # Mount /export primitive p-fs-data Filesystem \ params device="/dev/nfsvg/data" directory="/export" fstype=xfs \ meta target-role=Started \ op start timeout=60s interval=0s \ op stop timeout=60s interval=0s \ op monitor timeout=40s interval=20s # Mount /var/lib/nfs/nfsdcltrack primitive p-fs-track Filesystem \ params device="/dev/nfsvg/track" directory="/var/lib/nfs/nfsdcltrack" fstype=ext4 \ meta target-role=Started \ op start timeout=60s interval=0s \ op stop timeout=60s interval=0s \ op monitor timeout=40s interval=20s # Define VIP for NFS primitive p-ip-nfsvip IPaddr2 \ params ip=192.168.120.10 \ meta target-role=Started \ op start timeout=20s interval=0s \ op stop timeout=20s interval=0s \ op monitor timeout=20s interval=10s # Starting NFS service primitive p-nfsserver systemd:nfs-server op monitor interval="30s" # Exporting NFS primitive p-nfsexport exportfs \ params clientspec="192.168.120.0/24" directory="/export" options="sec=sys,no_root_squash,rw" fsid=10 \ meta target-role=Started \ op start timeout=40s interval=0s \ op stop timeout=120s interval=0s \ op monitor timeout=20s interval=10s # Group everything together, order is matter group g-nfs p-vg-activate p-fs-data p-fs-track p-ip-nfsvip p-nfsserver p-nfsexport \ meta target-role=Started
Once the file is ready, apply it to the configuration.
# crm configure load update resources.txt