Prepare one node similar to described at Mounting iSCSI LUN on RedHat 6 memo. Remove /etc/fstab /export entry, umount FS and export VG:
# vi /etc/fstab # umount /export # vgchange -a n datavg # vgexport datavg
Prepare second node as first one. Map same netapp LUN to second node too. After multipath saw the new LUN, rescan PVs and try to mount FS:
# vgscan # vgimport datavg # vgchange -a y datavg # mkdir /export ; mount -o discard /dev/datavg/export /export # vi /etc/fstab
Test server restart to see if everythings start automatically. Remove /etc/fstab /export entry, umount FS and export VG exactly as was done for first node.
Prepare LVM for HA configuration. Edit /etc/lvm/lvm.conf with following changes.
Filter to see only relevant devices (multipath'ed), /dev/sdb is my "rootvg" (see HOWTO align VMware Linux VMDK files for reason of that):
filter = [ "a|/dev/mapper/nlun|","a|/dev/sdb|","r/.*/" ]
Name explicit VGs activated on LVM start (this is just a list ov VGs and tag - hearthbeat NIC's hostname) :
volume_list = [ "rootvg", "@vorh6t01.domain.com" ]
Initrd have to be rebuild to include new lvm.conf in it (otherwice cluster refuse to start):
mkinitrd -f /boot/initramfs-$(uname -r).img $(uname -r)
Repeat with /etc/lvm/lvm.conf changes on other node.
Generate root SSH keys and exchange it over cluster nodes:
vorh6t01 # ssh-keygen -t rsa -b 1024 -C "root@vorh6t" ..... vorh6t01 # cat .ssh/id_rsa.pub >> .ssh/authorized_keys vorh6t01 # scp -pr .ssh vorh6t02:
Copy SSH host keys between nodes and restart sshd.
One package requre another, thus install these RPM on both nodes:
# yum install lvm2-cluster ccs cman rgmanager
vorh6t01 and vorh6t02 are two nodes of HA (fail-over) cluser named vorh6t. Take care to make all names resolvable by DNS and add all names to /etc/hosts on both nodes.
Define cluster:
# ccs_tool create -2 vorh6t
The command above create /etc/cluster/cluster.conf file. It can be editted by hand and have to be redistributed to every node in cluster. -2 option required for two-node cluster; usual configuration suppose more than two nodes, to make quorum clear.
Open file and change nodenames to real names. Check:
# ccs_tool lsnode Cluster name: vorh6t, config_version: 1 Nodename Votes Nodeid Fencetype vorh6t01.domain.com 1 1 vorh6t02.domain.com 1 2 # ccs_tool lsfence Name Agent
I do not make deal with fencing right now. This section will be added later, once installed on real physical servers.
Copy /etc/cluster/cluster.conf to second node:
vorh6t01 # scp /etc/cluster/cluster.conf vorh6t02:/etc/cluster/cluster.conf
You can start sluster services now to see it working. Start it by /etc/init.d/cman start on both nodes. Check /var/log/messages. See clustat output:
vorh6t01 # clustat Cluster Status for vorh6t @ Thu Sep 27 15:04:58 2012 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ vorh6t01.domain.com 1 Online, Local vorh6t02.domain.com 2 Online vorh6t02 # clustat Cluster Status for vorh6t @ Thu Sep 27 15:05:07 2012 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ vorh6t01.domain.com 1 Online vorh6t02.domain.com 2 Online, Local
Stop cluster services on both nodes by /etc/init.d/cman stop
There are two sections related to resources: <resources/> and <service/>. First section is about "Global" resources shared between services (like IP). Second is for resources grouped by service (like FS + script). Our cluster is single purpose cluster, then open only <service> section.
... <rm> <failoverdomains/> <resources/> <service autostart="1" name="vorh6t" recovery="relocate"> <ip address="192.168.131.12/24" /> </service> <rm> ...
Add cluster services to init scripts. Start cluster and resource manager on both nodes:
# chkconfig --add cman # chkconfig cman on # chkconfig --add rgmanager # chkconfig rgmanager on # /etc/init.d/cman start # /etc/init.d/rgmanager start # clustat Cluster Status for vorh6t @ Tue Oct 2 12:55:38 2012 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ vorh6t01.domain.com 1 Online, rgmanager vorh6t02.domain.com 2 Online, Local, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:vorh6t vorh6t01.domain.com started
Switch Service to another node:
# clusvcadm -r vorh6t -m vorh6t02 Trying to relocate service:vorh6t...Success service:vorh6t is now running on vorh6t02.domain.com
Freeze resources (for maintenance):
# clusvcadm -Z vorh6t Local machine freezing service:vorh6t...Success
Resume normal operation:
# clusvcadm -U vorh6t Local machine unfreezing service:vorh6t...Success
Resorce can and should be nested to create dependencies between them:
... <rm> <failoverdomains/> <resources/> <service autostart="1" name="vorh6t" recovery="relocate"> <ip address="10.129.131.12/22"> <lvm name="vorh6tlv" lv_name="export" vg_name="datavg"> <fs name="vorh6tfs" device="/dev/datavg/export" mountpoint="/export" fstype="ext4" options="discard" force_unmount="1" self_fence="1" /> </lvm> </ip> </service> </rm> ...
Increment config_version at the beginning of /etc/cluster/cluster.conf. Distribute updated /etc/cluster/cluster.conf and inform cluster about changes:
vorh6t01 # scp /etc/cluster/cluster.conf vorh6t02:/etc/cluster/cluster.conf # cman_tool version -r -S
Check /var/log/messages for errors on both nodes. Verify status with clustat. df should show you /export mounted on one of nodes.
Edit /etc/ssh/sshd_config file to force sshd listen only on local IPs (not on VIP). Populate shared /export with virtual FC16 distribution. Similar example can be found here Installing NFS based FC15 with updates for FC16. Then, still in chrooted FC16 skeleton, install openssh-server using yum. Exit chrooted FC16. Resulting /export listing should be similar:
# ls /export/ bin dev home lib64 media opt root sbin sys usr boot etc lib lost+found mnt proc run srv tmp var
Edit /export/etc/ssh/sshd_config to listen only on VIP.
Copy SSH host keys to FC16:
# cp -va /etc/ssh/ssh_host* /export/etc/ssh/ `/etc/ssh/ssh_host_dsa_key' -> `/export/etc/ssh/ssh_host_dsa_key' `/etc/ssh/ssh_host_dsa_key.pub' -> `/export/etc/ssh/ssh_host_dsa_key.pub' `/etc/ssh/ssh_host_key' -> `/export/etc/ssh/ssh_host_key' `/etc/ssh/ssh_host_key.pub' -> `/export/etc/ssh/ssh_host_key.pub' `/etc/ssh/ssh_host_rsa_key' -> `/export/etc/ssh/ssh_host_rsa_key' `/etc/ssh/ssh_host_rsa_key.pub' -> `/export/etc/ssh/ssh_host_rsa_key.pub'
Create service script:
# cat /export/sshd-start-stop case "$1" in start) mount -o bind /proc /export/proc mount -o bind /sys /export/sys mount -o bind /dev /export/dev mount -o bind /dev/pts /export/dev/pts chroot /export /usr/sbin/sshd ;; stop) kill -9 $(netstat -tlnp | awk '/192.168.131.12:22/ {gsub("/sshd","");print $NF}') umount /export/dev/pts || umount -l /export/dev/pts umount /export/dev || umount -l /export/dev umount /export/sys || umount -l /export/sys umount /export/proc || umount -l /export/proc ;; status) [ 'x'"$(netstat -tlnp | awk '/192.168.131.12:22/ {gsub("/sshd","");print $NF}')" = 'x' ] && exit 1 exit 0 ;; esac
Modify /etc/cluster/cluster.conf:
<rm> <failoverdomains/> <resources/> <service autostart="1" name="vorh6t" recovery="relocate"> <lvm name="vorh6tlv" lv_name="export" vg_name="datavg"> <fs name="vorh6tfs" device="/dev/datavg/export" mountpoint="/export" fstype="ext4" options="discard" force_unmount="1" self_fence="1" > <ip address="10.129.131.12/22"> <script name="vorh6tssh" file="/export/sshd-start-stop" /> </ip> </fs> </lvm> </service> </rm>
Reload configuration, see SSH started on VIP. Connect to it, we are on FC16 !!:
$ SSH vorh6t Last login: Thu Oct 4 05:33:09 2012 from ovws -bash-4.2# cat /etc/issue Fedora release 16 (Verne)
This is the way you can implement HA solution for non cluster aware applications.
Building RedHat 6 Cluster with DRBD is a seperate (but similar) document.