Install RH6 with minimal configuration on two VMs (Look into HOWTO align VMware Linux VMDK files ). Add (vmdk) disk to every node for application. Mine configuration looks as follow (on both nodes):
/dev/sda 128m -> First partition /dev/sda1 used for /boot /dev/sdb 8g -> Whole disk used as PV for rootvg /dev/sdc 30g -> Whole disk used as PV for orahome
Copy host SSH keys from one node to another:
vorh6t01 # scp vorh6t02:/etc/ssh/ssh_host_\* /etc/ssh/ ... vorh6t01 # service sshd restart
Generate root SSH keys and exchange it over cluster nodes:
vorh6t01 # ssh-keygen -t rsa -b 1024 -C "root@vorh6t" ..... vorh6t01 # cat .ssh/id_rsa.pub >> .ssh/authorized_keys vorh6t01 # scp -pr .ssh vorh6t02:
Update /etc/hosts file with all relevant IPs and put it on both nodes.
Do following LVM preparation on both nodes:
# pvcreate --dataalignment 4k /dev/sdc Physical volume "/dev/sdc" successfully created # vgcreate orahome /dev/sdc Volume group "orahome" successfully created # lvcreate -n export -L25g /dev/orahome Logical volume "export" created
No need mkfs, only lvcreate
DRBD (Distributed Replicated Block Device) will convert our both /dev/sdc disks, dedicated to each node, to behaive like shared storage. This will make our VMs becomes storage independant and allows RH fail-over cluster work.
There still no binary distribution for RH6, however you can purchase it with support from author LINBIT. However you still able to compile it from source (thanks to GPL)
# yum install make gcc kernel-devel flex rpm-build libxslt # cd /tmp && wget -q -O - http://oss.linbit.com/drbd/8.4/drbd-8.4.4.tar.gz | tar zxvf - # cd drbd-8.4.4/ # ./configure --with-utils --with-km --with-udev --with-rgmanager --with-bashcompletion --prefix=/usr --localstatedir=/var --sysconfdir=/etc # make # make install
Note:: You have to recompile kernel module every time you upgrade kernel
# make module
You can put everything in /etc/drbd.conf, however recommended by LINBIT practice to separate common and resources configuration by include directive:
# cat /etc/drbd.conf # You can find an example in /usr/share/doc/drbd.../drbd.conf.example include "drbd.d/global_common.conf"; include "drbd.d/*.res";
Copy global_common.conf from distribution to /etc/drbd.d and edit it to fix your needs.
# cat /etc/drbd.d/global_common.conf global { usage-count no; } common { handlers { } startup { wfc-timeout 300; degr-wfc-timeout 0; } options { } disk { } net { protocol C; cram-hmac-alg sha1; shared-secret "9szdFmSkQEoXU1s7UNVbpqYrhhIsGjhQ4MxzNeotPku3NkJEq3LovZcHB2pITRy"; use-rle yes; } }
Some security is not a bad idea, use "shared-secret".
# cat /etc/drbd.d/export.res resource export { device /dev/drbd1; disk /dev/orahome/export; meta-disk internal; disk { resync-rate 40M; fencing resource-and-stonith; } net { csums-alg sha1; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } handlers { fence-peer "/usr/lib/drbd/rhcs_fence"; } on vorh6t01.domain.com { address 10.10.10.240:7789; } on vorh6t02.domain.com { address 10.10.10.241:7789; } }
I've added 10.10.10/24 NIC to both VM for replication purpose only. As you can see, DRBD run over LVM's logical volume. This will help me with backup procedure later. LVM over DRBD works also, but cluster software cannot manage this configuration well, thus it not recommended.
Replicate configuration to second node:
root@vorh6t01:~ # scp -pr /etc/drbd.* root@vorh6t02:/etc/
Initialize DRBD:
root@vorh6t01:~ # drbdadm create-md export ... root@vorh6t02:~ # drbdadm create-md export ... root@vorh6t01:~ # drbdadm up export root@vorh6t02:~ # drbdadm up export # cat /proc/drbd version: 8.4.4 (api:1/proto:86-101) GIT-hash: 74402fecf24da8e5438171ee8c19e28627e1c98a build by root@vorh6t01.domain.com, 2014-03-18 12:05:58 1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:31456284
As you can see, it is in Connected state, both sides marked as Secondary and Inconsistent
Let's help DRBD to take decision:
root@vorh6t01:~ # drbdadm primary --force export root@vorh6t01:~ # cat /proc/drbd version: 8.4.4 (api:1/proto:86-101) GIT-hash: 74402fecf24da8e5438171ee8c19e28627e1c98a build by root@vorh6t01.domain.com, 2014-03-18 12:05:58 1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----- ns:2169856 nr:0 dw:0 dr:2170520 al:0 bm:132 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:27475996 [>...................] sync'ed: 7.4% (26832/28948)M finish: 0:11:03 speed: 41,416 (27,464) K/sec
OK, vorh6t01 becomes Primary and UpToDate and synchronization beguns.
Format our FS:
root@vorh6t01:~ # mkfs.ext3 -j -m0 -b4096 /dev/drbd1 ...
Try mount:
root@vorh6t02:~ # mkdir /export root@vorh6t01:~ # mkdir /export && mount /dev/drbd1 /export root@vorh6t01:~ # df Filesystem Size Used Avail Use% Mounted on ... /dev/drbd1 25G 5.9G 19G 24% /export root@vorh6t01:~ # umount /export
Fix checkconfig line of /etc/init.d/drbd script on both nodes. Also remove any hint lines between ### BEGIN INIT INFO and ### END INIT INFO. This fix will adjust drbd start/stop to correct place (as for RH6), between network and clusterware.
... # chkconfig: 2345 20 80 ... ### BEGIN INIT INFO # Provides: drbd ### END INIT INFO ...
Make DRBD starting at boot time on both nodes:
# chkconfig --add drbd # chkconfig drbd on
Install these RPM on both nodes (with all depencies):
# yum install lvm2-cluster ccs cman rgmanager
vorh6t01 and vorh6t02 are two nodes of HA (fail-over) cluser named vorh6t. Take care to make all names resolvable by DNS and add all names to /etc/hosts on both nodes.
Define cluster:
# ccs_tool create -2 vorh6t
The command above create /etc/cluster/cluster.conf file. It can be editted by hand and have to be redistributed to every node in cluster. -2 option required for two-node cluster; usual configuration suppose more than two nodes, to make quorum clear.
Open file and change nodenames to real names. The resulting file should be like:
<?xml version="1.0"?> <cluster name="vorh6t" config_version="1"> <cman two_node="1" expected_votes="1" transport="udpu" /> <clusternodes> <clusternode name="vorh6t01.domain.com" votes="1" nodeid="1"> <fence> <method name="single"> </method> </fence> </clusternode> <clusternode name="vorh6t02.domain.com" votes="1" nodeid="2"> <fence> <method name="single"> </method> </fence> </clusternode> </clusternodes> <fencedevices> </fencedevices> <rm> <failoverdomains/> <resources/> </rm> </cluster>
I am using transport="udpu" here, because my network does not support multicasts and broadcasts are not welcomed too. Without this option, my cluster works upredictable. Check:
# ccs_tool lsnode Cluster name: vorh6t, config_version: 1 Nodename Votes Nodeid Fencetype vorh6t01.domain.com 1 1 vorh6t02.domain.com 1 2 # ccs_tool lsfence Name Agent
Copy /etc/cluster/cluster.conf to second node:
vorh6t01 # scp /etc/cluster/cluster.conf vorh6t02:/etc/cluster/cluster.conf
You can start sluster services now to see it working. Start it by /etc/init.d/cman start on both nodes. Check /var/log/messages. See clustat output:
vorh6t01 # clustat Cluster Status for vorh6t @ Thu Sep 27 15:04:58 2012 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ vorh6t01.domain.com 1 Online, Local vorh6t02.domain.com 2 Online vorh6t02 # clustat Cluster Status for vorh6t @ Thu Sep 27 15:05:07 2012 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ vorh6t01.domain.com 1 Online vorh6t02.domain.com 2 Online, Local
Stop cluster services on both nodes by /etc/init.d/cman stop
There are two sections related to resources: <resources/> and <service/>. First section is about "Global" resources shared between services (like IP). Second is for resources grouped by service (like FS + script). Our cluster is single purpose cluster, then open only <service> section.
... <rm> <failoverdomains/> <resources/> <service autostart="1" name="vorh6t" recovery="relocate"> <ip address="192.168.131.12/24" /> </service> <rm> ...
Copy config file to second node.
Add cluster services to init scripts. Start cluster and resource manager on both nodes:
# chkconfig --add cman # chkconfig cman on # chkconfig --add rgmanager # chkconfig rgmanager on # /etc/init.d/cman start # /etc/init.d/rgmanager start # clustat Cluster Status for vorh6t @ Tue Oct 2 12:55:38 2012 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ vorh6t01.domain.com 1 Online, rgmanager vorh6t02.domain.com 2 Online, Local, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:vorh6t vorh6t01.domain.com started
Switch Service to another node:
# clusvcadm -r vorh6t -m vorh6t02 Trying to relocate service:vorh6t...Success service:vorh6t is now running on vorh6t02.domain.com
Freeze resources (for maintenance):
# clusvcadm -Z vorh6t Local machine freezing service:vorh6t...Success
Resume normal operation:
# clusvcadm -U vorh6t Local machine unfreezing service:vorh6t...Success
Resorce can and should be nested to create dependencies between them:
... <rm> <failoverdomains/> <resources/> <service autostart="1" name="vorh6t" recovery="relocate"> <drbd name="vorh6tdrdb" resource="export"> <fs name="vorh6tfs" device="/dev/drbd/by-res/export/0" mountpoint="/export" fstype="ext3" force_unmount="1" self_fence="1" /> </drbd> <ip address="192.168.168.3/22" /> <service> </rm> ...
Increment config_version at the beginning of /etc/cluster/cluster.conf. Distribute updated /etc/cluster/cluster.conf and inform cluster about changes:
vorh6t01 # scp /etc/cluster/cluster.conf vorh6t02:/etc/cluster/cluster.conf # cman_tool version -r -S
Check /var/log/messages for errors on both nodes. Verify status with clustat. df should show you /export mounted on one of nodes.
RH cluster almost broken without well configured fencing. You can see available fencing methods in /usr/sbin/fence*. Now we'll use VmWare fencing. Install prerequisites:
# yum install openssl-devel
Install VI Perl Toolkit on both nodes ; Somtime VmWare call it vSpher SDK, CLI or whatever. It should install /usr/lib/vmware-vcli/apps/ and other tools in /usr/bin. Package that was called "VMware-vSphere-Perl-SDK-5.5.0*" was OK for me.
/tmp # tar zxf VMware-vSphere-Perl-SDK-5.5.0-2043780.x86_64.tar.gz /tmp # cd vmware-vsphere-cli-distrib /tmp/vmware-vsphere-cli-distrib # ./vmware-install.pl
Check everything works and your user has some rights to talk with VC. Check it on both nodes:
# fence_vmware --action=status --ip="VCNAME" --username="VCUSER" --password="PASSWORD" --plug=vorh6t01
Fix /etc/cluster/cluster.conf
<?xml version="1.0"?> <cluster name="vorh6t" config_version="3"> <cman two_node="1" expected_votes="1" transport="udpu" /> <clusternodes> <clusternode name="vorh6t01.domain.com" votes="1" nodeid="1"> <fence> <method name="single"> <device name="vmware" port="vorh6t01" /> </method> </fence> </clusternode> <clusternode name="vorh6t02.domain.com" votes="1" nodeid="2"> <fence> <method name="single"> <device name="vmware" port="vorh6t02" /> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice name="vmware" agent="fence_vmware" ipaddr="VCNAME" action="off" login="VCUSER" passwd="PASSWORD" /> </fencedevices> ...
port is name of VM on VC, ipaddr is name or IP of VC
Copy to neighbour and propagate changes:
vorh6t01:~ # scp /etc/cluster/cluster.conf vorh6t02:/etc/cluster/cluster.conf vorh6t01:~ # cman_tool version -r -S
# cat /export/load.sh #!/bin/bash mkdir /export/$(hostname -s) cd /export/$(hostname -s) || exit i=1; while [ true ] ; do a=$(dd if=/dev/urandom bs=4k count=256 2>/dev/null) echo "$a" > $(echo "$a" | md5sum -b | awk '{print $1}') echo $i i=$(($i+1)) done # chmod +x /export/load.sh
If you know second node definetely dead, but your fencing had not worked (ESX dead), you can acknoledge fencing manually, like:
root@vorh6t02:~ # fence_ack_manual vorh6t01.domain.com About to override fencing for vorh6t01.domain.com. Improper use of this command can cause severe file system damage. Continue [NO/absolutely]? absolutely Done