Name resolving is important, define servers names in DNS or hard-code them into /etc/hosts.
$ host vorh6t01 vorh6t01.domain.com has address 192.168.0.105 $ host vorh6t02 vorh6t02.domain.com has address 192.168.0.108
Servers, that used here, are CISCO UCS servers, therefore we will define UCS fencing later.
Create UCS profile and install RH6 with minimal configuration on both nodes. Create shared LUN on storage and make required zones and masking, so that both servers can see the shared LUN.
Copy host SSH keys from one node to another. This is more relevant for HA (fail-over) cluster to make easy SSH to floating IP, but I still copy them even for AA clusters:
root@vorh6t01:/etc/ssh # scp ssh_host_* root@vorh6t02:/etc/ssh/ ... root@vorh6t01:/etc/ssh # >/root/.ssh/known_hosts root@vorh6t01:/etc/ssh # ssh vorh6t02 /etc/init.d/sshd restart ...
Fix /etc/hosts with all relevant to cluster IPs and names and copy it between nodes.
Generate root SSH keys and exchange it over cluster nodes:
root@vorh6t01:~ # ssh-keygen -t rsa -b 1024 -C "root@vorh6t0x" ..... root@vorh6t01:~ # cat .ssh/id_rsa.pub >> .ssh/authorized_keys root@vorh6t01:~ # scp -pr .ssh vorh6t02:
Install these RPMs on both nodes (with all depencies):
# yum install lvm2-cluster ccs cman rgmanager gfs2-utils
vorh6t01 and vorh6t02 are two nodes of cluser named vorh6t0x. Take care to make all names resolvable by DNS and add all names to /etc/hosts on both nodes.
Define the cluster:
root@vorh6t01:~ # ccs_tool create -2 vorh6t0x
The command above creates /etc/cluster/cluster.conf file. It can be editted by hand and have to be redistributed to every node in cluster. The option -2 builds two-node cluster. The classic cluster configuration suppose more than two nodes to make quorum easy.
Open file and change nodenames to real names. The resulting file should be like:
<?xml version="1.0"?>
<cluster name="vorh6t0x" config_version="1">
  <cman two_node="1" expected_votes="1" transport="udpu" />
  <clusternodes>
    <clusternode name="vorh6t01.domain.com" votes="1" nodeid="1">
      <fence>
        <method name="single">
        </method>
      </fence>
    </clusternode>
    <clusternode name="vorh6t02.domain.com" votes="1" nodeid="2">
      <fence>
        <method name="single">
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
  </fencedevices>
  <rm>
    <failoverdomains/>
    <resources/>
  </rm>
</cluster>
I am using transport="udpu" here, because my network does not support multicasts and broadcasts are not welcomed too. Without this option, my cluster works upredictable. Check the results:
root@vorh6t01:~ # ccs_config_validate Configuration validates root@vorh6t01:~ # ccs_tool lsnode Cluster name: vorh6t0x, config_version: 1 Nodename Votes Nodeid Fencetype vorh6t01.domain.com 1 1 vorh6t02.domain.com 1 2
Copy /etc/cluster/cluster.conf to second node:
vorh6t01 # scp /etc/cluster/cluster.conf vorh6t02:/etc/cluster/cluster.conf
You can start sluster services now on both nodes by command /etc/init.d/cman start to see cluster working. Check /var/log/messages. See clustat output:
root@vorh6t01:~ # clustat Cluster Status for vorh6t0x @ Thu Feb 26 11:37:05 2015 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ vorh6t01.domain.com 1 Online, Local vorh6t02.domain.com 2 Online root@vorh6t02:~ # clustat Cluster Status for vorh6t0x @ Thu Feb 26 11:37:10 2015 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ vorh6t01.domain.com 1 Online vorh6t02.domain.com 2 Online, Local
Add cluster services to init scripts:
# chkconfig --add cman # chkconfig cman on
This cluster will not manage any resources, it just provides infrastructure for shared clustered file system, therefore no additional resources configuration required. The only fencing configuration is required for normal cluster functionality. These servers are UCS, therefore fence_cisco_ucs will be used.
I've created local user "FENCEUSER" on USCMANAGER with poweroff and server-profile roles.
Check it working:
# fence_cisco_ucs --ip=USCMANAGER --username=FENCEUSER --password=FENCEUSERPASS \ --ssl --suborg=org-YourSubOrgString --plug=vorh6t01 --action=status Status: ON
You can use on, off as action parameter to turn neigbour server on or off. It is not smart to turn off server itself.
The --suborg string is usually your "Sub-Organization" (in CISCO terms) name with prefix "org-". For example, if you had called your "Sub-Organization" as "Test" in UCS manager, then results will be --suborg=org-Test.
Once fencing tests had worked, fix cluster.conf:
# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster name="vorh6t0x" config_version="2">
        <logging syslog_priority="error"/>
  <fence_daemon post_fail_delay="20" post_join_delay="30" clean_start="1" />
  <cman two_node="1" expected_votes="1" transport="udpu" />
  <clusternodes>
    <clusternode name="vorh6t01.domain.com" votes="1" nodeid="1">
      <fence>
        <method name="single">
                <device name="ucsfence" port="vorh6t01" action="off" />
        </method>
      </fence>
    </clusternode>                                                 
    <clusternode name="vorh6t02.domain.com" votes="1" nodeid="2">
      <fence>
        <method name="single">
                <device name="ucsfence" port="vorh6t02" action="off" />
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
        <fencedevice name="myfence" agent="fence_manual" />
	<fencedevice name="ucsfence"
		agent="fence_cisco_ucs"
		ipaddr="USCMANAGER"
		login="FENCEUSER"
		passwd="FENCEUSERPASS"
		ssl="on"
		suborg="org-YourSubOrgString"
		/>
  </fencedevices>
  <rm>
    <failoverdomains/>
    <resources/>
  </rm>
</cluster>
Do not forget increment config_version number and save changes. Verify config file:
vorh6t01 # ccs_config_validate Configuration validates
Distribute file and update cluster:
vorh6t01 # scp /etc/cluster/cluster.conf vorh6t02:/etc/cluster/cluster.conf vorh6t01 # cman_tool version -r -S
Check that fencing works by turning off network on one node
We will create LVM structure on multipath device, not on underlying SCSI disks. It is important to set up correct filter line in /etc/lvm/lvm.conf to expicitly include multipathed devices and exclude others, otherwise you will see "Duplicate PV found" and LVM can deside to use single path disk instead of multipathed. Here is an example of my "filter" line, adding only "rootvg" device and multipath device:
filter = [ "a|^/dev/mapper/data|", "a|^/dev/disk/by-path/pci-0000:01:00.0-scsi-0:2:0:0|", "r/.*/" ]
Multipath does not installed in minimal installation, then add it on both nodes:
# yum install --enablerepo=updates device-mapper-multipath # /etc/init.d/multipathd start # chkconfig --add multipathd # chkconfig multipathd on
My copy of /etc/multipath.conf:
defaults {
        user_friendly_names yes
        flush_on_last_del       yes
        queue_without_daemon    no
	no_path_retry		fail
}
# Local disks in UCS:
blacklist {
        device {
                vendor "LSI"
                product "UCSB-MRAID12G"
        }
}
devices {
        device {
                vendor                  "HITACHI"
                product                 "*"
                path_checker            "directio"
                path_grouping_policy    "multibus"
                path_selector           "service-time 0"
                failback                "immediate"
                rr_weight               "uniform"
                rr_min_io_rq            "128"
                features                "0"
        }
}
multipaths {
        multipath {
                wwid 360060e800756ce00003056ce00008148
                alias   data
        }
}
As you see, my LUN with the wwid would appear as /dev/mapper/data, exactly as I wrote in LVM's filter line.
Rescan multipath devices by command "multipath" and check than "/dev/mapper/data" was constructed.
Enable LVM cluster featires on both nodes and start clvmd:
# lvmconf --enable-cluster # /etc/init.d/clvmd start # chkconfig --add clvmd # chkconfig clvmd on
Create PV and CLV on one node:
vorh6t01:~ # pvcreate --dataalignment 4k /dev/mapper/data Physical volume "/dev/mapper/data" successfully created vorh6t01:~ # vgcreate -c y datavg /dev/mapper/data Clustered volume group "datavg" successfully created vorh6t01:~ # lvcreate -n export -L 20g /dev/datavg Logical volume "export" created
Check on second node by commands pvs, vgs and lvs that everything visible there too.
Create GFS2 on one node as following:
vorh6t01:~ # mkfs.gfs2 -p lock_dlm -t vorh6t0x:export -j 2 /dev/datavg/export This will destroy any data on /dev/datavg/export. It appears to contain: symbolic link to `../dm-7' Are you sure you want to proceed? [y/n] y Device: /dev/datavg/export Blocksize: 4096 Device Size 20.00 GB (5242880 blocks) Filesystem Size: 20.00 GB (5242878 blocks) Journals: 2 Resource Groups: 80 Locking Protocol: "lock_dlm" Lock Table: "vorh6t0x:export" UUID: ae65b8eb-997c-9a3f-079d-092d7d07d2ae
where: vorh6t0x is ClusterName, export is FS name, -j 2 using two journals as we have two nodes.
Then, mount it:
# mkdir /export # mount -o noatime,nodiratime -t gfs2 /dev/datavg/export /export # echo "/dev/datavg/export /export gfs2 noatime,nodiratime 0 0" >> /etc/fstab # chkconfig --add gfs2 ; chkconfig gfs2 on
/etc/init.d/gfs2 script as part of gfs2-utils will mount/umount GFS2 from /etc/fstab at appropriate time, after cluster started and before it goes down.
Make reboots, check if "/export" mounted after reboot. In case not, repeat checks if you have correct line in /etc/fstab, if you have correct "filter" in /etc/lvm/lvm.conf, All of these were described above.
Previous configuration implies that GFS2 service (and CLVM) will cared on it's own without cluster intervention. Cluster software here supplies infrastructure only.
The second configuration implements GFS2 as cluster service. Therefore system services gfs2 and clvmd should be disabled and /etc/fstab should not include GFS2 lines.
# chkconfig --del gfs2 # chkconfig --del clvmd # chkconfig --add rgmanager ; chkconfig rgmanager on
# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster name="vorh6t0x" config_version="3">
  <fence_daemon post_fail_delay="20" post_join_delay="30" clean_start="1" />
  <cman two_node="1" expected_votes="1" transport="udpu" />
  <clusternodes>
    <clusternode name="vorh6t01.domain.com" votes="1" nodeid="1">
      <fence>
        <method name="single">
                <device name="ucsfence" ipaddr="daucs01p" port="vorh6t01" action="off" />
        </method>
      </fence>
    </clusternode>
    <clusternode name="vorh6t02.domain.com" votes="1" nodeid="2">
      <fence>
        <method name="single">
                <device name="ucsfence" ipaddr="daucs02p" port="vorh6t02" action="off" />
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <fencedevices>
        <fencedevice name="myfence" agent="fence_manual" />
        <fencedevice name="ucsfence"
                agent="fence_cisco_ucs"
                login="FENCEUSER"
                passwd="FENCEUSERPASS"
                ssl="on"
                suborg="org-YourSubOrgString"
                />
  </fencedevices>
  <rm>
        <resources>
                <script file="/etc/init.d/clvmd" name="clvmd"/>
                <clusterfs name="export"
                        device="/dev/datavg/export"
                        fstype="gfs2"
                        mountpoint="/export"
                        options="noatime,nodiratime"
                        force_unmount="1"
                        />
        </resources>
    <failoverdomains>
        <failoverdomain name="node01" restricted="1">
                <failoverdomainnode name="vorh6t01.domain.com"/>
        </failoverdomain>
        <failoverdomain name="node02" restricted="1">
                <failoverdomainnode name="vorh6t02.domain.com"/>
        </failoverdomain>
    </failoverdomains>
        <service name="mount01" autostart="1" recovery="restart" domain="node01">
                <script ref="clvmd">
                        <clusterfs ref="export"/>
                </script>
        </service>
        <service name="mount02" autostart="1" recovery="restart" domain="node02">
                <script ref="clvmd">
                        <clusterfs ref="export"/>
                </script>
        </service>
  </rm>
</cluster>
Still cannot find solution to situation when one node loosing connction to LUN. Well tuned multipath pass hardware fail to clvmd, but then LVM hangs for some reason. It also hangs dlm protocol between nodes, therefore ALL GFS2 nodes will hang too. Turning off "bad" node _may_ release rest of nodes, but usually only total reboot solves hanging problem.
That is why second configuration option was tested (cluster take care about GFS2 mounting). The idea was that cluster will fence "bad" node and release GFS2 functionality on survived nodes. But this not happen. Hanging LVM totally break cluster functionality.
Following steps are:
vorh6t03:~ # vgchange --config 'global {locking_type = 0}' -c n datavg
  WARNING: Locking disabled. Be careful! This could corrupt your metadata.
  Volume group "datavg" successfully changed
vorh6t03:~ # vgchange -ay datavg
  1 logical volume(s) in volume group "datavg" now active
vorh6t03:~ # mount -t gfs2 /dev/datavg/export /export/ -o lockproto=lock_nolock,ignore_local_fs
vorh6t03:~ # df /export
Filesystem                  Size  Used Avail Use% Mounted on
/dev/mapper/datavg-export  100G  668M  100G   1% /export
Reverting changes:
vorh6t03:~ # umount /export vorh6t03:~ # vgchange -an datavg 0 logical volume(s) in volume group "datavg" now active vorh6t03:~ # vgexport datavg Volume group "datavg" successfully exported