Here is an example of particular installation. It is not a generic HOWTO. Adopt it to your needs.
An HP server with Qlogic FC adapters was used for RedHat 5.6 installation on it. No special software was installed.
Eight FC zones were created. The server has two FC adapters connected to two fabric switches, each of them connected to each NetApp's head by two FC cards. Therefore total zones number is 8, four are for active NetApp controller and four are for partner head.
As following from above, the server can see every LUN through eight paths, four active and four backup. Four LUNs were created instead of single LUN to boost performance. This number choosed as compromise between performance and easy maintenance.
UPDATE: Using striped volume had not shown performance boost as was expected. IO bottleneck was moved to LVM device aggregating stripes. Today I'd recommend single LUN when working with central storage.
netapp> igroup create -f -t linux $IGNAME $WWN1 $WWN2 netapp> igroup set $IGNAME alua yes netapp> vol create $VOLNAME $AGGRNAME 600g netapp> exportfs -z /vol/$VOLNAME netapp> vol options $VOLNAME minra on netapp> vol autosize $VOLNAME -m 14t on netapp> qtree create /vol/$VOLNAME/LUN0 netapp> qtree create /vol/$VOLNAME/LUN1 netapp> qtree create /vol/$VOLNAME/LUN2 netapp> qtree create /vol/$VOLNAME/LUN3 netapp> lun create -s 250g -t linux -o noreserve /vol/$VOLNAME/LUN0/lun0 netapp> lun create -s 250g -t linux -o noreserve /vol/$VOLNAME/LUN1/lun1 netapp> lun create -s 250g -t linux -o noreserve /vol/$VOLNAME/LUN2/lun2 netapp> lun create -s 250g -t linux -o noreserve /vol/$VOLNAME/LUN3/lun3 netapp> lun map /vol/$VOLNAME/LUN0/lun0 $IGNAME 0 netapp> lun map /vol/$VOLNAME/LUN1/lun1 $IGNAME 1 netapp> lun map /vol/$VOLNAME/LUN2/lun2 $IGNAME 2 netapp> lun map /vol/$VOLNAME/LUN3/lun3 $IGNAME 3
The second command define my host as ALUA enabled. I'll define ALUA at server later. LUNs were created with "noreserve" option. This simulates "thin provisioning" when used with "autoresize" option for volume.
These commands were tested number of times. Usually them worked well. However, they can hang server, or cause other damage. Try to use them at maintenance window, having good backup, and do not very frequent on same server.
Common actions for SCSI (and FC) devices are: remove old (dead) SCSI devices, rescan for newly added devices and resize existing ddevices. Sometime, devices are still in use by OS, then reboot can be required
This small script will try remove all SCSI devices. Kernel would not allow it to remove devices still in use. A device can be "in use" if partition still mounted, LVM use it or multipath use it for I/O just now. Paths, that has no I/O on it will be successfully removed, therefore rescan devices immediately after this script.
Using this script is dangerous, server can hung and data lost may occur. Try to remove only relevant devices instead.
# ( cd /sys/class/scsi_device/ ; for d in * ; do echo "scsi remove-single-device $(echo $d|tr ':' ' ')" > /proc/scsi/scsi done )
The next step is to rescan SCSI devices. This command will reset FC loop and all LUNs will be re-discovered. Due to event driven nature of modern Linux, you have not have to rescan SCSI devices (next command) also.
# for FC in /sys/class/fc_host/host?/issue_lip ; do echo "1" > $FC ; sleep 5 ; done ; sleep 20
This command will rescan SCSI devices (not FC, usefull when adding disks to VM):
# for SH in /sys/class/scsi_host/host?/scan ; do echo "- - -" > $SH ; done
Enable multipath daemon:
Update for RH6: Replace /dev/mpath to /dev/mapper in this document.
# chkconfig --add multipathd # chkconfig multipathd on # /etc/init.d/multipathd restart
Here is a content of working /etc/multipath.conf. I've tried to be informative in comments:
defaults { user_friendly_names yes # turn it to yes for clustered environment #flush_on_last_del yes # multipathd should always be running: queue_without_daemon no } ## Blacklist non-SAN devices blacklist { # Common non-disks devices: devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*" # We are on HP HW, cciss is our single boot device: devnode "^cciss!c[0-9]d[0-9]*" # This HW has card reader, identified by: device { vendor "Single" product "*" } } # See /usr/share/doc/device-mapper-multipath-X.X.X/ for compiled defaults. # This section not needed anymore for modern distributions. # Override NETAPP's default definitions to use ALUA: devices { device { vendor "NETAPP" product "*" path_grouping_policy group_by_prio getuid_callout "/sbin/scsi_id -g -u -s /block/%n" prio alua prio_callout "/sbin/mpath_prio_alua /dev/%n" features "3 queue_if_no_path pg_init_retries 50" path_checker tur hardware_handler "1 alua" failback immediate } } # Use fixed names for our LUNs: multipaths { multipath { wwid 360a98000646564494b34653545345552 alias nlun0 } multipath { wwid 360a98000646564494b34653545347261 alias nlun1 } multipath { wwid 360a98000646564494b34653545353938 alias nlun2 } multipath { wwid 360a98000646564494b34653545356151 alias nlun3 } }
Aliass for LUN's name are very helpfull for maintenance tasks and would be used later in this memo widely. You can see these wwid at least here:
/dev/disk/by-id # ll total 0 lrwxrwxrwx 1 root root 9 Jul 18 16:46 scsi-360a98000646564494b34653545345552 -> ../../sda lrwxrwxrwx 1 root root 9 Jul 18 16:46 scsi-360a98000646564494b34653545347261 -> ../../sdb lrwxrwxrwx 1 root root 9 Jul 18 16:46 scsi-360a98000646564494b34653545353938 -> ../../sdc lrwxrwxrwx 1 root root 9 Jul 18 16:46 scsi-360a98000646564494b34653545356151 -> ../../sdd lrwxrwxrwx 1 root root 10 Jul 18 16:46 usb-Single_Flash_Reader_058F63356336 -> ../../sdag
You can use this NetApp LUN serial to WWID converter if you work with NetApp.
Flush and rebuild multipath configuration:
# multipath -F # multipath -v3 > multipath.out # view multipath.out
Check command's output (multipath.out) to verify that ALUA is in use. Verify that paths groupped by priority:
# multipath -ll nlun0 nlun0 (360a98000646564494b34653545345552) dm-9 NETAPP,LUN [size=250G][features=1 queue_if_no_path][hwhandler=0][rw] \_ round-robin 0 [prio=200][active] \_ 0:0:1:0 sde 8:64 [active][ready] \_ 0:0:2:0 sdi 8:128 [active][ready] \_ 1:0:0:0 sdq 65:0 [active][ready] \_ 1:0:1:0 sdu 65:64 [active][ready] \_ round-robin 0 [prio=40][enabled] \_ 0:0:0:0 sda 8:0 [active][ready] \_ 0:0:3:0 sdm 8:192 [active][ready] \_ 1:0:2:0 sdy 65:128 [active][ready] \_ 1:0:3:0 sdag 66:0 [active][ready]
The long number in bracets is wwid mentioned above.
Edit /etc/lvm/lvm.conf to fix following lines:
...... # By default we accept every block device: #filter = [ "a/.*/" ] filter = [ "a|/dev/mpath/nlun|","a|/dev/cciss/|","r/.*/" ] #filter = [ "a|/dev/mapper/pv_|", "a|^/dev/disk/by-path/pci-0000:01:00.0-scsi-0:2:0:0|", "r/.*/" ] #filter=["a|/dev/mpath/mroot|", "a|/dev/mapper/mroot|", "r/.*/"] ......
My root VG resides on cciss device, my data VG will be on multipath's devices. I do not want LVM start use SCSI disks directly, thus all other devices are ignored. More of that, the pattern is narrowed to use only "nlun" aliases, that will be important later.
Create PVs and VG by commands:
# for i in 0 1 2 3 ; do pvcreate --dataalignment 4k /dev/mpath/nlun$i ; done # vgcreate vg_data /dev/mpath/nlun{0,1,2,3}
An option --dataalignment 4k should align data blocks to netapp blocks (to be verified). Stripe size 64k choosed to fit into DMA size (probably outdated). FS block size set to 4k. It is good for modern storages and good for Oracle DBF, since default Oracle block size become 8k. Create striped volume and format it:
# lvcreate -i 4 -I 64k -n oradbs -L 600g /dev/vg_data # mkfs.ext3 -j -m0 -b4096 /dev/vg_data/oradbs
Fix /etc/fstab with _netdev mount options:
/dev/vg_data/oradbs /oradbs ext3 _netdev 2 2
The _netdev causes delay mount of this filesystem till multipath service is up. It is important at boot time.
Rebuild initrd image to reflect lvm.conf anf multipath.conf changes. This is not nessecary, but makes boot cleaner.
# mkinitrd -f /boot/initrd-$(uname -r).img $(uname -r) Update for RH6: # mkinitrd -f /boot/initramfs-$(uname -r).img $(uname -r)
Reboot server to see if everything works well at boot time.
Updates: Not really, just dd. Reflects nothing, all run in memory, server has more then 10g memory.
# dd if=/dev/zero of=10g.file bs=1024k count=10240 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 33.2477 seconds, 323 MB/s # sync # flush local cache # dd if=10g.file of=/dev/null bs=1024k 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 5.01478 seconds, 2.1 GB/s # # PAM installed on this NetApp # dd if=10g.file of=/dev/null bs=1024k 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 4.69177 seconds, 2.3 GB/s # # Local cache also add something
Resize NetApp's LUNs:
netapp> lun resize /vol/$VOL/LUN0/lun0 400g netapp> lun resize /vol/$VOL/LUN1/lun1 400g netapp> lun resize /vol/$VOL/LUN2/lun2 400g netapp> lun resize /vol/$VOL/LUN3/lun3 400g
Rescan SCSI devices:
# for device in /sys/block/sd* ; do echo 1 > $device/device/rescan ; sleep 2 ; done
Check /var/log/messages for size changed, like:
kernel: sdz: detected capacity change from 268435456000 to 429496729600
Ask multipath daemon to rescan slave devices:
# multipath : nlun3 (360a98000646564494b34653545356151) NETAPP,LUN [size=400G][features=1 queue_if_no_path][hwhandler=0][n/a] ....
Update for RH6: Special command for resize invented:
# multipathd -k'resize map nlun0'Issue this command for every resized multipahed device
Resize physical volumes using LVM command:
# pvresize /dev/mpath/nlun0 Physical volume "/dev/mpath/nlun0" changed 1 physical volume(s) resized / 0 physical volume(s) not resized # pvscan PV /dev/mpath/nlun0 VG vg_data lvm2 [399.97 GB / 162.47 GB free] PV /dev/mpath/nlun1 VG vg_data lvm2 [249.97 GB / 12.47 GB free] PV /dev/mpath/nlun2 VG vg_data lvm2 [249.97 GB / 12.47 GB free] PV /dev/mpath/nlun3 VG vg_data lvm2 [249.97 GB / 12.47 GB free] PV /dev/cciss/c0d0p2 VG rootvg lvm2 [136.56 GB / 130.47 GB free] Total: 5 [1.26 TB] / in use: 5 [1.26 TB] / in no VG: 0 [0 ] # pvresize /dev/mpath/nlun1 # pvresize /dev/mpath/nlun2 # pvresize /dev/mpath/nlun3
Resize logical volume and FS:
# lvresize -L 1500g /dev/vg_data/oradbs Using stripesize of last segment 8.00 MB Extending logical volume oradbs to 1.46 TB Logical volume oradbs successfully resized # resize2fs /dev/vg_data/oradbs
Create flex clone on NetApp using desired snapshot
netapp> vol clone create $SNAPVOL -s none -b $VOL $SNAPNAME netapp> lun map /vol/$SNAPVOL/LUN0/lun0 $IGNAME netapp> lun online /vol/$SNAPVOL/LUN0/lun0 netapp> lun map /vol/$SNAPVOL/LUN1/lun1 $IGNAME netapp> lun online /vol/$SNAPVOL/LUN1/lun1 netapp> lun map /vol/$SNAPVOL/LUN2/lun2 $IGNAME netapp> lun online /vol/$SNAPVOL/LUN2/lun2 netapp> lun map /vol/$SNAPVOL/LUN3/lun3 $IGNAME netapp> lun online /vol/$SNAPVOL/LUN3/lun3
Rescan FC devices
# for FC in /sys/class/fc_host/host?/issue_lip ; do echo "1" > $FC ; done ; sleep 20
Rescan multipath devices
# multipath -F # multipath # multipath -ll
You shoud see other four mpathXX devices among of nlunX devices.
Verify that LVM still use nlunX devices.
# pvscan PV /dev/mpath/nlun0 VG vg_data lvm2 [249.97 GB / 99.97 GB free] PV /dev/mpath/nlun1 VG vg_data lvm2 [249.97 GB / 99.97 GB free] PV /dev/mpath/nlun2 VG vg_data lvm2 [249.97 GB / 99.97 GB free] PV /dev/mpath/nlun3 VG vg_data lvm2 [249.97 GB / 99.97 GB free] PV /dev/cciss/c0d0p2 VG rootvg lvm2 [136.56 GB / 130.47 GB free] Total: 5 [1.11 TB] / in use: 5 [1.11 TB] / in no VG: 0 [0 ]
If you see "Found duplicate PV qCNLd6KGQmdVPwnwQAadJ8IH7KWo8xEy: using /dev/mpath14 not /dev/nlun0", probably, you are in trouble. This should not be happen, because our /etc/lvm/lvm.conf includes only "nlunX" devices.
Replicate lvm.conf:
# mkdir /lvmtemp # cp /etc/lvm/lvm.conf /lvmtemp # export LVM_SYSTEM_DIR=/lvmtemp/ # vi /lvmtemp/lvm.conf
Replace filter line to see mpathXX devices instead of nlunX devices:
.... #filter = [ "a|/dev/mpath/nlun|","a|/dev/cciss/|","r/.*/" ] filter = [ "a|/dev/mpath/mpath|","a|/dev/cciss/|","r/.*/" ] #filter = [ "a|/dev/mapper/mpath|","a|/dev/cciss/|","r/.*/" ] ....
Verify, that you see only mpathXX devices now:
# pvscan PV /dev/mpath/mpath24 VG vg_data lvm2 [249.97 GB / 99.97 GB free] PV /dev/mpath/mpath25 VG vg_data lvm2 [249.97 GB / 99.97 GB free] PV /dev/mpath/mpath26 VG vg_data lvm2 [249.97 GB / 99.97 GB free] PV /dev/mpath/mpath27 VG vg_data lvm2 [249.97 GB / 99.97 GB free] PV /dev/cciss/c0d0p2 VG rootvg lvm2 [136.56 GB / 130.47 GB free] Total: 5 [1.11 TB] / in use: 5 [1.11 TB] / in no VG: 0 [0 ]
Rename VG name and change PVID using vgimportclone script, comes with LVM2 package:
# vgimportclone -n vg_restore /dev/mpath/mpath* WARNING: Activation disabled. No device-mapper interaction will be attempted. Physical volume "/tmp/snap.knd24669/vgimport3" changed 1 physical volume changed / 0 physical volumes not changed WARNING: Activation disabled. No device-mapper interaction will be attempted. Physical volume "/tmp/snap.knd24669/vgimport2" changed 1 physical volume changed / 0 physical volumes not changed WARNING: Activation disabled. No device-mapper interaction will be attempted. Physical volume "/tmp/snap.knd24669/vgimport1" changed 1 physical volume changed / 0 physical volumes not changed WARNING: Activation disabled. No device-mapper interaction will be attempted. Physical volume "/tmp/snap.knd24669/vgimport0" changed 1 physical volume changed / 0 physical volumes not changed WARNING: Activation disabled. No device-mapper interaction will be attempted. Volume group "vg_data" successfully changed Volume group "vg_data" successfully renamed to "vg_restore" Reading all physical volumes. This may take a while... Found volume group "vg_restore" using metadata type lvm2 Found volume group "rootvg" using metadata type lvm2
Activate VG and mount file system:
# vgchange -ay vg_restore 1 logical volume(s) in volume group "vg_restore" now active # mount /dev/vg_restore/oradbs /oradbs_restore
DO NOT FORGET REVERT ALL CHANGES WHEN FINISHED !!!, see in next chapter
Redefine LVM configuration directory to directory used in previous chapter. Verify, that you see mpathXX devices in pvscan output:
# export LVM_SYSTEM_DIR=/lvmtemp/ # pvscan PV /dev/mpath/mpath24 VG vg_restore lvm2 [249.97 GB / 99.97 GB free] PV /dev/mpath/mpath25 VG vg_restore lvm2 [249.97 GB / 99.97 GB free] PV /dev/mpath/mpath26 VG vg_restore lvm2 [249.97 GB / 99.97 GB free] PV /dev/mpath/mpath27 VG vg_restore lvm2 [249.97 GB / 99.97 GB free] PV /dev/cciss/c0d0p2 VG rootvg lvm2 [136.56 GB / 130.47 GB free] Total: 5 [1.11 TB] / in use: 5 [1.11 TB] / in no VG: 0 [0 ]
Pay attention that multipaths to be removed are mpath2{4,5,6,7} that we will use later in this chapter. Your output may vary, use your devices instead.
Umount file system(s) and deactivate VG, output should show 0 active LV:
# umount /oradbs_restore # vgchange -an vg_restore 0 logical volume(s) in volume group "vg_restore" now active
Save list of SCSI devices to be removed:
# rm -f /tmp/scsi-disks ; for m in mpath2{4,5,6,7} ; do multipath -ll $m | cut -c6-13 | grep ":.:" | tee -a /tmp/scsi-disks done
Flush multipath devices, only desired maps:
# for m in mpath2{4,5,6,7} ; do multipath -f $m ; done # cd /dev/mapper ; ls
You should not see flushed maps in /dev/mapper directory in case success, othervise you have to check what keep maps, resolve problem and repeat flush. Verify, that relevant FS unmounted, and LVM PV not in use (inactive, or exported)
Now remove SCSI devices, previously saved in /tmp/scsi-disks:
# for d in $(cat /tmp/scsi-disks) ; do echo "scsi remove-single-device $(echo $d|tr ':' ' ')" > /proc/scsi/scsi done
Destroy clone on NetApp:
netapp> vol offline $SNAPVOL netapp> vol destroy $SNAPVOL -f
Recheck configuration:
# unset LVM_SYSTEM_DIR # pvscan PV /dev/mpath/nlun0 VG vg_data lvm2 [249.97 GB / 99.97 GB free] PV /dev/mpath/nlun1 VG vg_data lvm2 [249.97 GB / 99.97 GB free] PV /dev/mpath/nlun2 VG vg_data lvm2 [249.97 GB / 99.97 GB free] PV /dev/mpath/nlun3 VG vg_data lvm2 [249.97 GB / 99.97 GB free] PV /dev/cciss/c0d0p2 VG rootvg lvm2 [136.56 GB / 130.47 GB free] Total: 5 [1.11 TB] / in use: 5 [1.11 TB] / in no VG: 0 [0 ]
Summarizing, the adding procedure looks like:
LUN -> SCSI -> multipath -> LVM -> FSand removing procedure should be exactly reverse:
FS -> LVM -> multipath -> SCSI -> LUN