This article will prove the number of concepts:
Thus, a virtual machine with four 3T disks assigned to it was created. This will simulate the creation of a homebrew NAS. The number of disks can be any more than two. I like RedHat, so Fedora (25 version is the latest stable version at this time) will be used in POC
If you repeat this POC with KVM, then use VIRTIO disks, not SCSI (I had a problem with booting from another disk other than the first SCSI).
When installing, select only one (first, vda) disk and select I will configure partitioning. The Done. In a new window, select the LVM schema and click Click here to create them automatically. For you, several partitions and logical volumes have been created. Remove /home, /boot and change the size of the root LV to be 3G (we will install the minimum OS). The final configuration will include:
Click Done, Accept Changes. At software selection, choose Minimal Install and Done. Begin Installation. When installing, set the root password.
Login to new system:
[root@lvmraid ~]# parted /dev/vda print Model: QEMU QEMU HARDDISK (virtblk) Disk /dev/vda: 3299GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: pmbr_boot Number Start End Size File system Name Flags 1 1049kB 2097kB 1049kB bios_grub 2 2097kB 5467MB 5465MB lvm
As you can see, there is no separate /boot partition, but the system boots without problems. Open the /boot/grub2/grub.cfg file to understand what is happening. Let's look at it (the lines are wrapped to fit on the screen):
menuentry 'Fedora (4.11.5-200.fc25.x86_64) 25 (Twenty Five)' --class fedora --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-4.11.5-200.fc25.x86_64-advanced-8a988bbc-ae0c-4031-b494-47ec547fccde' { load_video set gfxpayload=keep insmod gzio insmod part_gpt insmod lvm insmod xfs set root='lvmid/tdBs7T-BM2b-A1e1-6hBU-f2ta-deYo-pTJxzB/ejNHG1-VyOb-JKtz-lBCF-BdJs-8jej-tRs4ce' if [ x$feature_platform_search_hint = xy ]; then search --no-floppy --fs-uuid --set=root --hint='lvmid/tdBs7T-BM2b-A1e1-6hBU-f2ta-deYo-pTJxzB/ejNHG1-VyOb-JKtz-lBCF-BdJs-8jej-tRs4ce' 8a988bbc-ae0c-4031-b494-47ec547fccde else search --no-floppy --fs-uuid --set=root 8a988bbc-ae0c-4031-b494-47ec547fccde fi linux16 /boot/vmlinuz-4.11.5-200.fc25.x86_64 root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rhgb quiet LANG=en_US.UTF-8 initrd16 /boot/initramfs-4.11.5-200.fc25.x86_64.img }
The brown colored entry is file system's UUID, blue entry is VG UUID and green entry is LV UUID. The GRUB2 will search for FS UUID that resides in LV UUID in VG UUID.
[root@lvmraid ~]# df / Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/fedora-root 3135488 1047916 2087572 34% / [root@lvmraid ~]# blkid /dev/mapper/fedora-root /dev/mapper/fedora-root: UUID="8a988bbc-ae0c-4031-b494-47ec547fccde" TYPE="xfs" [root@lvmraid ~]# vgdisplay fedora | grep UUID VG UUID tdBs7T-BM2b-A1e1-6hBU-f2ta-deYo-pTJxzB [root@lvmraid ~]# lvdisplay /dev/fedora/root | grep UUID LV UUID ejNHG1-VyOb-JKtz-lBCF-BdJs-8jej-tRs4ce
grub2-install will use this information to copy the necessary grub modules to the BIOS Boot partition and point to the LV that contains the /boot directory. Thus, GRUB2 itself can do everything it takes to mount root file system. Soon, the initrd will not be needed again.
As a result of the installation, the root VG turned out to be very small, only to include our root and swap LV. We need to increase its size to make room for transformation and for future use. At the same time, I will add an additional partition for the rest of the space that zfs will use.
[root@lvmraid ~]# parted /dev/vda GNU Parted 3.2 Using /dev/vda Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) unit s <-- Switch to sectors to make new partition aligned (parted) p Model: QEMU QEMU HARDDISK (virtblk) Disk /dev/vda: 6442450944s Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: pmbr_boot Number Start End Size File system Name Flags 1 2048s 4095s 2048s bios_grub 2 4096s 10678271s 10674176s lvm (parted) resizepart 2 20479999 <-- Should end one sector before new partition (parted) p Model: QEMU QEMU HARDDISK (virtblk) Disk /dev/vda: 6442450944s Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: pmbr_boot Number Start End Size File system Name Flags 1 2048s 4095s 2048s bios_grub 2 4096s 20479999s 20475904s lvm (parted) mkpart zfs 20480000 -1 <-- Start should be multiple of 2048s Warning: You requested a partition from 20480000s to 6442450943s (sectors 20480000..6442450943). The closest location we can manage is 20480000s to 6442450910s (sectors 20480000..6442450910). Is this still acceptable to you? Yes/No? y (parted) align-check alignment type(min/opt) [optimal]/minimal? Partition number? 3 3 aligned (parted) p Model: QEMU QEMU HARDDISK (virtblk) Disk /dev/vda: 6442450944s Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: pmbr_boot Number Start End Size File system Name Flags 1 2048s 4095s 2048s bios_grub 2 4096s 20479999s 20475904s lvm 3 20480000s 6442450910s 6421970911s zfs (parted) q Information: You may need to update /etc/fstab.
Re-read partition table and resize PV:
[root@lvmraid ~]# parted /dev/vda p Model: QEMU QEMU HARDDISK (virtblk) Disk /dev/vda: 3299GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: pmbr_boot Number Start End Size File system Name Flags 1 1049kB 2097kB 1049kB bios_grub 2 2097kB 10.5GB 10.5GB lvm 3 10.5GB 3299GB 3288GB zfs [root@lvmraid ~]# partprobe /dev/vda [root@lvmraid ~]# pvresize /dev/vda2 Physical volume "/dev/vda2" changed 1 physical volume(s) resized / 0 physical volume(s) not resized [root@lvmraid ~]# pvs PV VG Fmt Attr PSize PFree /dev/vda2 fedora lvm2 a-- 9.76g 4.68g
Install gdisk, which we will use to clone the partition table:
[root@lvmraid ~]# dnf install gdisk rsync -y .. [root@lvmraid ~]# gdisk /dev/vda GPT fdisk (gdisk) version 1.0.1 Partition table scan: MBR: protective BSD: not present APM: not present GPT: present Found valid GPT with protective MBR; using GPT. Command (? for help): x Expert command (? for help): u Type device filename, or press <Enter> to exit: /dev/vdb Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING PARTITIONS!! Do you want to proceed? (Y/N): Y OK; writing new GUID partition table (GPT) to /dev/vdb. The operation has completed successfully. Expert command (? for help): u Type device filename, or press <Enter> to exit: /dev/vdc Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING PARTITIONS!! Do you want to proceed? (Y/N): y OK; writing new GUID partition table (GPT) to /dev/vdc. The operation has completed successfully. Expert command (? for help): u Type device filename, or press <Enter> to exit: /dev/vdd Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING PARTITIONS!! Do you want to proceed? (Y/N): y OK; writing new GUID partition table (GPT) to /dev/vdd. The operation has completed successfully. Expert command (? for help): q [root@lvmraid ~]# cat /proc/partitions major minor #blocks name 11 0 495616 sr0 8 0 3221225472 vda 8 1 1024 vda1 8 2 10237952 vda2 8 3 3210985455 vda3 8 16 3221225472 vdb 8 17 1024 vdb1 8 18 10237952 vdb2 8 19 3210985455 vdb3 8 48 3221225472 vdd 8 49 1024 vdd1 8 50 10237952 vdd2 8 51 3210985455 vdd3 8 32 3221225472 vdc 8 33 1024 vdc1 8 34 10237952 vdc2 8 35 3210985455 vdc3 253 0 3145728 dm-0 253 1 2183168 dm-1
Add all the second partitions as PV to the existing VG:
[root@lvmraid ~]# pvcreate /dev/vd{b,c,d}2 Physical volume "/dev/vdb2" successfully created. Physical volume "/dev/vdc2" successfully created. Physical volume "/dev/vdd2" successfully created. [root@lvmraid ~]# vgextend fedora /dev/vd{b,c,d}2 Volume group "fedora" successfully extended [root@lvmraid ~]# pvs PV VG Fmt Attr PSize PFree /dev/vda2 fedora lvm2 a-- 9.76g 4.68g /dev/vdb2 fedora lvm2 a-- 9.76g 9.76g /dev/vdc2 fedora lvm2 a-- 9.76g 9.76g /dev/vdd2 fedora lvm2 a-- 9.76g 9.76g
Now the difficult part has begun. There is no direct conversion of a plain linear volume into a raid5 type. Therefore, we will create new volumes and copy the data into them.
[root@lvmraid ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root fedora -wi-ao---- 3.00g swap fedora -wi-ao---- 2.08g [root@lvmraid ~]# swapoff /dev/fedora/swap [root@lvmraid ~]# lvremove /dev/fedora/swap Do you really want to remove active logical volume fedora/swap? [y/n]: y Logical volume "swap" successfully removed [root@lvmraid ~]# lvcreate --type raid5 -i 3 -L2g -n swap /dev/fedora Using default stripesize 64.00 KiB. Rounding size 2.00 GiB (512 extents) up to stripe boundary size 2.00 GiB (513 extents). Logical volume "swap" created. [root@lvmraid ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root fedora -wi-ao---- 3.00g swap fedora rwi-a-r--- 2.00g 100.00 [root@lvmraid ~]# lvdisplay -m /dev/fedora/swap --- Logical volume --- LV Path /dev/fedora/swap LV Name swap VG Name fedora LV UUID HFO44o-Rn6j-Uk04-8jon-wT0i-6Zud-YETCUI LV Write Access read/write LV Creation host, time lvmraid, 2017-06-17 20:41:07 +0300 LV Status available # open 0 LV Size 2.00 GiB Current LE 513 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 1024 Block device 253:9 --- Segments --- Logical extents 0 to 512: Type raid5 Monitoring monitored Raid Data LV 0 Logical volume swap_rimage_0 Logical extents 0 to 512 Raid Data LV 1 Logical volume swap_rimage_1 Logical extents 0 to 512 Raid Data LV 2 Logical volume swap_rimage_2 Logical extents 0 to 512 Raid Data LV 3 Logical volume swap_rimage_3 Logical extents 0 to 512 Raid Metadata LV 0 swap_rmeta_0 Raid Metadata LV 1 swap_rmeta_1 Raid Metadata LV 2 swap_rmeta_2 Raid Metadata LV 3 swap_rmeta_3 [root@lvmraid ~]# mkswap /dev/fedora/swap Setting up swapspace version 1, size = 2 GiB (2151673856 bytes) no label, UUID=322d5c29-4292-4238-95b2-15ae40aa7b6d [root@lvmraid ~]# swapon /dev/fedora/swap
Re-creating of the swap was an easy part, let's recreate root LV:
[root@lvmraid ~]# lvcreate --type raid5 -i 3 -L3g -n slash /dev/fedora Using default stripesize 64.00 KiB. Logical volume "slash" created. [root@lvmraid ~]# mkfs.ext4 -j /dev/fedora/slash .. [root@lvmraid ~]# mount /dev/fedora/slash /mnt [root@lvmraid ~]# rsync -avxAHSX / /mnt/ .. [root@lvmraid ~]# touch /mnt/.autorelabel
Prepare the chroot environment for the copied root FS and enter into it:
[root@lvmraid ~]# mount -t proc proc /mnt/proc [root@lvmraid ~]# mount -t sysfs sys /mnt/sys [root@lvmraid ~]# mount -o bind /dev /mnt/dev [root@lvmraid ~]# mount -o bind /dev/pts /mnt/dev/pts [root@lvmraid ~]# chroot /mnt [root@lvmraid /]# export PS1="chroot> " chroot> vi /etc/fstab chroot> df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/fedora-slash 3030800 1008120 1849012 36% / devtmpfs 1013872 0 1013872 0% /dev chroot> vi /etc/default/grub chroot> grep GRUB_CMDLINE_LINUX /etc/default/grub GRUB_CMDLINE_LINUX="" chroot>
Correct /etc/fstab to replace the original /dev/fedora/root with the new /dev/fedora/slah. Note the change of FS type to ext4 if you formatted it as ext4, as in my example. Delete all references to logical volumes from /etc/default/grub
Ask GRUB2 to rebuild its configuration file:
chroot> grub2-mkconfig > /boot/grub2/grub.cfg Generating grub configuration file ... WARNING: Failed to connect to lvmetad. Falling back to device scanning. .. Found linux image: /boot/vmlinuz-4.11.5-200.fc25.x86_64 Found initrd image: /boot/initramfs-4.11.5-200.fc25.x86_64.img WARNING: Failed to connect to lvmetad. Falling back to device scanning. .. Found linux image: /boot/vmlinuz-0-rescue-8717d74812804ff99cd287fc96174044 Found initrd image: /boot/initramfs-0-rescue-8717d74812804ff99cd287fc96174044.img WARNING: Failed to connect to lvmetad. Falling back to device scanning. .. device-mapper: reload ioctl on osprober-linux-fedora-root failed: Device or resource busy Command failed done chroot>
These warnings can be ignored. The lvmetad is running outside the chroot environment, and this fact triggers the warning.
Install the bootloader on any drive other than vda. I want to save the vda's boot sector to be able boot the old system until everything is OK. The option about the raid5rec module will force GRUB2 to include this module in the preloaded image. Otherwise, GRUB will not be able to boot in case of missing/failed disk.
chroot> grub2-install --modules=raid5rec /dev/vdb Installing for i386-pc platform. WARNING: Failed to connect to lvmetad. Falling back to device scanning. .. Installation finished. No error reported.
Now we need to rebuild our initrd to include LVM raid modules in it.
chroot> mkinitrd -f /boot/initramfs-$(uname -r).img $(uname -r) Creating: target|kernel|dracut args|basicmodules dracut: No '/dev/log' or 'logger' included for syslog logging chroot> exit exit [root@lvmraid ~]# poweroff
Turn off the VM and set vdb disk to be first in the boot order. Turn on the VM.
If you have selinux enabled as in default installation, the VM will relabel the new root FS as was requested and reboot twice.
If everything is OK and the server successfuly was booted from vdb, you can delete the old unused anymore /dev/fedora/root and install the bootloader on all other disks:
[root@lvmraid ~]# df / Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/fedora-slash 3030800 1017552 1839580 36% / [root@lvmraid ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root fedora -wi-a----- 3.00g slash fedora rwi-aor--- 3.00g 100.00 swap fedora rwi-aor--- 2.00g 100.00 [root@lvmraid ~]# lvremove /dev/fedora/root Do you really want to remove active logical volume fedora/root? [y/n]: y Logical volume "root" successfully removed [root@lvmraid ~]# pvs PV VG Fmt Attr PSize PFree /dev/vda2 fedora lvm2 a-- 9.76g 8.09g /dev/vdb2 fedora lvm2 a-- 9.76g 8.09g /dev/vdc2 fedora lvm2 a-- 9.76g 8.09g /dev/vdd2 fedora lvm2 a-- 9.76g 8.09g [root@lvmraid ~]# grub2-install --modules=raid5rec /dev/vda Installing for i386-pc platform. Installation finished. No error reported. [root@lvmraid ~]# grub2-install --modules=raid5rec /dev/vdc Installing for i386-pc platform. Installation finished. No error reported. [root@lvmraid ~]# grub2-install --modules=raid5rec /dev/vdd Installing for i386-pc platform. Installation finished. No error reported.
Verify that the server boots from any of the four disks.
As you remember, a third partition was added to every disk for using by ZFS. Now you can install the zfs software and create a raidz type zfs pool from them. This is covered in ZFS recepies article.
Remove one disk from the running server. I deleted the vdc:
[root@lvmraid ~]# pvs WARNING: Device for PV OSwx1N-z1QG-InRi-Es4C-201n-0GuQ-8QIpPh not found or rejected by a filter. PV VG Fmt Attr PSize PFree /dev/vda2 fedora lvm2 a-- 9.76g 8.09g /dev/vdb2 fedora lvm2 a-- 9.76g 8.09g /dev/vdd2 fedora lvm2 a-- 9.76g 8.09g [unknown] fedora lvm2 a-m 9.76g 8.09g [root@lvmraid ~]# lvs WARNING: Device for PV OSwx1N-z1QG-InRi-Es4C-201n-0GuQ-8QIpPh not found or rejected by a filter. LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert slash fedora rwi-aor-p- 3.00g 100.00 swap fedora rwi-aor-p- 2.00g 100.00
Let's check the ability to boot without one drive. Stuck a bit on powering off the server, then continued by itself and booted without problems.
Let's re-assign the disk back (simulating a temporary disconnect of the disk, that kills mdraid). Doing this on-line:
[root@lvmraid ~]# pvs PV VG Fmt Attr PSize PFree /dev/vda2 fedora lvm2 a-- 9.76g 8.09g /dev/vdb2 fedora lvm2 a-- 9.76g 8.09g /dev/vdc2 fedora lvm2 a-- 9.76g 8.09g /dev/vdd2 fedora lvm2 a-- 9.76g 8.09g [root@lvmraid ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert slash fedora rwi-aor--- 3.00g 100.00 swap fedora rwi-aor--- 2.00g 100.00
Everything returned to normal, even with 100% synchronization (well, I did not change data very much ;-).
NOTE!: The following instructions did not work for me on Fedora 29. Something has changed in some version. The working procedure for FC29 is described below this, which worked fine on the FC25.
Let's simulate the disc failure and its replacement. I will disconnect the disk again, reformat it with qemu-img and reassign it back. Then I will copy the partition scheme from the remaining disks. Then will reassign PV to LVM and ask LVM to rebuild LVs:
[root@lvmraid ~]# pvs WARNING: Device for PV b0lZy0-LE9R-CzE8-u7SQ-tJWW-ZacV-nj0L92 not found or rejected by a filter. PV VG Fmt Attr PSize PFree /dev/vda2 fedora lvm2 a-- 9.76g 8.09g /dev/vdb2 fedora lvm2 a-- 9.76g 8.09g /dev/vdc2 fedora lvm2 a-- 9.76g 8.09g [unknown] fedora lvm2 a-m 9.76g 8.09g [root@lvmraid ~]# grep " vd" /proc/partitions 252 0 3221225472 vda 252 1 1024 vda1 252 2 10237952 vda2 252 3 3210985455 vda3 252 16 3221225472 vdb 252 17 1024 vdb1 252 18 10237952 vdb2 252 19 3210985455 vdb3 252 32 3221225472 vdc 252 33 1024 vdc1 252 34 10237952 vdc2 252 35 3210985455 vdc3 252 64 3221225472 vde
There was a new disk vde.
[root@lvmraid ~]# gdisk /dev/vda GPT fdisk (gdisk) version 1.0.1 Partition table scan: MBR: protective BSD: not present APM: not present GPT: present Found valid GPT with protective MBR; using GPT. Command (? for help): x Expert command (? for help): u Type device filename, or press <Enter> to exit: /dev/vde Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING PARTITIONS!! Do you want to proceed? (Y/N): y OK; writing new GUID partition table (GPT) to /dev/vde. The operation has completed successfully. Expert command (? for help): q [root@lvmraid ~]# partprobe /dev/vde
GPT was copied from other (vda) disk.
[root@lvmraid ~]# pvs WARNING: Device for PV b0lZy0-LE9R-CzE8-u7SQ-tJWW-ZacV-nj0L92 not found or rejected by a filter. PV VG Fmt Attr PSize PFree /dev/vda2 fedora lvm2 a-- 9.76g 8.09g /dev/vdb2 fedora lvm2 a-- 9.76g 8.09g /dev/vdc2 fedora lvm2 a-- 9.76g 8.09g [unknown] fedora lvm2 a-m 9.76g 8.09g [root@lvmraid ~]# pvcreate --uuid "b0lZy0-LE9R-CzE8-u7SQ-tJWW-ZacV-nj0L92" --norestorefile /dev/vde2 Couldn't find device with uuid b0lZy0-LE9R-CzE8-u7SQ-tJWW-ZacV-nj0L92. WARNING: Device for PV b0lZy0-LE9R-CzE8-u7SQ-tJWW-ZacV-nj0L92 not found or rejected by a filter. Physical volume "/dev/vde2" successfully created. [root@lvmraid ~]# vgcfgrestore fedora Restored volume group fedora [root@lvmraid ~]# pvs PV VG Fmt Attr PSize PFree /dev/vda2 fedora lvm2 a-- 9.76g 8.09g /dev/vdb2 fedora lvm2 a-- 9.76g 8.09g /dev/vdc2 fedora lvm2 a-- 9.76g 8.09g /dev/vde2 fedora lvm2 a-- 9.76g 8.09g
PV was replaced with exact same uuid, then LVM metadata was restored. Now we have a healthy VG with LVs, still in need of repair.
[root@lvmraid ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert slash fedora rwi-aor-r- 3.00g 100.00 swap fedora rwi-aor-r- 2.00g 100.00 [root@lvmraid ~]# lvchange --rebuild /dev/vde2 /dev/fedora/swap Do you really want to rebuild 1 PVs of logical volume fedora/swap [y/n]: y Logical volume fedora/swap changed. [root@lvmraid ~]# lvchange --rebuild /dev/vde2 /dev/fedora/slash Do you really want to rebuild 1 PVs of logical volume fedora/slash [y/n]: y device-mapper: reload ioctl on (253:21) failed: Invalid argument Failed to lock logical volume fedora/slash. [root@lvmraid ~]# lvchange --rebuild /dev/vde2 /dev/fedora/slash Do you really want to rebuild 1 PVs of logical volume fedora/slash [y/n]: y Logical volume fedora/slash changed. [root@lvmraid ~]# lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert slash fedora rwi-aor--- 3.00g 7.81 swap fedora rwi-aor--- 2.00g 100.00
You may need to retry the rebuild command. Sometimes it does not work from first shot.
Once Cpy%Sync becomes 100, all our LVs are healthy too.
Updates: Working procedure for FC29.
Instead of replacing PV with same UUID, you should remove missing PV first and add new, like:
# vgreduce --removemissing --force fedora # pvcreate /dev/vde2 # vgextend fedora /dev/vde2
Also, lvconvert --repair could be used instead of lvchange --rebuild.
And, finally, re-install bootloader on it:
[root@lvmraid ~]# grub2-install --modules=raid5rec /dev/vde Installing for i386-pc platform. Installation finished. No error reported.