The LVM philosophy dates back to long before Linux. Many years ago I used LVM on HP-UX and AIX. Both companies have shared their experiences and patents, and you can see their logos in the code. The syntax for basic classic commands is mostly compatible with HP-UX commands.
The main idea was to insert a kind of "virtualization" layer between the file system and the physical disk. Before the LVM, the disk was partitioned during installation, after which it was almost impossible to change the partition scheme. LVM provides the demanded flexibility (if you know how to use that flexibility). It becomes possible to create file systems larger than the existing disk, resize the file system on the fly, replace underlying disks, and much more.
For purpose of this article, I'll take a CentOS 8 KVM virtual server with five additional 1G disks to experiment with.
root@centos8:~ # cat /proc/partitions major minor #blocks name 252 0 20971520 vda 252 1 262144 vda1 252 2 20708352 vda2 252 16 1048576 vdb 252 32 1048576 vdc 252 48 1048576 vdd 252 64 1048576 vde 252 80 1048576 vdf
Despite the common myth that it is impossible to use a disk without a partition table, it is absolutely unnecessary when used with LVM. But each PV (Physical Volume - aka disk) must be initialized. It is about allocating a Physical Volume header.
root@centos8:~ # pvcreate /dev/vdb Physical volume "/dev/vdb" successfully created. root@centos8:~ # pvs PV VG Fmt Attr PSize PFree /dev/vda2 rootvg lvm2 a-- <19.75g <13.75g /dev/vdb lvm2 --- 1.00g 1.00g
NOTE about data alignment. You can check the actual placement of the data beginning with the command pvs -o +pe_start, which shows 1m for the current version. This value is suitable for almost any storage subsystem. If this value were less than the default storage block size, then it could cause 2 storage I/O requests for each I/O request from LVM. In this case, you can adjust this offset using the --dataalignment parameter (see the man pages).
The next step is to create a Volume Group (VG). As the name suggests, it's just a group of volumes grouped together for some reason. The most obvious reason is that they are all resides on the same disk. We will also see other examples, but this is the most common situation where one VG corresponds to one PV (disk or LUN).
root@centos8:~ # vgcreate datavg /dev/vdb Volume group "datavg" successfully created root@centos8:~ # vgdisplay datavg --- Volume group --- VG Name datavg System ID Format lvm2 Metadata Areas 1 Metadata Sequence No 1 VG Access read/write VG Status resizable MAX LV 0 Cur LV 0 Open LV 0 Max PV 0 Cur PV 1 Act PV 1 VG Size 1020.00 MiB PE Size 4.00 MiB Total PE 255 Alloc PE / Size 0 / 0 Free PE / Size 255 / 1020.00 MiB VG UUID PO21b0-NezY-r0Mm-FTKa-h1r5-4I4o-zga3Dl
The vgdisplay is a classic command that came from ages with all the other "*display" commands. Here it is provided to demonstrate the concept of PE (Physical Extent). PE is the smallest data chunk that volumes form. It is this concept that makes LVM flexible. The volume is assembled from PE, and where they are located does not affect the volume itself.
Finally, we can create an LV (Logical Volume). Its size can be sat in bytes (k, m, g, t is suitable) using -L option, or in the PE amount using -l option.
root@centos8:~ # lvcreate -n v1 -l 10 datavg Logical volume "v1" created. root@centos8:~ # lvcreate -n v2 -L 40M datavg Logical volume "v2" created.
These are minimum required parameters - the desired volume name (-n), its size (-l or -L) and the VG name.
root@centos8:~ # lvs datavg LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert v1 datavg -wi-a----- 40.00m v2 datavg -wi-a----- 40.00m root@centos8:~ # lvdisplay datavg/v2 --- Logical volume --- LV Path /dev/datavg/v2 LV Name v2 VG Name datavg LV UUID fnc9oc-x6tm-mF2g-cin3-1E7m-7YIg-PEiW2C LV Write Access read/write LV Creation host, time centos8, 2020-10-04 18:35:28 +0300 LV Status available # open 0 LV Size 40.00 MiB Current LE 10 Segments 1 Allocation inherit Read ahead sectors auto - currently set to 8192 Block device 253:5 root@centos8:~ # lvdisplay -c datavg/v2 /dev/datavg/v2:datavg:3:1:-1:0:81920:10:-1:0:-1:253:5
The lvs is a modern and compact command whereas lvdisplay is a classic one. The -c option is useful when parsing the output of "*display" commands in scripts. Using lvs -o field in scripts is also good.
Let's create filesystems and mount them:
root@centos8:~ # mkfs.ext4 -j -m0 /dev/datavg/v1 .. root@centos8:~ # mkfs.xfs /dev/datavg/v2 .. root@centos8:~ # echo "/dev/datavg/v1 /mnt/v1 ext4 defaults 1 2" >> /etc/fstab root@centos8:~ # echo "/dev/datavg/v2 /mnt/v2 xfs defaults 0 0" >> /etc/fstab root@centos8:~ # mkdir -p /mnt/v{1,2} root@centos8:~ # mount -a root@centos8:~ # df -P | grep mnt /dev/mapper/datavg-v1 35M 782K 34M 3% /mnt/v1 /dev/mapper/datavg-v2 35M 2.4M 33M 7% /mnt/v2
However, our volume is quite small to work with it:
root@centos8:~ # rsync -a /usr/ /mnt/v1/ rsync: write failed on "/mnt/v1/bin/nmcli": No space left on device (28) rsync error: error in file IO (code 11) at receiver.c(374) [receiver=3.1.3] root@centos8:~ # df -P | grep mnt /dev/mapper/datavg-v1 35M 34M 677K 99% /mnt/v1 /dev/mapper/datavg-v2 35M 2.4M 33M 7% /mnt/v2
Let's increase it:
root@centos8:~ # lvresize -L+40m /dev/datavg/v1 Size of logical volume datavg/v1 changed from 40.00 MiB (10 extents) to 80.00 MiB (20 extents). Logical volume datavg/v1 successfully resized. root@centos8:~ # resize2fs /dev/datavg/v1 resize2fs 1.45.4 (23-Sep-2019) Filesystem at /dev/datavg/v1 is mounted on /mnt/v1; on-line resizing required old_desc_blocks = 1, new_desc_blocks = 1 The filesystem on /dev/datavg/v1 is now 81920 (1k) blocks long. root@centos8:~ # df -P | grep mnt /dev/mapper/datavg-v1 74M 35M 39M 47% /mnt/v1 /dev/mapper/datavg-v2 35M 2.4M 33M 7% /mnt/v2
It's important to say that resizing works in both direction, and if it is safe to grow up, then shrinking can corrupt the data. This is why it's important to use relative values (+40m in our case) with -L (or -l) rather than absolute ones.
The resizing action clearly demonstrates the concept of PE:
root@centos8:~ # pvdisplay -m /dev/vdb --- Physical volume --- PV Name /dev/vdb VG Name datavg PV Size 1.00 GiB / not usable 4.00 MiB Allocatable yes PE Size 4.00 MiB Total PE 255 Free PE 225 Allocated PE 30 PV UUID pdkzo6-NIjC-k7my-eeXd-l4cx-6iIM-smiGvw --- Physical Segments --- Physical extent 0 to 9: Logical volume /dev/datavg/v1 Logical extents 0 to 9 Physical extent 10 to 19: Logical volume /dev/datavg/v2 Logical extents 0 to 9 Physical extent 20 to 29: Logical volume /dev/datavg/v1 Logical extents 10 to 19 Physical extent 30 to 254: FREE
As you can see, volumes are interleaved at the physical level, but the file system does not aware about it. Let's increase the second volume:
root@centos8:~ # lvresize -l+100%FREE datavg/v2 Size of logical volume datavg/v2 changed from 40.00 MiB (10 extents) to 940.00 MiB (235 extents). Logical volume datavg/v2 successfully resized. root@centos8:~ # xfs_growfs /mnt/v2 meta-data=/dev/mapper/datavg-v2 isize=512 agcount=2, agsize=5120 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=10240, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=1368, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 data blocks changed from 10240 to 240640 root@centos8:~ # df -P | grep /mnt /dev/mapper/datavg-v1 74M 35M 39M 47% /mnt/v1 /dev/mapper/datavg-v2 935M 12M 924M 2% /mnt/v2
The +100%FREE can be used only with -l option. Grow XFS accepts mount point as argument instead of device.
root@centos8:~ # pvdisplay -m /dev/vdb --- Physical volume --- PV Name /dev/vdb VG Name datavg PV Size 1.00 GiB / not usable 4.00 MiB Allocatable yes (but full) PE Size 4.00 MiB Total PE 255 Free PE 0 Allocated PE 255 PV UUID pdkzo6-NIjC-k7my-eeXd-l4cx-6iIM-smiGvw --- Physical Segments --- Physical extent 0 to 9: Logical volume /dev/datavg/v1 Logical extents 0 to 9 Physical extent 10 to 19: Logical volume /dev/datavg/v2 Logical extents 0 to 9 Physical extent 20 to 29: Logical volume /dev/datavg/v1 Logical extents 10 to 19 Physical extent 30 to 254: Logical volume /dev/datavg/v2 Logical extents 10 to 234 root@centos8:~ # vgs datavg VG #PV #LV #SN Attr VSize VFree datavg 1 2 0 wz--n- 1020.00m 0 root@centos8:~ # pvs /dev/vdb PV VG Fmt Attr PSize PFree /dev/vdb datavg lvm2 a-- 1020.00m 0
It is a very common mistake to grow the file system till the end on the first request. Do you remember that we wanted to copy /usr data to volume v1? The task remained, but now there is no room for it. Moreover, XFS cannot be shrinked (extX can be shrinked). The only solution is to increase the size of our VG. This can be done in two ways - increase PV or add another PV. Always use the first method if possible. Adding another PV is easy, but now the health of the VG will depends on the health of the two PVs.
The following command on KVM host resizes my vdb disk online.
kvmhost# virsh blockresize centos8 /var/lib/libvirt/images/CentOS8.vdb.qcow2 2G Block device '/var/lib/libvirt/images/CentOS8.vdb.qcow2' is resized
Due to the nature of the virtio driver used in KVM, the disk size is automatically updated and there is no need to rescan the disk for changes. However in real life, it is common practice to rescan the block device using the command:
# echo 1 > /sys/block/sdb/device/rescan root@centos8:~ # tail /var/log/messages Oct 4 19:39:56 centos8 kernel: virtio_blk virtio5: [vdb] new size: 4194304 512-byte logical blocks (2.15 GB/2.00 GiB) Oct 4 19:39:56 centos8 kernel: vdb: detected capacity change from 1073741824 to 2147483648
Although the disk itself has been resized, LVM is unaware of this, the pvs command proves it. You must change the PV size with the command:
root@centos8:~ # pvs PV VG Fmt Attr PSize PFree /dev/vda2 rootvg lvm2 a-- <19.75g <13.75g /dev/vdb datavg lvm2 a-- 1020.00m 0 root@centos8:~ # pvresize /dev/vdb Physical volume "/dev/vdb" changed 1 physical volume(s) resized or updated / 0 physical volume(s) not resized root@centos8:~ # pvs PV VG Fmt Attr PSize PFree /dev/vda2 rootvg lvm2 a-- <19.75g <13.75g /dev/vdb datavg lvm2 a-- <2.00g 1.00g
The output of the pvresize command has never been clear to me.
If we started talking about the adding another PV to VG, then we should discuss the topic of migration between disks. Now we have two volumes belonging to one VG, and there is a desire to move the XFS volume to another disk, moreover, to a separate VG. First, we need to add another disk to VG:
root@centos8:~ # lvs datavg LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert v1 datavg -wi-ao---- 80.00m v2 datavg -wi-ao---- 940.00m root@centos8:~ # pvcreate /dev/vdc Physical volume "/dev/vdc" successfully created. root@centos8:~ # vgextend datavg /dev/vdc Volume group "datavg" successfully extended root@centos8:~ # pvs PV VG Fmt Attr PSize PFree /dev/vda2 rootvg lvm2 a-- <19.75g <13.75g /dev/vdb datavg lvm2 a-- <2.00g 1.00g /dev/vdc datavg lvm2 a-- 1020.00m 1020.00m root@centos8:~ # pvmove -b -n datavg/v2 /dev/vdb
The pvmove command frees the PV specified in command. If there are many options for destination to move, you can also name a specific target PV. If you only want to move a specific LV, you must name it. The -b option runs the command in the background and immediately returns to the prompt. Otherwise, percentage progress will be printed. Even without the -b option, this command is safe for interrupts and even reboots. Once LVM is activated, pvmove will continue until successful completion. How it works ? It creates a temporary mirror for this LV, and as soon as all PEs are synchronized, drops the PE on the original PV. That is, the procedure is safe at any time and is done online. The drawback of this technology is that everyone, even empty PEs, is copied, which can easily inflate thin LUNs on the storage.
By checking the disks with the pvdisplay -m command, you will see that vdb contains only v1 and vdc contains only v2. It's time to split the VG into two parts. The procedure requires the split out part to be inactive (offline). This is comply with the fact that you need to update the fstab with the new names.
root@centos8:~ # df -P | grep mnt /dev/mapper/datavg-v1 74M 35M 39M 47% /mnt/v1 /dev/mapper/datavg-v2 935M 12M 924M 2% /mnt/v2 root@centos8:~ # umount /mnt/v2 root@centos8:~ # lvchange -an datavg/v2 root@centos8:~ # vgsplit datavg xfsvg /dev/vdc New volume group "xfsvg" successfully split from "datavg" root@centos8:~ # vgchange -ay xfsvg 1 logical volume(s) in volume group "xfsvg" now active root@centos8:~ # mount /dev/xfsvg/v2 /mnt/v2 root@centos8:~ # df -P | grep mnt /dev/mapper/datavg-v1 74M 35M 39M 47% /mnt/v1
Nothing is mounted because systemd takes care of mounts these days (WHY???). It still remembers the old mounts and insists on it. You can see this in /var/log/messages:
Oct 4 20:54:51 centos8 kernel: XFS (dm-3): Mounting V5 Filesystem Oct 4 20:54:51 centos8 kernel: XFS (dm-3): Ending clean mount Oct 4 20:54:51 centos8 systemd[1]: mnt-v2.mount: Unit is bound to inactive unit dev-datavg-v2.device. Stopping, too. Oct 4 20:54:51 centos8 systemd[1]: Unmounting /mnt/v2... Oct 4 20:54:51 centos8 kernel: XFS (dm-3): Unmounting Filesystem Oct 4 20:54:51 centos8 systemd[1]: Unmounted /mnt/v2.
First, update /etc/fstab with the new VG name, then:
root@centos8:~ # systemctl daemon-reload root@centos8:~ # mount /mnt/v2 root@centos8:~ # df -P | grep mnt /dev/mapper/datavg-v1 74M 35M 39M 47% /mnt/v1 /dev/mapper/xfsvg-v2 935M 42M 894M 5% /mnt/v2 root@centos8:~ # pvs PV VG Fmt Attr PSize PFree /dev/vda2 rootvg lvm2 a-- <19.75g <13.75g /dev/vdb datavg lvm2 a-- <2.00g <1.92g /dev/vdc xfsvg lvm2 a-- 1020.00m 80.00m
We ran a complex example of migration and splitting. Typically, the pvmove command is used to migrate between attached storages for upgrade purposes. Then the procedure looks like this...
Refer to HOWTO LUNs on Linux using native tools for low level details.
I did not recommend adding multiple PVs to one VG, because then its health depends on the health of two PVs. This statement can be completely opposite when using RAID technology. LVM redundancy is implemented at the LV level. This makes the maintenance of LVM redundancy a headache. Each LV in the same VG must be created with the same level of redundancy, otherwise you could lose any LV for which you forgot to create redundancy. However, I have been using this kind of redundancy on my home NAS for many years and have replaced almost all of the original hard drives without losing a single bit.
First, we need a VG with multiple PVs. Two is enough if we are talking about a mirror. LVM supports RAID5 technology, you must have at least three PVs to implement it.
root@centos8:~ # pvcreate /dev/vd{d,e,f} Physical volume "/dev/vdd" successfully created. Physical volume "/dev/vde" successfully created. Physical volume "/dev/vdf" successfully created. root@centos8:~ # vgcreate raidvg /dev/vd{d,e,f} Volume group "raidvg" successfully created root@centos8:~ # pvs /dev/vd{d,e,f} PV VG Fmt Attr PSize PFree /dev/vdd raidvg lvm2 a-- 1020.00m 1020.00m /dev/vde raidvg lvm2 a-- 1020.00m 1020.00m /dev/vdf raidvg lvm2 a-- 1020.00m 1020.00m
I will create many different LVs at once:
root@centos8:~ # lvcreate -n plain -l 10 raidvg Logical volume "plain" created. root@centos8:~ # pvs /dev/vd{d,e,f} PV VG Fmt Attr PSize PFree /dev/vdd raidvg lvm2 a-- 1020.00m 980.00m /dev/vde raidvg lvm2 a-- 1020.00m 1020.00m /dev/vdf raidvg lvm2 a-- 1020.00m 1020.00m
This is a normal LV. It resides on the vdd disk and will be lost if that disk fails.
root@centos8:~ # lvcreate -n mirror -l 10 --type raid1 -m 1 raidvg /dev/vde /dev/vdf Logical volume "mirror" created. root@centos8:~ # pvs /dev/vd{d,e,f} PV VG Fmt Attr PSize PFree /dev/vdd raidvg lvm2 a-- 1020.00m 980.00m /dev/vde raidvg lvm2 a-- 1020.00m 976.00m /dev/vdf raidvg lvm2 a-- 1020.00m 976.00m
This is an example of creating a mirrored LV. I have specified the desired PV to balance the space usage.
root@centos8:~ # lvcreate -n raid5lv -l 20 --type raid5 -i 2 raidvg Using default stripesize 64.00 KiB. Logical volume "raid5lv" created.
This is an example of creating a RAID5 volume. Stripe count -i means only stripes of data. We have three PVs here, the third will be used for CRC, which leaves us with only 2 data stripes.
root@centos8:~ # lvcreate -n stripelv -l 30 --type striped -i 3 raidvg Using default stripesize 64.00 KiB. Logical volume "stripelv" created.
A striped volume has nothing to do with redundancy. It will fail if any disk fails. Traditionally built to speed up performance. This is actually a myth. The only benefit here is the increased I/O queue size. You can achieve the same by simply increasing the queue size for the device itself. The default value of 32 is great for physical devices. When working with storage, increasing the queue size can sometimes help. Striping doesn't help with local SSD devices. It also does not solve the seek latency issue for rotating devices. Therefore, striping shows performance gains only for streaming writes, or for sequential reads, but not for real workloads.
root@centos8:~ # lvs raidvg -o +devices -a LV VG Attr LSize Pool .. Cpy%Sync Convert Devices mirror raidvg rwi-a-r--- 40.00m 100.00 mirror_rimage_0(0),mirror_rimage_1(0) [mirror_rimage_0] raidvg iwi-aor--- 40.00m /dev/vde(1) [mirror_rimage_1] raidvg iwi-aor--- 40.00m /dev/vdf(1) [mirror_rmeta_0] raidvg ewi-aor--- 4.00m /dev/vde(0) [mirror_rmeta_1] raidvg ewi-aor--- 4.00m /dev/vdf(0) plain raidvg -wi-a----- 40.00m /dev/vdd(0) raid5lv raidvg rwi-a-r--- 80.00m 100.00 raid5lv_rimage_0(0),raid5lv_rimage_1(0),raid5lv_rimage_2(0) [raid5lv_rimage_0] raidvg iwi-aor--- 40.00m /dev/vdd(11) [raid5lv_rimage_1] raidvg iwi-aor--- 40.00m /dev/vde(12) [raid5lv_rimage_2] raidvg iwi-aor--- 40.00m /dev/vdf(12) [raid5lv_rmeta_0] raidvg ewi-aor--- 4.00m /dev/vdd(10) [raid5lv_rmeta_1] raidvg ewi-aor--- 4.00m /dev/vde(11) [raid5lv_rmeta_2] raidvg ewi-aor--- 4.00m /dev/vdf(11) stripelv raidvg -wi-a----- 120.00m /dev/vdd(21),/dev/vde(22),/dev/vdf(22)
This is the final picture of the placement of the constituent parts of the created LVs on the disks.
Now I will shutdown the virtual machine and replace the (broken) last disk (vdf) with a empty one.
kvmhost# qemu-img create -f qcow2 CentOS8.vdf.qcow2 1g Formatting 'CentOS8.vdf.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16
The raidvg started with "prtial" status, see the p in attributes:
root@centos8:~ # vgs raidvg WARNING: Couldn't find device with uuid XYFbbU-4RmW-c1rE-LH1H-St0c-fgAx-YT2GE1. WARNING: VG raidvg is missing PV XYFbbU-4RmW-c1rE-LH1H-St0c-fgAx-YT2GE1 (last written to /dev/vdf). VG #PV #LV #SN Attr VSize VFree raidvg 3 4 0 wz-pn- <2.99g <2.62g root@centos8:~ # lvs raidvg -o +devices -a WARNING: Couldn't find device with uuid XYFbbU-4RmW-c1rE-LH1H-St0c-fgAx-YT2GE1. WARNING: VG raidvg is missing PV XYFbbU-4RmW-c1rE-LH1H-St0c-fgAx-YT2GE1 (last written to /dev/vdf). LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices mirror raidvg rwi---r-p- 40.00m mirror_rimage_0(0),mirror_rimage_1(0) [mirror_rimage_0] raidvg Iwi---r--- 40.00m /dev/vde(1) [mirror_rimage_1] raidvg Iwi---r-p- 40.00m [unknown](1) [mirror_rmeta_0] raidvg ewi---r--- 4.00m /dev/vde(0) [mirror_rmeta_1] raidvg ewi---r-p- 4.00m [unknown](0) plain raidvg -wi------- 40.00m /dev/vdd(0) raid5lv raidvg rwi---r-p- 80.00m raid5lv_rimage_0(0),raid5lv_rimage_1(0),raid5lv_rimage_2(0) [raid5lv_rimage_0] raidvg Iwi---r--- 40.00m /dev/vdd(11) [raid5lv_rimage_1] raidvg Iwi---r--- 40.00m /dev/vde(12) [raid5lv_rimage_2] raidvg Iwi---r-p- 40.00m [unknown](12) [raid5lv_rmeta_0] raidvg ewi---r--- 4.00m /dev/vdd(10) [raid5lv_rmeta_1] raidvg ewi---r--- 4.00m /dev/vde(11) [raid5lv_rmeta_2] raidvg ewi---r-p- 4.00m [unknown](11) stripelv raidvg -wi-----p- 120.00m /dev/vdd(21),/dev/vde(22),[unknown](22)
The recovery procedure depends on the version of LVM you are currently using. For CentOS 8 this would be:
root@centos8:~ # vgreduce --removemissing --force raidvg WARNING: Couldn't find device with uuid XYFbbU-4RmW-c1rE-LH1H-St0c-fgAx-YT2GE1. WARNING: VG raidvg is missing PV XYFbbU-4RmW-c1rE-LH1H-St0c-fgAx-YT2GE1 (last written to /dev/vdf). WARNING: Couldn't find device with uuid XYFbbU-4RmW-c1rE-LH1H-St0c-fgAx-YT2GE1. WARNING: Couldn't find device with uuid XYFbbU-4RmW-c1rE-LH1H-St0c-fgAx-YT2GE1. WARNING: Couldn't find device with uuid XYFbbU-4RmW-c1rE-LH1H-St0c-fgAx-YT2GE1. WARNING: Removing partial LV raidvg/stripelv. WARNING: Couldn't find device with uuid XYFbbU-4RmW-c1rE-LH1H-St0c-fgAx-YT2GE1. Logical volume "stripelv" successfully removed Wrote out consistent volume group raidvg.
Bye-bye, stripelv !!
root@centos8:~ # lvs raidvg -o +devices -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices mirror raidvg rwi---r--- 40.00m mirror_rimage_0(0),mirror_rimage_1(0) [mirror_rimage_0] raidvg Iwi---r--- 40.00m /dev/vde(1) [mirror_rimage_1] raidvg vwi---r--- 40.00m [mirror_rmeta_0] raidvg ewi---r--- 4.00m /dev/vde(0) [mirror_rmeta_1] raidvg ewi---r--- 4.00m plain raidvg -wi------- 40.00m /dev/vdd(0) raid5lv raidvg rwi---r--- 80.00m raid5lv_rimage_0(0),raid5lv_rimage_1(0),raid5lv_rimage_2(0) [raid5lv_rimage_0] raidvg Iwi---r--- 40.00m /dev/vdd(11) [raid5lv_rimage_1] raidvg Iwi---r--- 40.00m /dev/vde(12) [raid5lv_rimage_2] raidvg vwi---r--- 40.00m [raid5lv_rmeta_0] raidvg ewi---r--- 4.00m /dev/vdd(10) [raid5lv_rmeta_1] raidvg ewi---r--- 4.00m /dev/vde(11) [raid5lv_rmeta_2] raidvg ewi---r--- 4.00m root@centos8:~ # pvcreate /dev/vdf Physical volume "/dev/vdf" successfully created. root@centos8:~ # vgextend raidvg /dev/vdf Volume group "raidvg" successfully extended
The recovery does not happen automatically, you should run the following:
root@centos8:~ # lvconvert --repair raidvg/mirror -b raidvg/mirror must be active to perform this operation. root@centos8:~ # vgchange -ay raidvg 3 logical volume(s) in volume group "raidvg" now active root@centos8:~ # lvconvert --repair raidvg/mirror -b Attempt to replace failed RAID images (requires full device resync)? [y/n]: y Faulty devices in raidvg/mirror successfully replaced. root@centos8:~ # lvconvert --repair raidvg/raid5lv -b Attempt to replace failed RAID images (requires full device resync)? [y/n]: y Faulty devices in raidvg/raid5lv successfully replaced. root@centos8:~ # lvs raidvg -o +devices -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices mirror raidvg rwi-a-r--- 40.00m 100.00 mirror_rimage_0(0),mirror_rimage_1(0) [mirror_rimage_0] raidvg iwi-aor--- 40.00m /dev/vde(1) [mirror_rimage_1] raidvg iwi-aor--- 40.00m /dev/vdd(22) [mirror_rmeta_0] raidvg ewi-aor--- 4.00m /dev/vde(0) [mirror_rmeta_1] raidvg ewi-aor--- 4.00m /dev/vdd(21) plain raidvg -wi-a----- 40.00m /dev/vdd(0) raid5lv raidvg rwi-a-r--- 80.00m 100.00 raid5lv_rimage_0(0),raid5lv_rimage_1(0),raid5lv_rimage_2(0) [raid5lv_rimage_0] raidvg iwi-aor--- 40.00m /dev/vdd(11) [raid5lv_rimage_1] raidvg iwi-aor--- 40.00m /dev/vde(12) [raid5lv_rimage_2] raidvg iwi-aor--- 40.00m /dev/vdf(1) [raid5lv_rmeta_0] raidvg ewi-aor--- 4.00m /dev/vdd(10) [raid5lv_rmeta_1] raidvg ewi-aor--- 4.00m /dev/vde(11) [raid5lv_rmeta_2] raidvg ewi-aor--- 4.00m /dev/vdf(0)
You can read about the design of my next home NAS here: Redundant disks without MDRAID.
The snapshot in LVM is implemented as a real COW, which means that when written, the old block of data is copied from volume space to snapshot space, causing a double write operation for each snapshot. This means that taking a snapshot can cause performance degradation, you can afford a small number of snapshots, you need to allocate disk space for the snapshot area.
Where can the snapshot be used? Taking a snapshot before updating the OS is a very good use case. In another case, a snapshot was used to create a hot backup of the database. The database was in backup mode only for the duration for the snapshot being taken, then a regular backup was made for the data from the snapshot and the snapshot was deleted as soon as backup finished.
You must have free space in your VG to take a snapshot. One of the most important parameters is the estimated snapshot size. When more space is required than was allocated at the time of creation, the snapshot becomes "Invalid" and cannot be used, only deleted. Obviously the maximum snapshot size can be LV size if it completely overwritten.
root@centos8:~ # vgs datavg VG #PV #LV #SN Attr VSize VFree datavg 1 1 0 wz--n- <2.00g <1.92g root@centos8:~ # lvs datavg LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert v1 datavg -wi-ao---- 80.00m root@centos8:~ # lvcreate -s -n snap_v1 -l 100%ORIGIN datavg/v1 Logical volume "snap_v1" created. root@centos8:~ # lvs datavg -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert snap_v1 datavg swi-a-s--- 84.00m v1 0.01 v1 datavg owi-aos--- 80.00m
Let's delete all the original data and then restore it from snapshot. Reverting to a snapshot is called a "merge" in LVM terminology. Probably because the snapshot disappears as a result of the command.
root@centos8:~ # ll /mnt/v1/ total 33 dr-xr-xr-x. 2 root root 12288 Oct 4 18:50 bin drwxr-xr-x. 2 root root 1024 May 11 2019 games drwxr-xr-x. 3 root root 1024 Oct 4 18:50 include dr-xr-xr-x. 30 root root 1024 Oct 4 18:50 lib dr-x------. 2 root root 1024 Oct 4 18:50 lib64 drwx------. 2 root root 1024 Oct 4 18:50 libexec drwx------. 2 root root 1024 Oct 4 18:50 local dr-x------. 2 root root 1024 Oct 4 18:50 sbin drwx------. 2 root root 1024 Oct 4 18:50 share drwx------. 2 root root 1024 Oct 4 18:50 src lrwxrwxrwx. 1 root root 10 May 11 2019 tmp -> ../var/tmp root@centos8:~ # rm -rf /mnt/v1/* root@centos8:~ # find /mnt/v1 /mnt/v1 root@centos8:~ # lvs datavg -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert snap_v1 datavg swi-a-s--- 84.00m v1 0.26 v1 datavg owi-aos--- 80.00m root@centos8:~ # sync root@centos8:~ # lvs datavg -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert snap_v1 datavg swi-a-s--- 84.00m v1 0.50 v1 datavg owi-aos--- 80.00m
The revert time !!
root@centos8:~ # lvconvert --merge datavg/snap_v1 Delaying merge since origin is open. Merging of snapshot datavg/snap_v1 will occur on next activation of datavg/v1. root@centos8:~ # lvs datavg -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert [snap_v1] datavg Swi-a-s--- 84.00m v1 0.51 v1 datavg Owi-aos--- 80.00m root@centos8:~ # umount /mnt/v1 root@centos8:~ # lvs datavg -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert [snap_v1] datavg Swi-a-s--- 84.00m v1 0.51 v1 datavg Owi-a-s--- 80.00m root@centos8:~ # lvchange -an datavg/v1 root@centos8:~ # lvs datavg -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert [snap_v1] datavg Swi---s--- 84.00m v1 v1 datavg Owi---s--- 80.00m root@centos8:~ # lvchange -ay datavg/v1 root@centos8:~ # lvs datavg -a WARNING: Cannot find matching snapshot segment for datavg/v1. WARNING: Cannot find matching snapshot segment for datavg/v1. Internal error: WARNING: Segment type error found does not match expected type snapshot for datavg/snap_v1. LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert [snap_v1] datavg Swi-XXs-X- 84.00m v1 v1 datavg Owi-aos--- 80.00m root@centos8:~ # lvs datavg -a LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert v1 datavg -wi-ao---- 80.00m root@centos8:~ # mount /mnt/v1 mount: /mnt/v1: /dev/mapper/datavg-v1 already mounted on /mnt/v1. root@centos8:~ # ll /mnt/v1/ total 33 dr-xr-xr-x. 2 root root 12288 Oct 4 18:50 bin drwxr-xr-x. 2 root root 1024 May 11 2019 games drwxr-xr-x. 3 root root 1024 Oct 4 18:50 include dr-xr-xr-x. 30 root root 1024 Oct 4 18:50 lib dr-x------. 2 root root 1024 Oct 4 18:50 lib64 drwx------. 2 root root 1024 Oct 4 18:50 libexec drwx------. 2 root root 1024 Oct 4 18:50 local dr-x------. 2 root root 1024 Oct 4 18:50 sbin drwx------. 2 root root 1024 Oct 4 18:50 share drwx------. 2 root root 1024 Oct 4 18:50 src lrwxrwxrwx. 1 root root 10 May 11 2019 tmp -> ../var/tmp
This screenshot demonstrates a typical wrong procedure flow. You must unmount the filesystem before the "merge" command. Otherwise, the command detects that LV is busy and the action is postponed until the next LV activation. According to the man pages, it should be enough to unmount the filesystem to trigger the merge start, however it is not. Actual activation required. The normal procedure will be shown later.
The snapshot can be mounted for read-write, and you can work with it. In this case, it becomes useless for recovery. If you remove files from a snapshot and then "merge" it, the files will be removed from the origin as well.
This leads us to an interesting idea - to test the changes on the snapshot and once you are happy with the results, then apply it to original volume. For example:
root@centos8:~ # lvcreate -s -n snap_v1 -l 100%ORIGIN datavg/v1 Logical volume "snap_v1" created. root@centos8:~ # mkdir /mnt/v1clone root@centos8:~ # mount /dev/datavg/snap_v1 /mnt/v1clone root@centos8:~ # rm -rf /mnt/v1clone/* root@centos8:~ # rsync -a /etc /mnt/v1clone/ root@centos8:~ # ll /mnt/v1clone/ total 8 drwxr-xr-x. 79 root root 7168 Oct 5 09:42 etc root@centos8:~ # umount /mnt/v1clone root@centos8:~ # umount /mnt/v1 root@centos8:~ # lvconvert --merge datavg/snap_v1 Merging of volume datavg/snap_v1 started. datavg/v1: Merged: 77.75% datavg/v1: Merged: 100.00% root@centos8:~ # mount /mnt/v1 root@centos8:~ # ll /mnt/v1/ total 8 drwxr-xr-x. 79 root root 7168 Oct 5 09:42 etc
I already wrote a good article on Cloning Logical Volume using LVM. I will direct you there instead of repeating it here.
This chapter is dedicated to restoring LVM headers that are overwritten for some reason. Reason number 1 - someone, following a common recommendation, creates a partition table on an already existing PV. It is fortunate if this partition table is MBR MSDOS because it small and you can recover the LVM PV header. However, the GPT table is written to the end of the disk also, which can corrupt the data on it. And if the evil has already created a file system on a freshly made partition, only data recovery will help.
This procedure is based on the fact that any LVM operation calls "vgcfgbackup" before and after processing. The last current configuration was backed up to the /etc/lvm/backup directory. Any previous states can be found in the /etc/lvm/archive. This information can be used to recover LVM headers and metadata (but not the data itself). Let's take an example:
root@centos8:~ # pvs PV VG Fmt Attr PSize PFree /dev/vda2 rootvg lvm2 a-- <19.75g <13.75g /dev/vdb datavg lvm2 a-- <2.00g <1.92g /dev/vdc xfsvg lvm2 a-- 1020.00m 80.00m /dev/vdd raidvg lvm2 a-- 1020.00m 892.00m /dev/vde raidvg lvm2 a-- 1020.00m 932.00m /dev/vdf raidvg lvm2 a-- 1020.00m 976.00m root@centos8:~ # fdisk /dev/vdc Welcome to fdisk (util-linux 2.32.1). Changes will remain in memory only, until you decide to write them. Be careful before using the write command. The old LVM2_member signature will be removed by a write command. Device does not contain a recognized partition table. Created a new DOS disklabel with disk identifier 0xe9d73e6a. Command (m for help): o Created a new DOS disklabel with disk identifier 0x43789486. The old LVM2_member signature will be removed by a write command. Command (m for help): n Partition type p primary (0 primary, 0 extended, 4 free) e extended (container for logical partitions) Select (default p): p Partition number (1-4, default 1): First sector (2048-2097151, default 2048): Last sector, +sectors or +size{K,M,G,T,P} (2048-2097151, default 2097151): Created a new partition 1 of type 'Linux' and of size 1023 MiB. Partition #1 contains a xfs signature. Do you want to remove the signature? [Y]es/[N]o: n Command (m for help): p Disk /dev/vdc: 1 GiB, 1073741824 bytes, 2097152 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0x43789486 Device Boot Start End Sectors Size Id Type /dev/vdc1 2048 2097151 2095104 1023M 83 Linux Command (m for help): w The partition table has been altered. Calling ioctl() to re-read partition table. Syncing disks.
I am pleasantly surprised by the security improvement in the fdisk tool - it warns of all dangerous actions, even in color. Perhaps reason number one will happen less often since then. The operation I did, overwrite the MBR but not the beginning of the partition. I answered “No” when fdisk prompted to remove the XFS signature. If our evil admin answers Yes, our LVM recovery will not help, because the data itself will be corrupted.
root@centos8:~ # pvs PV VG Fmt Attr PSize PFree /dev/vda2 rootvg lvm2 a-- <19.75g <13.75g /dev/vdb datavg lvm2 a-- <2.00g <1.92g /dev/vdd raidvg lvm2 a-- 1020.00m 892.00m /dev/vde raidvg lvm2 a-- 1020.00m 932.00m /dev/vdf raidvg lvm2 a-- 1020.00m 976.00m root@centos8:~ # more /etc/lvm/backup/xfsvg .. physical_volumes { pv0 { id = "YcxPqw-Kn9R-Sc1d-j8KS-tGLk-pD8t-qSxFyF" device = "/dev/vdc" # Hint only .. root@centos8:~ # pvcreate --uuid YcxPqw-Kn9R-Sc1d-j8KS-tGLk-pD8t-qSxFyF --restorefile /etc/lvm/backup/xfsvg /dev/vdc WARNING: Couldn't find device with uuid YcxPqw-Kn9R-Sc1d-j8KS-tGLk-pD8t-qSxFyF. Device /dev/vdc excluded by a filter.
This is another mechanism to prevent accidental deletion of existing partitions. But we are sure that the /dev/vdc1 partition was created by mistake, so we just wipe it:
root@centos8:~ # wipefs /dev/vdc DEVICE OFFSET TYPE UUID LABEL vdc 0x1fe dos root@centos8:~ # wipefs /dev/vdc -a wipefs: error: /dev/vdc: probing initialization failed: Device or resource busy root@centos8:~ # wipefs /dev/vdc -af /dev/vdc: 2 bytes were erased at offset 0x000001fe (dos): 55
And again, systemd treat my Linux in Microsoft way. Thank goodness wipefs supports force deletion. Lets repeat the "pvcreate" command:
root@centos8:~ # pvcreate --uuid YcxPqw-Kn9R-Sc1d-j8KS-tGLk-pD8t-qSxFyF --restorefile /etc/lvm/backup/xfsvg /dev/vdc WARNING: Couldn't find device with uuid YcxPqw-Kn9R-Sc1d-j8KS-tGLk-pD8t-qSxFyF. Can't open /dev/vdc exclusively. Mounted filesystem?
Bad luck. Yes, the filesystem was mounted at /mnt/v2 before the crisis, but not now. Let's continue after reboot, remember to remove mount point from fstab before reboot.
root@centos8:~ # pvcreate --uuid YcxPqw-Kn9R-Sc1d-j8KS-tGLk-pD8t-qSxFyF --restorefile /etc/lvm/backup/xfsvg /dev/vdc WARNING: Couldn't find device with uuid YcxPqw-Kn9R-Sc1d-j8KS-tGLk-pD8t-qSxFyF. Physical volume "/dev/vdc" successfully created. root@centos8:~ # vgcfgrestore xfsvg Restored volume group xfsvg. root@centos8:~ # vgs VG #PV #LV #SN Attr VSize VFree datavg 1 1 0 wz--n- <2.00g <1.92g raidvg 3 3 0 wz--n- <2.99g 2.73g rootvg 1 4 0 wz--n- <19.75g <13.75g xfsvg 1 1 0 wz--n- 1020.00m 80.00m root@centos8:~ # lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert v1 datavg -wi-ao---- 80.00m mirror raidvg rwi-a-r--- 40.00m 100.00 plain raidvg -wi-a----- 40.00m raid5lv raidvg rwi-a-r--- 80.00m 100.00 slash rootvg -wi-ao---- 3.00g swap rootvg -wi-ao---- 1.00g var rootvg -wi-ao---- 1.00g var_log rootvg -wi-ao---- 1.00g v2 xfsvg -wi------- 940.00m root@centos8:~ # vgchange -ay xfsvg 1 logical volume(s) in volume group "xfsvg" now active root@centos8:~ # mount /dev/xfsvg/v2 /mnt/v2
The pvcreate command with the restore option creates an LVM PV header match the previous configuration. The following vgcfgrestore command restored the VG metadata to this PV header.
The following example shows the procedure for restoring from a file backup. Suppose you need to completely rebuild a server from scratch, and you don't even know what VG, LV and what size they were. You must first restore the /etc/lvm/backup directory from a tape backup, then recreate the LVM infrastructure, mount, and proceed with the full restore. I wrote about this in detail in the article Bare Metal Restore (BMR) using bareos file level backup.