LVM2 recipes

The LVM philosophy dates back to long before Linux. Many years ago I used LVM on HP-UX and AIX. Both companies have shared their experiences and patents, and you can see their logos in the code. The syntax for basic classic commands is mostly compatible with HP-UX commands.

The main idea was to insert a kind of "virtualization" layer between the file system and the physical disk. Before the LVM, the disk was partitioned during installation, after which it was almost impossible to change the partition scheme. LVM provides the demanded flexibility (if you know how to use that flexibility). It becomes possible to create file systems larger than the existing disk, resize the file system on the fly, replace underlying disks, and much more.

For purpose of this article, I'll take a CentOS 8 KVM virtual server with five additional 1G disks to experiment with.

root@centos8:~ # cat /proc/partitions
major minor  #blocks  name

 252        0   20971520 vda
 252        1     262144 vda1
 252        2   20708352 vda2
 252       16    1048576 vdb
 252       32    1048576 vdc
 252       48    1048576 vdd
 252       64    1048576 vde
 252       80    1048576 vdf

Basic Functionality

Despite the common myth that it is impossible to use a disk without a partition table, it is absolutely unnecessary when used with LVM. But each PV (Physical Volume - aka disk) must be initialized. It is about allocating a Physical Volume header.

root@centos8:~ # pvcreate /dev/vdb
  Physical volume "/dev/vdb" successfully created.
root@centos8:~ # pvs
  PV         VG     Fmt  Attr PSize   PFree  
  /dev/vda2  rootvg lvm2 a--  <19.75g <13.75g
  /dev/vdb          lvm2 ---    1.00g   1.00g

NOTE about data alignment. You can check the actual placement of the data beginning with the command pvs -o +pe_start, which shows 1m for the current version. This value is suitable for almost any storage subsystem. If this value were less than the default storage block size, then it could cause 2 storage I/O requests for each I/O request from LVM. In this case, you can adjust this offset using the --dataalignment parameter (see the man pages).

The next step is to create a Volume Group (VG). As the name suggests, it's just a group of volumes grouped together for some reason. The most obvious reason is that they are all resides on the same disk. We will also see other examples, but this is the most common situation where one VG corresponds to one PV (disk or LUN).

root@centos8:~ # vgcreate datavg /dev/vdb
  Volume group "datavg" successfully created
root@centos8:~ # vgdisplay datavg
  --- Volume group ---
  VG Name               datavg
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  1
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                0
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               1020.00 MiB
  PE Size               4.00 MiB
  Total PE              255
  Alloc PE / Size       0 / 0   
  Free  PE / Size       255 / 1020.00 MiB
  VG UUID               PO21b0-NezY-r0Mm-FTKa-h1r5-4I4o-zga3Dl

The vgdisplay is a classic command that came from ages with all the other "*display" commands. Here it is provided to demonstrate the concept of PE (Physical Extent). PE is the smallest data chunk that volumes form. It is this concept that makes LVM flexible. The volume is assembled from PE, and where they are located does not affect the volume itself.

Finally, we can create an LV (Logical Volume). Its size can be sat in bytes (k, m, g, t is suitable) using -L option, or in the PE amount using -l option.

root@centos8:~ # lvcreate -n v1 -l 10 datavg
  Logical volume "v1" created.
root@centos8:~ # lvcreate -n v2 -L 40M datavg
  Logical volume "v2" created.

These are minimum required parameters - the desired volume name (-n), its size (-l or -L) and the VG name.

root@centos8:~ # lvs datavg
  LV   VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  v1   datavg -wi-a----- 40.00m                                                    
  v2   datavg -wi-a----- 40.00m
root@centos8:~ # lvdisplay datavg/v2
  --- Logical volume ---
  LV Path                /dev/datavg/v2
  LV Name                v2
  VG Name                datavg
  LV UUID                fnc9oc-x6tm-mF2g-cin3-1E7m-7YIg-PEiW2C
  LV Write Access        read/write
  LV Creation host, time centos8, 2020-10-04 18:35:28 +0300
  LV Status              available
  # open                 0
  LV Size                40.00 MiB
  Current LE             10
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           253:5

root@centos8:~ # lvdisplay -c datavg/v2
  /dev/datavg/v2:datavg:3:1:-1:0:81920:10:-1:0:-1:253:5

The lvs is a modern and compact command whereas lvdisplay is a classic one. The -c option is useful when parsing the output of "*display" commands in scripts. Using lvs -o field in scripts is also good.

Let's create filesystems and mount them:

root@centos8:~ # mkfs.ext4 -j -m0 /dev/datavg/v1
 ..
root@centos8:~ # mkfs.xfs /dev/datavg/v2
 ..
root@centos8:~ # echo "/dev/datavg/v1 /mnt/v1 ext4 defaults 1 2" >> /etc/fstab
root@centos8:~ # echo "/dev/datavg/v2 /mnt/v2 xfs defaults 0 0" >> /etc/fstab
root@centos8:~ # mkdir -p /mnt/v{1,2}
root@centos8:~ # mount -a
root@centos8:~ # df -P | grep mnt
/dev/mapper/datavg-v1        35M  782K   34M   3% /mnt/v1
/dev/mapper/datavg-v2        35M  2.4M   33M   7% /mnt/v2

However, our volume is quite small to work with it:

root@centos8:~ # rsync -a /usr/ /mnt/v1/
rsync: write failed on "/mnt/v1/bin/nmcli": No space left on device (28)
rsync error: error in file IO (code 11) at receiver.c(374) [receiver=3.1.3]
root@centos8:~ # df -P | grep mnt
/dev/mapper/datavg-v1        35M   34M  677K  99% /mnt/v1
/dev/mapper/datavg-v2        35M  2.4M   33M   7% /mnt/v2

Let's increase it:

root@centos8:~ # lvresize -L+40m /dev/datavg/v1
  Size of logical volume datavg/v1 changed from 40.00 MiB (10 extents) to 80.00 MiB (20 extents).
  Logical volume datavg/v1 successfully resized.
root@centos8:~ # resize2fs /dev/datavg/v1
resize2fs 1.45.4 (23-Sep-2019)
Filesystem at /dev/datavg/v1 is mounted on /mnt/v1; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 1
The filesystem on /dev/datavg/v1 is now 81920 (1k) blocks long.
root@centos8:~ # df -P | grep mnt
/dev/mapper/datavg-v1        74M   35M   39M  47% /mnt/v1
/dev/mapper/datavg-v2        35M  2.4M   33M   7% /mnt/v2

It's important to say that resizing works in both direction, and if it is safe to grow up, then shrinking can corrupt the data. This is why it's important to use relative values (+40m in our case) with -L (or -l) rather than absolute ones.

The resizing action clearly demonstrates the concept of PE:

root@centos8:~ # pvdisplay -m /dev/vdb
  --- Physical volume ---
  PV Name               /dev/vdb
  VG Name               datavg
  PV Size               1.00 GiB / not usable 4.00 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              255
  Free PE               225
  Allocated PE          30
  PV UUID               pdkzo6-NIjC-k7my-eeXd-l4cx-6iIM-smiGvw
   
  --- Physical Segments ---
  Physical extent 0 to 9:
    Logical volume      /dev/datavg/v1
    Logical extents     0 to 9
  Physical extent 10 to 19:
    Logical volume      /dev/datavg/v2
    Logical extents     0 to 9
  Physical extent 20 to 29:
    Logical volume      /dev/datavg/v1
    Logical extents     10 to 19
  Physical extent 30 to 254:
    FREE

As you can see, volumes are interleaved at the physical level, but the file system does not aware about it. Let's increase the second volume:

root@centos8:~ # lvresize -l+100%FREE datavg/v2
  Size of logical volume datavg/v2 changed from 40.00 MiB (10 extents) to 940.00 MiB (235 extents).
  Logical volume datavg/v2 successfully resized.
root@centos8:~ # xfs_growfs /mnt/v2
meta-data=/dev/mapper/datavg-v2  isize=512    agcount=2, agsize=5120 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=10240, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=1368, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 10240 to 240640
root@centos8:~ # df -P | grep /mnt
/dev/mapper/datavg-v1        74M   35M   39M  47% /mnt/v1
/dev/mapper/datavg-v2       935M   12M  924M   2% /mnt/v2

The +100%FREE can be used only with -l option. Grow XFS accepts mount point as argument instead of device.

root@centos8:~ # pvdisplay -m /dev/vdb
  --- Physical volume ---
  PV Name               /dev/vdb
  VG Name               datavg
  PV Size               1.00 GiB / not usable 4.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              255
  Free PE               0
  Allocated PE          255
  PV UUID               pdkzo6-NIjC-k7my-eeXd-l4cx-6iIM-smiGvw
   
  --- Physical Segments ---
  Physical extent 0 to 9:
    Logical volume      /dev/datavg/v1
    Logical extents     0 to 9
  Physical extent 10 to 19:
    Logical volume      /dev/datavg/v2
    Logical extents     0 to 9
  Physical extent 20 to 29:
    Logical volume      /dev/datavg/v1
    Logical extents     10 to 19
  Physical extent 30 to 254:
    Logical volume      /dev/datavg/v2
    Logical extents     10 to 234
   
root@centos8:~ # vgs datavg
  VG     #PV #LV #SN Attr   VSize    VFree
  datavg   1   2   0 wz--n- 1020.00m    0
root@centos8:~ # pvs /dev/vdb
  PV         VG     Fmt  Attr PSize    PFree
  /dev/vdb   datavg lvm2 a--  1020.00m    0

It is a very common mistake to grow the file system till the end on the first request. Do you remember that we wanted to copy /usr data to volume v1? The task remained, but now there is no room for it. Moreover, XFS cannot be shrinked (extX can be shrinked). The only solution is to increase the size of our VG. This can be done in two ways - increase PV or add another PV. Always use the first method if possible. Adding another PV is easy, but now the health of the VG will depends on the health of the two PVs.

The following command on KVM host resizes my vdb disk online.

kvmhost# virsh blockresize centos8 /var/lib/libvirt/images/CentOS8.vdb.qcow2 2G
Block device '/var/lib/libvirt/images/CentOS8.vdb.qcow2' is resized

Due to the nature of the virtio driver used in KVM, the disk size is automatically updated and there is no need to rescan the disk for changes. However in real life, it is common practice to rescan the block device using the command:

# echo 1 > /sys/block/sdb/device/rescan
root@centos8:~ # tail /var/log/messages
Oct  4 19:39:56 centos8 kernel: virtio_blk virtio5: [vdb] new size: 4194304 512-byte logical blocks (2.15 GB/2.00 GiB)
Oct  4 19:39:56 centos8 kernel: vdb: detected capacity change from 1073741824 to 2147483648

Although the disk itself has been resized, LVM is unaware of this, the pvs command proves it. You must change the PV size with the command:

root@centos8:~ # pvs
  PV         VG     Fmt  Attr PSize    PFree  
  /dev/vda2  rootvg lvm2 a--   <19.75g <13.75g
  /dev/vdb   datavg lvm2 a--  1020.00m      0 
root@centos8:~ # pvresize /dev/vdb
  Physical volume "/dev/vdb" changed
  1 physical volume(s) resized or updated / 0 physical volume(s) not resized
root@centos8:~ # pvs
  PV         VG     Fmt  Attr PSize   PFree  
  /dev/vda2  rootvg lvm2 a--  <19.75g <13.75g
  /dev/vdb   datavg lvm2 a--   <2.00g   1.00g

The output of the pvresize command has never been clear to me.

Migration

If we started talking about the adding another PV to VG, then we should discuss the topic of migration between disks. Now we have two volumes belonging to one VG, and there is a desire to move the XFS volume to another disk, moreover, to a separate VG. First, we need to add another disk to VG:

root@centos8:~ # lvs datavg
  LV   VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  v1   datavg -wi-ao----  80.00m
  v2   datavg -wi-ao---- 940.00m
root@centos8:~ # pvcreate /dev/vdc
  Physical volume "/dev/vdc" successfully created.
root@centos8:~ # vgextend datavg /dev/vdc
  Volume group "datavg" successfully extended
root@centos8:~ # pvs
  PV         VG     Fmt  Attr PSize    PFree   
  /dev/vda2  rootvg lvm2 a--   <19.75g  <13.75g
  /dev/vdb   datavg lvm2 a--    <2.00g    1.00g
  /dev/vdc   datavg lvm2 a--  1020.00m 1020.00m
root@centos8:~ # pvmove -b -n datavg/v2 /dev/vdb 

The pvmove command frees the PV specified in command. If there are many options for destination to move, you can also name a specific target PV. If you only want to move a specific LV, you must name it. The -b option runs the command in the background and immediately returns to the prompt. Otherwise, percentage progress will be printed. Even without the -b option, this command is safe for interrupts and even reboots. Once LVM is activated, pvmove will continue until successful completion. How it works ? It creates a temporary mirror for this LV, and as soon as all PEs are synchronized, drops the PE on the original PV. That is, the procedure is safe at any time and is done online. The drawback of this technology is that everyone, even empty PEs, is copied, which can easily inflate thin LUNs on the storage.

By checking the disks with the pvdisplay -m command, you will see that vdb contains only v1 and vdc contains only v2. It's time to split the VG into two parts. The procedure requires the split out part to be inactive (offline). This is comply with the fact that you need to update the fstab with the new names.

root@centos8:~ # df -P | grep mnt
/dev/mapper/datavg-v1        74M   35M   39M  47% /mnt/v1
/dev/mapper/datavg-v2       935M   12M  924M   2% /mnt/v2
root@centos8:~ # umount /mnt/v2
root@centos8:~ # lvchange -an datavg/v2
root@centos8:~ # vgsplit datavg xfsvg /dev/vdc
  New volume group "xfsvg" successfully split from "datavg"
root@centos8:~ # vgchange -ay xfsvg
  1 logical volume(s) in volume group "xfsvg" now active
root@centos8:~ # mount /dev/xfsvg/v2 /mnt/v2
root@centos8:~ # df -P | grep mnt
/dev/mapper/datavg-v1        74M   35M   39M  47% /mnt/v1

Nothing is mounted because systemd takes care of mounts these days (WHY???). It still remembers the old mounts and insists on it. You can see this in /var/log/messages:

Oct  4 20:54:51 centos8 kernel: XFS (dm-3): Mounting V5 Filesystem
Oct  4 20:54:51 centos8 kernel: XFS (dm-3): Ending clean mount
Oct  4 20:54:51 centos8 systemd[1]: mnt-v2.mount: Unit is bound to inactive unit dev-datavg-v2.device. Stopping, too.
Oct  4 20:54:51 centos8 systemd[1]: Unmounting /mnt/v2...
Oct  4 20:54:51 centos8 kernel: XFS (dm-3): Unmounting Filesystem
Oct  4 20:54:51 centos8 systemd[1]: Unmounted /mnt/v2.

First, update /etc/fstab with the new VG name, then:

root@centos8:~ # systemctl daemon-reload
root@centos8:~ # mount /mnt/v2
root@centos8:~ # df -P | grep mnt
/dev/mapper/datavg-v1        74M   35M   39M  47% /mnt/v1
/dev/mapper/xfsvg-v2        935M   42M  894M   5% /mnt/v2
root@centos8:~ # pvs
  PV         VG     Fmt  Attr PSize    PFree  
  /dev/vda2  rootvg lvm2 a--   <19.75g <13.75g
  /dev/vdb   datavg lvm2 a--    <2.00g  <1.92g
  /dev/vdc   xfsvg  lvm2 a--  1020.00m  80.00m

We ran a complex example of migration and splitting. Typically, the pvmove command is used to migrate between attached storages for upgrade purposes. Then the procedure looks like this...

  1. Rescan for added disks (rescan fiber, SCSI, other)
  2. Create PV by pvcreate
  3. Add new PV to VG by vgextend command
  4. Migrate out of old disk by command pvmove oldpv
  5. Remove old PV from VG using vgreduce command
  6. Remove PV header by pvremove oldpv
  7. Dangerous phase !!! Remove SCSI disk from kernel list using "scsi remove-single-device" command with correct options
  8. Detach old disk.

Refer to HOWTO LUNs on Linux using native tools for low level details.

Redundancy

I did not recommend adding multiple PVs to one VG, because then its health depends on the health of two PVs. This statement can be completely opposite when using RAID technology. LVM redundancy is implemented at the LV level. This makes the maintenance of LVM redundancy a headache. Each LV in the same VG must be created with the same level of redundancy, otherwise you could lose any LV for which you forgot to create redundancy. However, I have been using this kind of redundancy on my home NAS for many years and have replaced almost all of the original hard drives without losing a single bit.

First, we need a VG with multiple PVs. Two is enough if we are talking about a mirror. LVM supports RAID5 technology, you must have at least three PVs to implement it.

root@centos8:~ # pvcreate /dev/vd{d,e,f}
  Physical volume "/dev/vdd" successfully created.
  Physical volume "/dev/vde" successfully created.
  Physical volume "/dev/vdf" successfully created.
root@centos8:~ # vgcreate raidvg /dev/vd{d,e,f}
  Volume group "raidvg" successfully created
root@centos8:~ # pvs /dev/vd{d,e,f}
  PV         VG     Fmt  Attr PSize    PFree   
  /dev/vdd   raidvg lvm2 a--  1020.00m 1020.00m
  /dev/vde   raidvg lvm2 a--  1020.00m 1020.00m
  /dev/vdf   raidvg lvm2 a--  1020.00m 1020.00m

I will create many different LVs at once:

root@centos8:~ # lvcreate -n plain -l 10 raidvg
  Logical volume "plain" created.
root@centos8:~ # pvs /dev/vd{d,e,f}
  PV         VG     Fmt  Attr PSize    PFree   
  /dev/vdd   raidvg lvm2 a--  1020.00m  980.00m
  /dev/vde   raidvg lvm2 a--  1020.00m 1020.00m
  /dev/vdf   raidvg lvm2 a--  1020.00m 1020.00m

This is a normal LV. It resides on the vdd disk and will be lost if that disk fails.

root@centos8:~ # lvcreate -n mirror -l 10 --type raid1 -m 1 raidvg /dev/vde /dev/vdf
  Logical volume "mirror" created.
root@centos8:~ # pvs /dev/vd{d,e,f}
  PV         VG     Fmt  Attr PSize    PFree  
  /dev/vdd   raidvg lvm2 a--  1020.00m 980.00m
  /dev/vde   raidvg lvm2 a--  1020.00m 976.00m
  /dev/vdf   raidvg lvm2 a--  1020.00m 976.00m

This is an example of creating a mirrored LV. I have specified the desired PV to balance the space usage.

root@centos8:~ # lvcreate -n raid5lv -l 20 --type raid5 -i 2 raidvg
  Using default stripesize 64.00 KiB.
  Logical volume "raid5lv" created.

This is an example of creating a RAID5 volume. Stripe count -i means only stripes of data. We have three PVs here, the third will be used for CRC, which leaves us with only 2 data stripes.

root@centos8:~ # lvcreate -n stripelv -l 30 --type striped -i 3 raidvg
  Using default stripesize 64.00 KiB.
  Logical volume "stripelv" created.

A striped volume has nothing to do with redundancy. It will fail if any disk fails. Traditionally built to speed up performance. This is actually a myth. The only benefit here is the increased I/O queue size. You can achieve the same by simply increasing the queue size for the device itself. The default value of 32 is great for physical devices. When working with storage, increasing the queue size can sometimes help. Striping doesn't help with local SSD devices. It also does not solve the seek latency issue for rotating devices. Therefore, striping shows performance gains only for streaming writes, or for sequential reads, but not for real workloads.

root@centos8:~ # lvs raidvg -o +devices -a
  LV                 VG     Attr       LSize   Pool ..  Cpy%Sync Convert Devices                                                    
  mirror             raidvg rwi-a-r---  40.00m          100.00           mirror_rimage_0(0),mirror_rimage_1(0)                      
  [mirror_rimage_0]  raidvg iwi-aor---  40.00m                           /dev/vde(1)                                                
  [mirror_rimage_1]  raidvg iwi-aor---  40.00m                           /dev/vdf(1)                                                
  [mirror_rmeta_0]   raidvg ewi-aor---   4.00m                           /dev/vde(0)                                                
  [mirror_rmeta_1]   raidvg ewi-aor---   4.00m                           /dev/vdf(0)                                                
  plain              raidvg -wi-a-----  40.00m                           /dev/vdd(0)                                                
  raid5lv            raidvg rwi-a-r---  80.00m          100.00           raid5lv_rimage_0(0),raid5lv_rimage_1(0),raid5lv_rimage_2(0)
  [raid5lv_rimage_0] raidvg iwi-aor---  40.00m                           /dev/vdd(11)                                               
  [raid5lv_rimage_1] raidvg iwi-aor---  40.00m                           /dev/vde(12)                                               
  [raid5lv_rimage_2] raidvg iwi-aor---  40.00m                           /dev/vdf(12)                                               
  [raid5lv_rmeta_0]  raidvg ewi-aor---   4.00m                           /dev/vdd(10)                                               
  [raid5lv_rmeta_1]  raidvg ewi-aor---   4.00m                           /dev/vde(11)                                               
  [raid5lv_rmeta_2]  raidvg ewi-aor---   4.00m                           /dev/vdf(11)                                               
  stripelv           raidvg -wi-a----- 120.00m                           /dev/vdd(21),/dev/vde(22),/dev/vdf(22)                     

This is the final picture of the placement of the constituent parts of the created LVs on the disks.

Now I will shutdown the virtual machine and replace the (broken) last disk (vdf) with a empty one.

kvmhost# qemu-img create -f qcow2 CentOS8.vdf.qcow2 1g
Formatting 'CentOS8.vdf.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16

The raidvg started with "prtial" status, see the p in attributes:

root@centos8:~ # vgs raidvg
  WARNING: Couldn't find device with uuid XYFbbU-4RmW-c1rE-LH1H-St0c-fgAx-YT2GE1.
  WARNING: VG raidvg is missing PV XYFbbU-4RmW-c1rE-LH1H-St0c-fgAx-YT2GE1 (last written to /dev/vdf).
  VG     #PV #LV #SN Attr   VSize  VFree 
  raidvg   3   4   0 wz-pn- <2.99g <2.62g
root@centos8:~ # lvs raidvg -o +devices -a
  WARNING: Couldn't find device with uuid XYFbbU-4RmW-c1rE-LH1H-St0c-fgAx-YT2GE1.
  WARNING: VG raidvg is missing PV XYFbbU-4RmW-c1rE-LH1H-St0c-fgAx-YT2GE1 (last written to /dev/vdf).
  LV                 VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices                                                    
  mirror             raidvg rwi---r-p-  40.00m                                                     mirror_rimage_0(0),mirror_rimage_1(0)                      
  [mirror_rimage_0]  raidvg Iwi---r---  40.00m                                                     /dev/vde(1)                                                
  [mirror_rimage_1]  raidvg Iwi---r-p-  40.00m                                                     [unknown](1)                                               
  [mirror_rmeta_0]   raidvg ewi---r---   4.00m                                                     /dev/vde(0)                                                
  [mirror_rmeta_1]   raidvg ewi---r-p-   4.00m                                                     [unknown](0)                                               
  plain              raidvg -wi-------  40.00m                                                     /dev/vdd(0)                                                
  raid5lv            raidvg rwi---r-p-  80.00m                                                     raid5lv_rimage_0(0),raid5lv_rimage_1(0),raid5lv_rimage_2(0)
  [raid5lv_rimage_0] raidvg Iwi---r---  40.00m                                                     /dev/vdd(11)                                               
  [raid5lv_rimage_1] raidvg Iwi---r---  40.00m                                                     /dev/vde(12)                                               
  [raid5lv_rimage_2] raidvg Iwi---r-p-  40.00m                                                     [unknown](12)                                              
  [raid5lv_rmeta_0]  raidvg ewi---r---   4.00m                                                     /dev/vdd(10)                                               
  [raid5lv_rmeta_1]  raidvg ewi---r---   4.00m                                                     /dev/vde(11)                                               
  [raid5lv_rmeta_2]  raidvg ewi---r-p-   4.00m                                                     [unknown](11)                                              
  stripelv           raidvg -wi-----p- 120.00m                                                     /dev/vdd(21),/dev/vde(22),[unknown](22)                    

The recovery procedure depends on the version of LVM you are currently using. For CentOS 8 this would be:

root@centos8:~ # vgreduce --removemissing --force raidvg
  WARNING: Couldn't find device with uuid XYFbbU-4RmW-c1rE-LH1H-St0c-fgAx-YT2GE1.
  WARNING: VG raidvg is missing PV XYFbbU-4RmW-c1rE-LH1H-St0c-fgAx-YT2GE1 (last written to /dev/vdf).
  WARNING: Couldn't find device with uuid XYFbbU-4RmW-c1rE-LH1H-St0c-fgAx-YT2GE1.
  WARNING: Couldn't find device with uuid XYFbbU-4RmW-c1rE-LH1H-St0c-fgAx-YT2GE1.
  WARNING: Couldn't find device with uuid XYFbbU-4RmW-c1rE-LH1H-St0c-fgAx-YT2GE1.
  WARNING: Removing partial LV raidvg/stripelv.
  WARNING: Couldn't find device with uuid XYFbbU-4RmW-c1rE-LH1H-St0c-fgAx-YT2GE1.
  Logical volume "stripelv" successfully removed
  Wrote out consistent volume group raidvg.

Bye-bye, stripelv !!

root@centos8:~ # lvs raidvg -o +devices -a
  LV                 VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices                                                    
  mirror             raidvg rwi---r--- 40.00m                                                     mirror_rimage_0(0),mirror_rimage_1(0)                      
  [mirror_rimage_0]  raidvg Iwi---r--- 40.00m                                                     /dev/vde(1)                                                
  [mirror_rimage_1]  raidvg vwi---r--- 40.00m                                                                                                                
  [mirror_rmeta_0]   raidvg ewi---r---  4.00m                                                     /dev/vde(0)                                                
  [mirror_rmeta_1]   raidvg ewi---r---  4.00m                                                                                                                
  plain              raidvg -wi------- 40.00m                                                     /dev/vdd(0)                                                
  raid5lv            raidvg rwi---r--- 80.00m                                                     raid5lv_rimage_0(0),raid5lv_rimage_1(0),raid5lv_rimage_2(0)
  [raid5lv_rimage_0] raidvg Iwi---r--- 40.00m                                                     /dev/vdd(11)                                               
  [raid5lv_rimage_1] raidvg Iwi---r--- 40.00m                                                     /dev/vde(12)                                               
  [raid5lv_rimage_2] raidvg vwi---r--- 40.00m                                                                                                                
  [raid5lv_rmeta_0]  raidvg ewi---r---  4.00m                                                     /dev/vdd(10)                                               
  [raid5lv_rmeta_1]  raidvg ewi---r---  4.00m                                                     /dev/vde(11)                                               
  [raid5lv_rmeta_2]  raidvg ewi---r---  4.00m                                                                                                                
root@centos8:~ # pvcreate /dev/vdf
  Physical volume "/dev/vdf" successfully created.
root@centos8:~ # vgextend raidvg /dev/vdf
  Volume group "raidvg" successfully extended

The recovery does not happen automatically, you should run the following:

root@centos8:~ # lvconvert --repair raidvg/mirror -b
  raidvg/mirror must be active to perform this operation.
root@centos8:~ # vgchange -ay raidvg
  3 logical volume(s) in volume group "raidvg" now active
root@centos8:~ # lvconvert --repair raidvg/mirror -b
Attempt to replace failed RAID images (requires full device resync)? [y/n]: y
  Faulty devices in raidvg/mirror successfully replaced.
root@centos8:~ # lvconvert --repair raidvg/raid5lv -b
Attempt to replace failed RAID images (requires full device resync)? [y/n]: y
  Faulty devices in raidvg/raid5lv successfully replaced.
root@centos8:~ # lvs raidvg -o +devices -a
  LV                 VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert Devices                                                    
  mirror             raidvg rwi-a-r--- 40.00m                                    100.00           mirror_rimage_0(0),mirror_rimage_1(0)                      
  [mirror_rimage_0]  raidvg iwi-aor--- 40.00m                                                     /dev/vde(1)                                                
  [mirror_rimage_1]  raidvg iwi-aor--- 40.00m                                                     /dev/vdd(22)                                               
  [mirror_rmeta_0]   raidvg ewi-aor---  4.00m                                                     /dev/vde(0)                                                
  [mirror_rmeta_1]   raidvg ewi-aor---  4.00m                                                     /dev/vdd(21)                                               
  plain              raidvg -wi-a----- 40.00m                                                     /dev/vdd(0)                                                
  raid5lv            raidvg rwi-a-r--- 80.00m                                    100.00           raid5lv_rimage_0(0),raid5lv_rimage_1(0),raid5lv_rimage_2(0)
  [raid5lv_rimage_0] raidvg iwi-aor--- 40.00m                                                     /dev/vdd(11)                                               
  [raid5lv_rimage_1] raidvg iwi-aor--- 40.00m                                                     /dev/vde(12)                                               
  [raid5lv_rimage_2] raidvg iwi-aor--- 40.00m                                                     /dev/vdf(1)                                                
  [raid5lv_rmeta_0]  raidvg ewi-aor---  4.00m                                                     /dev/vdd(10)                                               
  [raid5lv_rmeta_1]  raidvg ewi-aor---  4.00m                                                     /dev/vde(11)                                               
  [raid5lv_rmeta_2]  raidvg ewi-aor---  4.00m                                                     /dev/vdf(0)                                                

You can read about the design of my next home NAS here: Redundant disks without MDRAID.

Using Snapshots

The snapshot in LVM is implemented as a real COW, which means that when written, the old block of data is copied from volume space to snapshot space, causing a double write operation for each snapshot. This means that taking a snapshot can cause performance degradation, you can afford a small number of snapshots, you need to allocate disk space for the snapshot area.

Where can the snapshot be used? Taking a snapshot before updating the OS is a very good use case. In another case, a snapshot was used to create a hot backup of the database. The database was in backup mode only for the duration for the snapshot being taken, then a regular backup was made for the data from the snapshot and the snapshot was deleted as soon as backup finished.

You must have free space in your VG to take a snapshot. One of the most important parameters is the estimated snapshot size. When more space is required than was allocated at the time of creation, the snapshot becomes "Invalid" and cannot be used, only deleted. Obviously the maximum snapshot size can be LV size if it completely overwritten.

root@centos8:~ # vgs datavg
  VG     #PV #LV #SN Attr   VSize  VFree 
  datavg   1   1   0 wz--n- <2.00g <1.92g
root@centos8:~ # lvs datavg
  LV   VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  v1   datavg -wi-ao---- 80.00m                                                    
root@centos8:~ # lvcreate -s -n snap_v1 -l 100%ORIGIN datavg/v1
  Logical volume "snap_v1" created.
root@centos8:~ # lvs datavg -a 
  LV      VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  snap_v1 datavg swi-a-s--- 84.00m      v1     0.01                                   
  v1      datavg owi-aos--- 80.00m                                                    

Let's delete all the original data and then restore it from snapshot. Reverting to a snapshot is called a "merge" in LVM terminology. Probably because the snapshot disappears as a result of the command.

root@centos8:~ # ll /mnt/v1/
total 33
dr-xr-xr-x.  2 root root 12288 Oct  4 18:50 bin
drwxr-xr-x.  2 root root  1024 May 11  2019 games
drwxr-xr-x.  3 root root  1024 Oct  4 18:50 include
dr-xr-xr-x. 30 root root  1024 Oct  4 18:50 lib
dr-x------.  2 root root  1024 Oct  4 18:50 lib64
drwx------.  2 root root  1024 Oct  4 18:50 libexec
drwx------.  2 root root  1024 Oct  4 18:50 local
dr-x------.  2 root root  1024 Oct  4 18:50 sbin
drwx------.  2 root root  1024 Oct  4 18:50 share
drwx------.  2 root root  1024 Oct  4 18:50 src
lrwxrwxrwx.  1 root root    10 May 11  2019 tmp -> ../var/tmp
root@centos8:~ # rm -rf /mnt/v1/*
root@centos8:~ # find /mnt/v1
/mnt/v1
root@centos8:~ # lvs datavg -a 
  LV      VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  snap_v1 datavg swi-a-s--- 84.00m      v1     0.26                                   
  v1      datavg owi-aos--- 80.00m                                                    
root@centos8:~ # sync
root@centos8:~ # lvs datavg -a 
  LV      VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  snap_v1 datavg swi-a-s--- 84.00m      v1     0.50                                   
  v1      datavg owi-aos--- 80.00m                                                    

The revert time !!

root@centos8:~ # lvconvert --merge datavg/snap_v1
  Delaying merge since origin is open.
  Merging of snapshot datavg/snap_v1 will occur on next activation of datavg/v1.
root@centos8:~ # lvs datavg -a 
  LV        VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  [snap_v1] datavg Swi-a-s--- 84.00m      v1     0.51                                   
  v1        datavg Owi-aos--- 80.00m                                                    
root@centos8:~ # umount /mnt/v1
root@centos8:~ # lvs datavg -a 
  LV        VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  [snap_v1] datavg Swi-a-s--- 84.00m      v1     0.51                                   
  v1        datavg Owi-a-s--- 80.00m                                                    
root@centos8:~ # lvchange -an datavg/v1
root@centos8:~ # lvs datavg -a  
  LV        VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  [snap_v1] datavg Swi---s--- 84.00m      v1                                            
  v1        datavg Owi---s--- 80.00m                                                    
root@centos8:~ # lvchange -ay datavg/v1
root@centos8:~ # lvs datavg -a         
  WARNING: Cannot find matching snapshot segment for datavg/v1.
  WARNING: Cannot find matching snapshot segment for datavg/v1.
  Internal error: WARNING: Segment type error found does not match expected type snapshot for datavg/snap_v1.
  LV        VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  [snap_v1] datavg Swi-XXs-X- 84.00m      v1                                            
  v1        datavg Owi-aos--- 80.00m                                                    
root@centos8:~ # lvs datavg -a 
  LV   VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  v1   datavg -wi-ao---- 80.00m                                                    
root@centos8:~ # mount /mnt/v1
mount: /mnt/v1: /dev/mapper/datavg-v1 already mounted on /mnt/v1.
root@centos8:~ # ll /mnt/v1/
total 33
dr-xr-xr-x.  2 root root 12288 Oct  4 18:50 bin
drwxr-xr-x.  2 root root  1024 May 11  2019 games
drwxr-xr-x.  3 root root  1024 Oct  4 18:50 include
dr-xr-xr-x. 30 root root  1024 Oct  4 18:50 lib
dr-x------.  2 root root  1024 Oct  4 18:50 lib64
drwx------.  2 root root  1024 Oct  4 18:50 libexec
drwx------.  2 root root  1024 Oct  4 18:50 local
dr-x------.  2 root root  1024 Oct  4 18:50 sbin
drwx------.  2 root root  1024 Oct  4 18:50 share
drwx------.  2 root root  1024 Oct  4 18:50 src
lrwxrwxrwx.  1 root root    10 May 11  2019 tmp -> ../var/tmp

This screenshot demonstrates a typical wrong procedure flow. You must unmount the filesystem before the "merge" command. Otherwise, the command detects that LV is busy and the action is postponed until the next LV activation. According to the man pages, it should be enough to unmount the filesystem to trigger the merge start, however it is not. Actual activation required. The normal procedure will be shown later.

The snapshot can be mounted for read-write, and you can work with it. In this case, it becomes useless for recovery. If you remove files from a snapshot and then "merge" it, the files will be removed from the origin as well.

This leads us to an interesting idea - to test the changes on the snapshot and once you are happy with the results, then apply it to original volume. For example:

root@centos8:~ # lvcreate -s -n snap_v1 -l 100%ORIGIN datavg/v1
  Logical volume "snap_v1" created.
root@centos8:~ # mkdir /mnt/v1clone
root@centos8:~ # mount /dev/datavg/snap_v1 /mnt/v1clone
root@centos8:~ # rm -rf /mnt/v1clone/*
root@centos8:~ # rsync -a /etc /mnt/v1clone/
root@centos8:~ # ll /mnt/v1clone/
total 8
drwxr-xr-x. 79 root root 7168 Oct  5 09:42 etc
root@centos8:~ # umount /mnt/v1clone
root@centos8:~ # umount /mnt/v1
root@centos8:~ # lvconvert --merge datavg/snap_v1
  Merging of volume datavg/snap_v1 started.
  datavg/v1: Merged: 77.75%
  datavg/v1: Merged: 100.00%
root@centos8:~ # mount /mnt/v1
root@centos8:~ # ll /mnt/v1/
total 8
drwxr-xr-x. 79 root root 7168 Oct  5 09:42 etc

Cloning

I already wrote a good article on Cloning Logical Volume using LVM. I will direct you there instead of repeating it here.

Recovery

This chapter is dedicated to restoring LVM headers that are overwritten for some reason. Reason number 1 - someone, following a common recommendation, creates a partition table on an already existing PV. It is fortunate if this partition table is MBR MSDOS because it small and you can recover the LVM PV header. However, the GPT table is written to the end of the disk also, which can corrupt the data on it. And if the evil has already created a file system on a freshly made partition, only data recovery will help.

This procedure is based on the fact that any LVM operation calls "vgcfgbackup" before and after processing. The last current configuration was backed up to the /etc/lvm/backup directory. Any previous states can be found in the /etc/lvm/archive. This information can be used to recover LVM headers and metadata (but not the data itself). Let's take an example:

root@centos8:~ # pvs
  PV         VG     Fmt  Attr PSize    PFree  
  /dev/vda2  rootvg lvm2 a--   <19.75g <13.75g
  /dev/vdb   datavg lvm2 a--    <2.00g  <1.92g
  /dev/vdc   xfsvg  lvm2 a--  1020.00m  80.00m
  /dev/vdd   raidvg lvm2 a--  1020.00m 892.00m
  /dev/vde   raidvg lvm2 a--  1020.00m 932.00m
  /dev/vdf   raidvg lvm2 a--  1020.00m 976.00m
root@centos8:~ # fdisk /dev/vdc

Welcome to fdisk (util-linux 2.32.1).                                                                                                                                     
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

The old LVM2_member signature will be removed by a write command.

Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0xe9d73e6a.

Command (m for help): o
Created a new DOS disklabel with disk identifier 0x43789486.
The old LVM2_member signature will be removed by a write command.

Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1): 
First sector (2048-2097151, default 2048): 
Last sector, +sectors or +size{K,M,G,T,P} (2048-2097151, default 2097151): 

Created a new partition 1 of type 'Linux' and of size 1023 MiB.
Partition #1 contains a xfs signature.

Do you want to remove the signature? [Y]es/[N]o: n

Command (m for help): p

Disk /dev/vdc: 1 GiB, 1073741824 bytes, 2097152 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x43789486

Device     Boot Start     End Sectors  Size Id Type
/dev/vdc1        2048 2097151 2095104 1023M 83 Linux

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

I am pleasantly surprised by the security improvement in the fdisk tool - it warns of all dangerous actions, even in color. Perhaps reason number one will happen less often since then. The operation I did, overwrite the MBR but not the beginning of the partition. I answered “No” when fdisk prompted to remove the XFS signature. If our evil admin answers Yes, our LVM recovery will not help, because the data itself will be corrupted.

root@centos8:~ # pvs
  PV         VG     Fmt  Attr PSize    PFree  
  /dev/vda2  rootvg lvm2 a--   <19.75g <13.75g
  /dev/vdb   datavg lvm2 a--    <2.00g  <1.92g
  /dev/vdd   raidvg lvm2 a--  1020.00m 892.00m
  /dev/vde   raidvg lvm2 a--  1020.00m 932.00m
  /dev/vdf   raidvg lvm2 a--  1020.00m 976.00m
root@centos8:~ # more /etc/lvm/backup/xfsvg
 ..
        physical_volumes {

                pv0 {
                        id = "YcxPqw-Kn9R-Sc1d-j8KS-tGLk-pD8t-qSxFyF"
                        device = "/dev/vdc"     # Hint only
 ..
root@centos8:~ # pvcreate --uuid YcxPqw-Kn9R-Sc1d-j8KS-tGLk-pD8t-qSxFyF --restorefile /etc/lvm/backup/xfsvg /dev/vdc
  WARNING: Couldn't find device with uuid YcxPqw-Kn9R-Sc1d-j8KS-tGLk-pD8t-qSxFyF.
  Device /dev/vdc excluded by a filter.

This is another mechanism to prevent accidental deletion of existing partitions. But we are sure that the /dev/vdc1 partition was created by mistake, so we just wipe it:

root@centos8:~ # wipefs /dev/vdc
DEVICE OFFSET TYPE UUID LABEL
vdc    0x1fe  dos       
root@centos8:~ # wipefs /dev/vdc -a
wipefs: error: /dev/vdc: probing initialization failed: Device or resource busy
root@centos8:~ # wipefs /dev/vdc -af
/dev/vdc: 2 bytes were erased at offset 0x000001fe (dos): 55

And again, systemd treat my Linux in Microsoft way. Thank goodness wipefs supports force deletion. Lets repeat the "pvcreate" command:

root@centos8:~ # pvcreate --uuid YcxPqw-Kn9R-Sc1d-j8KS-tGLk-pD8t-qSxFyF --restorefile /etc/lvm/backup/xfsvg /dev/vdc
  WARNING: Couldn't find device with uuid YcxPqw-Kn9R-Sc1d-j8KS-tGLk-pD8t-qSxFyF.
  Can't open /dev/vdc exclusively.  Mounted filesystem?

Bad luck. Yes, the filesystem was mounted at /mnt/v2 before the crisis, but not now. Let's continue after reboot, remember to remove mount point from fstab before reboot.

root@centos8:~ # pvcreate --uuid YcxPqw-Kn9R-Sc1d-j8KS-tGLk-pD8t-qSxFyF --restorefile /etc/lvm/backup/xfsvg /dev/vdc
  WARNING: Couldn't find device with uuid YcxPqw-Kn9R-Sc1d-j8KS-tGLk-pD8t-qSxFyF.
  Physical volume "/dev/vdc" successfully created.
root@centos8:~ # vgcfgrestore xfsvg
  Restored volume group xfsvg.
root@centos8:~ # vgs
  VG     #PV #LV #SN Attr   VSize    VFree  
  datavg   1   1   0 wz--n-   <2.00g  <1.92g
  raidvg   3   3   0 wz--n-   <2.99g   2.73g
  rootvg   1   4   0 wz--n-  <19.75g <13.75g
  xfsvg    1   1   0 wz--n- 1020.00m  80.00m
root@centos8:~ # lvs
  LV      VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  v1      datavg -wi-ao----  80.00m                                                    
  mirror  raidvg rwi-a-r---  40.00m                                    100.00          
  plain   raidvg -wi-a-----  40.00m                                                    
  raid5lv raidvg rwi-a-r---  80.00m                                    100.00          
  slash   rootvg -wi-ao----   3.00g                                                    
  swap    rootvg -wi-ao----   1.00g                                                    
  var     rootvg -wi-ao----   1.00g                                                    
  var_log rootvg -wi-ao----   1.00g                                                    
  v2      xfsvg  -wi------- 940.00m                                                    
root@centos8:~ # vgchange -ay xfsvg
  1 logical volume(s) in volume group "xfsvg" now active
root@centos8:~ # mount /dev/xfsvg/v2 /mnt/v2

The pvcreate command with the restore option creates an LVM PV header match the previous configuration. The following vgcfgrestore command restored the VG metadata to this PV header.

BMR

The following example shows the procedure for restoring from a file backup. Suppose you need to completely rebuild a server from scratch, and you don't even know what VG, LV and what size they were. You must first restore the /etc/lvm/backup directory from a tape backup, then recreate the LVM infrastructure, mount, and proceed with the full restore. I wrote about this in detail in the article Bare Metal Restore (BMR) using bareos file level backup.


Updated on Mon Oct 5 22:27:45 IDT 2020 More documentations here