Install the ZFS software as explained on the ZFS on linux site. For tests, I will use the platform created during the Redundant disks without MDRAID POC There is Fedora (25) installed there, so I followed this instruction to install ZFS.
After installation, you will have a number of disabled services:
[root@lvmraid ~]# systemctl list-unit-files | grep zfs zfs-import-cache.service disabled zfs-import-scan.service disabled zfs-mount.service disabled zfs-share.service disabled zfs-zed.service disabled zfs.target disabled
Include a few of them. I will not use the sharing services from ZFS, only mount.
[root@lvmraid ~]# systemctl enable zfs.target [root@lvmraid ~]# systemctl enable zfs-mount.service [root@lvmraid ~]# systemctl enable zfs-import-cache.service [root@lvmraid ~]# systemctl start zfs-mount.service Job for zfs-mount.service failed because the control process exited with error code. See "systemctl status zfs-mount.service" and "journalctl -xe" for details. [root@lvmraid ~]# zpool status The ZFS modules are not loaded. Try running '/sbin/modprobe zfs' as root to load them. [root@lvmraid ~]# modprobe zfs [root@lvmraid ~]# zpool status no pools available
Looks like services do not load ZFS modules at startup. Having studied the sources, you can understand that the module will be loaded automatically, if only zpool is defined. We will not check this, instead we will force the load of zfs module every boot time following to Fedora's recommendations:
[root@lvmraid ~]# cat > /etc/sysconfig/modules/zfs.modules << EOFcat #!/bin/sh exec /usr/sbin/modprobe zfs EOFcat [root@lvmraid ~]# chmod 755 /etc/sysconfig/modules/zfs.modules [root@lvmraid ~]# reboot
This script will help automate the scheduled snapshot creation and retention. I highly recommend installing this tool on the production system.
[root@lvmraid ~]# wget https://github.com/zfsonlinux/zfs-auto-snapshot/archive/master.zip [root@lvmraid ~]# unzip master.zip [root@lvmraid ~]# cd zfs-auto-snapshot-master/ [root@lvmraid zfs-auto-snapshot-master]# make install [root@lvmraid zfs-auto-snapshot-master]# cd [root@lvmraid ~]# rm -f /etc/cron.d/zfs-auto-snapshot /etc/cron.hourly/zfs-auto-snapshot \ /etc/cron.weekly/zfs-auto-snapshot /etc/cron.monthly/zfs-auto-snapshot
ZFS snapshots use Redirect-On-Write technology (mistakenly called CopyOnWrite). In the schedule of just installed scripts, a lot of snapshots are created, which in fact do not help, and instead spend a lot of resources on them. The oldest snapshot will take up a lot of disk space. Rotating frequent tiny snapshots requires a lot of computing resources. So I use only daily snapshots and the rest of the schedule I've deleted.
The ZFS pool is the place to create file systems (or volumes). The pool spread data between physical disks and takes care of the redundancy. Although you can create a pool without any redundancy technique, this is not common. We will create RAID5 from the third partition of our disks.
NOTE: If you have many disks and want to specify the size of the raid group, simply specify the keyword "raidz" after the desired number of disks.
NOTE:Almost always use an option -o ashift=12 which define 4k IO block size for pool. A default value 9 refers to 512b block size which can cause serious performance degradation.
I used -m none option to not mount whole pool. The export is the name of created pool. I plan to mount its FS under the /export hierarchy, so that's the name.
[root@lvmraid ~]# zpool create -o ashift=12 -m none export raidz /dev/vd?3 [root@lvmraid ~]# zpool list export -v NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT export 11.9T 412K 11.9T - 0% 0% 1.00x ONLINE - raidz1 11.9T 412K 11.9T - 0% 0% vda3 - - - - - - vdb3 - - - - - - vdc3 - - - - - - vdd3 - - - - - - [root@lvmraid ~]# zpool status export -v pool: export state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM export ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 vda3 ONLINE 0 0 0 vdb3 ONLINE 0 0 0 vdc3 ONLINE 0 0 0 vdd3 ONLINE 0 0 0 errors: No known data errors [root@lvmraid ~]# zpool history History for 'export': 2017-06-25.17:11:49 zpool create -o ashift=12 -m none export raidz /dev/vda3 /dev/vdb3 /dev/vdc3 /dev/vdd3 [root@lvmraid ~]#
The last command is very useful if you rarely deal with ZFS or share this duty with someone. Another useful command can be iostat for zpool:
[root@lvmraid ~]# zpool iostat export -v 5 capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- export 412K 11.9T 0 0 0 470 raidz1 412K 11.9T 0 0 0 470 vda3 - - 0 0 17 1.20K vdb3 - - 0 0 17 1.19K vdc3 - - 0 0 17 1.20K vdd3 - - 0 0 17 1.20K ---------- ----- ----- ----- ----- ----- -----
You can check the version of ZFS and the enabled features with the zpool upgrade -v command.
Here are more examples of pool creating taken from other sources. This is an example of RAID10 pool:
# zpool create -o ashift=12 -m none pool10 mirror /dev/sdc /dev/sdd mirror /dev/sde /dev/sdf spare /dev/sdg # zpool status pool10 pool: pool10 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM pool10 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 sdc ONLINE 0 0 0 sdd ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 sde ONLINE 0 0 0 sdf ONLINE 0 0 0 spares sdg AVAIL errors: No known data errors
When you have many disks, combining them into one raid group can have a performance impact. It is wise to put them into smaller groups. This is an example of a ZFS pool created from three raid groups with double parity disks (6 data + 2 parity) without spare disks.
# zpool create -o ashift=12 -m none internal \ raidz2 /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj \ raidz2 /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr \ raidz2 /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy /dev/sdz # zpool status internal pool: internal state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM internal ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 sdc ONLINE 0 0 0 sdd ONLINE 0 0 0 sde ONLINE 0 0 0 sdf ONLINE 0 0 0 sdg ONLINE 0 0 0 sdh ONLINE 0 0 0 sdi ONLINE 0 0 0 sdj ONLINE 0 0 0 raidz2-1 ONLINE 0 0 0 sdk ONLINE 0 0 0 sdl ONLINE 0 0 0 sdm ONLINE 0 0 0 sdn ONLINE 0 0 0 sdo ONLINE 0 0 0 sdp ONLINE 0 0 0 sdq ONLINE 0 0 0 sdr ONLINE 0 0 0 raidz2-2 ONLINE 0 0 0 sds ONLINE 0 0 0 sdt ONLINE 0 0 0 sdu ONLINE 0 0 0 sdv ONLINE 0 0 0 sdw ONLINE 0 0 0 sdx ONLINE 0 0 0 sdy ONLINE 0 0 0 sdz ONLINE 0 0 0 errors: No known data errors
When ZFS pool builds from virtual disks, files or other storage LUNs, it is possible online resize pool by increasing disk size. You should set pool option autoexpand to "on" (it is off by default) and cause zpool rescan disks online:
# zpool set autoexpand=on zpoolname # zpool get autoexpand zpoolname NAME PROPERTY VALUE SOURCE zpoolname autoexpand on local # zpool online zpoolname /dev/datavg/zfs
A ZFS performance depends on its pool structure. Large sized raidz shows usually a bad performance, as almost any IO cause full stripe be readen or rewritten. Adding log and cache on any fast single device could improve the performance.
# lvcreate -n zfslog -L128m rootvg # lvcreate -n zfscache -L128g rootvg # zpool add z1 log /dev/rootvg/zfslog # zpool add z1 cache /dev/rootvg/zfscache
It is possible to add / remove / resize cache without interruption
# zpool remove z1 /dev/rootvg/zfscache # lvresize -L256g /dev/rootvg/zfscache # zpool add z1 cache /dev/rootvg/zfscache
[root@lvmraid ~]# zfs create -o mountpoint=/export/data export/data [root@lvmraid ~]# df -hP Filesystem Size Used Avail Use% Mounted on .. export/data 8.7T 128K 8.7T 1% /export/data
This is an example of creating a simple file system. The file system will be automatically mounted if the mountpoint option is specified. If you want to mount ZFS manually using mount command, you could set so called "legacy" mountpoint, like this:
# zfs set mountpoint=legacy export/data # mount.zfs export/data /mnt
Another usefull filesystem option, related to linux, could be -o acltype=posixacl -o xattr=sa, which add support for extended ACL attributes and embed this info into same IO call.
You can check attributes by using zfs get command
# zfs get all export/data NAME PROPERTY VALUE SOURCE ..
Check if encryption supported and enabled for your zpool:
# zpool get feature@encryption zpoolname NAME PROPERTY VALUE SOURCE zpoolname feature@encryption active local
If it supported, but disabled, then enable it:
# zpool set feature@encryption=enabled zpoolname
Finally, create the encrypted fileset:
# zfs create -o encryption=on -o keyformat=passphrase -o keylocation=prompt -o mountpoint=/mnt/home zpoolname/home
Let's create the very first snapshot for this FS. It is totally useless, because there is no data in FS, but it will demonstrate the use of disk space by snapshots.
[root@lvmraid ~]# zfs snap export/data@initial
Blue is a desired snapshot name, brown is the name of FS to take snapshot.
Now, lets copy some data into FS, then take another snapshot:
[root@lvmraid ~]# rsync -a /etc /export/data/ [root@lvmraid ~]# df -hP /export/data/ Filesystem Size Used Avail Use% Mounted on export/data 8.7T 18M 8.7T 1% /export/data [root@lvmraid ~]# zfs snap export/data@etc_copied [root@lvmraid ~]# zfs list export/data -o space NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD export/data 8.65T 17.4M 22.4K 17.4M 0 0 [root@lvmraid ~]# zfs list -r export/data -t snapshot -o space NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD export/data@initial - 22.4K - - - - export/data@etc_copied - 0 - - - -
As you can see, the new data (18M according to "df") does not affect the disk usage of the snapshot. It will hold previously deleted data. Let's delete something to demonstrate.
[root@lvmraid ~]# rm -rf /export/data/etc [root@lvmraid ~]# df -hP /export/data Filesystem Size Used Avail Use% Mounted on export/data 8.7T 128K 8.7T 1% /export/data [root@lvmraid ~]# zfs list export/data -o space NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD export/data 8.65T 17.4M 17.4M 25.4K 0 0 [root@lvmraid ~]# zfs list -r export/data -t snapshot -o space NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD export/data@initial - 22.4K - - - - export/data@etc_copied - 17.4M - - - -
The amount of deleted data is subtracted from the data part of the FS data (as shown by "df") and is added to the snapshot usage space ("USEDSNAP" column). A more detailed listing shows that this space belongs to the snapshot "etc_copied". The initial snapshot still uses almost zero space, because deleted data had not exist when this snapshot was created.
You can revert the whole FS only to latest snapshot. If you want to revert to other than latest snapshot, you have to remove latest snapshot before revert.
[root@lvmraid ~]# zfs rollback export/data@etc_copied [root@lvmraid ~]# df -hP /export/data Filesystem Size Used Avail Use% Mounted on export/data 8.7T 18M 8.7T 1% /export/data [root@lvmraid ~]# zfs list export/data -o space NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD export/data 8.65T 17.4M 23.9K 17.4M 0 0 [root@lvmraid ~]# zfs list -r export/data -t snapshot -o space NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD export/data@initial - 22.4K - - - - export/data@etc_copied - 1.50K - - - -
FS was reverted, and snapshot's disk usage was turned back be a data usage.
Another great command when working with ZFS snapshots:
[root@lvmraid ~]# rm /export/data/etc/passwd rm: remove regular file '/export/data/etc/passwd'? y [root@lvmraid ~]# zfs snap export/data@passwd_removed [root@lvmraid ~]# zfs diff export/data@etc_copied export/data@passwd_removed M /export/data/etc - /export/data/etc/passwd - /export/data/etc/passwd/<xattrdir> - /export/data/etc/passwd/<xattrdir>/security.selinux
The screenshot above explains itself. It is a very nice command - zfs diff !
Let's restore one file from the snapshot, copying it back:
[root@lvmraid ~]# cd /export/data/.zfs/snapshot [root@lvmraid snapshot]# ll total 0 dr-xr-xr-x. 1 root root 0 Jun 25 20:32 etc_copied dr-xr-xr-x. 1 root root 0 Jun 25 20:32 initial dr-xr-xr-x. 1 root root 0 Jun 25 20:32 passwd_removed [root@lvmraid snapshot]# rsync -av etc_copied/etc/passwd /export/data/etc/ sending incremental file list passwd sent 1,182 bytes received 35 bytes 2,434.00 bytes/sec total size is 1,090 speedup is 0.90 [root@lvmraid snapshot]# df -hP Filesystem Size Used Avail Use% Mounted on .. export/data 8.7T 18M 8.7T 1% /export/data export/data@etc_copied 8.7T 18M 8.7T 1% /export/data/.zfs/snapshot/etc_copied [root@lvmraid snapshot]# zfs diff export/data@etc_copied M /export/data/etc - /export/data/etc/passwd - /export/data/etc/passwd/<xattrdir> - /export/data/etc/passwd/<xattrdir>/security.selinux + /export/data/etc/passwd + /export/data/etc/passwd/<xattrdir> + /export/data/etc/passwd/<xattrdir>/security.selinux
The hidden directory .zfs automatically mounts the required snapshot for you, and then you can copy single file from there. The "zfs diff" command proves that this is not a real revert. The snapshot still contains deleted data blocks, and the file (with exactly the same name and matadata) is newly created in the FS new data blocks.
First, we will find all the snapshots belonging to FS that we need to clone.
TIP: The zfs list command can be very slow on the loaded system. A much faster way to check the names of snapshots is to list the .zfs/snapshot pseudo directory.
[root@lvmraid ~]# zfs list -r export/data -t snapshot NAME USED AVAIL REFER MOUNTPOINT export/data@initial 22.4K - 25.4K - export/data@etc_copied 35.2K - 17.4M - export/data@passwd_removed 28.4K - 17.4M - [root@lvmraid ~]# zfs clone -o mountpoint=/clone/data export/data@etc_copied export/data_clone [root@lvmraid ~]# df -hP .. export/data 8.7T 18M 8.7T 1% /export/data export/data_clone 8.7T 18M 8.7T 1% /clone/data
Then the clone was created using the snapshot as the basis. And, of course, you can mount it somewhere else.
It is not so easy to determine what a clone is and what is a base snapshot. Here is one of ways to find out the truth:
[root@lvmraid ~]# zfs list -o name,origin,clones -r -t snapshot export/data NAME ORIGIN CLONES export/data@initial - export/data@etc_copied - export/data_clone export/data@passwd_removed - [root@lvmraid ~]# zfs list -o name,origin,clones export/data_clone NAME ORIGIN CLONES export/data_clone export/data@etc_copied -
ZFS has an interesting feature that I will demonstrate here:
[root@lvmraid ~]# zfs destroy -r export/data cannot destroy 'export/data': filesystem has dependent clones use '-R' to destroy the following datasets: export/data_clone [root@lvmraid ~]# zfs promote export/data_clone [root@lvmraid ~]# zfs list -o name,origin,clones export/data NAME ORIGIN CLONES export/data export/data_clone@etc_copied - [root@lvmraid ~]# zfs list -o name,origin,clones export/data_clone -r -t snapshot NAME ORIGIN CLONES export/data_clone@initial - export/data_clone@etc_copied - export/data
As a result of promote command, the clone and its base switched their roles, and the clone inherited all the previous snapshots. Now it is possible to remove the origin FS:
[root@lvmraid ~]# zfs destroy -r export/data [root@lvmraid ~]# zfs list -o name,origin,clones export/data_clone -r -t snapshot NAME ORIGIN CLONES export/data_clone@initial - export/data_clone@etc_copied -
We will use the same ZFS system as origin and target system. Therefore, the sending process is piped to the receiving process. You can use another ZFS system for data replicattion over the network. SSH can be used as a channel if you want additional protection on the wire or netcat if you want to achieve copy efficiency.
First we need to select the desired snapshot to start with it:
[root@lvmraid ~]# zfs list -r -t snapshot export/data_clone NAME USED AVAIL REFER MOUNTPOINT export/data_clone@initial 22.4K - 25.4K - export/data_clone@etc_copied 22.4K - 17.4M - [root@lvmraid ~]# zfs send -R export/data_clone@etc_copied | zfs recv -v export/data receiving full stream of export/data_clone@initial into export/data@initial received 39.9KB stream in 1 seconds (39.9KB/sec) receiving incremental stream of export/data_clone@etc_copied into export/data@etc_copied received 20.0MB stream in 1 seconds (20.0MB/sec) cannot mount '/clone/data': directory is not empty [root@lvmraid ~]# zfs set mountpoint=/export/data export/data [root@lvmraid ~]# zfs mount export/data [root@lvmraid ~]# df -hP .. export/data_clone 8.7T 18M 8.7T 1% /clone/data export/data 8.7T 18M 8.7T 1% /export/data [root@lvmraid ~]# zfs list -r -t snapshot export/data NAME USED AVAIL REFER MOUNTPOINT export/data@initial 22.4K - 25.4K - export/data@etc_copied 0 - 17.4M -
The mount point was occupied by the original FS, then I changed it to another, and the mount was successful.
As you can see, FS is copied completely, including the contents of the snapshots. This is a good way to transfer data from one ZFS system to another. The incremental copy is supported.
The replication to remote server could be piped to SSH, what besides a traffic encryption solves an authentication and permissions problem. For example, you could have a user called prodzfs at your DR storage and root user could ssh to it without password, using authorized key. Then, the replication script can include something similar to:
.. # Remote zfs command: ZFS=" ssh prodzfs@dr-storage sudo zfs"
Give to it sudo permitions, of course.
When initializing big filesystems, an interruption could happen, such as network disconnection, powerfail, etc. Allways use an option -s when initializing big filesystems for replication. This option allows resume a replication from moment of fault instead of starting over. Let's see an example:
# zfs send z1/iso@backupScript_202008282355 | $ZFS receive -s -F -u dr/iso ^C # $ZFS get receive_resume_token dr/iso NAME PROPERTY VALUE SOURCE internal/iso receive_resume_token 1-f69e398fb-d0-2KilometerLongLine - # zfs send -t 1-f69e398fb-d0-2KilometerLongLine | $ZFS receive -s dr/iso@backupScript_202008282355
Once interruption occure (simulated here by pressing Ctrl-C), you have to get resume token from remote site. Pay attention that $ZFShere implements tip shown above. Then the operation can be resumed by referencing to this token. However, the receive side syntax should refer to originated snapshot, probably because this information missing in input stream.
Here is an example of incremental update of remote site:
prodstorage:~ # cat /root/bin/zfs-resync.sh #!/bin/bash # 3 3 * * * [ -x /root/bin/zfs-resync.sh ] && /root/bin/zfs-resync.sh |& logger -t zfs-resync # Local pool: LP=z1 # Remote pool: RP="dr" # Remote zfs command: ZFS=" ssh prodzfs@drstorage sudo zfs" # Filesystems to resync: FS=" home iso public www " for F in $FS ; do # get last (in list, not in time) remote snap name: FROMSNAP=$($ZFS list -H -t snapshot -o name -r ${RP}/${F} | awk -F"@" 'END {print $2}') [ -z $FROMSNAP ] && continue # get last (in list, not in time) local snap name: TOSNAP=$(/usr/sbin/zfs list -H -t snapshot -o name -r ${LP}/${F} | awk -F"@" 'END {print $2}') [ -z $TOSNAP ] && continue [ "$FROMSNAP" = "$TOSNAP" ] && continue # send incremental stream: echo "/usr/sbin/zfs send -R -I ${LP}/${F}@${FROMSNAP} ${LP}/${F}@${TOSNAP} | $ZFS receive -u ${RP}/${F}" /usr/sbin/zfs send -R -I ${LP}/${F}@${FROMSNAP} ${LP}/${F}@${TOSNAP} | $ZFS receive -u ${RP}/${F} done
Global overview
# zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT noname 299G 216G 82.7G - - 58% 72% 1.00x ONLINE -The most interesting value is "CAP". When it exceeds 85%, overall ZFS performance is reduced. The other value, "FRAG", is for informational purposes only. This is not about data fragmentation, which is impossible due to the ZFS philosophy. This value indicates the fragmentation of free available space and generally reflects the usage and maturity of that zpool. There is nothing you can do about this value, the only way is to replicate the data to another pool and reformat the original. However, after a while this value will change back because it reflects the usage pattern.
You can see a little more detailed view with -v option
# zpool list -v NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT internal 26.2T 19.4T 6.79T - - 10% 74% 1.00x ONLINE - raidz2-0 8.72T 6.45T 2.27T - - 10% 74.0% - ONLINE sdb 1.09T - - - - - - - ONLINE sdc 1.09T - - - - - - - ONLINE sdd 1.09T - - - - - - - ONLINE sde 1.09T - - - - - - - ONLINE sdf 1.09T - - - - - - - ONLINE sdg 1.09T - - - - - - - ONLINE sdh 1.09T - - - - - - - ONLINE sdi 1.09T - - - - - - - ONLINE raidz2-1 8.72T 6.47T 2.25T - - 10% 74.2% - ONLINE sdj 1.09T - - - - - - - ONLINE sdk 1.09T - - - - - - - ONLINE sdl 1.09T - - - - - - - ONLINE sdm 1.09T - - - - - - - ONLINE sdn 1.09T - - - - - - - ONLINE sdo 1.09T - - - - - - - ONLINE sdp 1.09T - - - - - - - ONLINE sdq 1.09T - - - - - - - ONLINE raidz2-2 8.72T 6.45T 2.27T - - 11% 74.0% - ONLINE sdr 1.09T - - - - - - - ONLINE sds 1.09T - - - - - - - ONLINE sdt 1.09T - - - - - - - ONLINE sdu 1.09T - - - - - - - ONLINE sdv 1.09T - - - - - - - ONLINE sdw 1.09T - - - - - - - ONLINE sdx 1.09T - - - - - - - ONLINE sdy 1.09T - - - - - - - ONLINEThis output adds details about RAIDZ groups. However, there is no disk filling information.
Let's drill down the usage of zpool by its filesets.
# zfs list NAME USED AVAIL REFER MOUNTPOINT noname 216G 73.3G 24K none noname/SuseAcademy 28.3G 73.3G 896M /mnt/SuseAcademy noname/home 179G 73.3G 87.4G /homeThis is a quick overview of filesets and space they used. You must be able to read it correctly. The "REFER" value shows how much space is being used by the current data. "USED" shows any addition beyond the data itself.
Lets add more details to see
# zfs list noname/SuseAcademy -o space NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD noname/SuseAcademy 73.3G 28.3G 27.5G 896M 0B 0BAs mentioned above, most of the space is taken up by snapshots (“USEDSNAP”).
Lets find snapshots taking space.
# zfs list noname/SuseAcademy -r -t all NAME USED AVAIL REFER MOUNTPOINT noname/SuseAcademy 28.3G 73.3G 896M /mnt/SuseAcademy .. noname/SuseAcademy@zfs-auto-snap_daily-2023-09-03-1621 0B - 28.3G - noname/SuseAcademy@zfs-auto-snap_daily-2023-09-04-1329 0B - 28.3G - noname/SuseAcademy@zfs-auto-snap_daily-2023-09-05-0650 0B - 896M - noname/SuseAcademy@zfs-auto-snap_daily-2023-09-06-0617 0B - 896M - ..I removed several similar lines above and below the data changes. The output shows that the deleted data is not shown as the snapshot value "USED". This is shown in the value "REFER". Around this time, I deleted about 28 GB of outdated VM images, after which REFER decreased. If you want to see exactly what has been changed, you can use the zfs diff command.
Let's remove one disk. ZFS does not detect a problem until it accesses the disks, and then shows:
[root@lvmraid ~]# zpool status pool: export state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://zfsonlinux.org/msg/ZFS-8000-4J scan: none requested config: NAME STATE READ WRITE CKSUM export DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 vda3 ONLINE 0 0 0 vdb3 ONLINE 0 0 0 vdc3 UNAVAIL 1 432 0 corrupted data vdd3 ONLINE 0 0 0 errors: No known data errors
Now reconnect the disconnected disk. ZFS does not see the reconnected disk, they may somehow have to be rescanned. I was too lazy to read the manual and I just rebooted the server. Everything returned to normal:
[root@lvmraid ~]# zpool status pool: export state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://zfsonlinux.org/msg/ZFS-8000-9P scan: none requested config: NAME STATE READ WRITE CKSUM export ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 vda3 ONLINE 0 0 0 vdb3 ONLINE 0 0 0 vdc3 ONLINE 0 0 95 vdd3 ONLINE 0 0 0 errors: No known data errors [root@lvmraid ~]# zpool clear export [root@lvmraid ~]# zpool status pool: export state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM export ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 vda3 ONLINE 0 0 0 vdb3 ONLINE 0 0 0 vdc3 ONLINE 0 0 0 vdd3 ONLINE 0 0 0 errors: No known data errors
Now, the hard part. I'm going to replace the disk with an empty one. First, we need to copy the partition table from other disks, as described in Redundant disks without MDRAID.
Then fix ZFS by replacing device:
[root@lvmraid ~]# zpool status pool: export state: DEGRADED status: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the device using 'zpool replace'. see: http://zfsonlinux.org/msg/ZFS-8000-4J scan: none requested config: NAME STATE READ WRITE CKSUM export DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 vda3 ONLINE 0 0 0 vdb3 ONLINE 0 0 0 vdc3 UNAVAIL 1 333 0 corrupted data vdd3 ONLINE 0 0 0 errors: No known data errors [root@lvmraid ~]# zpool replace export /dev/vdc3 /dev/vde3 [root@lvmraid ~]# zpool status pool: export state: ONLINE scan: resilvered 21.9M in 0h0m with 0 errors on Mon Jun 26 17:35:56 2017 config: NAME STATE READ WRITE CKSUM export ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 vda3 ONLINE 0 0 0 vdb3 ONLINE 0 0 0 vde3 ONLINE 0 0 0 vdd3 ONLINE 0 0 0 errors: No known data errors
The hard part becomes peace of cake.