HOWTO NFSv4 on linux

I have avoided using NFSv4 for a long time, but ignoring the problem does not mean solving it. Let's check out how to survive with NFSv4 on Linux using CentOS 8 as an example.

Differences between v3 and v4

NFSv3 does not support extended ACL, but controls file access according to the POSIX UID/GID scheme. This brings us to the problem of NFS clients where local users with mismatched UIDs cannot access files on an NFS share. Moreover, it leads to a security leak between such users. In contrast, NFSv4 supports string-based extended ACLs. The file is owned by USER@DOMAIN (similar to Windows) instead of the UID/GID combination.

As follows from the changes above, the user now has to prove the USER@DOMAIN he is pretending to be. Implemented a Kerberos-based authentication scheme, making NFSv4 more AD-related. It is possible to make v4 work without Kerberos, with the loss of the specified functionality.

Another strange ability has been added to solve a nonexistent problem - it is a pseudo-root filesystem. This is best described with examples.

Stateful version 4 is marked as an advantage over stateless version 3. This notion looks ridiculous in the age of stateless microservices, proving their scalability and redundancy.

Fewer processes and fewer open ports is definitely a big plus for v4. No need for rpcbind and the rest of the family. The new protocol is much more firewall friendly.

Setup basic NFSv4 server

Since this POC is using CentOS 8, I used RedHat 3320581 solution as a starting point to set up an NFSv4 only server, so as not to mix its capabilities with other NFS versions. Install the nfs-utils RPM and edit /etc/nfs.conf to explicitly enable all v4 options and disable others:

[nfsd]
vers2=n
vers3=n
vers4=y
vers4.0=y
vers4.1=y
vers4.2=y

RPC is not used in v4, so it should be disabled. As a side effect, this will make the NFSv3 client impossible on that server, but in general, the server is usually not a client.

nfs-server:~ # systemctl mask --now rpc-statd.service rpcbind.service rpcbind.socket
nfs-server:~ # systemctl enable --now nfs-server

Add something to exports:

nfs-server:~ # mkdir -p /export/project1/work
nfs-server:~ # cat /etc/exports
/export/project1        -sec=sys,no_root_squash,sync    192.168.120.0/24(ro)
nfs-server:~ # exportfs -a
nfs-server:~ # exportfs
/export/project1
                192.168.120.0/24
nfs-server:~ # systemctl disable --now firewalld.service
Removed /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.

Alternatively, instead of disabling the firewall, you can add NFS ports on both the server and client side, for example:

# firewall-cmd --list-all
# firewall-cmd --add-service=nfs --permanent
# firewall-cmd --reload

Setup basic NFSv4 client

Install the same nfs-utils package and repeat the same configuration for /etc/nfs.conf. Almost the same services should be disabled/enabled:

nfs-client1:~ # yum install -y nfs-utils
nfs-client1:~ # vi /etc/nfs.conf
nfs-client1:~ # systemctl mask --now rpc-statd.service rpcbind.service rpcbind.socket
Created symlink /etc/systemd/system/rpc-statd.service → /dev/null.
Created symlink /etc/systemd/system/rpcbind.service → /dev/null.
Created symlink /etc/systemd/system/rpcbind.socket → /dev/null.
nfs-client1:~ # systemctl enable --now nfs-client.target
nfs-client1:~ # systemctl disable --now firewalld.service
Removed /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
nfs-client1:~ # mount -t nfs4 nfs-server:/export/project1 /mnt
nfs-client1:~ # mount
 ..
nfs-server:/export/project1 on /mnt type nfs4 (rw,relatime,vers=4.2,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.120.209,local_lock=none,addr=192.168.120.241)

NOTE: The handy "showmount" tool will not work here because it uses disabled RPC.

Pseudo root filesystem feature

Unmount clients and unexport NFS:

nfs-client1:~ # umount /mnt
nfs-server:~ # exportfs -ua
nfs-server:~ # cat /etc/exports
/export/project1        -sec=sys,no_root_squash,sync,fsid=root  192.168.120.0/24(ro)
/export/project1/work   -sec=sys,no_root_squash,sync    nfs-client1(rw)
nfs-server:~ # exportfs -a
nfs-server:~ # exportfs
/export/project1/work
                nfs-client1
/export/project1
                192.168.120.0/24
nfs-client1:~ # mount -t nfs4 nfs-server:/ /mnt
nfs-client1:~ # df /mnt
Filesystem      Size  Used Avail Use% Mounted on
nfs-server:/    2.9G  1.2G  1.6G  42% /mnt
nfs-client1:~ # ll /mnt/
total 4
drwxr-xr-x. 2 root root 4096 Dec 29 09:50 work

What have we achieved here? The client is unaware of the server's file system structure and simply mounts the /. This probably safer. For example, the export path "/vol/my_volume" might give me a hint that the remote system is NetApp. Let's try to play with this feature some more:

nfs-client1:~ # umount /mnt
nfs-client1:~ # mount -t nfs4 nfs-server:/work /mnt
nfs-client1:~ # ll /mnt
total 0
nfs-client1:~ # df /mnt
Filesystem        Size  Used Avail Use% Mounted on
nfs-server:/work  2.9G  1.2G  1.6G  42% /mnt
nfs-client1:~ # touch /mnt/client1.file
touch: cannot touch '/mnt/client1.file': Read-only file system

The last command demonstrates that the second line in /etc/exports did not work. After doing a few experiments, I discovered that the syntax for /etc/exports does not match the manual. It should be always checked with the exportfs -v command to verify the actual configuration. For example:

nfs-server:~ # exportfs -v
/export/project1/work
                nfs-client1(sync,wdelay,hide,no_subtree_check,sec=sys,ro,secure,no_root_squash,no_all_squash)
/export/project1
                192.168.120.0/24(sync,wdelay,hide,no_subtree_check,fsid=0,sec=sys,ro,secure,no_root_squash,no_all_squash)

As you can see, a nfs-client1 got ro permissions when it is explicitely sat as rw. The working setup looks like:

nfs-server:~ # cat /etc/exports
/export/project1        -sec=sys,no_root_squash,sync,fsid=root  192.168.120.0/24(ro)
/export/project1/work   -sec=sys,no_root_squash,sync,rw    nfs-client1

Let's go back to the pseudo root filesystem. After fact that /export/project1 has been set as the root fsid, it is subtracted from the rest of the exports. The client cannot mount directly /export/project1/work, but only /work.

Another bizarre behavior is that if the exported "work" is a directory into "/export/project1", then the parent rule is applied to it, ignoring explicitly specified rule. You have to mount the new filesystem in this directory to cause the child rule work. Therefore, many examples found on the Internet use the pseudo "mount -o bind". So, I built the following structure for tests on an nfs server:

nfs-server:~ # df | grep export
/dev/mapper/rootvg-export    976M  2.6M  907M   1% /export
/dev/mapper/rootvg-project1  976M  2.6M  907M   1% /export/project1
/dev/mapper/rootvg-work1     976M  2.6M  907M   1% /export/project1/work
/dev/mapper/rootvg-project2  976M  2.6M  907M   1% /export/project2
/dev/mapper/rootvg-work2     976M  2.6M  907M   1% /export/project2/work

And in the end, everything worked as intended:

nfs-client1:~ # mount -t nfs4 nfs-server:/ /mnt
nfs-client1:~ # mount -t nfs4 nfs-server:/work /mnt/work
nfs-client1:~ # df
 ..
nfs-server:/                976M  2.5M  907M   1% /mnt
nfs-server:/work            976M  2.5M  907M   1% /mnt/work
nfs-client1:~ # touch /mnt/client1.file
touch: cannot touch '/mnt/client1.file': Read-only file system
nfs-client1:~ # touch /mnt/work/client1.file
nfs-client1:~ # ls -l /mnt/work/client1.file
-rw-r--r--. 1 root root 0 Dec 29 14:37 /mnt/work/client1.file
nfs-client1:~ # umount /mnt/work
nfs-client1:~ # umount /mnt

Let's check out another use of this pseudo-root function. Imagine that a server family belongs to project1 and another server family belongs to project2. The export file looks like this:

nfs-server:~ # cat /etc/exports
/export/project1        -sec=sys,no_root_squash,sync,fsid=root,ro       nfs-client1
/export/project1/work   -sec=sys,no_root_squash,sync,rw    		nfs-client1
/export/project2        -sec=sys,no_root_squash,sync,fsid=root,ro       nfs-client2
/export/project2/work   -sec=sys,no_root_squash,sync,rw   		nfs-client2

There are two NFS exports as root fsid. This is fine as long as they are exported to different clients. Let's check on both clients:

nfs-client1:~ # mount -t nfs4 nfs-server:/ /mnt
nfs-client1:~ # mount -t nfs4 nfs-server:/work /mnt/work
nfs-client1:~ # ll /mnt/work/
total 16
-rw-r--r--. 1 root root     0 Dec 29 14:37 client1.file
drwx------. 2 root root 16384 Dec 29 11:50 lost+found
nfs-client2:~ # mount -t nfs4 nfs-server:/ /mnt
nfs-client2:~ # mount -t nfs4 nfs-server:/work /mnt/work
nfs-client2:~ # ll /mnt/work/
total 16
drwx------. 2 root root 16384 Dec 29 11:50 lost+found

O! It might finally be useful. You can manage your production and test environments the same way, keeping the differences only on the NFS server side.

Kerberos authentication

I found some examples of how to configure NFSv4 to work with Kerberos. The most solid explanations can be found from Microsoft. It looks like NFSv4 was sponsored by Microsoft to bring the UNIX universe closer.

As for the bottom line - if you want to use files ownership in form USER@DOMAIN as by design, you need to implement sec=krb5. You have to create service accounts in AD for both the server and NFS client, generate a keytab for them, and set them accordingly. Only then can both of them talk to the KDC to verify user information.

This looks very useful when one of the NFS servers or clients is a Microsoft Windows server. But this is overkill for NFS services between two Linux servers.

Authentication sec=sys

This option returns the ability to use UID/GID pair between NFS server and client.

Lets investigate how it works:

nfs-server:~ # useradd -g users -u 1000 user1000
nfs-server:~ # useradd -g users -u 1001 user1001

However, on both clients, we will mix users up:

nfs-client1:~ # useradd -g users -u 1000 bob
nfs-client1:~ # useradd -g users -u 1001 alice
nfs-client2:~ # useradd -g users -u 1000 alice
nfs-client2:~ # useradd -g users -u 1001 bob

For this experiment, we don't need two sets of exports, one set will be used by two clients:

nfs-server:~ # exportfs -ua
nfs-server:~ # cat /etc/exports
/export/project1        -sec=sys,no_root_squash,sync,fsid=root,ro       nfs-client1 nfs-client2
/export/project1/work   -sec=sys,no_root_squash,sync,rw                 nfs-client1 nfs-client2
nfs-server:~ # exportfs -a
nfs-server:~ # exportfs -v
/export/project1
                nfs-client1(sync,wdelay,hide,no_subtree_check,fsid=0,sec=sys,ro,secure,no_root_squash,no_all_squash)
/export/project1
                nfs-client2(sync,wdelay,hide,no_subtree_check,fsid=0,sec=sys,ro,secure,no_root_squash,no_all_squash)
/export/project1/work
                nfs-client1(sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash)
/export/project1/work
                nfs-client2(sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash)

Finally, let's mount on both clients and run some tests:

nfs-client2:~ # mount -t nfs4 nfs-server:/ /mnt
nfs-client2:~ # mount -t nfs4 nfs-server:/work /mnt/work
nfs-client2:~ # ll /mnt/work/client1.file 
-rw-r--r--. 1 root root 0 Dec 29 14:37 /mnt/work/client1.file
root@nfs-client1:~ # chown bob:users /mnt/work/client1.file
root@nfs-client1:~ # ll /mnt/work/client1.file 
-rw-r--r--. 1 bob users 0 Dec 29 14:37 /mnt/work/client1.file
root@nfs-client2:~ # ll /mnt/work/client1.file 
-rw-r--r--. 1 alice users 0 Dec 29 14:37 /mnt/work/client1.file
root@nfs-server:~ # ll /export/project1/work/client1.file 
-rw-r--r--. 1 user1000 users 0 Dec 29 14:37 /export/project1/work/client1.file

As expected, switching to the old UID/GID scheme returns the same issues - the UID and GID must be the same for all actors in the NFS play. And of course there is no extended ACL in this scheme. This is probably a limitation of the built-in server, because a Google search shows that the netapp and ganesha NFS servers support ACLs when sec=sys. However, I have not tested this yet.

Updated on Thu Jan 14 12:15:13 IST 2021 More documentations here