fstrim, discard and storage

I first encountered this problem when I tried to format the XFS file system on a rather large (several terabytes) volume from the IBM V5000 storage. The formatting process was stuck, the storage CPU was 100% busy with no bandwidth and no I/O operations. The main reason was the "discard" process. When mkfs sees a device that supports the "discard" option, it sends this command to the storage. The command is rather short, this explains why there was no I/O, but the storage becomes busy when executing this command.

The second time this happened with EMC storage, which unexpectedly dropped all its paths until the multipath driver finally gave up. The LUN becomes disconnected and the system almost hung. The reason was that the OS (SuSE) tried to return unused blocks with the "fstrim" command, which sends the same "discard" command to the disk device. EMC storage became busy and stopped responding to the port. The multipath "directio" path_checker could not get a response from the storage, so it declared this path failed. The multipath driver has already spread discard commands between all ports, so they fall one by one until the LUN is declared as disconnected. The disconnect problem can be solved by playing with parameters in multipath.conf, but this does not eliminate the storage freeze.

Since we see that the storages do not like the "discard" command, let's think about what it is and whether we really need it. The discard command helps us keep thin volumes really thin. Due to the fragmentation of any file system volume continues to grow, despite the large amount of empty space between the used blocks. The "discard" command informs the storage about them and that they can be returned to the pool. SSDs benefit even more from this information. They do not know how to rewrite one byte, they must rewrite the whole page. Information about empty blocks can greatly facilitate this process and extend the life of the drive itself. However, external central storage, even with SSD drives, optimally manage their drives and do not require external intervention for this. Since we see that they have a problem with the "discrad" command (I think it may be resolved later), we have to decide whether we are so limited in space to try to keep our thin volumes really thin. The usual answer is no, and now we will see how to avoid using the "discard" command to avoid load on our storages.

Avoid discard when mkfs

Use appropriate option when formatting file systems:

# mkfs.xfs -K ... /dev/device
# mkfs.ext4 -E nodiscard ... /dev/device

Disabling fstrim command

I ran into this problem using SuSE 12, but I think that the problem exist not only in SuSE. The fstrim service is launched based on systemd timers. To disable this:

# systemctl disable fstrim.timer
# sed -e 's/\(ExecStart=.*\)/#\1/' -i /usr/lib/systemd/system/fstrim.service

Avoid discard mount options

Look for "discard" word in "man mount" and use nodiscard mount option where appropriate. Now it relevant for BTRFS, EXT4, FAT.


Updated on Sun Apr 14 12:48:03 IDT 2019 More documentations here