Bug 700138

Summary:	If PORTAGE_TMPDIR is on btrfs, use subvolume instead of directory for performance
Product:	Portage Development	Reporter:	adebeus
Component:	Enhancement/Feature Requests	Assignee:	Portage team <dev-portage>
Status:	CONFIRMED ---
Severity:	normal	CC:	gentoo, lssndrbarbieri
Priority:	Normal
Version:	unspecified
Hardware:	All
OS:	Linux
Whiteboard:
Package list:		Runtime testing required:	---
Bug Depends on:
Bug Blocks:	835380

Description adebeus 2019-11-15 03:03:14 UTC

If PORTAGE_TMPDIR is on btrfs, portage should make "${PORTAGE_TMPDIR}/portage/${CATEGORY}/${P}" a subvolume instead of a directory to improve performance.

Currently, "rm -rf" is used for cleanup, which can be slow when operating on large source trees (i.e., just the ones where you need PORTAGE_TMPDIR to be on disk instead of tmpfs in the first place), especially on btrfs. On any filesystem, "rsync -a --delete /tmp/empty/ /dir/to/delete/" is faster than "rm -rf" since it deletes the files in the optimal order, so that would also be a potential performance enhancement, but on btrfs, deleting a subvolume is by far the fastest way to delete a large number of files.

Apart from cleanup, I think using a subvolume might also help the btrfs block allocator make better decisions, especially if other I/O is going on at the same time as the build.

There might be permission issues with this unless the filesystem is mounted with user_subvol_rm_allowed, but in this case it should be possible to fall back to using directories, just as if another filesystem were used instead of btrfs.

Comment 1 Enne Eziarc 2019-11-15 10:14:17 UTC

I'd be interested to see the standard set of kernel benchmark tables for this, including the medium-term effect it has on the whole system.

In my own real-world experience with a point-in-time rollback system that cycled through only about 50 subvols per hour, we ended up having to have a workaround using carefully spaced deletes and sync calls; the latency spikes were unacceptable otherwise.

So, I doubt this pattern actually makes things faster. It might look that way when you're not paying attention to anything but emerge output scrolling past, but the fact is it just defers the space deallocation to Btrfs' internal queue where the block dealloc IO will happen all at once during the next flush, but at VFS commit priority instead of under portage's ionice class and cgroup.

Comment 2 Zac Medico gentoo-dev

2019-11-15 20:07:53 UTC

We can have plugins to support various implementations. A plugin based on the containers storage library would use the storage driver you have configured in /etc/containers/storage.conf. Available drivers include btrfs, devmapper, and zfs

https://github.com/containers/storage/tree/master/drivers

Comment 3 Sam James archtester

2023-05-23 09:45:00 UTC

Using rsync more is not necessarily a bad idea if nothing else though, fwiw.