An optimized implementation of dblink._merge_contents could improve performance significantly when merging a large number of binary packages. There could be a completely native implementation using a C extension, and possibly a hybrid implementation that calls copy_file_range via ctypes.
This should support sparse files, using the lseek(2) SEEK_DATA and SEEK_HOLE operations as described in the copy_file_range man page.
Actually copy_file_range returns EXDEV in cases where os.rename would fail, so it looks like we'll be using sendfile instead.
1. copy_file_range can take advantage of reflinking on btrfs, avoiding physically copying the data. sendfile can't, so it's worse. 2. sendfile requires outfd to be a socket on older kernels.
If copy_file_range returns EXDEV then it will fallback to sendfile.
There's also this mmap/memcpy trick that could be useful if sendfile doesn't work: http://stackoverflow.com/questions/26582920/mmap-memcpy-to-copy-file-from-a-to-b
Patch posted for review: https://archives.gentoo.org/gentoo-portage-dev/message/ef75dcd303d3c044f56b8df0259817da https://github.com/gentoo/portage/pull/131
This is in the master branch: https://gitweb.gentoo.org/proj/portage.git/commit/?id=8ab5c8835931fd9ec098dbf4c5f416eb32e4a3a4
Fixed in portage-2.3.5. Also see bug 617778 and bug 618086.
(In reply to Zac Medico from comment #8) > Fixed in portage-2.3.5. > > Also see bug 617778 and bug 618086. This cause a bug when I do emerge in a docker container. https://bugs.gentoo.org/show_bug.cgi?id=627374
Hi Zac, there is a case when syscall_326 (timefd_gettime) is not implemented in the runtime and portage gets EINVAL error during qmerge.
The relavant strace is [pid 8651] lstat("/disk01/xmass/gentoo/usr/bin/ebump", {st_mode=S_IFREG|0755, st_size=8516, ...}) = 0 [pid 8651] open("/dev/shm/portage/app-portage/gentoolkit-0.4.0/image/disk01/xmass/gentoo/usr/bin/ebump", O_RDONLY) = 12 [pid 8651] fstat(12, {st_mode=S_IFREG|0755, st_size=8518, ...}) = 0 [pid 8651] open("/disk01/xmass/gentoo/usr/bin/ebump#new", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 13 [pid 8651] fstat(13, {st_mode=S_IFREG|0666, st_size=0, ...}) = 0 [pid 8651] lseek(12, 0, SEEK_DATA) = 0 [pid 8651] lseek(12, 0, SEEK_HOLE) = 8518 [pid 8651] lseek(12, 0, SEEK_SET) = 0 [pid 8651] syscall_326(0xc, 0, 0xd, 0x7ffd7c2a6ec8, 0x2146, 0) = -1 (errno 38) [pid 8651] sendfile(13, 12, [0], 8518) = 8518 [pid 8651] lseek(12, 8518, SEEK_DATA) = -1 EINVAL (Invalid argument) [pid 8651] close(13) = 0 [pid 8651] close(12) = 0
(In reply to Benda Xu from comment #11) > The relavant strace is > > [pid 8651] lstat("/disk01/xmass/gentoo/usr/bin/ebump", > {st_mode=S_IFREG|0755, st_size=8516, ...}) = 0 > [pid 8651] > open("/dev/shm/portage/app-portage/gentoolkit-0.4.0/image/disk01/xmass/ > gentoo/usr/bin/ebump", O_RDONLY) = 12 [pid 8651] > fstat(12, {st_mode=S_IFREG|0755, st_size=8518, ...}) = 0 > [pid 8651] open("/disk01/xmass/gentoo/usr/bin/ebump#new", > O_WRONLY|O_CREAT|O_TRUNC, 0666) = 13 > [pid 8651] fstat(13, {st_mode=S_IFREG|0666, st_size=0, ...}) = 0 > [pid 8651] lseek(12, 0, SEEK_DATA) = 0 > [pid 8651] lseek(12, 0, SEEK_HOLE) = 8518 > [pid 8651] lseek(12, 0, SEEK_SET) = 0 > [pid 8651] syscall_326(0xc, 0, 0xd, 0x7ffd7c2a6ec8, 0x2146, 0) = -1 (errno > 38) [pid 8651]v The above errno 38 corresponds to ENOSYS, so it falls back to sendfile, as designed. > sendfile(13, 12, [0], 8518) = 8518 The sendfile call succeeds. > [pid 8651] lseek(12, 8518, SEEK_DATA) = -1 EINVAL (Invalid argument) Normally the lseek error is ENXIO here, as shown in the lseek man page: ENXIO whence is SEEK_DATA or SEEK_HOLE, and the file offset is beyond the end of the file. With which kernel versions and filesystem did you observe this EINVAL error from lseek?
(In reply to Zac Medico from comment #12) > (In reply to Benda Xu from comment #11) > > The relavant strace is > > > > [pid 8651] lstat("/disk01/xmass/gentoo/usr/bin/ebump", > > {st_mode=S_IFREG|0755, st_size=8516, ...}) = 0 > > [pid 8651] > > open("/dev/shm/portage/app-portage/gentoolkit-0.4.0/image/disk01/xmass/ > > gentoo/usr/bin/ebump", O_RDONLY) = 12 [pid 8651] > > fstat(12, {st_mode=S_IFREG|0755, st_size=8518, ...}) = 0 > > [pid 8651] open("/disk01/xmass/gentoo/usr/bin/ebump#new", > > O_WRONLY|O_CREAT|O_TRUNC, 0666) = 13 > > [pid 8651] fstat(13, {st_mode=S_IFREG|0666, st_size=0, ...}) = 0 > > [pid 8651] lseek(12, 0, SEEK_DATA) = 0 > > [pid 8651] lseek(12, 0, SEEK_HOLE) = 8518 > > [pid 8651] lseek(12, 0, SEEK_SET) = 0 > > [pid 8651] syscall_326(0xc, 0, 0xd, 0x7ffd7c2a6ec8, 0x2146, 0) = -1 (errno > > 38) [pid 8651]v > > The above errno 38 corresponds to ENOSYS, so it falls back to sendfile, as > designed. > > > sendfile(13, 12, [0], 8518) = 8518 > > The sendfile call succeeds. > > > [pid 8651] lseek(12, 8518, SEEK_DATA) = -1 EINVAL (Invalid argument) > > Normally the lseek error is ENXIO here, as shown in the lseek man page: > > ENXIO whence is SEEK_DATA or SEEK_HOLE, and the file offset is beyond the > end of the file. > > With which kernel versions and filesystem did you observe this EINVAL error > from lseek? Ah-ha, thanks. Kernel is 3.10.0-327.22.2.el7.x86_64 from RHEL7 and the file system is Fujitsu FeFS (which is an enterprise derivation from Lustre. I am also using PRoot (which is intercepting syscalls by ptrace). I suspect PRoot is the cause of EINVAL here.
(In reply to Benda Xu from comment #13) Yeah that's pretty exotic. Let's close this bug and file a new one.