It seems that this commit
Has changed the old (wrong) behavior of getdents on a 32bit OS not using Large File Support (LFS), to now correctly fail instead of "wrapping around" on >32bit inodes and offsets. This also affects readdir().
However, this correction can manifest in horrible ways, whereby software compiled without _LARGE_FILE_OFFSET 64 at al. that used to 'work' because it didn't actually use the d_ino or d_off fields of getdents now will get back a NULL pointer when calling readdir() with errno set to EOVERFLOW. This at least can cause the following:
'shared-mime-info' fails to generate its database without any error message, breaking any consumers of the database
'update-ca-certificates' calls c_rehash and fails silently, which breaks https for applications that use it.
sandbox uses readdir() and can print warnings like the following (not sure the effect but probably not sandboxed):
(sandbox) error: in /var/tmp/portage/sys-apps/sandbox-2.13/work/sandbox-2.13/libsbutil/src/file.c, function rc_ls_dir(), line 122:
(sandbox) strerror() = 'Value too large for defined data type'
(sandbox) Failed to readdir() '/etc/sandbox.d'!
I suspect that the above (and other undiscovered) programs will need to be patched with LFS support.
Several of us in #gentoo-arm have already seen this behavior inside 32bit chroots via qemu-static wrapper, but I imagine it will also affect anyone left using x86 32bit userland with filesystems that use >32bit for inodes such as ext4 and xfs.
See qemu bug filed here (although it may in fact not simply be a qemu bug):
Interesting. Given the structure of the problem that you describe, we would need to fix the programs using glibc... So let's make this here a tracker and file bugs for packages that are affected.
Practical question, how "usual" is it to still have 32bit OS *not* using LFS?
(Don't we kind of force this on by now?)
Where is it forced on? I'm certainly not well versed in this, and perhaps my chroot is just in a bad state and it's not as big an issue as would seem?
_LARGE_FILE_OFFSET is set on package-by-package basis. On autoconf buld systems AC_SYS_LARGEFILE macro enables passing proper flags.
https://bugs.gentoo.org/471102 has more details.
Hmm, I didn't see that other bug initially. Should we close this as a duplicate and add bugs to the other tracker? Or just proceed attaching them here?
There's a relevant discussion on LKML
It seems to be more to do with calling getdents on a 64bit kernel, where the guest expects 32bit behaviour. Definitely affects 32bit qemu, not sure yet if it affects a 32bit native (i686) chroot without qemu emulation.
And here's the glibc bug
It seems there's a conundrum of: qemu thinks glibc should fix it, glibc thinks kernel should fix it, kernel did nothing. Now glibc thinks distros should fix it by forcing LFS on.
I haven't had luck building sandbox with lfs turned on (see #681892).
Hello. I have created a bunch of patches with new "getdents64_x32" syscall here https://bugzilla.kernel.org/show_bug.cgi?id=205957. It is hard to apply them. I've provided an order of applyment.
Now I am able to build applications inside arm(via qemu user) without any problem (no LFS workarounds required).
Thanks for the patch Andrew! We will unlikely apply the in Gentoo as they change syscall surface quite a bit. But it's a nice illustration of a problem in kernel/user interface.
I've been in contact with the glibc developer who made the initial changes to break this. I've tested a preliminary patch he's working on the correct this behaviour (at least put it back to the semi-broken way it was pre glibc-2.28).
The initial patch makes it possible to work around the issue with qemu, but it does also seem to break something on ARM. He has also put the patch forward for review to the glibc maintainers. I'll update here as things progress.
As discussed here (which I just noticed the link was broken when I first reported)
Some people have had success working around the issue by just moving their chroots to a non-ext4 filesystem in the interim. The issue stems from the fact that ext4 always stores a 64bit hash in the off_t field of struct dirent, and when 64bit host qemu gets the 64bit value from the 64bit kernel, it cannot fit it into the 32bit value in the guest properly. Alternatively you could also compile qemu as a 32bit application. YMMV
Wow I dropped the ball on this. A patchset was put forward in October (2020).
I have tested it and seems to reverse the bad behaviour discussed in a 32bit qemu-chroot running atop a 64bit kernel, on an ext4 filesystem that uses 64bit
hash in the off_t.
Maybe if some more of us replied to the thread on the ML (it seems to have been reviewed, then fell off the horizon), we could get this accepted into mainline glibc. Currently I have been hand patching my chroot to use 2.33 + those patches, but its only a matter of time before the patchset falls behind.
The bug has been referenced in the following commit(s):
Author: Andreas K. Hüttel <email@example.com>
AuthorDate: 2022-01-12 22:00:13 +0000
Commit: Andreas K. Hüttel <firstname.lastname@example.org>
CommitDate: 2022-01-12 23:00:07 +0000
sys-libs/glibc: 2.34 patchset 11 and re-keyword
Package-Manager: Portage-3.0.30, Repoman-3.0.3
Signed-off-by: Andreas K. Hüttel <email@example.com>
sys-libs/glibc/Manifest | 2 +-
sys-libs/glibc/glibc-2.34-r6.ebuild | 5 ++---
2 files changed, 3 insertions(+), 4 deletions(-)