sys-libs/db-3.2.3h-r4 fails to compile, see attachment. I fetched a clean copy from rsync.nl.gentoo.org (i.e. removed /usr/portage/sys-libs/db beforce syncing) to be sure it's not a rsync problem again and also removed /var/cache/edb/dep/dep-db-3.2.3h-r4.ebuild prior to fetch clean archives (which was necessary to get openssh to build properly).
Since Bugzilla fails to attach a file to the bug (complains about not having specified a file), I've put it onto my homepage: http://sascha.silbe.org/db.script.bz2
Try unmasking 3.2.9 and build that. 3.2.3h has some issues.
I am unable to reproduce the problem, and this is a singular report. If it is reproducable with a more recent version of portage, please re-open the bug report and provide as much detail as possible. Thanks!
Created attachment 2002 [details] Typescript of a failed "emerge sys-libs/db" for db-3.2.9
It still happens with sys-libs/db-3.2.9. But I found out that it works with an ext2 /var filesystem, so it probably has something to do with the XFS / (root) filesystem. I've attached the typescript.
I also had the exactly same problem with sys-libs/db. I first noticed the problem when upgrading it to 3.2.9, but while unmerging/remerging it, I found the same problem arises with 3.2.3h-r4, too. My / filesystem is XFS. As a temporary fix, I modified $(cp) -pr to $(cp) -r in install_docs target in /var/.../build_unix/Makefile.Don't know what side-effect there would be by omitting "-p" option, but nothing noticeable so far. I needed to manually run ebuild install and ebuild qmerge in order to keep portage happy.
Created attachment 2030 [details] Typescript of a failed 'cp -pr'
I've verified that the real problem is in sys-apps/fileutils, not sys-libs/db (see attached typescript). You probably want to reassign the bug to the maintainer of sys-apps/fileutils.
Could you run that typescript again, and this time do 2 things different? 1. When you ls, use ls --color=off (the ansi codes make this unreadable) 2. Attach the typescript without bzipping it -- it makes it impossible to view without downloading it. Thanks!
This time as a screenshot: hybrid root # mkdir x y hybrid root # cp -pr x y cp: preserving permissions for `y/x': Invalid argument hybrid root # mount /dev/main_vg/gentoo-root on / type xfs (rw,noatime) proc on /proc type proc (rw) none on /dev type devfs (rw) tmpfs on /mnt/.init.d type tmpfs (rw,mode=0644,size=1024k) tmpfs on /dev/shm type tmpfs (rw) sphere:/home on /sphere/home type nfs (rw,addr=192.168.1.3) /dev/hda1 on /boot type ext2 (rw,noatime) hybrid root # uname -a Linux hybrid.sascha.silbe.org 2.4.19-gentoo-1 #2 SMP Thu Apr 4 00:10:50 CEST 2002 i586 AuthenticAMD hybrid root #
sashca, does this happen if you run the vanilla-sources kernel as well? if you could please try to do that with the vanilla-sources kernel that would be fantastic.
Hmm. Which kernel is this? gentoo-sources? It would be nice if you could test if it is reproducable with mjc-sources as well, as I think it might be related to the 54_xfs-2.4.18-split-xattr.bz2 patch in there... Not sure though, just a theory. Anyway, I'm unfortunately working twelve hours a day for three weeks now, so I've got ~1 hour a day for Gentoo, so... Anyone? mjc? =)
jmorgan pointed out on IRC, perhaps this is a SMP problem? Would you mind testing without SMP support, too? Thanks
I've just tried sys-kernel/xfs-sources-2.4.18. Same problem. sys-kernel/vanilla-sources cannot work (no XFS support). I'll attach the kernel config of 2.4.18-2-hybrid (xfs-sources-2.4.18).
Created attachment 2068 [details] Kernel config for 2.4.18-2-hybrid (sys-kernel/xfs-sources-2.4.18)
Happens for sys-kernel/mjc-sources-2.4.19_pre10, too.
*** Bug 4464 has been marked as a duplicate of this bug. ***
From Bug #4464: > This happens on my PII-266 with gcc 2.95.3, kernel 2.4.19-r7 and with XFS, but on my athlon 1.4, with the same packages, and with XFS it dosn't happen. I'm running Gentoo on an AMD K6-2. I don't believe it's a processor issue, though. Do you use exactly the same USE flags on both systems? If the answer is 'yes', please try different optimizations and post your results. Thanks!
Just verified that it happens on any XFS filesystem, not only on /.
Just a note: I experience the same issue with XFS filesystems and cp reporting Invalid argument. It -seems- to be an issue peculiar to the added support in fileutils for acls. For me the kernel is built from xfs-sources (synched July 10, xfs-sources-2.4.18.ebuild). The problem is exhibited on both a Via C3 machine (CHOST="i586-pc-linux-gnu" CFLAGS="-march=pentium -O2 -pipe) and on a Pentium4 machine (CHOST="i686-pc-linux-gnu" CFLAGS="-march=i686 -O2 -pipe"). The USE keyword "acl" is set. I noticed on both machines that ls -l will list every file or directory on the XFS volumes with a '+' sign indicating the presence of an ACL. It seems odd that it does this even on files with just the default (normal Unix) ACLs, but that might be by design. For me, it does not occur with cp -p for files, even when the files have extra acls. It always occurs with directories though: host% mkdir a host% cp -a a b cp: preserving permissions for `b': Invalid argument The directory 'b' however is still created, and ACLs are copied across correctly. If 'a' has default acls, then the copy produces no error messages: host% mkdir a host% setfacl -m d:dummy:rwx a host% cp -a a b host% getfacl b # file: b # owner: root # group: root user::rwx group::rwx other::r-x default:user::rwx default:user:ctdummy:rwx default:group::rwx default:mask::rwx default:other::r-x Running strace: (complete log!) strace cp -a a b execve("/bin/cp", ["cp", "-a", "a", "b"], [/* 30 vars */]) = 0 brk(0) = 0x8054804 open("/etc/ld.so.preload", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 close(3) = 0 open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=45824, ...}) = 0 old_mmap(NULL, 45824, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40016000 close(3) = 0 open("/lib/libacl.so.1", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\23"..., 1024) = 1024 fstat64(3, {st_mode=S_IFREG|0644, st_size=23930, ...}) = 0 old_mmap(NULL, 21924, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x40022000 mprotect(0x40027000, 1444, PROT_NONE) = 0 old_mmap(0x40027000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x4000) = 0x40027000 close(3) = 0 open("/lib/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\320\205"..., 1024) = 1024 fstat64(3, {st_mode=S_IFREG|0755, st_size=1335898, ...}) = 0 old_mmap(NULL, 1188992, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x40028000 mprotect(0x40141000, 38016, PROT_NONE) = 0 old_mmap(0x40141000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x118000) = 0x40141000 old_mmap(0x40147000, 13440, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x40147000 close(3) = 0 open("/lib/libattr.so.1", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\220\n\0"..., 1024) = 1024 fstat64(3, {st_mode=S_IFREG|0644, st_size=9244, ...}) = 0 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4014b000 old_mmap(NULL, 10116, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x4014c000 mprotect(0x4014e000, 1924, PROT_NONE) = 0 old_mmap(0x4014e000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x1000) = 0x4014e000 close(3) = 0 munmap(0x40016000, 45824) = 0 brk(0) = 0x8054804 brk(0x805482c) = 0x805482c brk(0x8055000) = 0x8055000 geteuid32() = 0 lstat64("b", 0xbffffa30) = -1 ENOENT (No such file or directory) lstat64("a", {st_mode=S_IFDIR|0775, st_size=6, ...}) = 0 mkdir("b", 040775) = 0 lstat64("b", {st_mode=S_IFDIR|0775, st_size=6, ...}) = 0 stat64("b", {st_mode=S_IFDIR|0775, st_size=6, ...}) = 0 open("/dev/null", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = -1 ENOTDIR (Not a directory) open("a", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3 fstat64(3, {st_mode=S_IFDIR|0775, st_size=6, ...}) = 0 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 brk(0x8057000) = 0x8057000 getdents64(0x3, 0x8054d10, 0x1000, 0) = 48 getdents64(0x3, 0x8054d10, 0x1000, 0) = 0 close(3) = 0 utime("b", [2002/07/14-16:26:06, 2002/07/14-16:25:37]) = 0 chown32(0xbffffceb, 0, 0) = 0 SYS_229(0xbffffce9, 0x400260c3, 0xbffff540, 0x84, 0x400274e4) = 44 SYS_226(0xbffffceb, 0x400260c3, 0x8054dc0, 0x2c, 0) = 0 SYS_229(0xbffffce9, 0x40026094, 0xbffff540, 0x84, 0x400274e4) = -1 ENODATA (No data available) SYS_226(0xbffffceb, 0x40026094, 0x8054d10, 0x4, 0) = -1 EINVAL (Invalid argument) write(2, "cp: ", 4cp: ) = 4 write(2, "preserving permissions for `b\'", 30preserving permissions for `b') = 30 write(2, ": Invalid argument", 18: Invalid argument) = 18 write(2, "\n", 1 ) = 1 _exit(1) = ? Following the code, in fileutils-4.1.8/lib/acl.c, line 157: acl = acl_get_file (src_path, ACL_TYPE_DEFAULT); returns an acl, but sets errno to ENODATA. Then when setting it, line 164: if (acl_set_file (dst_path, ACL_TYPE_DEFAULT, acl)) errno is set to EINVAL. The acl_get_file code has: retval = getxattr(path_p, name, ext_acl_p, size_guess); /* ... */ else if (retval == 0 || errno == ENOATTR) { if (type == ACL_TYPE_ACCESS) { struct stat st; if (stat(path_p, &st) == 0) return acl_from_mode(st.st_mode); else return NULL; } else return acl_init(0); } /* ...*/ In this situation, getxattr returns ENOATTR, and so acl_get-file returns acl_init(0) and ERRNO is still set to ENOATTR (== ENODATA). This is not in accordance with the 1003.1e spec (withdrawn though it may be) which says it should not return an error in this case, but instead return an empty acl list (it does return an empty acl list, but errno is set in this code.) Moving on, acl_set_file(...) returns an error. This is fair, because according to the spec, the acl passed to acl_set_file must be valid - which implies it must containt the user, group and other entries. Being an empty acl list, it is thus invalid. Following the chain into the kernel (erk!) it does indeed check to see if the acl has zero entries before writing it, and returns EINVAL if so (linux/fs/xfs/xfs_acl.c:87, in acl_ext_attr_to_xfs()) (As far as I can tell though, it doesn't check for general validity of the acl in the sense of acl_valid().) The following patch to lib/acl.c in fileutils fixes this error, as well as fixing another bug due to the same code, where if a directory 'a' had a default acl, and had a subdirectory 'c' which had no default acl, the command cp -a c c2 within the directory 'a' would create a subdirectory 'c2' with inherited default acls. START-OF-PATCH *** lib/acl.c.orig Mon Jul 15 02:24:56 2002 --- lib/acl.c Mon Jul 15 02:25:12 2002 *************** *** 155,158 **** --- 155,160 ---- if (S_ISDIR (mode)) { + acl_entry_t dummy; + acl = acl_get_file (src_path, ACL_TYPE_DEFAULT); if (acl == NULL) *************** *** 162,174 **** } ! if (acl_set_file (dst_path, ACL_TYPE_DEFAULT, acl)) ! { ! error (0, errno, _("preserving permissions for %s"), ! quote (dst_path)); acl_free(acl); return -1; } ! else ! acl_free(acl); } return 0; --- 164,196 ---- } ! switch (acl_get_entry (acl,ACL_FIRST_ENTRY,&dummy)) ! { ! case -1: ! error (0, errno, "%s", quote (src_path)); acl_free(acl); return -1; + + case 0: + /* empty acl */ + if (acl_delete_def_file (dst_path)) + { + error (0, errno, _("preserving permissions for %s"), + quote (dst_path)); + acl_free(acl); + return -1; + } + break; + + default: + if (acl_set_file (dst_path, ACL_TYPE_DEFAULT, acl)) + { + error (0, errno, _("preserving permissions for %s"), + quote (dst_path)); + acl_free(acl); + return -1; + } } ! ! acl_free(acl); } return 0; END-OF-PATCH On a related note, there is an error in the acl-20020330 library in acl_extended_file.c; in this version of xfs at least, xfs_acl_vget() in linux/fs/xfs/xfs_acl.c returns a larger size than required to hold the acl (it does this on purpose) and so the size-based checking in acl_extended_file() does not work. This is the reason why ls -l marks everything with a '+' on XFS file systems with the acl USE option. Perhaps getxattr() should be returning ENOATTR instead if there's no acl, but that's not what it does at the moment. It *does* however return the actual size when given a non-null pointer and size. In fact, it should return E2BIG if the offered size is insufficient. If a non-zero size is supplied, getxattr() also returns ENOATTR as it possibly should in the absence of an acl. START-OF-PATCH *** cmd/acl/libacl/acl_extended_file.c.orig Fri Mar 1 10:08:36 2002 --- cmd/acl/libacl/acl_extended_file.c Mon Jul 15 03:11:26 2002 *************** *** 31,47 **** acl_extended_file(const char *path_p) { ! int base_size = sizeof(acl_ea_header) + 3 * sizeof(acl_ea_entry); int retval; ! retval = getxattr(path_p, ACL_EA_ACCESS, NULL, 0); ! if (retval < 0 && errno != ENOATTR) ! return -1; ! if (retval > base_size) ! return 1; ! retval = getxattr(path_p, ACL_EA_DEFAULT, NULL, 0); ! if (retval < 0 && errno != ENOATTR) ! return -1; ! if (retval >= base_size) ! return 1; return 0; } --- 31,50 ---- acl_extended_file(const char *path_p) { ! char buf[sizeof(acl_ea_header) + 3 * sizeof(acl_ea_entry)]; int retval; ! retval = getxattr(path_p, ACL_EA_ACCESS, buf, sizeof buf); ! if (retval < 0) ! if (errno == E2BIG) ! return 1; ! else if (errno != ENOATTR) ! return -1; ! ! retval = getxattr(path_p, ACL_EA_DEFAULT, buf, sizeof buf); ! if (retval < 0) ! if (errno == E2BIG) ! return 1; ! else if (errno != ENOATTR) ! return -1; return 0; } END-OF-PATCH If such a patch is applied, the equivalent one with fgetxattr() should probably be applied to acl_extend_fd.c
Just a note: current XFS code has getxattr() return E2BIG when size is too small (but non-zero.) The manpage says it should return ERANGE. These clearly are inconsistent. Also, the manpage is unclear as to what behaviour should happen when a zero size is passed and there is no such attribute on the file: should it return -1 and set ENOATTR, or should it return a size sufficient to hold the attribute should it have existed? The latter is what XFS currently does for the attribute holding acls, but it's not clear if this behaviour is correct. Haven't checked to see if this is consistent with the absence of other sorts of attribute on the file.
this is a bug in fileutils.
I've been running my system with the patches above now for a couple of days. Both cp and ls seem to be doing the right thing now. Would it be possible to examine those patches and if fine,update the acl and fileutils packages?
Oh, actually the patch I'm using for libacl/acl_extended_file.c also checks for ERANGE (as per the manpage) as well as E2BIG; I also applied the equivalent patch to acl_extended_fd.c
*** Bug 5525 has been marked as a duplicate of this bug. ***
*** Bug 5587 has been marked as a duplicate of this bug. ***
Does an ebuild exist that would allow me to apply the patches you refer to so I may test your resolution on my system?
I agree, this *needs* to be integrated ASAP. Or at least provide a testing version of fileutils. I think this is a *definite* showstopper for those of us using XFS/ACL.
Created attachment 2725 [details, diff] Patch to fs/xfs/xfs_acl.c from SGI cvs to fix the cp -pr issues After much searching I found info in the SGI CVS log about this problem; this was the next cvs revision of xfs_acl.c that fixed it. It has been running on my laptop (used fairly constantly) since Jul 2 without any problems, as well as a friend's "I want to play with Gentoo and I like XFS" workstation. As I recall the root cause is a possible bug in libacl. More info at http://oss.sgi.com/cgi-bin/cvsweb.cgi/linux-2.4-xfs/linux/fs/xfs/xfs_acl.c (this patch is CVS revision 1.23)
I can confirm that Disconnect's patch seems to have resolved the issue. It did require a little bit of hand patching for the mjc sources, but otherwise it was fairly routine.
I report the same error with ghostscript ebuild (cp -a)... on fresh 1.4 install with XFS and Posix ACL switched on in kernel... Without ACL in kernel all is ok...
*** Bug 7423 has been marked as a duplicate of this bug. ***
so, this is a fileutils bug, needs a version update, and or fixed in recent xfs versions, am I on the right page? If this is still a problem with xfs-sources-2.4.19-r1, please re-open the bug.
cube root # emerge =sys-kernel/xfs-sources-2.4.19-r1 Calculating dependencies emerge: all ebuilds that could satisfy "=sys-kernel/xfs-sources-2.4.19-r1" have been masked. cube root # grep xfs-sources /usr/portage/profiles/package.mask # New xfs-sources, please test. >=sys-kernel/xfs-sources-2.4.19 I cannot find where it is masked, so I cannot test it. :(
Sascha: that _is_ the mask. >= means "higher than or equal to", e.g everything newer than or matching xfs-sources-2.4.19 will be masked.
Oops, of course. I was interpreting it like /etc/make.profile/packages, which is exactly the opposite. :) Just tried xfs-sources-2.4.19-r1 and had serious trouble. Mount choked at boot time with a kernel error and hung on the manual invocation. When trying to sync filesystems with <SysRq>+<s>, even the kernel hung. The kernel config is basically the same as for xfs-sources-2.4.18. I used "make oldconfig" and entered some save values for the new options. I'm using XFS over LVM for all filesystems except /boot (ext2 on a primary partition) and /usr/vice/cache (ext2 over LVM).
I'm still getting a kernel OOPS as soon as mount tries to mount the first XFS partition: XFS mounting filesystem lvm(58,0) Unable to handle kernel NULL pointer dereference at virtual address 00000008 printing eip: c022c4bf *pde = 00000000 Oops: 0000 CPU: 0 EIP: 0010:[<c022c4bf>] Not tainted EFLAGS: 00010256 eax: 00000000 ebx: d6b96240 ecx: d6b96040 edx: 00000000 esi: d6af4017 edi: 00002a05 ebp: 00000000 esp: d6bf7d54 ds: 0018 es: 0018 ss: 0018 Process mount (pid: 2831, stackpage=d6bf7000) Stack: 00000000 00000000 00000000 00000246 00000000 00000000 d6b79c20 00000000 00000000 d6b96240 d6af4017 00002a05 d6b79c00 c022c702 00000000 00000000 00000000 00000200 00002a05 d6b96240 00000000 00000000 d6b79c20 d6af4017 Call Trace: [<c022c702>] [<c021876d>] [<c0221cde>] [<c02220c9>] [<c0236fc3>] [<c0133e38>] [<c01410ba>] [<c0141a83>] [<c015309e>] [<c014102c>] [<c0141dfd>] [<c0154273>] [<c0154590>] [<c01543d9>] [<c01549c1>] [<c0108ec7>] Code: 8b 45 08 0f b7 40 10 89 04 24 e8 b2 f7 ff ff 89 44 24 18 8d I've completely rebuilt the kernel after "make mrproper" to be sure there was nothing messed up during the compile.
wowzer, lemme get mjc back in on this one, apparently the issues with xfs-sources-r1 are LVM related and I dont use it and know little about it.
I'm now running a vanilla 2.4.19 kernel patched with ftp://oss.sgi.com/projects/xfs/download/patches/2.4.19/xfs-2.4.19-all-i386.bz2 (+freeswan+loop-aes, but that does not matter). This one works fine, no kernel panics after all. It seems like one of the additional patches from http://gentoo.lostlogicx.com/patches-2.4.19-xfs-r1.tar.bz2 is causing the problems. What's in there?
xfs-sources-2.4.19-r2 hit Rsync recently, can you test that (assuming you don't use grsecurity as grsecurity is currently broken in that patch)
and sorry to request more testing... but it is a whole new version of xfs an stuff... I'm currently working on resolving the grsecurity issue...
I've now tested xfs-sources-2.4.19-r2. It seems to work fine (i.e. I could boot it properly and use w3m to read some pages on the console. Because my current kernel includes FreeSWAN 1.98b + FreeSWAN-alg-0.8.0 and loop-AES, I'll keep it instead of changing back to xfs-sources, so I cannot say anything about its long-time reliability. When 2.4.20 is out, I'll come back to you. :) Thanks!