After installing the new kernel source I wanted to rebuild the afs module. After compiling for a while I got an error message. Reproducible: Always
Created attachment 124718 [details] Build.log
Created attachment 124719 [details] emerge --info
Please, set your locales to C when reporting bugs.
Confirmed. I think the only fix is currently in openafs' CVS. If no new release comes out in the next few days, I'll look into backporting it.
I was planning to put out a snapshot release, but the latest snapshot (20070717) gets stuck at what I guess is an internal configuration parameter issue. I hope to get this sorted out soon...
Created attachment 125866 [details, diff] openafs[-kernel]-1.5.19.ebuild -> openafs[-kernel]-1.5.21-20070721.ebuild
(In reply to comment #6) > Created an attachment (id=125866) [edit] > openafs[-kernel]-1.5.19.ebuild -> openafs[-kernel]-1.5.21-20070721.ebuild > Thanks. However, when I wrote comment #4 I was under the impression that snapshots were of the stable 1.4 release, which they aren't. I don't think switching to a development code base is a good solution to solve compatibility issues with a newer kernel, so I'd very much like to stick with 1.4.x. After some digging, I've put a openafs-kernel-1.4.4-r1 ebuild in the tree. I would very much appreciate feedback on it.
(In reply to comment #7) > Thanks. However, when I wrote comment #4 I was under the impression that > snapshots were of the stable 1.4 release, which they aren't. I don't think Snapshots are not the 1.5.* release too, there are only single CVS stream. internally VERSION=devel. But Gentoo required package version number and I take last pre-snapshot version number. Correctly there are "openafs[-kernel]-devel-20070721" packages. First I trying to patch 1.4.4 (using comparsion with snapshot), but found more then two problems. On third I stop to fixing it and start use snapshot. 1: "current->thread_info" -> "task_thread_info(current)", 2: === +#if defined(SLAB_CTOR_VERIFY) if ((flags & (SLAB_CTOR_VERIFY|SLAB_CTOR_CONSTRUCTOR)) == SLAB_CTOR_CONSTRUCTOR) inode_init_once(AFSTOV(vcp)); +#endif === PS Direct CVS snapshot contains also up-to-date DOCs, but I use last 1.5.21 tarball.
> After some digging, I've put a openafs-kernel-1.4.4-r1 ebuild in the tree. I > would very much appreciate feedback on it. First of all it was quite difficult to get the openafs-gentoo-0.1.4.tar.bz2. Finally I found it on http://gentoo.osuosl.org/distfiles/ Compiling was successful, starting too. I can login with klog. But when I try to access the files on my account at the university I get the following errors: sm afs> ls ls: cannot access PRIVAT: No such file or directory ls: cannot access BACKUP: No such file or directory ls: cannot access nsmail: No such file or directory ls: cannot access GNUstep: No such file or directory BACKUP GNUstep PRIVAT Programme liprefs.js nsmail wmaker Desktop.kde33 Noten PUBLIC bin ns_imap public_html No problems and no error messages with kernel-2.6.21 and openafs-1.4.4
(In reply to comment #9) > First of all it was quite difficult to get the openafs-gentoo-0.1.4.tar.bz2. > Finally I found it on http://gentoo.osuosl.org/distfiles/ You were probably too quick for your local mirror :) > Compiling was successful, starting too. I can login with klog. But when I try > to access the files on my account at the university I get the following errors: > > sm afs> ls > ls: cannot access PRIVAT: No such file or directory > ls: cannot access BACKUP: No such file or directory > ls: cannot access nsmail: No such file or directory > ls: cannot access GNUstep: No such file or directory > BACKUP GNUstep PRIVAT Programme liprefs.js nsmail wmaker > Desktop.kde33 Noten PUBLIC bin ns_imap public_html I cannot reproduce that. Running on the same architecture, same kernel. Do you have this on more than one afs share? Any other hints that could help me reproduce this?
Same error with CVS-Snapshot of 2007-07-24 (modified Ebuild 1.5.19 / 1.5.21)
> I cannot reproduce that. Running on the same architecture, same kernel. Do > you have this on more than one afs share? Any other hints that could help me > reproduce this? Maybe I don't have enough knowledges of afs. Scenario: I'm student at a university. The university provides some space at the university's compute centre. I can get access to this space via afs. They use more than one volumes to provide the afs drive. At home I have a desktop computer (Athlon XP2600+, x86) with gentoo-sources-2.6.21-r3 and openafs-1.4.4. Connected to the desktop computer I have a notebook (PentiumIII, x86) with gentoo-sources-2.6.22-r1 and openafs-kernel-1.4.4-r1 / openafs-kernel cvs of today (2007-07-24). On my desktop-box it works without problems, on the notebook openafs produces the described error messages. Except of the hardware differences the USE-Flags are almost the same on both computers. The afs-configuration is identical on both machines. And 2.6.21-r3 with openafs-1.4.4. worked also on the notebook.
I've added openafs-1.4.4_p20070724 to the tree. I hope this proves a better attempt :) Sven: it troubles me a bit that both 1.4.x and 1.5.x trigger your problems. I don't know what to make of that (either a common error in both release branches, or an error on your machine?). In any case, look if "dmesg" tells you anything useful. As before, all feedback on this new ebuild is much appreciated.
First: Thanx a lot for your support. It's much work and time you put into this maintenance. I tried the last posted ebuild - same error. dmesg on my notebook (openafs-1.4.4_p20070724) says: Found system call table at 0xc0408540 (pattern scan) Starting AFS cache scan...found 8 non-empty cache files (0%). afs: Lost contact with file server 134.109.221.12 in cell tu-chemnitz.de (all multi-homed ip addresses down for the server) afs: Lost contact with file server 134.109.221.12 in cell tu-chemnitz.de (all multi-homed ip addresses down for the server) On my other computer it's still working (openafs-1.4.4). But just for fun I looked into dmesg and found: afs: Lost contact with file server 134.109.132.75 in cell tu-chemnitz.de (all multi-homed ip addresses down for the server) afs: file server 134.109.132.75 in cell tu-chemnitz.de is back up (multi-homed address; other same-host interfaces may still be down) afs: file server 134.109.132.75 in cell tu-chemnitz.de is back up (multi-homed address; other same-host interfaces may still be down) afs: Lost contact with file server 134.109.132.80 in cell tu-chemnitz.de (all multi-homed ip addresses down for the server) afs: Lost contact with file server 134.109.132.80 in cell tu-chemnitz.de (all multi-homed ip addresses down for the server) afs: file server 134.109.132.80 in cell tu-chemnitz.de is back up (multi-homed address; other same-host interfaces may still be down) afs: file server 134.109.132.80 in cell tu-chemnitz.de is back up (multi-homed address; other same-host interfaces may still be down) afs: Lost contact with file server 134.109.132.79 in cell tu-chemnitz.de (all multi-homed ip addresses down for the server) afs: Lost contact with file server 134.109.132.79 in cell tu-chemnitz.de (all multi-homed ip addresses down for the server) afs: file server 134.109.132.79 in cell tu-chemnitz.de is back up (multi-homed address; other same-host interfaces may still be down) afs: file server 134.109.132.79 in cell tu-chemnitz.de is back up (multi-homed address; other same-host interfaces may still be down) It looks like a problem with the afs-server to me. But it's still strange, that it works on one computer and on the other it doesn't. And it's really odd that the behaviour of afs would have changed from 1.4.4 to 1.4.4-r1 so extremly. The only thing I changed was the kernel. But also in this config I used 'make oldconfig' to copy the kernel-config of 2.6.21-r3 to 2.6.22-r1 as much as possible. Thus I guess the problem is rather the new kernel than the openafs-versions. What did I do to limit the error range: - I checked the afs-partition (fsck.ext2) - I deleted all the cache files on the afs-partition Config: /etc/openafs/cacheinfo: /afs:/var/cache/openafs:185000 /etc/openafs/CellServDB >tu-chemnitz.de #Technische Universitaet Chemnitz, Germany 134.109.2.1 #zuse.hrz.tu-chemnitz.de 134.109.2.15 #phoenix.hrz.tu-chemnitz.de 134.109.200.7 #aetius.hrz.tu-chemnitz.de /etc/openafs/ThisCell tu-chemnitz.de The problem isn't urgent. It's just not upgrading the kernel on my desktop computer.
(In reply to comment #14) > First: Thanx a lot for your support. It's much work and time you put into this > maintenance. It makes one wonder why they don't release official new sources to cope with newer kernel versions, instead of letting every distribution make its own hacks. > I tried the last posted ebuild - same error. Have you tried the new ebuild on your stable machine yet (keep the kernel)? If it introduces the same error on yet another machine, even when it has an old kernel, then that would mean the error is in the newer ebuild. btw: at first sight, I can access tu-chemnitz from both my x86 and amd64 with this newer openafs and newer kernel
> Have you tried the new ebuild on your stable machine yet (keep the kernel)? Test-Scenarios: Notebook P3: Kernel-2.6.22-r1 Openafs-1.4.4 / Openafs-1.5.19 and Openafs-Kernel 1.4.4-r1, 1.4.4_p20070724, 1.5.21.ebuild -> Error Notebook P3: Boot older kernel-2.6.21-r3 Openafs-1.4.4 and Openafs-Kernel 1.4.4 -> works perfectly Notebook P3: Reboot kernel-2.6.22-r1 again -> Error AthlonXP: Kernel-2.6.21-r3 Openafs-1.4.4 and Openafs-Kernel 1.4.4 -> works perfectly AthlonXP: Kernel-2.6.21-r3 Openafs-1.4.4 Upgrade to Openafs-Kernel-1.4.4_p20070724 -> Error AthlonXP: Kernel-2.6.21-r3 Openafs-1.4.4 Downgrad from openafs-kernel-1.4.4_p20070724 to 1.4.4 -> Error AthlonXP: reboot Kernel-2.6.21-r3 Openafs-1.4.4 -> Error Upgrade / Downgrad / clean afscache doesn't help anymore. Now I got on my stable system always the error messages. > btw: at first sight, I can access tu-chemnitz from both my x86 and amd64 with > this newer openafs and newer kernel /afs/tu-chemnitz still works. But an ls in tu-chemnitz shows: ls: cannot access archiv: No such file or directory ls: cannot access global: No such file or directory ls: cannot access mount: No such file or directory ls: cannot access ToSCA: No such file or directory ls: cannot access wsadmin: No such file or directory IWP ToSCA admin archiv common dept docu ftp global gnu home mount openafs product project service stura tucz ubc urz wsadmin www zuv And into the directories not shown as error I can still change and look (e.g. www, home, product,...)
(In reply to comment #16) > > Have you tried the new ebuild on your stable machine yet (keep the kernel)? snip > AthlonXP: Kernel-2.6.21-r3 Openafs-1.4.4 > Upgrade to Openafs-Kernel-1.4.4_p20070724 > -> Error > > AthlonXP: Kernel-2.6.21-r3 Openafs-1.4.4 > Downgrad from openafs-kernel-1.4.4_p20070724 to 1.4.4 > -> Error > > AthlonXP: reboot Kernel-2.6.21-r3 Openafs-1.4.4 > -> Error > > Upgrade / Downgrad / clean afscache doesn't help anymore. Now I got on my > stable system always the error messages. This is troublesome... *) How do you clean your cache? *) When you downgraded to 1.4.4, are you sure you - built openafs-kernel-1.4.4 (plain) as well? - built it against the correct kernel headers (it says when it begins building which ones it's gonna use, it's determined by the /usr/src/linux softlink) > > > > btw: at first sight, I can access tu-chemnitz from both my x86 and amd64 with > > this newer openafs and newer kernel > /afs/tu-chemnitz still works. But an ls in tu-chemnitz shows: > ls: cannot access archiv: No such file or directory > ls: cannot access global: No such file or directory > ls: cannot access mount: No such file or directory > ls: cannot access ToSCA: No such file or directory > ls: cannot access wsadmin: No such file or directory > IWP ToSCA admin archiv common dept docu ftp global gnu home mount > openafs product project service stura tucz ubc urz wsadmin www zuv > > And into the directories not shown as error I can still change and look (e.g. > www, home, product,...) > I get: stefaan@bubbles /afs/tu-chemnitz.de $ ls ls: cannot access zuv: Connection timed out IWP archiv docu gnu openafs service ubc www ToSCA common ftp home product stura urz zuv admin dept global mount project tucz wsadmin stefaan@bubbles /afs/tu-chemnitz.de $ cd archiv/ stefaan@bubbles /afs/tu-chemnitz.de/archiv $ ls CVS develop-work logs production-work security config images metadaten pub tmp develop-linux index.html perl58 publication tools develop-stable log production-stable publicationt www stefaan@bubbles /afs/tu-chemnitz.de/archiv $ cd ../global stefaan@bubbles /afs/tu-chemnitz.de/global $ ls README Win2K capp i386_linux24 sgi_65 wfw README.software Win95 dos i386_linux26 sun4x_57 win TOOLS WinNT hp_ux11i ia64_linux24 sun5 WfWg amd64_linux26 i386_linux22 nt text stefaan@bubbles /afs/tu-chemnitz.de/global $ cd ../mount/ stefaan@bubbles /afs/tu-chemnitz.de/mount $ ls ls: cannot access cd.proew_lin: No such device ls: cannot access test1.thm: No such device ls: cannot access test.thm: Connection timed out ls: cannot access tosca.logs.SL307X86.readonly: No such device ls: cannot access test.thm.ronsc: Connection timed out cd.CT2002 cd.ansys10.0sp1_lib cd.sw-vol4 cd.CT2005 cd.ansys10_win64emt cd.sw-vol5 cd.Corel9_2 cd.catia_pmey cd.sw-vol6 cd.Corel9_3 cd.catia_pmey1 cd.sw-vol7 ... I'm guessing the errors I get could be normal? (because of firewalling maybe?)
Forgot to mention these: afs: Lost contact with file server 134.109.140.109 in cell tu-chemnitz.de (all multi-homed ip addresses down for the server) afs: Lost contact with file server 134.109.140.109 in cell tu-chemnitz.de (all multi-homed ip addresses down for the server) afs: Lost contact with file server 134.109.221.12 in cell tu-chemnitz.de (all multi-homed ip addresses down for the server) afs: Lost contact with file server 134.109.221.12 in cell tu-chemnitz.de (all multi-homed ip addresses down for the server) afs: Lost contact with file server 134.109.221.11 in cell tu-chemnitz.de (all multi-homed ip addresses down for the server) afs: Lost contact with file server 134.109.221.11 in cell tu-chemnitz.de (all multi-homed ip addresses down for the server) I cannot ping these. For the other servers that don't appear in these messages, I at least get "Destination Port Unreachable" when trying to ping them.
*) How do you clean your cache? cd /var/cache/openafs rm -rf * *) When you downgraded to 1.4.4, are you sure you - built openafs-kernel-1.4.4 (plain) as well? - built it against the correct kernel headers (it says when it begins building which ones it's gonna use, it's determined by the /usr/src/linux softlink) emerge -1 =openafs-kernel-1.4.4 emerge -1 =openafs-1.4.4 Installed Linux-Headers: 2.6.17-r2 I remember there was an installation problem with one package (but I don't remember which one :)), so I masked all versions above. It's a idea to upgrade also the Linux-Headers. Installed Kernel (AthlonXP): 2.6.21-r3 There's no other kernel on this system. The symlink points to this kernel. >I'm guessing the errors I get could be normal? No, this isn't normal. Normally you can access this afs cell from all over the world (I did it in South Korea). >I cannot ping these. Same problem here from inside the campus net. I checked again the CellServDB, but it seems ok. The computers of the compute centre use the same. I wrote now the compute centre. If there's a problem with afs, they will tell me.
Ok, I asked at the compute centre. All the volumes in the afs cell have a copy on a different file server. It means, if one server is offline then there's always another server where you can get your data at least with readonly access. So it's not a problem if there's no ping to some servers.
FYI: I'll be starting the 2.6.22 stable push in the next few days.
I didn't see the 2.6.22-gentoo-r2 kernel mentioned here so thought I would just add that openafs 1.4.4 fails to compile for me on a x86_64 nocona with that kernel. It works perfectly with 2.6.21 on the same machine. This is probably no great surprise. Since this is a production machine, I'm afraid I don't have the possibility to test the unstable openafs on it, as suggested elsewhere in this thread, but thought I should add this comment for completeness. If possible, I'll try the unstable openafs on an x86 or ppc machine. PS I can provide the full log file if anyone is interested.
(In reply to comment #22) > Since this is a production machine, I'm afraid I don't have > the possibility to test the unstable openafs on it, as suggested elsewhere in > this thread, but thought I should add this comment for completeness. If > possible, I'll try the unstable openafs on an x86 or ppc machine. I'd be extremely interested in that. 1) Openafs stable won't work on the 2.6.22 kernel 2) That's why I added a newer version pulled from the stable cvs tree it's been tested by only two people: Sven and myself. For Sven it gives a lot of "No such file or directory" errors For me, it works like a charm on 3 boxes (all different architectures) 3) Like this, I cannot ask any arch team to mark it as stable. So what we need is: - either someone that can reproduce Sven's findings (I haven't been able) - or some more people that test the new openafs ebuild In any case, further testing is GREATLY appreciated.
I noted the same problem after upgrading to 2.6.22-gentoo-r2, and found my way to this bug report. I've built the suggested openafs-kernel-1.4.4_p20070724 on one machine, and so far it appears to be working. I'm able to klog and all attempts to access data so far have been successful. The particular machine is a Thinkpad R50 with Pentium-M. It's also worth noting that the AFS cell it belongs to is "legacy AFS", not OpenAFS. I'll run this for a few days, then move my dual-P3 deskside to the same kernel/openafs levels, assuming there are no problems.
You mentioned you needed testers: 1. openafs-kernel-1.4.4 does not compile with 2.6.22-r2 2. openafs-kernel-1.4.4_p20070724 compiles with gentoo-sources 2.6.21-r4 and 2.6.22-r2. This has no problems (so far) on 2.6.21. If I ever get around to rebooting my machine, I'll let you know if it works good on 2.6.22. Btw -- there are lots of compile warnings, and portage warns me about "poor programing practices, and possible random runtime failures". I can ignore this right...?
(In reply to comment #25) > Btw -- there are lots of compile warnings, and portage warns me about "poor > programing practices, and possible random runtime failures". I can ignore this > right...? Yes. Works are in progress upstream to fix many of the errors, but those evidently only appear in the experimental branch, so it'll be a while before this will be fixed in a stable release. Thanks for testing! Sven: what you report seems more and more odd. Any progress? BTW: Only read-only volumes can have (active) copies on different servers.
> Sven: what you report seems more and more odd. Any progress? > BTW: Only read-only volumes can have (active) copies on different servers. No, unfortunately no progress. Compiling is fine, klog also works. But I have still the problem, that I don't have access to several directories. And sure there's no problem with the afs-servers. I had to install at the university a Linux computer using Scientific Linux with kernel 2.6.18 and openafs-1.4.4.. And it's working on that machine without problems. To go sure I deinstalled openafs completely (configs also deleted), I installed a new kernel (gentoo-sorces-2.6.22-r4) and installed afterwards openafs-1.4.4_p20070724. Same problem again. Also downgrade to kernel 2.6.21 and openafs-1.4.4 didn't help anymore. That's really strange. I'm running out of ideas.
(In reply to comment #25) > If I ever get around to rebooting my machine, I'll let you know if it works > good on 2.6.22. I rebooted. Openafs works fine (so far) on 2.6.22-gentoo-r2. Just FYI -- I only run openafs-client + kerberos. So can't test the server part. If you need testing for the server, and you send me working config files, I'll be happy to try them on my machine. Thanks for your support, GI PS: openafs is quite critical to me. All my work is on a subversion repository I store on Stanford's OpenAFS cell. So if something messes up, I'll be in *deep* trouble...
(In reply to comment #23) I have been trying OpenAFS 1.4.4_p20070724 with 2.6.22 kernels for a little over a week now on the following arches: nocona, pentium4, pentium3, pentium2, and ppc32 (G4). I have not run into any problems outside of trying to use OpenAFS behind a NAT, which is known to have timeouts depending on the client and server version involved. The kernels, etc., involved are: nocona: default-linux/amd64/2007.0/server, gcc-4.1.2, glibc-2.5-r4, 2.6.22-gentoo-r5 x86_64 pentium4 default-linux/x86/2007.0/desktop, gcc-4.1.2, glibc-2.5-r4, 2.6.22-gentoo-r5 i686 pentium3 default-linux/x86/2007.0/server, gcc-4.1.2, glibc-2.5-r4, 2.6.22-gentoo-r5 i686 pentium2 default-linux/x86/2007.0/desktop, gcc-4.1.2, glibc-2.5-r4, 2.6.22-gentoo-r5 i686 ppc32 (G4) default-linux/ppc/ppc32/2007.0/desktop/G4, gcc-4.1.2, glibc-2.5-r4, 2.6.22-gentoo-r5 ppc I use a 2G OpenAFS cache on all machines, and on the pentium3, I tried pushing files over AFS larger than that without problem. In that case, and a few others during initial testing, checksums showed no change in original and copied-over-afs files. Perhaps worth noting is that 2.6.22 is still marked unstable on ppc32. It was on the G4 that I encountered the timeouts. It is the only machine I tested behind NAT. If you want more details, let me know. I think I did some timing on the large-file transfer, but it's probably reliability more than speed that you're interested in. /Mike
(In reply to comment #29) Thank you very much, Mike, for your elaborate report. Indeed, stability is the issue here. Performance is just an added plus for now. Sven, are you working from behind a NAT? (I've had some curious problems with NATs before, that were solved by restarting the BOTH server and client, but I guess you can't really restart your univ's servers)
>Sven, are you working from behind a NAT? (I've had some curious problems with >NATs before, that were solved by restarting the BOTH server and client, but I >guess you can't really restart your univ's servers) Yes, I'm behind a NAT. And no, I can't restart the university's AFS-server. :)
So... rereading the error reports, at first I was at a loss because I didn't understand an upgrade resulted in a failure that a downgrade couldn't fix. I guess it makes sense that something could've gone wrong on the server side, something stateful that isn't reset by a client change. Unless there's other ideas, I suggest we try and mark this ebuild as stable? Remarks are welcome...
FYI: there is an SRPM at openafs.org, http://dl.openafs.org/dl/1.4.4/fedora-7/SRPMS/openafs-1.4.4-fc7.3.src.rpm which is claimed to build and work with Fedora 2.6.22 line kernels. So it must contain the appropriate patches. Can anyone review this?
(In reply to comment #32) Stable sounds good to me.
(In reply to comment #33) > FYI: there is an SRPM at openafs.org, > http://dl.openafs.org/dl/1.4.4/fedora-7/SRPMS/openafs-1.4.4-fc7.3.src.rpm which > is claimed to build and work with Fedora 2.6.22 line kernels. So it must > contain the appropriate patches. Can anyone review this? I've tried that. Actually, it was my first attempt at getting this going. But it patches several things up starting from openafs-1.4.3 (not 1.4.4), and something else failed horribly too, though I don't remember the details (sorry). Bottom line is that I'm a bit disappointed that now heavily patched packages are appearing that are backed by openafs.org, while I would like to have seen a single source package that works on everything. But I've read they lack manpower currently, which is very understandable. So what I've done with 1.4.4_p20070724 is probably quite similar to the fedora-src-rpm, except that I haven't been selecting which patches to include, and just pulled up to a certain cvs date.
I forgot to apply the delta-patch in the openafs-1.4.4_p2007... ebuild. It was applied in the openafs-kernel-... ebuild however, which made it work on 2.6.22 kernels (for most people at least). I've pushed a new net-fs/openafs-1.4.4_p20070724-r1 ebuild now, if any of you would like to test and share your results? Sven: I'm hoping this fixes your problem as well.
I tried it. Same error like before. But don't care about it. It seems I'm the only one with this problem. Searching google I found only one guy with a similiar error messages. Maybe I'll have the opportunity to install openafs on my neighbor's gentoo. Then I'll see if it's something special only on my both computers. Then I will have to check the configurations again. The first problem (compile error) is fixed. So if you wan't then let's close the bug report.
(In reply to comment #37) > The first problem (compile error) is fixed. So if you wan't then let's close > the bug report. Good idea. Thanks for all the input. Don't hesitate to file another report if your other problem persists.
Sven, were you by any chance using gcc-4.2? If so, see bug #194122.
That's exactly my problem. On both machines I'm using gcc-4.2.0. I've never thought in a gcc problem. But the described errors at bug #194122 is identical to my issues. At least I know now I'm not the only one with this problem.