Have reproduced the following behavior twice: # df -h # above hangs after listing some of the files # Message in dmesg: Starting AFS cache scan...found 1714 non-empty cache files (3%). BUG: unable to handle kernel paging request at fffffffffffffffe IP: [<ffffffff80530ce8>] _read_lock+0x0/0xc PGD 203067 PUD 204067 PMD 0 Oops: 0002 [1] SMP CPU 1 Modules linked in: libafs(P) snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_timer snd i2c_i801 i2c_core soundcore snd_page_alloc Pid: 4746, comm: afsd Tainted: P 2.6.25-gentoo-r7 #2 RIP: 0010:[<ffffffff80530ce8>] [<ffffffff80530ce8>] _read_lock+0x0/0xc RSP: 0018:ffff8100755cde88 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff8100755cdeec RCX: 0000000000000000 RDX: 0000000000000010 RSI: ffffc20001f67348 RDI: fffffffffffffffe RBP: 0000000049031e33 R08: 0000000000000000 R09: 0000000100000000 R10: ffffc20001d1f050 R11: 00000000000000c0 R12: 0000000049031e33 R13: 0000000049031e33 R14: 0000000049031cd8 R15: 0000000049031e21 FS: 0000000000000000(0000) GS:ffff81007f36acc0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: fffffffffffffffe CR3: 000000006e80d000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process afsd (pid: 4746, threadinfo ffff8100755cc000, task ffff81007e7c6080) Stack: ffffffff8809134b ffff8100755cdeec ffffffff88096fe4 0000000049031e33 ffffffff88087109 4903101349031e21 00000000490315cf 0000000049031e33 0000000036298831 0000000049031e33 0000000036298831 ffffffff807502a0 Call Trace: [<ffffffff8809134b>] ? :libafs:afs_osi_TraverseProcTable+0x12/0x61 [<ffffffff88096fe4>] ? :libafs:afs_GCPAGs+0xad/0x164 [<ffffffff88087109>] ? :libafs:afs_Daemon+0x352/0x419 [<ffffffff880d058c>] ? :libafs:afsd_launcher+0x225/0x6b3 [<ffffffff8022c9cd>] ? schedule_tail+0x27/0x5c [<ffffffff880d0367>] ? :libafs:afsd_launcher+0x0/0x6b3 [<ffffffff8020bc88>] ? child_rip+0xa/0x12 [<ffffffff880d0367>] ? :libafs:afsd_launcher+0x0/0x6b3 [<ffffffff880d0394>] ? :libafs:afsd_launcher+0x2d/0x6b3 [<ffffffff8020bc7e>] ? child_rip+0x0/0x12 Code: 8b 07 38 e0 75 0a 66 89 c2 fe c6 f0 66 0f b1 17 0f 94 c2 0f b6 c2 85 c0 0f 95 c0 0f b6 c0 c3 fe 07 c3 fe 07 56 9d c3 fe 07 fb c3 <f0> 83 2f 01 79 05 e8 0d cb de ff c3 9c 58 fa f0 83 2f 01 79 05 RIP [<ffffffff80530ce8>] _read_lock+0x0/0xc RSP <ffff8100755cde88> CR2: fffffffffffffffe ---[ end trace ff550a46ac0834f8 ]--- Reproducible: Always Steps to Reproduce: (see above) Actual Results: system continues to run, but df -h is hung. Unfortunately, the system will then NOT reboot when with "reboot", but must be powered off. Note this is a x86_64 Intel (CXXFLAGS="-march=nocona -O2 -pipe") running Portage 2.1.4.5 (default-linux/amd64/2007.0, gcc-4.1.2, glibc-2.6.1-r0, 2.6.25-gentoo-r7 x86_64). As it uses the AMD profile (as suggested by Gentoo), I am filing this under AMD. Please feel free to change.
I updated the subject line because you left off the package name -- make sure I got it right. Also, please post output of 'emerge --info' command.
Hm, this list thread looks quite relevant, a slightly older afs but very similar oops: http://www.nabble.com/Erratic-kernel-Oops-(afsd-1.4.7_pre3-and-Linux-2.6.25)-td16772220.html
Hi Wormo, Yes, you got the package name just right. Sorry I left that out. Below is the emerge --info. The only thing that has changed since I reported the bug is that I switched profiles to default/linux/amd64/2008.0/no-multilib. This resulted in no change in USE flags for me from the 2007 profile I had been using since I was running without multilib there as well. So outside of the profile name change, this is the correct emerge --info. I also think you are right about the relevance of nabble.com link you gave. Since the machine in question is running production (typically without AFS), I'm a little afraid to try the GCPAGs thing suggested in that link, however. Thanks for the interest, /Mike # emerge --info Portage 2.1.4.5 (default/linux/amd64/2008.0/no-multilib, gcc-4.1.2, glibc-2.6.1-r0, 2.6.25-gentoo-r7 x86_64) ================================================================= System uname: 2.6.25-gentoo-r7 x86_64 Intel(R) Pentium(R) D CPU 2.80GHz Timestamp of tree: Wed, 12 Nov 2008 12:00:01 +0000 ccache version 2.4 [enabled] app-shells/bash: 3.2_p33 dev-lang/python: 2.5.2-r7 dev-python/pycrypto: 2.0.1-r6 dev-util/ccache: 2.4-r7 sys-apps/baselayout: 1.12.11.1 sys-apps/sandbox: 1.2.18.1-r2 sys-devel/autoconf: 2.13, 2.61-r2 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.1-r1 sys-devel/binutils: 2.18-r3 sys-devel/gcc-config: 1.4.0-r4 sys-devel/libtool: 1.5.26 virtual/os-headers: 2.6.23-r3 ACCEPT_KEYWORDS="amd64" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=nocona -O2 -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/revdep-rebuild /etc/terminfo /etc/udev/rules.d" CXXFLAGS="-march=nocona -O2 -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="ccache distlocks metadata-transfer parallel-fetch sandbox sfperms strict unmerge-orphans userfetch" GENTOO_MIRRORS="http://ftp.linux.ee/pub/gentoo/distfiles/ http://ftp.rhnet.is/pub/gentoo/ http://mirror.gentoo.no/ http://gentoo.osuosl.org/" LDFLAGS="-Wl,-O1" MAKEOPTS="-j3" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage" USE="acl afs amd64 apache2 berkdb bzip2 cli cracklib crypt cups curl dri emacs fam fortran gdbm geoip gpm iconv ipv6 isdnlog ldap mailwrapper midi mmx mudflap ncurses network-cron nls nptl nptlonly openmp pam pcre perl pppd python readline reflection session spl sse sse2 ssl sysfs tcpd threads unicode vhosts xattr xml xorg zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="access actions alias asis auth auth_anon auth_dbm auth_digest authz_default authz_host autoindex cache case_filter_in case_filter cern_meta cgi cgid charset_lite dav dav_fs dav_lock deflate dir disk_cache echo env expires ext_filter file_cache filter headers imap include info log_config logio mem_cache mime mime_magic negotiation proxy proxy_connect proxy_ftp proxy_http rewrite setenvif so speling status unique_id unique_id userdir usertrack vhost_alias" APACHE2_MPMS="worker" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="fbdev glint i810 intel mach64 mga neomagic nv r128 radeon savage sis tdfx trident vesa vga via vmware voodoo" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LANG, LC_ALL, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
I think the kernel config is especially relevant for this. Could you check whether CONFIG_KEYS is enabled or not? If not, could you try enabling it in your kernel? (i.e. recompile kernel, install kernel, rebuild openafs-kernel, reboot)
Could you provide some feedback please?
(In reply to comment #5) > Could you provide some feedback please? Sorry for the slow reply. I had surgery and was out of it for awhile. Currently: tjasse ~ # cd /usr/src/linux tjasse linux # grep CONFIG_KEYS .config # CONFIG_KEYS is not set tjasse linux # uname -a Linux tjasse 2.6.26-gentoo-r4 #1 SMP Sun Dec 7 15:17:18 CET 2008 x86_64 Intel(R) Pentium(R) D CPU 2.80GHz GenuineIntel GNU/Linux But this also applied to the kernel I was running when I reported the error: tjasse linux # cd ../linux-2.6.25-gentoo-r7/ tjasse linux-2.6.25-gentoo-r7 # grep CONFIG_KEYS .config # CONFIG_KEYS is not set I also now have OpenAFS 1.4.8 installed but not running on the system now, instead of the 1.4.8_pre3 I that I initially had the problem with. Since this is a production machine and I unfortunately don't have any other x86_64 machines to test on, the soonest test I would be able to do would be over the weekend. Assuming I can do that, which kernel/openafs version would it help you most for me to try?
(In reply to comment #6) > # CONFIG_KEYS is not set > > But this also applied to the kernel I was running when I reported the error: Openafs follows different code paths depending on this parameter, and I currently have the feeling that CONFIG_KEYS enabled is the more tested path. This may be relevant because there lately have been changes in code depending on this CONFIG_KEYS parameter. > I also now have OpenAFS 1.4.8 installed but not running on the system now, > instead of the 1.4.8_pre3 I that I initially had the problem with. Since this > is a production machine and I unfortunately don't have any other x86_64 > machines to test on, the soonest test I would be able to do would be over the > weekend. Assuming I can do that, which kernel/openafs version would it help > you most for me to try? Well, if you have 1.4.8 (as opposed to) without CONFIG_KEYS ready to test and you can easily reproduce, it would be interesting to know whether that works. Otherwise, I would still bet on 1.4.8 with a kernel that has CONFIG_KEYS enabled (take care to build the new kernel, reboot into it, make sure you build against the new kernel (KERNEL_DIR/KBUILD_OUTPUT), remerge openafs-kernel and only the start the client). I hope your tests turn out well. Thanks for testing.
(In reply to comment #7) Well, I have now tried compiling 2.6.27-gentoo-r7 with both CONFIG_KEYS and CONFIG_KEYS_DEBUG_PROC_KEYS set. I compiled net-fs/openafs-kernel-1.4.8 against this kernel but when I even try to /etc/init.d/openafs-client start I get a Kernel Oops. libafs is involved with the Oops. I don't seem to have any problems with the Kernel with CONFIG_KEYS set, so far, as long as I don't run openafs. I tried to be careful in the compiling, etc., as you outlined.
Closing, as it seems to work with CONFIG_KEYS.