Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 223355

Summary: net-fs/openafs-1.4.7 kernel Oops on 2.6.25-r4
Product: Gentoo Linux Reporter: Michael Hammer (RETIRED) <mueli>
Component: New packagesAssignee: Stefaan De Roeck (RETIRED) <stefaan>
Status: RESOLVED FIXED    
Severity: normal CC: hrabe
Priority: High    
Version: unspecified   
Hardware: AMD64   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on:    
Bug Blocks: 218127    

Description Michael Hammer (RETIRED) gentoo-dev 2008-05-23 14:39:12 UTC
Hi Stefann!

Because of a special hardware situation recently I had to upgrade to 2.6.25-r4. On this client I am using openafs and emerged openafs-1.4.7(-kernel). After slight usage I get the following kernel Oops which is absolutely reproducible:

BUG: unable to handle kernel paging request at fffffffffffffffe
IP: [<ffffffff804e10a0>] _read_lock+0x0/0xc
PGD 203067 PUD 204067 PMD 0 
Oops: 0002 [1] SMP 
CPU 1 
Modules linked in: snd_pcm_oss snd_mixer_oss snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device libafs(P) i915 drm snd_hda_intel snd_pcm snd_timer snd snd_page_alloc e1000e e1000 dm_mod scsi_wait_scan ata_piix pata_mpiix
Pid: 6111, comm: afsd Tainted: P         2.6.25-gentoo-r4 #3
RIP: 0010:[<ffffffff804e10a0>]  [<ffffffff804e10a0>] _read_lock+0x0/0xc
RSP: 0018:ffff81007b661e78  EFLAGS: 00010282
RAX: 0000000000000002 RBX: ffff81007b661eec RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000010 RDI: fffffffffffffffe
RBP: fffffffffffffffe R08: ffff81007d4dd8c0 R09: 0000000000000000
R10: ffff81007bc8bd00 R11: 0000000000000009 R12: 000000004836d24b
R13: 000000004836d1cb R14: 000000004836d1cb R15: 000000004836c430
FS:  0000000000000000(0000) GS:ffff81007e0250c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: fffffffffffffffe CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process afsd (pid: 6111, threadinfo ffff81007b660000, task ffff81007d0f9080)
Stack:  ffffffff8811ea19 000000004836d24b ffff81007b661eec 000000004836d24b
 ffffffff88125431 000000004836d24b ffffffff88113448 4836be144836d1a7
 000000004836d239 000000004836d24b 000000002289a495 000000004836d24b
Call Trace:
 [<ffffffff8811ea19>] ? :libafs:afs_osi_TraverseProcTable+0x1a/0x69
 [<ffffffff88125431>] ? :libafs:afs_GCPAGs+0xa4/0x182
 [<ffffffff88113448>] ? :libafs:afs_Daemon+0x503/0x57b
 [<ffffffff88165081>] ? :libafs:afsd_launcher+0x352/0x78b
 [<ffffffff8022840a>] ? schedule_tail+0x27/0x5c
 [<ffffffff88164d2f>] ? :libafs:afsd_launcher+0x0/0x78b
 [<ffffffff8020bc68>] ? child_rip+0xa/0x12
 [<ffffffff88164d2f>] ? :libafs:afsd_launcher+0x0/0x78b
 [<ffffffff88164d60>] ? :libafs:afsd_launcher+0x31/0x78b
 [<ffffffff8020bc5e>] ? child_rip+0x0/0x12


Code: 8b 07 38 e0 75 0a 66 89 c2 fe c6 f0 66 0f b1 17 0f 94 c2 0f b6 c2 85 c0 0f 95 c0 0f b6 c0 c3 fe 07 c3 fe 07 56 9d c3 fe 07 fb c3 <f0> 83 2f 01 79 05 e8 e5 67 e6 ff c3 9c 58 fa f0 83 2f 01 79 05 
RIP  [<ffffffff804e10a0>] _read_lock+0x0/0xc
 RSP <ffff81007b661e78>
CR2: fffffffffffffffe
---[ end trace 8e69ef1e1094e2b0 ]---

This one seams to be related to:
http://thread.gmane.org/gmane.comp.file-systems.openafs.devel/7560

They suggest to set
$ echo 2 > /proc/sys/afs/GCPAGs

This one is IMHO not a duplicate of #220635 so I am posting as new. Can you advice to disable GCPAGs? Can we do this in the init script or can we at least document it somewhere?

g, mueli

emerge --info:

Portage 2.1.4.4 (default-linux/amd64/2006.1, gcc-4.1.2, glibc-2.6.1-r0, 2.6.25-gentoo-r4 x86_64)
=================================================================
System uname: 2.6.25-gentoo-r4 x86_64 Intel(R) Pentium(R) D CPU 3.00GHz
Timestamp of tree: Tue, 20 May 2008 16:18:01 +0000
distcc 2.18.3 x86_64-pc-linux-gnu (protocols 1 and 2) (default port 3632) [enabled]
app-shells/bash:     3.2_p33
dev-java/java-config: 1.3.7, 2.1.6
dev-lang/python:     2.4.4-r9
dev-python/pycrypto: 2.0.1-r6
sys-apps/baselayout: 1.12.11.1
sys-apps/sandbox:    1.2.18.1-r2
sys-devel/autoconf:  2.13, 2.61-r1
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.1
sys-devel/binutils:  2.18-r1
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool:   1.5.26
virtual/os-headers:  2.6.23-r3
ACCEPT_KEYWORDS="amd64"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=nocona -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/config"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c /etc/udev/rules.d"
CXXFLAGS="-march=nocona -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="distcc distlocks metadata-transfer sandbox sfperms strict unmerge-orphans userfetch"
GENTOO_MIRRORS="ftp://gentoo.inode.at/source/ ftp://ftp.tugraz.at/mirror/gentoo ftp://gd.tuwien.ac.at/opsys/linux/gentoo/ "
LANG="en_US.UTF-8"
LC_ALL="en_US.UTF-8"
MAKEOPTS="-j4"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/portage/local/layman/gentoo-de /usr/portage/local/layman/test"
SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage"
USE="X alsa amd64 berkdb cdr cli cracklib crypt cups dbus dri dvd emacs fortran gdbm gif gpm gtk hal iconv icq ipv6 irc isdnlog java jpeg kde kdeenablefinal kerberos ldap midi mp3 mudflap ncurses nfs nls nntp nptl nptlonly nsplugin opengl openmp pam pcre pdf perl ppds pppd python readline reflection session spl ssl svg tcpd unicode xinerama xorg zlib" ALSA_CARDS="hda-intel" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="evdev keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="fbdev i128 i810 vesa vga"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LDFLAGS, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 1 Stefaan De Roeck (RETIRED) gentoo-dev 2008-05-23 16:54:00 UTC
Thanks for the thorough analysis.

If I get this correctly, the choice is between
* a system that runs out of PAGs, which amounts to a DOS on heavily used systems
* a system that may crash because of some lock error in the garbage collection
(correct me if I'm wrong)
so the safe choice on a 2.6.25 kernel seems to be disabling gcpags, which I would consider to change in the source, as this default is a configuration parameter.  

I have not yet seen the crash you're describing, but then again my machine does not undergo heavy afs traffic at the moment.  

I'm checking with #220635 anyway to see whether they're affected by the same bug (locking errors may have varying consequences).  

Hoping for more information on the OpenAFS mailing lists...
Comment 2 Stefaan De Roeck (RETIRED) gentoo-dev 2008-05-23 16:58:10 UTC
I forgot to mention that on my system (2.6.25-gentoo-r1 with openafs-1.4.7), /proc/sys/afs/GCPAGs jumps from 1 (enabled) to 8 (error condition) after some time.  
Comment 3 Michael Hammer (RETIRED) gentoo-dev 2008-05-23 17:28:50 UTC
I do fully agree that both possibilities aren't nice. I do _not_ demand that you undef GCPAGs per default - that make no sense. I am going to make a patch on my affected machines. Let's hope that upstream will fix the issue - I am willing to do further testing if I can help in any way.... it seams you're watching the bug - don't hesitate to ping me ;)

@/proc/sys/afs/GCPAGs : I can attest that at the momemt of the kernel Oops /proc/sys/afs/GCPAGs hasn't reached an error state. Perhaps this is the difference between your running and my segfaulting machines?

So far and thx for all the fish

mueli
Comment 4 Michael Hammer (RETIRED) gentoo-dev 2008-05-23 17:29:21 UTC
BTW: Sry Stefaan for misspelling your name!
Comment 5 Stefaan De Roeck (RETIRED) gentoo-dev 2008-06-21 07:14:43 UTC
Upstream still doesn't seem to have a release that solves this problem.  I'm beginning to think we'll need to pull patches from cvs (again).  
Comment 6 Carsten Lohrke (RETIRED) gentoo-dev 2008-07-21 19:15:53 UTC
*** Bug 232588 has been marked as a duplicate of this bug. ***
Comment 7 Stefaan De Roeck (RETIRED) gentoo-dev 2008-11-28 13:08:05 UTC
Could you check whether this works for you in net-fs/openafs-1.4.8?  Thanks
Comment 8 Michael Hammer (RETIRED) gentoo-dev 2009-04-04 16:30:07 UTC
The problem is under control for me. With openafs-1.4.8 I wasn't able to reproduce the problem yet. On the other hand it's fully an upstream issue and therefore I'll close this bug.

g, mueli