First Last Prev Next    No search results available      Search page      Enter new bug
Bug#: 223355
Alias:
Product:
Component:
Status: RESOLVED
Resolution: FIXED
Assigned To: Stefaan De Roeck <stefaan@gentoo.org>
Hardware:
OS:
Version:
Priority:
Severity:
Reporter: Michael Hammer <mueli@gentoo.org>
Add CC:
CC:
Remove selected CCs
URL:
Summary:
Status Whiteboard:
Keywords:

Filename Description Type Creator Created Size Actions
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 223355 depends on: Show dependency tree
Bug 223355 blocks: 218127
Votes: 0    Show votes for this bug    Vote for this bug

Additional Comments: (this is where you put emerge --info)


Not eligible to see or edit group visibility for this bug.






View Bug Activity   |   Format For Printing   |   XML   |   Clone This Bug


Description:   Opened: 2008-05-23 14:39 0000
Hi Stefann!

Because of a special hardware situation recently I had to upgrade to 2.6.25-r4.
On this client I am using openafs and emerged openafs-1.4.7(-kernel). After
slight usage I get the following kernel Oops which is absolutely reproducible:

BUG: unable to handle kernel paging request at fffffffffffffffe
IP: [<ffffffff804e10a0>] _read_lock+0x0/0xc
PGD 203067 PUD 204067 PMD 0 
Oops: 0002 [1] SMP 
CPU 1 
Modules linked in: snd_pcm_oss snd_mixer_oss snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device libafs(P) i915 drm snd_hda_intel
snd_pcm snd_timer snd snd_page_alloc e1000e e1000 dm_mod scsi_wait_scan
ata_piix pata_mpiix
Pid: 6111, comm: afsd Tainted: P         2.6.25-gentoo-r4 #3
RIP: 0010:[<ffffffff804e10a0>]  [<ffffffff804e10a0>] _read_lock+0x0/0xc
RSP: 0018:ffff81007b661e78  EFLAGS: 00010282
RAX: 0000000000000002 RBX: ffff81007b661eec RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000010 RDI: fffffffffffffffe
RBP: fffffffffffffffe R08: ffff81007d4dd8c0 R09: 0000000000000000
R10: ffff81007bc8bd00 R11: 0000000000000009 R12: 000000004836d24b
R13: 000000004836d1cb R14: 000000004836d1cb R15: 000000004836c430
FS:  0000000000000000(0000) GS:ffff81007e0250c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: fffffffffffffffe CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process afsd (pid: 6111, threadinfo ffff81007b660000, task ffff81007d0f9080)
Stack:  ffffffff8811ea19 000000004836d24b ffff81007b661eec 000000004836d24b
 ffffffff88125431 000000004836d24b ffffffff88113448 4836be144836d1a7
 000000004836d239 000000004836d24b 000000002289a495 000000004836d24b
Call Trace:
 [<ffffffff8811ea19>] ? :libafs:afs_osi_TraverseProcTable+0x1a/0x69
 [<ffffffff88125431>] ? :libafs:afs_GCPAGs+0xa4/0x182
 [<ffffffff88113448>] ? :libafs:afs_Daemon+0x503/0x57b
 [<ffffffff88165081>] ? :libafs:afsd_launcher+0x352/0x78b
 [<ffffffff8022840a>] ? schedule_tail+0x27/0x5c
 [<ffffffff88164d2f>] ? :libafs:afsd_launcher+0x0/0x78b
 [<ffffffff8020bc68>] ? child_rip+0xa/0x12
 [<ffffffff88164d2f>] ? :libafs:afsd_launcher+0x0/0x78b
 [<ffffffff88164d60>] ? :libafs:afsd_launcher+0x31/0x78b
 [<ffffffff8020bc5e>] ? child_rip+0x0/0x12


Code: 8b 07 38 e0 75 0a 66 89 c2 fe c6 f0 66 0f b1 17 0f 94 c2 0f b6 c2 85 c0
0f 95 c0 0f b6 c0 c3 fe 07 c3 fe 07 56 9d c3 fe 07 fb c3 <f0> 83 2f 01 79 05 e8
e5 67 e6 ff c3 9c 58 fa f0 83 2f 01 79 05 
RIP  [<ffffffff804e10a0>] _read_lock+0x0/0xc
 RSP <ffff81007b661e78>
CR2: fffffffffffffffe
---[ end trace 8e69ef1e1094e2b0 ]---

This one seams to be related to:
http://thread.gmane.org/gmane.comp.file-systems.openafs.devel/7560

They suggest to set
$ echo 2 > /proc/sys/afs/GCPAGs

This one is IMHO not a duplicate of #220635 so I am posting as new. Can you
advice to disable GCPAGs? Can we do this in the init script or can we at least
document it somewhere?

g, mueli

emerge --info:

Portage 2.1.4.4 (default-linux/amd64/2006.1, gcc-4.1.2, glibc-2.6.1-r0,
2.6.25-gentoo-r4 x86_64)
=================================================================
System uname: 2.6.25-gentoo-r4 x86_64 Intel(R) Pentium(R) D CPU 3.00GHz
Timestamp of tree: Tue, 20 May 2008 16:18:01 +0000
distcc 2.18.3 x86_64-pc-linux-gnu (protocols 1 and 2) (default port 3632)
[enabled]
app-shells/bash:     3.2_p33
dev-java/java-config: 1.3.7, 2.1.6
dev-lang/python:     2.4.4-r9
dev-python/pycrypto: 2.0.1-r6
sys-apps/baselayout: 1.12.11.1
sys-apps/sandbox:    1.2.18.1-r2
sys-devel/autoconf:  2.13, 2.61-r1
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.1
sys-devel/binutils:  2.18-r1
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool:   1.5.26
virtual/os-headers:  2.6.23-r3
ACCEPT_KEYWORDS="amd64"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=nocona -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config
/usr/kde/3.5/shutdown /usr/share/config"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf
/etc/gconf /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c
/etc/udev/rules.d"
CXXFLAGS="-march=nocona -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="distcc distlocks metadata-transfer sandbox sfperms strict
unmerge-orphans userfetch"
GENTOO_MIRRORS="ftp://gentoo.inode.at/source/ ftp://ftp.tugraz.at/mirror/gentoo
ftp://gd.tuwien.ac.at/opsys/linux/gentoo/ "
LANG="en_US.UTF-8"
LC_ALL="en_US.UTF-8"
MAKEOPTS="-j4"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress
--force --whole-file --delete --stats --timeout=180 --exclude=/distfiles
--exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/portage/local/layman/gentoo-de
/usr/portage/local/layman/test"
SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage"
USE="X alsa amd64 berkdb cdr cli cracklib crypt cups dbus dri dvd emacs fortran
gdbm gif gpm gtk hal iconv icq ipv6 irc isdnlog java jpeg kde kdeenablefinal
kerberos ldap midi mp3 mudflap ncurses nfs nls nntp nptl nptlonly nsplugin
opengl openmp pam pcre pdf perl ppds pppd python readline reflection session
spl ssl svg tcpd unicode xinerama xorg zlib" ALSA_CARDS="hda-intel"
ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file
hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route
share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias
authn_anon authn_dbm authn_default authn_file authz_dbm authz_default
authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs
dav_lock deflate dir disk_cache env expires ext_filter file_cache filter
headers include info log_config logio mem_cache mime mime_magic negotiation
rewrite setenvif speling status unique_id userdir usertrack vhost_alias"
ELIBC="glibc" INPUT_DEVICES="evdev keyboard mouse" KERNEL="linux"
LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses
text" USERLAND="GNU" VIDEO_CARDS="fbdev i128 i810 vesa vga"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LDFLAGS, LINGUAS,
PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

------- Comment #1 From Stefaan De Roeck 2008-05-23 16:54:00 0000 -------
Thanks for the thorough analysis.

If I get this correctly, the choice is between
* a system that runs out of PAGs, which amounts to a DOS on heavily used
systems
* a system that may crash because of some lock error in the garbage collection
(correct me if I'm wrong)
so the safe choice on a 2.6.25 kernel seems to be disabling gcpags, which I
would consider to change in the source, as this default is a configuration
parameter.  

I have not yet seen the crash you're describing, but then again my machine does
not undergo heavy afs traffic at the moment.  

I'm checking with #220635 anyway to see whether they're affected by the same
bug (locking errors may have varying consequences).  

Hoping for more information on the OpenAFS mailing lists...

------- Comment #2 From Stefaan De Roeck 2008-05-23 16:58:10 0000 -------
I forgot to mention that on my system (2.6.25-gentoo-r1 with openafs-1.4.7),
/proc/sys/afs/GCPAGs jumps from 1 (enabled) to 8 (error condition) after some
time.  

------- Comment #3 From Michael Hammer 2008-05-23 17:28:50 0000 -------
I do fully agree that both possibilities aren't nice. I do _not_ demand that
you undef GCPAGs per default - that make no sense. I am going to make a patch
on my affected machines. Let's hope that upstream will fix the issue - I am
willing to do further testing if I can help in any way.... it seams you're
watching the bug - don't hesitate to ping me ;)

@/proc/sys/afs/GCPAGs : I can attest that at the momemt of the kernel Oops
/proc/sys/afs/GCPAGs hasn't reached an error state. Perhaps this is the
difference between your running and my segfaulting machines?

So far and thx for all the fish

mueli

------- Comment #4 From Michael Hammer 2008-05-23 17:29:21 0000 -------
BTW: Sry Stefaan for misspelling your name!

------- Comment #5 From Stefaan De Roeck 2008-06-21 07:14:43 0000 -------
Upstream still doesn't seem to have a release that solves this problem.  I'm
beginning to think we'll need to pull patches from cvs (again).  

------- Comment #6 From Carsten Lohrke 2008-07-21 19:15:53 0000 -------
*** Bug 232588 has been marked as a duplicate of this bug. ***

------- Comment #7 From Stefaan De Roeck 2008-11-28 13:08:05 0000 -------
Could you check whether this works for you in net-fs/openafs-1.4.8?  Thanks

------- Comment #8 From Michael Hammer 2009-04-04 16:30:07 0000 -------
The problem is under control for me. With openafs-1.4.8 I wasn't able to
reproduce the problem yet. On the other hand it's fully an upstream issue and
therefore I'll close this bug.

g, mueli

First Last Prev Next    No search results available      Search page      Enter new bug