139406 – sys-kernel/gentoo-sources-2.6.17-r1: reading from /proc/cpuinfo (e.g. by uname -p) hangs

Bug 139406 - sys-kernel/gentoo-sources-2.6.17-r1: reading from /proc/cpuinfo (e.g. by uname -p) hangs

Summary: sys-kernel/gentoo-sources-2.6.17-r1: reading from /proc/cpuinfo (e.g. by unam...

Status:	RESOLVED OBSOLETE

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] Core system (show other bugs)
Hardware:	All Linux

Importance:	High normal (vote)
Assignee:	Gentoo Kernel Bug Wranglers and Kernel Maintainers

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2006-07-06 03:49 UTC by Martin von Gagern
Modified:	2011-11-23 09:43 UTC (History)
CC List:	0 users

See Also:
Package list:
Runtime testing required:	---

Attachments
Boot messages of my 2.6.17-r1 kernel (2.6.17-r1.boot,31.14 KB, text/plain) 2006-07-09 14:51 UTC, Martin von Gagern	Details
Boot messages of my 2.6.17-r1 kernel (2.6.17-r1.boot,31.82 KB, text/plain) 2006-07-09 14:54 UTC, Martin von Gagern	Details
dmesg on 2.6.17-gentoo-r2 (dmesg.combined,31.96 KB, text/plain) 2006-07-18 02:40 UTC, Martin von Gagern	Details
dmesg on 2.6.17-gentoo-r2 untainted (dmesg,34.18 KB, text/plain) 2006-07-24 13:18 UTC, Martin von Gagern	Details
Show Obsolete (3) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Martin von Gagern 2006-07-06 03:49:26 UTC

I just had a situation where emerging neon hung while running configure. Examining running processes, I identified "uname -p" as the culprit. Calling that manually hung as well, as did attaching an strace to an already running instance. Sending SIGKILL did not work either. Running "strace uname -p" showed me that it hung while reading from /proc/cpuinfo. "cat /proc/cpuinfo" hung as well.

Rebooting the system solved the problem, I don't know how or even if this can be reproduced. An ugly bug, that one.

I'm using sys-kernel/gentoo-sources-2.6.17-r1, and after reboot my cpuinfo looks like this:

# cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 3
model name      : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping        : 4
cpu MHz         : 3000.000
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc pni monitor ds_cpl cid xtpr
bogomips        : 6026.10

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 3
model name      : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping        : 4
cpu MHz         : 3000.000
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe constant_tsc pni monitor ds_cpl cid xtpr
bogomips        : 6020.50

# uname -a
Linux server 2.6.17-gentoo-r1 #1 SMP Thu Jul 6 10:19:21 CEST 2006 i686 Intel(R) Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux

I'm using SMT (CONFIG_SMP=y and CONFIG_X86_HT=y)
and cpufreqd (CONFIG_X86_P4_CLOCKMOD=y).
Don't know if either one has anything to do with this issue.

Comment 1 Martin von Gagern 2006-07-09 02:45:50 UTC

It just happened again.

I configured some package several times, and suddenly it got stuck in reading cpuinfo. So this problem was not there from the start but rather occurred somtime while the system was up.

Connecting to cpufreqd fails as well:
$ cpufreqd-get 
socket I'll try to connect: /tmp/cpufreqd-6BRIMP/cpufreqd

And killing it is no betther, be it SIGTERM or SIGKILL, the process stays there.
Attaching strace to it hangs as well, as with that uname above.

Because I've been asked about my preemtion setings in IRC today:
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_BKL=y

I will disable CONFIG_PREEMPT_BKL for the time being and see if that helps.

Comment 2 Daniel Drake (RETIRED) gentoo-dev

2006-07-09 11:42:57 UTC

Please attach dmesg output from when after the hang occurrs

Comment 3 Martin von Gagern 2006-07-09 12:49:56 UTC

(In reply to comment #2)
> Please attach dmesg output from when after the hang occurrs

I did not save the dmesg, but the last event was hours before the hang, there was nothing related.

Comment 4 Daniel Drake (RETIRED) gentoo-dev

2006-07-09 13:10:25 UTC

And what was the last event?

Comment 5 Martin von Gagern 2006-07-09 13:30:29 UTC

(In reply to comment #4)
> And what was the last event?

Some message about an unknown PS/2 mouse after I last used my KVM switch.
But accessing /proc/cpuinfo definitely worked several times after that.

Oh, I have that line in my logs:
Jul  9 04:24:22 server kernel: [4358097.856000] logips2pp: Detected unknown logitech mouse model 94

I rebooted my system at 12:36, which in my estimation was less than an hour after the bug occurred. So there were about 7 hours without any kernel messages.

Comment 6 Daniel Drake (RETIRED) gentoo-dev

2006-07-09 13:43:08 UTC

Ok, in that case just post a dmesg from a clean boot please.

Comment 7 Martin von Gagern 2006-07-09 14:51:04 UTC

Created attachment 91316 [details]
Boot messages of my 2.6.17-r1 kernel

(In reply to comment #6)
> Ok, in that case just post a dmesg from a clean boot please.

I grabbed the messages from my kern.log. It is my understanding that they contain all dmesg contents as well. This log was from the latest boot process before comment #1, so a boot after which the error occurred.

In the meantime I've switched to 2.6.17-r2 and disabled CONFIG_PREEMPT_BKL, so if I reboot now and grab the dmesg contents, things would be slightly different.

Comment 8 Martin von Gagern 2006-07-09 14:54:12 UTC

Created attachment 91317 [details]
Boot messages of my 2.6.17-r1 kernel

(In reply to comment #7)
> I grabbed the messages from my kern.log.

Lost a few lines in the process, sorry about that.

Comment 9 Daniel Drake (RETIRED) gentoo-dev

2006-07-09 15:03:31 UTC

Please post dmesg output even if it is from a slightly different kernel. syslog often misses stuff...

Comment 10 Daniel Drake (RETIRED) gentoo-dev

2006-07-09 15:09:32 UTC

Also you need to reproduce this on a clean kernel (i.e. no fritz stuff, not even loaded then unloaded: must be completely untainted)

Comment 11 Martin von Gagern 2006-07-09 18:11:22 UTC

(In reply to comment #9)
> Please post dmesg output even if it is from a slightly different kernel.
> syslog often misses stuff...

Currently the dmesg contents is incomplete, too many messages since the last boot. And I don't want to reboot my system just now, I'm remote accessing it and if anything goes wrong I'd probably have to drive there just to fix it. I'll create this log as soon as I manage to, probably around the end of the week.

(In reply to comment #10)
> Also you need to reproduce this on a clean kernel (i.e. no fritz stuff, not
> even loaded then unloaded: must be completely untainted)

I'll try if I can get capisuite working with misdn as well. If so, then I'll switch and have a clean kernel. Otherwise this will have to wait even longer, until I find some time when I can do without my capisuite answering machine.

Comment 12 Henrik Brix Andersen 2006-07-10 00:09:55 UTC

(In reply to comment #11)
> Currently the dmesg contents is incomplete, too many messages since the last
> boot.

The dmesg from when you last booted the machine is stored in /var/log/dmesg - you can attach that :)

Comment 13 Daniel Drake (RETIRED) gentoo-dev

2006-07-13 10:53:09 UTC

In that case I'll close this for now, please reopen once you have reproduced on a clean kernel and provided the extra info. At that point you'd need to test the latest development kernel too, which is currently 2.6.18-rc1.

Comment 14 Martin von Gagern 2006-07-18 02:40:30 UTC

Created attachment 92068 [details]
dmesg on 2.6.17-gentoo-r2

(In reply to comment #12)
> The dmesg from when you last booted the machine is stored in /var/log/dmesg -
> you can attach that :)

Thanks! It just happened again. I got that and combined it with the current dmesg, they had enough overlap. This is the kernel from my comment #7, 2.6.17-r2 with CONFIG_PREEMPT_BKL disabled. I'm now installing vanilla sources and will boot them without fritzcapi.

Comment 15 Martin von Gagern 2006-07-19 03:11:47 UTC

(In reply to comment #13)
> At that point you'd need to test the latest development kernel too

2.6.18-rc2 does not work for me. I just filed bug 141015 about this.

(In reply to comment #10)
> Also you need to reproduce this on a clean kernel (i.e. no fritz stuff, not
> even loaded then unloaded: must be completely untainted)

At least I have capisuite working with mISDN instead of fritzcapi, so my kernel should be untainted, although it still uses modules not from the main kernel source tree.

Comment 16 Martin von Gagern 2006-07-24 13:18:03 UTC

Created attachment 92636 [details]
dmesg on 2.6.17-gentoo-r2 untainted

(In reply to comment #10)
> Also you need to reproduce this on a clean kernel (i.e. no fritz stuff, not
> even loaded then unloaded: must be completely untainted)

OK, it just happened again, with mISDN instead of fritzcapi this time.

(In reply to comment #13)
> In that case I'll close this for now, please reopen once you have reproduced
> on a clean kernel and provided the extra info.

Reproduced with an clean i.e. untainted kernel.
Merged /var/log/dmesg and current dmesg to build attached file.

> At that point

What point? Before or after reopening? Because reopening is done in an instant, trying to reproduce it may take months, and actually reproducing it certainly won't coincide with me reopening this bug.

> you'd need to test the latest development kernel too.

As 2.6.18-rc2 still does not work for me, that is out of the question for now. I could try the 2.6.17.6 vanilla sources, if that is any help.

Comment 17 Martin von Gagern 2011-11-23 09:43:35 UTC

As this is about an ancient kernel, using ancient drivers, and I seem to be the only one who had been affected by this, and don't encounter it any more, I'm changing the resolution from NEEDINFO to OBSOLETE.