Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 78270 - kernel BUG at mm/rrge, and...¿randomly?map.c:483!--> When I try to eme
Summary: kernel BUG at mm/rrge, and...¿randomly?map.c:483!--> When I try to eme
Status: RESOLVED INVALID
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: High critical (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-01-16 13:32 UTC by José María (Spain)
Modified: 2005-02-09 11:56 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description José María (Spain) 2005-01-16 13:32:40 UTC
My system:
AMD Athlon 800Mhz (not an Athlon-XP).

My kernel:
2.6.10-r4 (but I had similar bugs with 2.6.9-r1)

What appears is:
------------[ cut here ]------------
kernel BUG at mm/rmap.c:483!
invalid operand: 0000 [#1]
Modules linked in:
CPU:    0
EIP:    0060:[<c0143ed9>]    Not tainted VLI
EFLAGS: 00010296   (2.6.10-gentoo-r4)
EIP is at page_remove_rmap+0x29/0x40
eax: fffffff0   ebx: 00000000   ecx: c1215000   edx: c1215000
esi: d0aa8058   edi: c1215000   ebp: 00001000   esp: d1924d64
ds: 007b   es: 007b   ss: 0068
Process doexe (pid: 9678, threadinfo=d1924000 task=d01b9520)
Stack: c013ddf8 c1215000 d1924d84 c0131c10 00000080 10a80067 40416000 cff63404
       40017000 00000000 c013df67 c04ddb68 cff63400 40016000 00001000 00000000
       c04ddb68 40016000 cff63404 40017000 00000000 c013dfdb c04ddb68 cff63400
Call Trace:
 [<c013ddf8>] zap_pte_range+0x128/0x240
 [<c0131c10>] file_read_actor+0x0/0xe0
 [<c013df67>] zap_pmd_range+0x57/0x80
 [<c013dfdb>] unmap_page_range+0x4b/0x80
 [<c013e10d>] unmap_vmas+0xfd/0x1b0
 [<c0142308>] exit_mmap+0x78/0x140
 [<c0112aec>] mmput+0x2c/0x80
 [<c0156e89>] exec_mmap+0x79/0xf0
 [<c015702a>] flush_old_exec+0xca/0x650
 [<c0156e00>] kernel_read+0x50/0x60
 [<c0174bcb>] load_elf_binary+0x33b/0xc80
 [<c013545e>] buffered_rmqueue+0xbe/0x150
 [<c01356ba>] __alloc_pages+0x1ca/0x360
 [<c01ecbc2>] copy_from_user+0x42/0x80
 [<c0156928>] copy_strings+0x188/0x200
 [<c013df67>] zap_pmd_range+0x57/0x80
 [<c013dfdb>] unmap_page_range+0x4b/0x80
 [<c013e10d>] unmap_vmas+0xfd/0x1b0
 [<c0142308>] exit_mmap+0x78/0x140
 [<c0112aec>] mmput+0x2c/0x80
 [<c0156e89>] exec_mmap+0x79/0xf0
 [<c015702a>] flush_old_exec+0xca/0x650
 [<c0156e00>] kernel_read+0x50/0x60
 [<c0174bcb>] load_elf_binary+0x33b/0xc80
 [<c013545e>] buffered_rmqueue+0xbe/0x150
 [<c01356ba>] __alloc_pages+0x1ca/0x360
 [<c01ecbc2>] copy_from_user+0x42/0x80
 [<c0156928>] copy_strings+0x188/0x200
 [<c01577cd>] search_binary_handler+0x5d/0x1b0
 [<c0157aa0>] do_execve+0x180/0x200
 [<c0101c02>] sys_execve+0x42/0xa0
 [<c0102fcf>] syscall_call+0x7/0xb
Code: 26 00 8b 54 24 04 8b 02 f6 c4 08 75 28 83 42 08 ff 0f 98 c0 84 c0 74 11 8b 42 08 40 78 0d 9c 58 fa ff 0d 70 a1 4e c0 50 9d 90 c3 <0f> 0b e3 01 f8 35 3c c0 eb e9 0f 0b e0 01 f8 35 3c c0 eb ce 8d



Reproducible: Sometimes
Steps to Reproduce:
1.Install Gentoo: LiveCD-universal (2004.3).
2.When I compile appears a similar error (I don't remember if exactly the same).
3.I download *2.6.10-r4*.tar.gz, linux-2.6.10.tar.bz2. Compile the kernel and the system boots and works fine.
4. Finally I try to emerge "eagle-usb" and the bug appears. This is a driver for a DSL modem (Sagem Fast 800 E2T). More info from: www.eagle-usb.org. I downloaded the "ebuild" and the ".tar.gz" from there.
5. The final "variable" is than I'm a newbie; I know this should be difficult to reproduce for you ;).

Actual Results:  
I try to prove another Flavors of Gentoo:
www.sysresccd.org
desktop.vidalinux.org
The first one (based in the LiveCD) hangs when do the modprobe.
The second one hangs when do the emerge, but sometimes hangs in earlier stages
of the installation.

Expected Results:  
Nothing. (Maybe I would expect no bug).
Comment 1 José María (Spain) 2005-01-16 14:11:24 UTC
I post this error to the developers of the kernel and I have recived the next email:
>Your system is not broken, this is a known bug.
>Can you check whether 2.6.11-rc1-mm1-jedi1 fixes it?
> 2.6.11-rc1 : ftp://ftp.kernel.org:/pub/linux/kernel/v2.6/testing/
> -mm1 patch : ftp://ftp.kernel.org:/pub/linux/kernel/people/akpm/patches/2.6/
>     -jedi1 :      ftp://ftp.c9x.org/linux-kernel/

Jos
Comment 2 José María (Spain) 2005-01-16 14:11:24 UTC
I post this error to the developers of the kernel and I have recived the next email:
>Your system is not broken, this is a known bug.
>Can you check whether 2.6.11-rc1-mm1-jedi1 fixes it?
> 2.6.11-rc1 : ftp://ftp.kernel.org:/pub/linux/kernel/v2.6/testing/
> -mm1 patch : ftp://ftp.kernel.org:/pub/linux/kernel/people/akpm/patches/2.6/
>     -jedi1 :      ftp://ftp.c9x.org/linux-kernel/

José María
Comment 3 Daniel Drake (RETIRED) gentoo-dev 2005-01-16 15:29:07 UTC
Well..could you please try that?
Comment 4 José María (Spain) 2005-01-17 08:08:05 UTC
Tryed and.... crashed :o(
Comment 5 José María (Spain) 2005-01-17 08:18:37 UTC
Here you have another post from people who is fighting against this bug. He talks about another patch. I won't probe this patch against 2.6.9. I have asked him to make the patch for 2.9.11 so I can make probes with the two guys that are in contact with me trying to eliminate this bug. I'm not a programmer so what comes here is really chinese for me.

> We still do not know; we'd very much like to know.
> 
> It would not be the fault of any userspace program
> (unless they corrupt via /dev/mem or something like that).
> 
> It may be a core kernel problem, but I've searched repeatedly and
> failed.  It may be a driver problem e.g. GregKH's incident suggested
> a problem in DRM, and Andrea has pointed to a worrying ioctl there
> (looks like it could ClearPageReserved too early): I've been halfway
> through following that up for a few weeks now.  Are you using DRM?
> (but the hallmarks in your case are different.)
> 
> It can be caused by somewhere freeing a page it no longer holds;
> but in that case we'd usually expect to see the Bad page state
> error coming from free_pages_check rather than prep_new_page,
> and to be followed by the rmap.c BUG rather than following it.
> 
> It could easily be caused by bad memory bitflipping in a page table
> (but in general, we'd expect to be hearing of swap_free errors,
> or random corruption, if that were generally the case - I think).
> Please give memtest86 a good run to rule out that possibility.
> 
> If memtest86 is satifisfied, would you mind running with the patch
> below (against 2.6.9, suitable for i386 or x86_64, but not suitable
> for the various architectures which use PG_arch_1)?  To give us more
> debug info - it's unlikely to solve the mystery on it's own, but I
> hope it might help us to look in the right direction.  And send me
> any "Bad rmap" and "Bad page state" log entries you find (but
> perhaps this was a one-off, and nothing more will appear).
Comment 6 Daniel Drake (RETIRED) gentoo-dev 2005-01-28 03:49:10 UTC
Any progress on this? Is your discussion on a public mailing list?
Comment 7 Daniel Drake (RETIRED) gentoo-dev 2005-02-09 07:52:42 UTC
Please reopen when you reply to comment #5
Testing 2.6.11-rc3 might be an idea.
Comment 8 José María (Spain) 2005-02-09 11:16:25 UTC
Sorry me. I think I had posted here to close the bug :(.

Finally someone told me to check the RAM  and eureka!!!. My memory was buggy. Once I take out the SIMM I had no more problems with "2.6.11-rc1-mm1-jedi1". Later, I decide to prove the 2.6.10 from gentoo (gentoo-dev-sources) and everything works fine.

I talk with kernel developers and they told me that they don't find no error in "rmap". They think that many of this bugs reported are due to buggy SIMMs.

Jos
Comment 9 José María (Spain) 2005-02-09 11:16:25 UTC
Sorry me. I think I had posted here to close the bug :(.

Finally someone told me to check the RAM  and eureka!!!. My memory was buggy. Once I take out the SIMM I had no more problems with "2.6.11-rc1-mm1-jedi1". Later, I decide to prove the 2.6.10 from gentoo (gentoo-dev-sources) and everything works fine.

I talk with kernel developers and they told me that they don't find no error in "rmap". They think that many of this bugs reported are due to buggy SIMMs.

José María
Comment 10 José María (Spain) 2005-02-09 11:21:14 UTC
I think you should recommend to anyone with similar errors like me to make a HARD test (all the night?) to the memory with "memtest86":
http://www.memtest86.com/memtest86-3.2.iso.zip
(This is a bootable CDROM).

I was 
Comment 11 José María (Spain) 2005-02-09 11:21:14 UTC
I think you should recommend to anyone with similar errors like me to make a HARD test (all the night?) to the memory with "memtest86":
http://www.memtest86.com/memtest86-3.2.iso.zip
(This is a bootable CDROM).

I was ¿lucky? because  I found errors in 15min.

See you
José María
Comment 12 Daniel Drake (RETIRED) gentoo-dev 2005-02-09 11:56:13 UTC
Ok, thanks for letting us know. Bad memory is often a cause of these things.