Summary: | sys-kernel/usermode-sources-2.6.14: kernel panic when assigning more than 256M | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Toralf Förster <toralf> |
Component: | Current packages | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | blaisorblade_spam, user-mode-linux-devel |
Priority: | High | ||
Version: | 2005.0 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
diff between working/non working config
unified diff uml kernel config file |
Description
Toralf Förster
2005-11-05 10:14:29 UTC
Could you please try vanilla-sources-2.6.13.x and see if the bug exists there. Using linux-2.6.13.4 the systems hangs at 100% cpu after : tfoerste@n22 ~/workspace/local_bin $ start_uml.sh Checking for /proc/mm...not found Checking for the skas3 patch in the host...not found UML running in SKAS0 mode Checking PROT_EXEC mmap in /tmp...OK Kernel virtual memory size shrunk to 254803968 bytes Linux version 2.6.13.4 (root@n22) (gcc version 3.3.6 (Gentoo 3.3.6, ssp-3.3.6-1.0, pie-8.7.8)) #1 Sat Nov 5 22:18:43 CET 2005 Built 1 zonelists Kernel command line: ubda=/opt/uml/root_fs ubdb=/opt/uml/swap_fs eth0=tuntap,,,192.168.0.254 mem=256M root=98:0 PID hash table entries: 2048 (order: 11, 32768 bytes) Dentry cache hash table entries: 65536 (order: 6, 262144 bytes) Inode-cache hash table entries: 32768 (order: 5, 131072 bytes) Memory: 253568k available Mount-cache hash table entries: 512 Checking for host processor cmov support...Yes Checking for host processor xmm support...No Checking that ptrace can change system call numbers...OK Checking syscall emulation patch for ptrace...missing Checking that host ptys support output SIGIO...Yes Checking that host ptys support SIGIO on close...No, enabling workaround Checking for /dev/anon on the host...Not available (open failed with errno 2) NET: Registered protocol family 16 mconsole (version 2) initialized on /home/tfoerste/.uml/toralf/mconsole ubd: Synchronous mode UML Audio Relay (host dsp = /dev/sound/dsp, host mixer = /dev/sound/mixer) Netdevice 0 : TUN/TAP backend - IP = 192.168.0.254 Coda Kernel/Venus communications, v6.0.0, coda@cs.cmu.edu NTFS driver 2.1.23 [Flags: R/O]. Initializing Cryptographic API io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize tun: Universal TUN/TAP device driver, 1.6 tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com> NET: Registered protocol family 2 IP route cache hash table entries: 4096 (order: 2, 16384 bytes) TCP established hash table entries: 16384 (order: 5, 131072 bytes) TCP bind hash table entries: 16384 (order: 4, 65536 bytes) TCP: Hash tables configured (established 16384 bind 16384) TCP reno registered TCP bic registered NET: Registered protocol family 1 NET: Registered protocol family 17 Initialized stdio console driver Console initialized on /dev/tty0 Initializing software serial port version 1 ubda: unknown partition table ubdb: unknown partition table kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. VFS: Mounted root (ext3 filesystem) readonly. line_ioctl: tty0: ioctl KDSIGACCEPT called INIT: version 2.86 booting line_ioctl: tty0: ioctl TIOCLINUX called Gentoo Linux; http://www.gentoo.org/ Copyright 1999-2005 Gentoo Foundation; Distributed under the GPLv2 * Mounting proc at /proc ... [ ok ] * Mounting sysfs at /sys ... [ ok ] * Mounting /dev for udev ... [ ok ] * Configuring system to use udev ... * Setting /sbin/udevsend as hotplug agent ... [ ok ] * Mounting devpts at /dev/pts ... [ ok ] * Activating (possible) swap ...Adding 262136k swap on /dev/ubdb. Priority:-1 extents:1 Paolo, any ideas here? 2.6.14-bs1 produces a panic: Kernel panic - not syncing: copy_context_skas0 : failed to wait for SIGUSR1/SIGTRAP, pid = 31380, n = 31380, errno = 0, status = 0xb7f This seems to have been introduced some time after 2.6.13 (which doesn't work, but at least gets further). It's funny, but using sys-kernel/gentoo-sources-2.6.14-r2 as sources and starting the linux executable with strace the UML brought up at least a console in a xterm as expected, but the halt command was bad: * Configuring system to use udev ... * Setting /sbin/udevsend as hotplug agent ... [ ok ] * Mounting devpts at /dev/pts ... [ ok ] * Activating (possible) swap ...Adding 262136k swap on /dev/ubdb. Priority:-1 extents:1 across:262136k [ ok ] * Checking root filesystem .../dev/ubda: clean, 197933/655360 files, 832912/1310720 blocks [ ok ] * Remounting root filesystem read/write ... [ ok ] * Setting hostname to n22_uml ... [ ok ] * Checking all filesystems ... [ ok ] * Mounting local filesystems ... [ ok ] * Activating (possibly) more swap ... [ ok ] * Setting system clock to hardware clock [UML] ... [ ok ] cannot set up thread-local storage: cannot set up LDT for thread-local storage cannot set up thread-local storage: cannot set up LDT for thread-local storage cannot set up thread-local storage: cannot set up LDT for thread-local storage INIT: Entering runlevel: 3 cannot set up thread-local storage: cannot set up LDT for thread-local storage cannot set up thread-local storage: cannot set up LDT for thread-local storage cannot set up thread-local storage: cannot set up LDT for thread-local storage cannot set up thread-local storage: cannot set up LDT for thread-local storage cannot set up thread-local storage: cannot set up LDT for thread-local storage login(pam_unix)[507]: session opened for user root by LOGIN(uid=0) INIT: Switching to runlevel: 0 INIT: Sending processes the TERM signal login(pam_unix)[507]: session closed for user root cannot set up thread-local storage: cannot set up LDT for thread-local storage cannot set up thread-local storage: cannot set up LDT for thread-local storage * Deactivating swap ... [ ok ] * Unmounting filesystems ... [ ok ] * Remounting remaining filesystems readonly ...cannot set up thread-local storage: cannot set up LDT for thread-local storage /etc/init.d/halt.sh: line 187: 580 Segmentation fault sleep 1 [ ok ] Power down. [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 25716 rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0 --- SIGCHLD (Child exited) @ 0 (0) --- waitpid(-1, 0xbfee0608, WNOHANG) = -1 ECHILD (No child processes) sigreturn() = ? (mask now [RTMIN]) rt_sigaction(SIGINT, {SIG_DFL}, {0x80786d0, [], 0}, 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [RTMIN], 8) = 0 read(255, "\n\n##############################"..., 1632) = 118 rt_sigprocmask(SIG_BLOCK, NULL, [RTMIN], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [RTMIN], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [RTMIN], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [RTMIN], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [RTMIN], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [RTMIN], 8) = 0 rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0 exit_group(0) = ? Any joy with gentoo-sources-2.6.15? no, neither with gentoo-sources nor with straight vanilla-sources 2.6.15, host kernel is vanilla 2.6.15, it stops after : ... FS: Mounted root (ext3 filesystem) readonly. line_ioctl: tty0: ioctl KDSIGACCEPT called INIT: version 2.86 booting line_ioctl: tty0: ioctl TIOCLINUX called Gentoo Linux; http://www.gentoo.org/ Copyright 1999-2005 Gentoo Foundation; Distributed under the GPLv2 * Mounting proc at /proc ... [ ok ] * Mounting sysfs at /sys ... [ ok ] * Mounting /dev for udev ... [ ok ] * Configuring system to use udev ... * Setting /sbin/udevsend as hotplug agent ... [ ok ] * Mounting devpts at /dev/pts ... [ ok ] * Activating (possible) swap ...Adding 262136k swap on /dev/ubdb. Priority:-1 extents:1 across:262136k [ ok ] Created attachment 76850 [details]
diff between working/non working config
Yep, joy factor increased significantly,
now it works using straight vanilla 2.6.15 as host kernel and linux-2.6.15-gentoo as UML kernel, changed .config (diff attached) until I got the UML up and running.
Can you post a unified diff (diff -u) between the configs? They are much easier to read and show which file is which. Created attachment 76886 [details, diff]
unified diff
BTW, IMO the bug seems to be related rather to the file system stuff. The processor type seems not to be the root cause b/c I've checked this.
Even with the new config sometimes the uml kernel didn't came up. After removing the option mem=256M from the uml command line this problems went away. For comment #1 and comment #3: 2.6.14-latest bs and 2.6.15 should have solved that bug. There were all kind of miscompilations in arch/um/kernel/skas/clone.c (at a point the whole stack content is invalidated, and we couldn't explain clearly this to GCCs). Now we've found a way to make this compile reliably to what we mean. Comment #4: either you're using a NPTL guest fs (which doesn't work yet, we've almost fixed this) or the problem is one which is known to be fixed in 2.6.14-bs3 and 2.6.15 (and I think we're in the latter case). The real bug in the trace is: cannot set up thread-local storage: cannot set up LDT for thread-local storage Comment #2, #6 and #9: yes, the diff says this very clearly. I see a suspicious CONFIG_CODA_FS - probably not lots of people mixed it with UML. Also, I don't see the full config but I'd check to disable CONFIG_SMP and CONFIG_HIGHMEM at the very least. Attach the full config please. As a side note and an excusation for the problems - this high instability (which has been excessive) was brought in by the introduction of SKAS0 mode. It didn't replace TT or SKAS3 modes, which are still available (though nobody is going to use TT now and it'll become buggy). But it was the default mode (it was fast), while it still gave a number of problems (the ones you see) when used widely. We've now fixed all them, it seems. The reason to introduce it was that SKAS0 is much faster than TT. Just seen latest comment - ok, attach the config (can be got with ./linux --showconfig) and let's look for strange options. I'd suggest retesting 2.6.15 after doing "make defconfig ARCH=um" and adjusting only defaults on that (no, the config doesn't start with the default one, often - Kbuild often picks the host one if it's in /boot). It's more reasonable a random crash than a FS-related one. Created attachment 76974 [details] uml kernel config file The attached .config is _not_ working using the command line: $>/usr/src/uml/linux ubda=/opt/uml/root_fs ubdb=/opt/uml/swap_fs eth0=tuntap,,,192.168.0.254 mem=256M umid=toralf It stops after ... Gentoo Linux; http://www.gentoo.org/ Copyright 1999-2005 Gentoo Foundation; Distributed under the GPLv2 * Mounting proc at /proc ... [ ok ] * Mounting sysfs at /sys ... [ ok ] * Mounting /dev for udev ... [ ok ] * Configuring system to use udev ... * Setting /sbin/udevsend as hotplug agent ... [ ok ] * Mounting devpts at /dev/pts ... [ ok ] * Activating (possible) swap ...Adding 262136k swap on /dev/ubdb. Priority:-1 extents:1 across:262136k [ ok ] The same excutable is working using nearly the same command line but _without_ option "mem=256M". The only setting in the .config which is untested is: CONFIG_MPENTIUMM=y give a try to CONFIG_M386. Also, test disabling CONFIG_MODE_TT to see if it helps. Additionally, since the problem is with mem=256M, I would check the first entry in this FAQ. The error message is different, but that could be due to internal changes in UML, but I would still check that. http://uml.harlowhill.com/index.php/Troubleshooting#probsrunning Somebody: please append "when assigning more than 256M" to the subject. Hi, has no effect: echo 262144 > /proc/sys/vm/max_map_count switching to CONFIG_M686=y works ! (even with mem=256M) switching to CONFIG_MODE_TT not tested wwitching to CONFIG_M386 I got a compile error: /usr/src/uml # make CC='ccache gcc' ARCH=um linux CHK arch/um/include/uml-config.h UPD arch/um/include/uml-config.h ccache gcc -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -ffreestanding -Os -fomit-frame-pointer -D__arch_um__ -DSUBARCH=\"i386\" -Dvmap=kernel_vmap -Din6addr_loopback=kernel_in6addr_loopback -Iarch/um/include -I/usr/src/linux-2.6.15-gentoo/arch/um/kernel/tt/include -I/usr/src/linux-2.6.15-gentoo/arch/um/kernel/skas/include -D_FILE_OFFSET_BITS=64 -march=i386 -mpreferred-stack-boundary=2 -D_GNU_SOURCE -D_LARGEFILE64_SOURCE -S -o arch/um/user-offsets.s arch/um/sys-i386/user-offsets.c CHK arch/um/include/user_constants.h CHK include/linux/version.h UPD include/linux/version.h SYMLINK include/asm -> include/asm-um SPLIT include/linux/autoconf.h -> include/config/* ccache gcc -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -ffreestanding -Os -fomit-frame-pointer -D__arch_um__ -DSUBARCH=\"i386\" -Iarch/um/include -I/usr/src/linux-2.6.15-gentoo/arch/um/kernel/tt/include -I/usr/src/linux-2.6.15-gentoo/arch/um/kernel/skas/include -Dvmap=kernel_vmap -Din6addr_loopback=kernel_in6addr_loopback -Derrno=kernel_errno -Dsigprocmask=kernel_sigprocmask -fno-unit-at-a-time -U__i386__ -Ui386 -march=i386 -mpreferred-stack-boundary=2 -D_LARGEFILE64_SOURCE -Wdeclaration-after-statement -nostdinc -isystem /usr/lib/gcc/i686-pc-linux-gnu/3.4.4/include -D__KERNEL__ -Iinclude -include include/linux/autoconf.h -S -o arch/um/kernel-offsets.s arch/um/sys-i386/kernel-offsets.c In file included from include/asm/atomic.h:9, from include/linux/spinlock.h:232, from include/linux/capability.h:45, from include/linux/sched.h:7, from arch/um/sys-i386/kernel-offsets.c:3: include/asm/arch/atomic.h: In function `atomic_add_return': include/asm/arch/atomic.h:192: error: structure has no member named `x86' make: *** [arch/um/kernel-offsets.s] Error 1 > switching to CONFIG_MODE_TT not tested I suggested switching if off (just to make it clear). > M386 gives error Thanks, will fix this. Forgot one question, and sorry, I don't want be be unkind - did you remember to test the CONFIG_m686 case (the working one) with more than one boot? >with more than one boot? oops, with M686 it doesn't work reliable, now it does not boot even if it started before ?!? >> switching to CONFIG_MODE_TT not tested >I suggested switching if off (just to make it clear) You're right, this is the root cause. Now I can start the uml kernel either with CONFIG_M686 as well as with CONFIG_MPENTIUMM and/or changing a lot of other kernel option, the uml kernel now starts under all circumstances tested until now, and with the option mem=256M :-) >I don't want be be unkind heh, null problemo :-) Bump. Any news on these stability issues? The report says switching "CONFIG_MODE_TT" off helped. And currently, there is no known reason for which a user should need to enable TT mode. So, this bug can be IMHO considered RESOLVED (since it's a configuration not recommended, it would turn out to be RESOLVED INVALID, likely). Toralf, are you happy with the resolution of turning TT mode off? >Toralf, are you happy with the resolution of turning TT mode off? Oh yes, of course, the UML works now fine and the option isn't really needed. I close the bug. Only one remark: >since it's a configuration not recommended The kernel menuconfig help says: "Normally, this should be set to Y" > The kernel menuconfig help says: "Normally, this should be set to Y"
Yep, good note; however likely this has already been fixed.
|