After upgrading the kernel of my PowerBook G4 to 3.2.1-gentoo-r2, sometimes the system will hang after an oops in the sungem driver. I also had this issue when I tried kernel 3.1.6-gentoo, but never when I tried kernel 3.0.6-gentoo. Reproducible: Sometimes Steps to Reproduce: 1. Boot the system. 2. Pray. Actual Results: Sometimes, the system oopses in the sungem driver and hangs shortly after. At other times it works fine. Expected Results: System should never oops and hang.
Created attachment 300469 [details] Kernel configuration for the failing kernel.
Can you post the oops with debug information turned on?
Alright, I did some more testing, and I found what causes the oops to occur. It only happens when the system is on AC power at the time of booting, the system boots fine when it is running on battery at boot time. Nothing seems to happen when I plug the AC in after the system has booted succesfully. I've also confirmed that this happens with a vanilla 3.2.11 kernel as well. Here is the oops with a 3.2.1-gentoo-r2 kernel (typed over manually): Oops: Machine check, sig: 7 [#1] PowerMac Modules linked in: snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss reiserfs ext3 jbd dm_mod loop generic_nvram apm_emu apm_emulation usbhid hid appletouch cryptomgr crypto_hash aead pcompress crypto_blkcipher arc4 crypto_algapi nouveau b43 mac80211 snd_aoa_codec_tas snd_aoa_fabric_layout snd_aoa cfg80211 crypto rng_core ttm fbcon font bitblit softcursor ochi_hcd drm_kms_helper drm evdev firewire_ohci snd_aoa_i2sbus rtc_generic firewire_core sungem snd_pcm i2c_powermac sr_mod sungem_phy ehci_hcd crc_itu_t pmac_zilog cdrom serial_core snd_timer usbcore snd_page_alloc snd fb hwmon soundcore backlight snd_aoa_soundbus i2c_algo_bit usb_common i2c_core ssb nls_base cfbcopyarea cfbimgblt cfbfillrect uninorth_agp agpart unix ext2 sd_mod pata_macio libata scsi_mod NIP: f199c53c LR: f199c63c CTR: f199c5cc REGS: efb85ce0 TRAP: 0200 Not tainted (3.2.1-gentoo-r2) MSR: 00149030 <EE,ME,IR,DR> CR: 42002022 XER: 20000000 TASK = eebb5200[1332] 'mii-tool' THREAD: efb84000 GPR00: a9b520f5 efb85d90 eebb5200 c10ed3c0 60020000 00000000 0000000c 00000001 GPR08: 00000000 f1a0620c 00000002 00000006 80000022 1001b198 00000040 00000000 GPR16: 100bc358 00000040 1089f210 108a8800 100c0000 ffffffff 100131b0 00000000 GPR24: bfe51618 00000000 00000000 efb85e00 f19a1754 efb85e00 c10ed000 c10ed3c0 NIP [f199c53c] __phy_read+0x44/0xd4 [sungem] LR [f199c63c] gem_ioctl+0x70/0xc4 [sungem] Call trace: [efb85d90] [c02c77ac] 0xc02c77ac (unreliable) [efb85db0] [f199c63c] gem_ioctl+0x70/0xc4 [sungem] [efb85dc0] [c01bca38] dev_ifsioc+0x1a8/0x3c8 [efb85df0] [c01bd150] dev_ioctl+0x4f8/0x768 [efb85e80] [c01a5fb4] sock_ioctl+0x90/0x2b8 [efb85ea0] [c00cc9c8] do_vfs_ioctl+0xa4/0x754 [efb85f10] [c00cd0b8] sys_ioctl+0x40/0x88 [efb85f40] [c0011a3c] ret_from_syscall+0x0/0x38 --- Exception: c01 at 0xff68b1c LR = 0xff68a84 Instruction dump: 7c842b78 93e1001c 7c7f1b78 90010024 81230000 3929620c 7c0004ac 7c804d2c 81230000 3929620c 7c0004ac 7c004c2c <0c000000> 4c00012c 74090001 40820050 ---[ end trace 0f53d7bba7f41b67 ]--- The system becomes unusable after this; it does still respond somewhat but the boot process never finishes.
Can you replicate with CONFIG_DEBUG_INFO=y and paste the oops here? You can take a picture with a digital camera if that is easier. This is found under: kernel hacking
I've reproduced the bug with CONFIG_DEBUG_INFO=y. Unfortunately, apart from making the kernel image and initramfs about ten times larger and making the system quite slow to boot, this didn't do anything. The oops is exactly the same down to the letter, with no extra information whatsoever. I've also confirmed that this bug still exists in 2.6.12-gentoo.
Since I can also reproduce this on a vanilla kernel, I filed bug #42992 on kernel.org. https://bugzilla.kernel.org/show_bug.cgi?id=42992
We'll follow the upstream bug and attempt to backport any feasible patch identified