I have upgraded to linux-2.6.0.gentoo and 2.4.22-gentoo-r2 one week ago (on the same day). Since that, I encounter system freezes regularly when I use the network. This happens on both kernel versions. I have made several test, including: - cpu stress test to ensure that it is not coming from insufficient cooling - memory test using memtest86 for several hours (17 iteration iirc) I observed that the system freezes only if the network is used (browsing, scp). I can reproduce the behaviour by scp-ing the kernel tree from another machine to the affected one. If I scp using the loopback device (127.0.0.1), the problem doesn't occur. I have also exchanged the network card to exclude hardware failure reasons. System freezes on network usage on the new card as well. Behaviour shows up on X and plain console I just emerged mm-sources and did the test again. The problem doesn't seem to exist on this kernel version. I have scp-ied several kernel trees from another machine in parallel for several times without any freezing occuring. I have not yet testes 2.4 vanilla sources, but I believe this would yield to the same result. I believe that there was some patch introduced in the latest gentoo-kernels (in both of them) causing my system freezes... Reproducible: Always Steps to Reproduce: 1.boot my system 2.scp a sufficient bif file or many small ones from antoher machine or optionally use my webbrowser long enough 3. Actual Results: computer freezes completely. Expected Results: not freeze :) System specs: athlon xp nforce2 mobo rtl8139 NIC, nvidia geforce 2gts (nvidia driver)
Created attachment 23230 [details] my kernel config
What did you upgrade from?
I upgraded from 2.6.0-test11-gentoo and 2.4.20-gentoo-r? (sorry, can'T remember the version number, the last before 2.4.22 probably). As I pointed out, the bug applies to both upgrades (before upgrading: no problem; after upgrading: both kernel version freeze).
'cpu stree test' ... why not just use cpuburn ? ;)
Next time I will...I used seti; but there's always something new to learn.
If you can try removing the patches from gentoo-dev-sources one at a time to see exactly which one it is, that would help a lot.
how do I do that? remove them one by one from /usr/portage/distfiles?
can you try gentoo-sources-2.4.22-r5 and let us know if that clears up your problem?
I will try gentoo-sources-2.4.22-r5 and report about it. Meanwhile be informed that 2.6.1-mm re-intorduced the problem (I got rid of it by using 2.6.0-mm, as I already mentioned). I know that is beyond the scope of your responsibility, but maybe it helps...
The last comment is apparently bullshit. I was refering 2.6.1-gentoo, not 2.6.1-mm. Sorry if that caused any confusion...
scp on gentoo-sources-2.4.22-r5 freezes (as gentoo-sources-2.4.22-r2 did)... As a matter of fact kernel-2.6.1-mm DOES freeze (see my yesterday's comment)... kernel-2.6.0-mm works...
Can you see if enabling CONFIG_PREEMPT helps?
CONFIG_PREEMPT seems not to be the cause. I suspected this one too because I remebered the issue with 2.6-test10, but the kernels I've tried so far that showed the behaviour froze regardless of CONFIG_PREEMPT absent or present. The thing that confuses me is that the problem now also occurs with the mm-series kernels (at least with 2.6.1-mm, I haven't tried the latest one) - such as there is a patch that was introduced to the 2.6-gentoo series before it came into 2.6-mm (at version 2.6.1-mm1) and that was at the same time introduced into 2.4-gentoo series as it came into the 2.6-gentoo kernel. Someone asked me before to remove the patches one by one. I have not tried this so far because I don't know how to do that...
if it happens with all those kernels, I'd be tempted to suspect acpi, to test, unpack the tarball of patches (gentoo-dev-sources) and look at the patch names. Set UNIPATCH_EXCLUDE to the name of the acpi patch on the command line when you remerge gentoo-dev-sources. UNIPATCH_EXCLUDE="999-acpi-blah" emerge gentoo-dev-sources and see if that fixes it
I tried it today, but emerge doesn't react to UNIPATCH_EXCLUDE: hydra root 526 (~): UNIPATCH_EXCLUDE="408_acpi-20031203-2.6.0" emerge gentoo-dev-sources Calculating dependencies ...done! >>> emerge (1 of 1) sys-kernel/gentoo-dev-sources-2.6.1 to / >>> md5 src_uri ;-) linux-2.6.1.tar.bz2 >>> md5 src_uri ;-) genpatches-2.6-1.15.tar.bz2 kernel >>> Unpacking source... >>> Unpacking genpatches-2.6-1.15.tar.bz2 to /var/tmp/portage/gentoo-dev-sources-2.6.1/work >>> Unpacking linux-2.6.1.tar.bz2 to /var/tmp/portage/gentoo-dev-sources-2.6.1/work * Applying 125_x86_64_org_patches_2.6.1_rc3-brad.patch... [ ok ] * Applying 151_libata_siliconimage_3112_4_fixes.patch... [ ok ] * Applying 200_r8169-8110S-12172003.patch... [ ok ] * Applying 201_prism54_wlan_01032004.patch... [ ok ] * Applying 202_bcm5700_broadcom_gigabit_drvr_11272003.patch... [ ok ] * Applying 226_ieee1394_updates_01042004.patch... [ ok ] * Applying 227_alsa-1.0.0rc2-2.6.1rc1.patch... [ ok ] * Applying 300_NVIDIA_forcedeth_v20.patch... [ ok ] * Applying 400_bootsplash-3.1.3-2.6.0-test9.patch... [ ok ] * Applying 401_supermount-2.0.3.patch... [ ok ] * Applying 402_i2cisa_remove_dep.patch... [ ok ] * Applying 403_speakup_accessibility.patch... [ ok ] * Applying 405_lirc_infrared-2.6.1-rc1-20040106.patch... [ ok ] * Applying 408_acpi-20031203-2.6.0.patch... [ ok ] * Applying 410_libata_enable_sil.patch... I tried all kind of filenames - with the '.patch' suffix, without it, with version number, without,...
try UNIPATCH_EXCLUDE="408" :)
sorry, doesn't work either..:( hydra root 502 (/home/felix): UNIPATCH_EXCLUDE="408" emerge gentoo-dev-sources Calculating dependencies ...done! >>> emerge (1 of 1) sys-kernel/gentoo-dev-sources-2.6.1 to / >>> md5 src_uri ;-) linux-2.6.1.tar.bz2 >>> md5 src_uri ;-) genpatches-2.6-1.15.tar.bz2 kernel >>> Unpacking source... >>> Unpacking genpatches-2.6-1.15.tar.bz2 to /var/tmp/portage/gentoo-dev-sources-2.6.1/work >>> Unpacking linux-2.6.1.tar.bz2 to /var/tmp/portage/gentoo-dev-sources-2.6.1/work * Applying 125_x86_64_org_patches_2.6.1_rc3-brad.patch... [ ok ] * Applying 151_libata_siliconimage_3112_4_fixes.patch... [ ok ] * Applying 200_r8169-8110S-12172003.patch... [ ok ] * Applying 201_prism54_wlan_01032004.patch... [ ok ] * Applying 202_bcm5700_broadcom_gigabit_drvr_11272003.patch... [ ok ] * Applying 226_ieee1394_updates_01042004.patch... [ ok ] * Applying 227_alsa-1.0.0rc2-2.6.1rc1.patch... [ ok ] * Applying 300_NVIDIA_forcedeth_v20.patch... [ ok ] * Applying 400_bootsplash-3.1.3-2.6.0-test9.patch... [ ok ] * Applying 401_supermount-2.0.3.patch... [ ok ] * Applying 402_i2cisa_remove_dep.patch... [ ok ] * Applying 403_speakup_accessibility.patch... [ ok ] * Applying 405_lirc_infrared-2.6.1-rc1-20040106.patch... [ ok ] * Applying 408_acpi-20031203-2.6.0.patch... [ ok ] * Applying 410_libata_enable_sil.patch... [ ok ] >>> Source unpacked.
wait, what am I saying :) speaking without thinking again. edit the ebuild and within UNIPATCH_LIST="yadda" add 408
gentoo-dev-sources-2.6.1.ebuild doesn't contain something like that. gentoo-dev-sources-2.6.1-r1.ebuild (but I think it's not used because my emerge reports that it uses 1.15 of the patchset, not 1.16) contains this line: UNIPATCH_LIST="${DISTDIR}/genpatches-2.6-${GPV}.tar.bz2" would adding '408' at the end really make sense (if i understood correctly that's what I should do)? Shouldn't I rather remove 408 from UNIPATCH_LIST? Removing the patch from the tar.bz doesn't work (of course) because of the checksum :( Why isn't there something like UNIPATCH_EXCLUDE for ppl like me? It seems really complicated to me to exclude a patch from the patchset...and tireing.
no adding it to the end is the correct way. if it isnt a patch or a tarball it counts as an exclusion. it is the correct way to do it. sorry, I actually wrote the code but didnt even think about how to use it when i last commented!
I have just commited a quick addition to the ebuild so it can not only handle exclusions from UNIPATCH_LIST but also by passing UNIPATCH_EXCLUDE
ok, I did ACCEPT_KEYWORDS="~x86" emerge gentoo-dev-sources (after modifying gentoo-dev-sources-2.6.1-r1.ebuild). The acpi patch was not applied (as expected :)) Unfortunately, it didn't help...still freezing on scp :( If I have time (monday night propably) I will try what happens if I also remove the other patches.
some recent changes to netfilter and other goodies in rc3 might have fixed this for you. we will try and roll out a release asap. please re-open if rc3 (or 2.6.2 final, whichever comes first) doesnt fix it.
I have lately tested the behaviour of 2.6.3-gentoo and 2.6.3-mm1. Both still show the behaviour :( I have made one observation which might help or not: Once I tested with 2.6.3-gentoo (the scp experiment) and another task was running in the background accessing the haddrive (a cronjob doing emerge sync). It seemed to work fine for quite a while (I believe usually the machine would have crashed already in that amount of time). When I discovered which other was task running, I stopped the scp task to prevent corruption of the portage repository. After the cronjob finished I resumed the scp task and guess what... I can't believe I am the only one suffering from this phenomenon - will I ever be able to move on to a newer kernel? btw. I also suspected hdparm parameters to be the cause, but I tried without -X66 option and -u1; didn't help. These are my current settings: disc1_args="-d1 -m16 -c3 -A1"
I know this is old, but does it still happen with a 2.6.4 or 2.6.5 kernel? How about an unpatched vanilla 2.6 kernel?
Two months without word. This bug has rotted away.