Switching the kernel from 2.6.17-gentoo-r4 to 2.6.17-gentoo-r7 I observed now at my Thinkpad t41 that after a wakeup from a suspend to mem/disk pppoe-start failed and I got in my log filegot that message. Restarting the network (net.eth0 restart) resolved the problem but that behaviour is annoying. n22 ~ # q -i ppp net-dialup/ppp-2.4.3-r16 net-dialup/rp-pppoe-3.8 net-dialup/rppppoek-0.40 n22 ~ # uname -r 2.6.17-gentoo-r7 n22 ~ # q -i ppp net-dialup/ppp-2.4.3-r16 net-dialup/rp-pppoe-3.8 net-dialup/rppppoek-0.40 n22 ~ # cat /etc/conf.d/net # /etc/conf.d/net # config_eth0=( "dhcp" "192.168.0.254/24" ) dhcpcd_eth0="-t 6" dhcp_eth0="nontp" n22 ~ # emerge --info Portage 2.1-r2 (default-linux/x86/2006.1, gcc-3.4.6, glibc-2.3.6-r4, 2.6.17-gentoo-r7 i686) ================================================================= System uname: 2.6.17-gentoo-r7 i686 Intel(R) Pentium(R) M processor 1700MHz Gentoo Base System version 1.12.4 ccache version 2.3 [enabled] app-admin/eselect-compiler: [Not Present] dev-lang/python: 2.4.3-r1 dev-python/pycrypto: 2.0.1-r5 dev-util/ccache: 2.3 dev-util/confcache: [Not Present] sys-apps/sandbox: 1.2.17 sys-devel/autoconf: 2.13, 2.59-r7 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2 sys-devel/binutils: 2.16.1-r3 sys-devel/gcc-config: 1.3.13-r3 sys-devel/libtool: 1.5.22 virtual/os-headers: 2.6.11-r2 ACCEPT_KEYWORDS="x86" AUTOCLEAN="yes" CBUILD="i686-pc-linux-gnu" CFLAGS="-O2 -march=pentium-m -fomit-frame-pointer -pipe" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/X11/xkb /usr/share/config" CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/revdep-rebuild /etc/terminfo" CXXFLAGS="-O2 -march=pentium-m -fomit-frame-pointer -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="autoconfig ccache distlocks metadata-transfer parallel-fetch sandbox sfperms strict" GENTOO_MIRRORS="ftp://mirror.icis.pcz.pl/gentoo/ http://ftp.belnet.be/mirror/rsync.gentoo.org/gentoo/ ftp://ftp.belnet.be/mirror/rsync.gentoo.org/gentoo/ http://ftp-stud.fht-esslingen.de/pub/Mirrors/gentoo/" LINGUAS="de en" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude='/distfiles' --exclude='/local' --exclude='/packages'" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://rsync.de.gentoo.org/gentoo-portage" USE="x86 X aac acpi alsa apache2 arts berkdb bitmap-fonts bzip2 clamav cli crypt css cups dlloader dri exif fam fastbuild ffmpeg fortran gd gdbm gpm ipv6 isdnlog jai java javascript jimi libg++ logrotate mbox mmx mmxext ncurses nls nptl nptlonly nsplugin opengl pam pcre pdf perl ppds pppd python readline real reflection rtc samba session spl sse sse2 ssl subversion tcpd truetype truetype-fonts type1-fonts udev unicode userlocales win32codecs xorg zlib elibc_glibc input_devices_keyboard input_devices_mouse kernel_linux linguas_de linguas_en userland_GNU video_cards_vga video_cards_radeon" Unset: CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, PORTAGE_RSYNC_EXTRA_OPTS
BTW, at home (where dhcp timed out after 6 sec only the static ip address 192.168.0.1 is assigned to eth0) an "ifconfig eth0 down && ifconfig eth0 up" solves the problem.
I wonder, what made you think this network configuration use ppp or rp-pppoe? This should go to kernel or dhcp maintainers, not net-dialup.
Which DHCP client/version are you using?
I have net-misc/dhcpcd-2.0.5-r1 but the problem appears at home, where I use a static ip address and connect via a DSL modem to the internet. And yes, it's rather a problem with the kernel, the following I got from E1000-devel]: the bugreport indicates that you're having pppoe errors or suspend/resume. Your nic is only marginally involved in that sequence. Suspend/resume is supported by the e1000 nic but the kernel code hasn't changed since 2.6.15 or so, so obviously something else is the problem (and not e1000). ... If you read the lkml archives in the last 4 months, you will see that there have been numerous suspend issues with 2.6.17. None of those were related to network cards but all of them to the suspend/resume code. Cheers, Auke
Yep, seems to be a general kernel/network problem, b/c at work (dhcp) I have the same issues. Suspend & resume and I have to restart my network :-( With gentoo-sources-2.6.17-gentoo-r4 I have no problem, the network runs fine after a resume, all of my ssh connection are still up and running, the fish protocol of konqueror runs fine etc.
Reverting patch 4040_e1000-7.1.9-k4.patch at gentoo-sources-2.6.17-r7 resolved all issues.
Is this reproducible on the latest development kernel (currently 2.6.18-rc6)?
(In reply to comment #7) > Is this reproducible on the latest development kernel (currently 2.6.18-rc6)? > Unfortunately, 2.6.18-rc6 shows the same behaviour, I have to restart the network after resume from suspend/hibernate - this bug breaks at least all my opened ssh sessions :-( Shall I try to reverse apply the 4040-* patch to 2.6.18-rc6 to see whether the new kernel works with the old e1000 driver or is that not worth/a needless operation ?
What would really rock is if you were to bisect the change history and find the exact patch which introduces the problem (the patch in gentoo-sources is a collection of many changes). http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/ You can reduce the number of changes to test by specifying paths in the "start" command. You'd probably want to use: # git bisect start drivers/net/e1000 # bit bisect bad v2.6.18-rc6 # git bisect good v2.6.17
(In reply to comment #9) > What would really rock is if you were to bisect the change history and find Ok, I'll do that: tfoerste@n22 ~/devel/linux-git $ git bisect start drivers/net/e1000 tfoerste@n22 ~/devel/linux-git $ bit bisect bad v2.6.18-rc6 bash: bit: command not found tfoerste@n22 ~/devel/linux-git $ git bisect bad v2.6.18-rc6 tfoerste@n22 ~/devel/linux-git $ git bisect good v2.6.17 Bisecting: 28 revisions left to test after this needs some time I think ...
tfoerste@n22 ~/devel/linux-git $ git bisect good cac925a4aab1b7233d3beb591f53498816058a08 is first bad commit
Emaild the following upstream to Auke Kok <auke-jan.h.kok@intel.com> : You are right, the first bisecting result were wrong, sorry for bothering you with a wrong commit hash key. Now I carefully bisected the 3308 revisions between v2.6.17 and v2.6.18-rc6 again and checked twice each result. The last good commit is acfbc9fde2ec7f304398f6ad7644002e07bf84bc. The first bad commit is 2db10a081c5c1082d58809a1bcf1a6073f4db160.
Any response from Auke on this? It would be good if you could CC netdev@vger.kernel.org on future communication.
This was the last msg I sent at 14.09.2006 19:07 to auke (he ask me to try the latest netdev tree): ok,tried it today with tfoerste@n22 ~/devel/linux-netdev-2.6 $ cg-status Heads: ALL eefc351b23ac30484b009dee8601e969b1afc7a3 e100-sbit 0e11584e049d848c76bc0e92171df4f9712e6e99 master 95064a75ebf8744e1ff595e8cd7ff9b6c851523e origin 95064a75ebf8744e1ff595e8cd7ff9b6c851523e >upstream c233289c29369dba7177ca873e9b8ed457af2a78 upstream-fixes 71d28725548be203e8b8f6ad63b1f64fd7f02d4d upstream-greg 357eb4cf75fdb9dbe46b64d50f3670de6a45dc91 upstream-linus 71d28725548be203e8b8f6ad63b1f64fd7f02d4d no success :-(
It would be good if you could file this at http://bugzilla.kernel.org then we'd have something to track
ok, here it is: http://bugzilla.kernel.org/show_bug.cgi?id=7207
Will keep an eye on that
*** Bug 154661 has been marked as a duplicate of this bug. ***
fixed upstream
Fixed in gentoo-sources-2.6.18-r3 (genpatches-2.6.18-4)