Summary: | Network failure with >sys-kernel/hardened-sources-3.4.5 tulip driver (DECchip 21140) | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Felix Tiede <info> |
Component: | Hardened | Assignee: | The Gentoo Linux Hardened Kernel Team (OBSOLETE) <hardened-kernel+disabled> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | kernel, pageexec, spender |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | AMD64 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: | Current 3.6.7-hardened kernel config |
Description
Felix Tiede
2012-11-27 06:13:17 UTC
(In reply to comment #0) > My DECchip 21140 network card seems to not handle incoming packets with > kernels younger than sys-kernel/hardened-sources-3.4.5. I've tried with > hardened-sources-3.5.4 and hardened-sources-3.6.7 - in both versions packets > are transmitted but no incoming packet is handled while it works very well > with hardened-sources-3.4.5 and prior to that. > > I've seen no log messages from the kernel and I suspect this might be an > upstream bug. > > Kernel tulip driver configuration for all affected and unaffected versions: > # grep TULIP .config > CONFIG_NET_TULIP=y > CONFIG_TULIP=m > # CONFIG_TULIP_MWI is not set > CONFIG_TULIP_MMIO=y > CONFIG_TULIP_NAPI=y > # CONFIG_TULIP_NAPI_HW_MITIGATION is not set I don't have a card with that chipset, so I'm going to have to ask you to try a few things: 1) Does this happen with the equivalent vanilla sources? If so can you bracket which version bump it broke under? If it doesn't happen under vanilla, but does under hardened, then 2) Try using CONFIG_PAX_KERNEXEC_PLUGIN_METHOD_BTS for your Return address method under non-exe pages in the PaX config menu, rather than OR. 3) If BTS vs OR makes no difference, then please test the very latest hardened-sources and see if its an issue there. I'll have the very latest from upstream available by the end of the day. If it is, then we'll have to pass stuff along upstream. would be nice to see your config. also as a first try, disable the PaX features that rely on a gcc plugin and see if that changes anything (KERNEXEC/SIZE_OVERFLOW/STACKLEAK/CONSTIFY/LATENT_ENTROPY). Created attachment 330802 [details] Current 3.6.7-hardened kernel config (In reply to comment #2) This is the config which I used to compile 3.6.7 (failing) with. It's basically the same configuration as it was with older versions as I only use 'make oldconfig' for newer kernels. (In reply to comment #1) > 2) Try using CONFIG_PAX_KERNEXEC_PLUGIN_METHOD_BTS for your Return address method > under non-exe pages in the PaX config menu, rather than OR. I am not using CONFIG_PAX_KERNEXEC at all. (And currently I have no idea why, but I'm sure there was a reason when that option was new...) (In reply to comment #1) > I don't have a card with that chipset, so I'm going to have to ask you to > try a few things: > > 1) Does this happen with the equivalent vanilla sources? If so can you > bracket which version bump it broke under? It is at least broken for vanilla-sources-3.6.7 and vanilla-sources-3.6.8 as well. Can't narrow down further as I have to keep the box alive and can't experiment with network down for too long (each cycle takes about 15 minutes). Given this I suspect this is an upstream regression introduced somewhere between 3.4.5 and 3.5.4. Is already open upstream at https://bugzilla.kernel.org/show_bug.cgi?id=48691 - at least that seems like my bug. (In reply to comment #5) > Is already open upstream at > https://bugzilla.kernel.org/show_bug.cgi?id=48691 - at least that seems like > my bug. Given your info above, this should be fixed in hardened-sources-3.7.3 which is based on vanilla 3.7.3. Can you confirm? (In reply to comment #6) > (In reply to comment #5) > > Is already open upstream at > > https://bugzilla.kernel.org/show_bug.cgi?id=48691 - at least that seems like > > my bug. > > Given your info above, this should be fixed in hardened-sources-3.7.3 which > is based on vanilla 3.7.3. Can you confirm? Unfortunately not. I've tried with hardened-sources-3.7.3 and the link worked for about 13.5 minutes (that's what pppd using the link in question went down) and I was unable to revive it. Going back to hardened-sources-3.4.5 with unchanged configuration and the link is as of now up and stable for more than 6 hours. I might have been mistaken with my earlier assumption about this bug and https://bugzilla.kernel.org/show_bug.cgi?id=48691 being the same. (In reply to comment #7) > (In reply to comment #6) > > (In reply to comment #5) > > > Is already open upstream at > > > https://bugzilla.kernel.org/show_bug.cgi?id=48691 - at least that seems like > > > my bug. > > > > Given your info above, this should be fixed in hardened-sources-3.7.3 which > > is based on vanilla 3.7.3. Can you confirm? > > Unfortunately not. > I've tried with hardened-sources-3.7.3 and the link worked for about 13.5 > minutes (that's what pppd using the link in question went down) and I was > unable to revive it. > > Going back to hardened-sources-3.4.5 with unchanged configuration and the > link is as of now up and stable for more than 6 hours. > > I might have been mistaken with my earlier assumption about this bug and > https://bugzilla.kernel.org/show_bug.cgi?id=48691 being the same. Do you have anymore information on this? Have you tried any of the 3.8 series? (In reply to comment #8) > (In reply to comment #7) > > (In reply to comment #6) > > > (In reply to comment #5) > > > > Is already open upstream at > > > > https://bugzilla.kernel.org/show_bug.cgi?id=48691 - at least that seems like > > > > my bug. > > > > > > Given your info above, this should be fixed in hardened-sources-3.7.3 which > > > is based on vanilla 3.7.3. Can you confirm? > > > > Unfortunately not. > > I've tried with hardened-sources-3.7.3 and the link worked for about 13.5 > > minutes (that's what pppd using the link in question went down) and I was > > unable to revive it. > > > > Going back to hardened-sources-3.4.5 with unchanged configuration and the > > link is as of now up and stable for more than 6 hours. > > > > I might have been mistaken with my earlier assumption about this bug and > > https://bugzilla.kernel.org/show_bug.cgi?id=48691 being the same. > > Do you have anymore information on this? Have you tried any of the 3.8 > series? Yes, just so a few hours ago with hardened-sources-3.8.5: Absolute same result as with 3.7.3. The link worked for a short amount of time, went down and not even rebooting the box helped. So my guess is that any version above 3.4.5 does "something" to the card after it lived for some time which kills the link and only booting the older kernel resets this "something" so the link comes back up. I just thought about something: My card is actually a multi-port NIC using a PCI-PCI-Bridge between its four network chips and the system's PCI-bus. Is it possible that this problem is less related to the tulip NIC driver and more a problem with the kernel's PCI subsystem and the driver for that PCI-PCI-bridge? I also do apologize if that actually was the missing bit of information here. Hardware information below: # lspci -n 01:04.0 0604: 1011:0024 (rev 03) # lspci -vvv 01:04.0 PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 03) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 64, Cache Line Size: 16 bytes Bus: primary=01, secondary=02, subordinate=02, sec-latency=64 I/O behind bridge: 0000a000-0000bfff Memory behind bridge: edc00000-eddfffff Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff Secondary status: 66MHz- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ <SERR- <PERR- BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [dc] Power Management version 1 Flags: PMEClk- DSI- D1- D2- AuxCurrent=220mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Bridge: PM- B3+ (In reply to comment #10) > I just thought about something: > My card is actually a multi-port NIC using a PCI-PCI-Bridge between its four > network chips and the system's PCI-bus. Is it possible that this problem is > less related to the tulip NIC driver and more a problem with the kernel's > PCI subsystem and the driver for that PCI-PCI-bridge? > I could be. In comment 4 you say that you hit it with vanilla. Was it ever working and then it broke? If so, git bisect down to the commit that broke it. (In reply to comment #11) > (In reply to comment #10) > > I just thought about something: > > My card is actually a multi-port NIC using a PCI-PCI-Bridge between its four > > network chips and the system's PCI-bus. Is it possible that this problem is > > less related to the tulip NIC driver and more a problem with the kernel's > > PCI subsystem and the driver for that PCI-PCI-bridge? > > > > I could be. In comment 4 you say that you hit it with vanilla. Was it ever > working and then it broke? If so, git bisect down to the commit that broke > it. I'll try. It will take some time as it is my main server and takes a long time for testing as well as I can't take it down for long periods of time. (In reply to Felix Tiede from comment #12) > (In reply to comment #11) > > (In reply to comment #10) > > > I just thought about something: > > > My card is actually a multi-port NIC using a PCI-PCI-Bridge between its four > > > network chips and the system's PCI-bus. Is it possible that this problem is > > > less related to the tulip NIC driver and more a problem with the kernel's > > > PCI subsystem and the driver for that PCI-PCI-bridge? > > > > > > > I could be. In comment 4 you say that you hit it with vanilla. Was it ever > > working and then it broke? If so, git bisect down to the commit that broke > > it. > > I'll try. It will take some time as it is my main server and takes a long > time for testing as well as I can't take it down for long periods of time. Any news here? (In reply to Anthony Basile from comment #13) > > Any news here? I'm going to assume this issue is fixed. I've compiled tulip many times recently, configured as in comment 0 and had no problem. I never did hit the original issue. |