I get: <0>Kernel panic - not syncing: Fatal exception in interrupt When using bridges with the 2.6.14-hardened-r8 kernel. I run from a tmpfs system so I cannot get any kernel dump. I have also disabled symbols (since I run everythign from RAM, I want to keep things small) I can easily reproduce it by setting up a ethernet bridge (2 physical NIC's or 1 NIC and a tap interface), assign an ip address to the bridge interface (by dhcp for example) and then generate some traffic. After a while (up to a couple of hours) it crashes. The funny thing is that I tried to run traffic through the bridge, without assigning any ip to br0, and then it was running all through the night without crashing. Any ideas how to track this down? It seems like there have been problems with bridges before: http://www.archivesat.com/Gentoo_Linux_hardened/thread666661.htm
Created attachment 89374 [details] kernel config I attatched my kernel config.
Created attachment 89516 [details, diff] hardened-sources-2.6.14-r8-br_netfilter.patch The patch fixes the problem. I have dumping random data over the bridge (and to the brindge interface) over 24 hours whithout kernel panic. The code in the patch is what is found in upstream releases too so it should be good. (for example 2.6.16-gentoo-r10)
(In reply to comment #2) > The code in the patch is what is found in upstream releases too so it should be > good. (for example 2.6.16-gentoo-r10) So in otherwords this bug is effects all kernels and not just hardened? If so we need to get the right kernel guys on the job so it can be included in genpatches.
I have not had time to investigate futher, but according to the mailing list link I mention above, its a patch in hardened sources that appeared in 2.6.14-hardened-r6 (1431_15.4_bridge-netfilter-race.patch) provokes the problem. Also see http://bugzilla.kernel.org/show_bug.cgi?id=5803 (seems like the problem 1431_15.4_bridge-netfilter-race.patch is supposed to fix is fixed in vanilla 2.6.15.4) But yes, it would great to have the right kernel guy to fix the bridges in all kernels that is affected.
How is the status here? Am I the only one who uses bridges with "stable" hardened? I have been runing a kernel with the previosly posted patch and pumping random data over a bridge for 8 days (almost 9 now), so I guess it fixes the problem. I increased the severity to major because a kernel that crashes is kind of useless.
johnm, Please review and apply.
(In reply to comment #5) > How is the status here? Am I the only one who uses bridges with "stable" > hardened? Nope, you're not the only one. I've been chasing this problem on a server I upgraded yesterday and it took me until now to put everything together -- then I found your post. :-) I am having the exact problem as you on the exact version of hardened kernel using a bridged NIC (TAP) interface. I just upgraded the kernel and system libs this week and now I'm having panics within 2-16 hours of uptime. I found another user in the forums having a similar issue with the same kernel -- he had to drop bridging on his side to fix it (something I cannot do). I can try the patch but I'd rather get a status on when/if/how this patch will be included into an official release before I go through the effort. Any updates folks? > I increased the severity to major because a kernel that crashes is kind of > useless. Agreed, but I suggest going one higher with a level of "critical", defined as "crashes, loss of data, severe memory leak". Our production system is crashing on a regular basis due to this issue. I've had to revert to an older version of the kernel that is no longer in-tree, but that kernel has problems with other parts of the networking layer that I need to work around... I'm kind of stuck at this point.
Since I depend on ipsec (openswan) and shorewall I will also need the ipt_policy match which comes in 2.6.16 series so I'm waiting for stabilizing the 2.6.16. That depends on grsecurity 2.1.9 release. (#138453) I'm raising the severity to critical anyway.
In talking with pageexec@ there is only a minor bug in the pax part of the kernel that prevented this from really going stable on x86,amd64 in that if no PaX is enabled the kernel wont boot. I know he has no intention of working on .16 anymore with .17 here now.
closing this bug since its fixed in newer versions.