After about 24 hours the new kernel caused a hard lockup with the attached stacktrace. The server works normally with gentoo-sources-2.6.31-r10.
I also had lockup issues (though nfs related) with kernel 2.6.32.
Please write if you need more data to debug.
It seems that all applications from "memory" still work but if the hd has to be accessed the related command simply stalls without output (e.g. dmesg works, restarting varnish does not).
Same for server related apps like varnish, apache, qmail etc.. They simply "hang".
Created attachment 238905 [details]
Stacktrace from dmesg about the kernel crash
Created attachment 238907 [details]
Kernel boot information
Created attachment 238909 [details]
Emerge info (server now runs with kernel 2.6.31 again)
It actually seems to be a SOFT lockup because I can still login and do memory related things. Sorry for that. If somebody could change the topic I would be glad. Thanks.
Can you try upgrading to the latest gentoo-sources and see if this behavior continues?
Also, I'm not familiar with reiserfs (which could be connected to your problem judging from your stacktraces) but are the messages in your boot log normal?
The problem is that the computer with the issues is a frequently used server.
So testing is a bit dangerous. I will see what I can do.
(In reply to comment #6)
> The problem is that the computer with the issues is a frequently used server.
> So testing is a bit dangerous. I will see what I can do.
Unfortunately, I can't pinpoint a cause for sure (except that it's caused by I/O writes) and some searches on the nets didn't bring up anything interesting (various Ubuntu bug reports blaming dm-crypt and what not).
If you want to avoid downtime it would be better to post this on the official kernel bugzilla in case someone more experienced can provide you with a solution.
(If you actually do this, don't forget to give us the link of the upstream bug report.)
Created attachment 239441 [details]
Kernel Stack Trace of Kernel 2.6.43-r2
After adding advanced monitoring I can confirm that Kernel 2.6.34-r2 also crashes with this stack trace.
(In reply to comment #5)
> Also, I'm not familiar with reiserfs (which could be connected to your problem
> judging from your stacktraces) but are the messages in your boot log normal?
The reiserfs messages from the boot log result from the last crash with the depicted kernel and resemble journal operations because the file system sync could not be executed, i.e. this is completely normal.
If I should report this upstream, just let me know.
(In reply to comment #10)
> If I should report this upstream, just let me know.
Sorry for the slow reply.
I, indeed, think that the best course of action would be reporting this upstream.
Please submit this upstream and post the url back here. We'll track the upstream bug and backport any patches as needed
Sorry for the delay. I will report the bug upstream as soon as I have some spare time.
Strangely kernel 2.6.31-r10 is also producing soft lockups now, which normally should not occur. However I have no stack trace of such a lockup yet.
It does not seem to be a hardware issue (harddisks etc. seem to be fine) and it only occurs when our backup is executed/after it was executed. The backup saves more than 10GB to NAS which is connected via NFS.
If the backup is disabled no lockups occur.
I will look this over the next days/weeks and report back to this issue.
With the new lockups occuring with kernel 2.6.31 too, I personally do not think that information suffices to report it upstream yet.