I first encountered this error when starting Konqueror or Firefox on sites which load many graphics. My KDE processes start to hang in disk state one after the next until the whole desktop hard locks. If I switch from X to the console I can login as root, but not as user, and check with "ps axuw" that some processes hang in disk state. Switching back to X hard locks the machine, only SSH logins as root are possible then. Reboot hangs. I can do Alt+SysRq+I, ...+S, +U to unmount the partitions and then reboot without filesystem corruptions. CPU-intensive programs like games (Neverwinter Nights, Quake 4, Doom 3) work although they do intensive IO from time to time. But desktop apps which load many small files in a short time (Konqueror, Firefox) make the system block. I first thought it's related to my home partition being on NFSv4 but I got the same problem on my laptop which does not do IO on any NFS mounts. But it is much more harder to reproduce there. Both systems have in common that filesystems are on XFS. The only way to fix it was to go back to gentoo-sources-2.6.22-r9. There have been no unusual syslog or dmesg entries before or during the time the hanging processes appear. Usually only KDE processes hang in disk state. Only one time I saw a login process hanging in disk state. I cannot tell if this happens with other filesystems because all my system are on XFS. If you need more info please tell me. Reproducible: Sometimes Steps to Reproduce: 1. Install gentoo-sources-2.6.23-r3 on a system with XFS filesystems. 2. Run IO intensive applications which do many small IO actions in a short time, e.g. Konqueror or Firefox on web pages with many images Actual Results: Sometimes the web browser gets stuck in disk state, other processes supporting the desktop follow, until the whole desktop stops responding. Expected Results: Should work without problems as it did before gentoo-sources-2.6.23 series. System partitions on XFS, /boot on reiserfs3, /home on NFSv4 on first machine, /home on XFS on second machine. Firefox and Konqueror cache are symlinked to local mounts (XFS). I suppose NFS has nothing to do with this. No unusual dmesg or syslog entries do appear.
Trying the latest gentoo-sources-2.6.23-r6 changes the behaviour but the system still becomes unresponsive. Process now aren't shown to be in disk state when I look at ps or top but the system still comes to a full stop during intensive IO operations (not bulk transfers) as described before. Maybe this helps tracking down the problem to some patches involved during changes from r3 -> r6.
A similar problem was discussed extensively on the upstream mailing lists for 2.6.23.X. Can you test with gentoo-sources-2.6.24-r2 and if the problem persists, please test with the latest development kernel which is 2.6.25-rc2 as of this comment.
(In reply to comment #2) > A similar problem was discussed extensively on the upstream mailing lists for > 2.6.23.X. Do you have a link so I can do some reading? Would be nice... ;-) > Can you test with gentoo-sources-2.6.24-r2 and if the problem persists, please > test with the latest development kernel which is 2.6.25-rc2 as of this comment. I will try the 24 version then and let you know.
> Can you test with gentoo-sources-2.6.24-r2 I have tested with 2.6.24-r3. The problem persists but symptoms changed: Instead of "top" showing processes in disk state, these processes now just hang in sleep state. I did not try to kill the processes. First konqueror stopped responding, later kdesktop stopped responding. After switching to tty and back to X, even X stopped responding (no more ctrl+alt+f# switching to console possible). I did Ctrl+SysRq+S,U,B then. > and if the problem persists, please > test with the latest development kernel which is 2.6.25-rc2 as of this comment. I suppose I need to use vanilla sources for that? Any gentoo specific patches I should apply first?
No specific gentoo patches to apply, just always grab the latest development kernel for testing which is 2.6.25-rc5 at this point.
I still didn't try 2.6.25 yet. But I am running 2.6.24 pretty successful currently after I switched vom CFQ to AS io scheduler. At least I had no lockups since. But I admit I wasn't using my system too intensive since that time.
Thanks for the update. Let us know what the results are if you have chance to try a more intensive test.
Using AS instead of CFQ seemed to completely fix it on my setup with 2.6.24. I assume this is related to changes in CFQ not playing well with NFS/XFS setups. I will try CFQ again when gentoo-sources updates to 2.6.25 stable...
(In reply to comment #8) > I will try CFQ again when gentoo-sources updates to 2.6.25 stable... > Have you had a chance to test with the latest gentoo-sources-2.6.25-rX release?
Please reopen with the results if you're interested in pursuing this further.