Summary: | gentoo-dev-sources-2.6.9 does not rectify nasty VM/kswapd issue in mainline | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | kfm |
Component: | [OLD] Core system | Assignee: | Daniel Drake (RETIRED) <dsd> |
Status: | VERIFIED FIXED | ||
Severity: | major | CC: | andre.hinrichs, kernel |
Priority: | High | Keywords: | InVCS |
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | http://ck.kolivas.org/patches/2.6/2.6.9/2.6.9-ck2/2.6.9-ck2.eml | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
kfm
2004-10-26 15:30:34 UTC
I'm currently waiting for the patch to make it into upstream 2.6.10 tree, then I'll add it to our patchset. It hasn't been applied by Linus yet. However, there has been a patch applied which looks like it might be the same fix in a different way.. perhaps you could revert the one you posted and see if this one helps: http://linux.bkbits.net:8080/linux-2.6/diffs/mm/vmscan.c@1.231?nav=index.html|src/|src/mm|hist/mm/vmscan.c http://linux.bkbits.net:8080/linux-2.6/cset@1.2263 It was merged earlier today. Will include in future gentoo-dev-sources release. Thank you very much, both for the rapid response and heads-up. I notice that you have not marked the bug as closed; if you discover any more information prior to closure that could be relevant to the issues raised here I would be most grateful if you could post again on this bug (time, energy and inclination permitting of course as it is of great interest to me, at least ;). Cheers. It will be closed once we release a new gentoo-dev-sources version containing this patch. In gentoo-dev-sources-2.6.9-r2 Thanks, Daniel. I'm using 2.6.9-gentoo-r8 now and still have this problem. Especially on a notebook this is a nasty problem. Don't know if this kernel is already patched. Can you please define "this problem" - there are a few mentioned on this bug Sure. I've a Dell Inspiron 8000 Notebook with Gentoo on it. Unfortunately, I need some M$ Win programs so I've installed vmware on it. Most time I start this virtual machine the kswapd0 process takes lots of CPU load. The RAM is not fully used. I've added the top of top at the end. Current kernel is 2.6.9-gentoo-r8 top - 17:19:50 up 2:03, 4 users, load average: 1.21, 1.10, 1.09 Tasks: 126 total, 2 running, 124 sleeping, 0 stopped, 0 zombie Cpu(s): 3.2% us, 92.6% sy, 0.0% ni, 2.4% id, 1.7% wa, 0.1% hi, 0.1% si Mem: 514552k total, 469580k used, 44972k free, 7340k buffers Swap: 987988k total, 36184k used, 951804k free, 360164k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 38 root 25 0 0 0 0 R 95.2 0.0 111:36.02 kswapd0 Could you please test development-sources-2.6.10-rc2 and see if the problem exists there? Firstly, I wonder if you're using any experimental kernel features such as 4k stacks or "Use register arguments". Not that I know of any possible side effect, but 4k stacks in particular change the way in which the VM works. With proprietary software such as vmware, it's best to stick to a "regular" configuration for the testing case. The sources do include the patch mentioned in this bug. Can you confirm that it is a problem that (1) does not occur in 2.6.8.1 (2) does *or* doesn't occur in 2.6.10-rc2? I've started using Alan Cox's 2.6.9 branch as a basis for my kernels because he seems to be focussing on bug fixing/stabilisation in general. I'd be interested to know if it happens in 2.6.9-ac11 also (2.6.9-ac12 is experimental by his standards). Perhaps, if it transpires that it does not occur in one of the other (newer) branches, it might be worth tring to isolate the change that fixes the problem and backporting it. Then again, maybe it's one of those corner cases and you might be better off just waiting for the situation to settle (and using 2.6.8.1 in the meantime). Another suggestion is to try using the "mapped watermark" patches from the 2.6.9 -ck set, which seem to regulate swap usage pretty effectively (at least for desktop systems). It's been a while since I've used vmware but I recall that it stresses the system very hard! It may or may not help. --- http://ck.kolivas.org/patches/2.6/2.6.9/2.6.9-ck3/patches/mwII.diff http://ck.kolivas.org/patches/2.6/2.6.9/2.6.9-ck3/patches/mwII-oc.diff One other thing: I noticed before that if you're not using a real partition for a host's virtual disk, then vmware seems to be quite sensitive to the filesystem being used. In particular, it really seems to stink with reiserfs! I'm aware that that shouldn't pertain to the swap issue but thought it worthy of mention. Did some testing with different kernels. First of all let me say, that I do not use the 4k stack option. The problem does NOT occur with 2.6.8.1 The problem is still existent with 2.6.10-rc3 Haven't tried the "watermark patches". Will try to do so next week if possible. vmware is NOT used with its own partition! So comment #12 might be an issue. I decided to do so because of easier backups... As this issue is also in upstream, nothing we can do here in the gentoo tree. Please open a bug at bugzilla.kernel.org for this. Andre: Perhaps you could try this patch http://marc.theaimsgroup.com/?l=linux-kernel&m=110357628419245&w=2 I think it solves the issue you are describing Daniel: thanks - that patch is good! I took your gentoo-dev-sources-2.6.9-r12 release and added 5 good patches that were applied upstream at some point or another, with the exception of the first: * The "1G lowmem" patch from -ck (well, I have exactly 1G RAM). * The aforementioned "include total_scanned" patch from Andrew Morton. * A fix from Jens Axboe to prevent blk_recalc_rq_segments from indulging in bad segment coalescing (due to not taking ->max_segment_size into account). * A fix from Arjan van de Ven to change the "hysteresis" for the queue congestion to be an additional 1/16th of the number of requests. * A fix from Marcelo Tossati to limit the amount of memory which is under pageout writeout to be a little more than the amount of memory at which balance_dirty_pages() callers will synchronously throttle. Apparently, this prevents a simple dd operation from driving the system nuts. Despite 2.6.9 being the only 2.6 kernel ever to cause a catastrophic crash here (on the first occasion that I tried it), I took the plunge and rebooted my main server with this kernel. It's not been up for long yet, hence I am still keeping a close eye on things. Nonetheless, performance is great and none of the usual oddities that I have come to associate with 2.6.9 have made themselves apparent - particularly with respect to the numerous OOM/swap/VM issues that have been under broad discussion of late. Having said that, later posts in the thread that you linked to still hint at problems under certain circumstances. Once can only hope that they've been adequately resolved in the newly released 2.6.10 ;) Anyway, thanks for your insights - Merry Christmas. The pages_scanned fix is now in Linus' tree and will be included in gentoo-dev-sources-2.6.10-r3 Splendid news and not before time I might add ;) Thanks for the update. |