The ZFS/Lustre developers discovered that __pte_alloc_kernel() does not honor gfp flags passed to vmalloc()[0], but they opted to set PF_MEMALLOC in their code to workaround it. That is a caused problems[1], so we need a proper fix. There has been a patch for it sitting in RedHat's bug tracker[2] for more than a year, but no one is doing anything to upstream it. Is there anything we can do? 0: http://marc.info/?l=linux-mm&m=128942194520631&w=4 1: https://github.com/zfsonlinux/spl/issues/116 2: https://bugzilla.kernel.org/show_bug.cgi?id=30702
Created attachment 312319 [details, diff] Prasad Gajanan Joshi's 2011-03-14 patch from the RedHat bug tracker
I will follow this upstream bug and backport any approved patch that makes it into Linus's tree.
In that case, I am going to reassign this bug to myself. If I am not mistaken, the Linux 3.5 merge window is open. I want to get this fixed before it closes.
Created attachment 314493 [details, diff] Updated patch for Linux 3.4 I have updated Prasad Gajanan Joshi patch from the RedHat bug tracker for Linux 3.4. I plan to send it upstream for review from more experienced kernel developers.
Created attachment 314501 [details, diff] Patch against Linux 3.5-rc1 I am attaching a patch against Linux 3.5-rc1. This patch has a proper GIT commit message.
We meet on the FreeNode #zfsonlinux channel (Orfheo) and you suggested me to test this patch, under gentoo, with the vanilla kernel 3.4.0. I did and, beside a minor problem applying the patch, I was able to compile the vanilla 3.4.0 kernel with your patch 314493 on an old machine (P4 with around 2Gb), where you think the problem was present. It looks the patch fix the problem with both the vanilla end the gentoo-sources 3.4.0 kernels (for both it applies with the same problem): the machine was able to keep against my test for hours, while before was hanging in a while. This with updated (from your overlay) zfs, spl and a PREEMPT kernel. Hope it helps.
Tested the same setup of my previous "Comment 6" with a NO PREEMPT 3.4.0 kernel, with the same patch, on the same machine. No hangs, no error in dmesg.
Patching sys-kernel/openvz-sources-2.6.32.53.5 with the patch on comment 1 on this bug results with http://pastebin.com/VN0gWWVN .
Created attachment 319656 [details, diff] Updated patch against Linux 3.5.0 I still need to find time to implement support for 6 architectures implemented in mainline, but I am posting an updated patch against Linux 3.5.0. This eliminates fuzz when applying against sys-kernel/vanilla-sources-3.5.0
I have rewritten the parts of sys-fs/spl that caused ZFS to require this patch and opened an upstream pull request: https://github.com/zfsonlinux/spl/pull/147 I will merge those patches into a revision bump of sys-fs/spl unless a serious regression is found in the next week.
(In reply to comment #10) > I have rewritten the parts of sys-fs/spl that caused ZFS to require this > patch and opened an upstream pull request: > > https://github.com/zfsonlinux/spl/pull/147 > > I will merge those patches into a revision bump of sys-fs/spl unless a > serious regression is found in the next week. I am afraid that patch was found to introduce external memory fragmentation issues in practice, so I am forced to withdraw it.
A workaround that eliminates the need for a kernel patch has been committed to ZFSOnLinux GIT. It will be in the 0.6.0-rc11 release within the next few weeks. Those who wish to use this sooner are welcome to use the sys-kernel/spl-9999, sys-fs/zfs-kmod-9999 and sys-fs/zfs-9999 ebuilds.
0.6.0-rc11 has been committed to portage. It eliminates the need to fix this kernel issue.