373109 – >=sys-kernel/gentoo-sources-2.6.39-r2: Network copy freezes system

Bug 373109 - >=sys-kernel/gentoo-sources-2.6.39-r2: Network copy freezes system

Summary: >=sys-kernel/gentoo-sources-2.6.39-r2: Network copy freezes system

Status:	RESOLVED UPSTREAM

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] Core system (show other bugs)
Hardware:	AMD64 Linux

Importance:	Normal normal
Assignee:	Gentoo Kernel Bug Wranglers and Kernel Maintainers

URL:	https://bugzilla.kernel.org/show_bug....
Whiteboard:	[linux-3.1]
Keywords:

Depends on:
Blocks:	375279
	Show dependency tree

Reported:	2011-06-26 11:47 UTC by Marcus Becker
Modified:	2011-08-09 08:03 UTC (History)
CC List:	4 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
manual config 2.6.39-r1 (config-2.6.39-r1.txt,90.90 KB, text/plain) 2011-06-26 11:47 UTC, Marcus Becker	Details
dmesg (dmesg.txt,30.19 KB, text/plain) 2011-06-26 11:47 UTC, Marcus Becker	Details
emerge --info (emerge_info.txt,4.89 KB, text/plain) 2011-06-26 11:48 UTC, Marcus Becker	Details
bisect 2.6.391 and 2.6.39.2 (git_bisect.txt,3.85 KB, text/plain) 2011-06-30 18:27 UTC, Marcus Becker	Details
Do not use DMA address over 32bit range (high_dma_address_fix.patch,829 bytes, patch) 2011-07-19 07:29 UTC, Guo-Fu Tseng	Details \| Diff
dmesg with patch applied (dmesg-test,30.17 KB, text/plain) 2011-07-19 13:24 UTC, Marc Schiffbauer	Details
/var/log/messages with patch applied (tail_messages_shorten.txt,52.80 KB, text/plain) 2011-07-19 17:06 UTC, Marcus Becker	Details
unmap fiirst descriptor, not just the frags (jme-tx-unmap.patch,449 bytes, text/plain) 2011-07-20 00:42 UTC, Chris Wright	Details
DMA unmap fix (dma_unmap_fix.patch,1021 bytes, patch) 2011-07-20 17:24 UTC, Guo-Fu Tseng	Details \| Diff
DMA unmap fix (dma_unmap_fix.patch,1.01 KB, patch) 2011-07-20 17:27 UTC, Guo-Fu Tseng	Details \| Diff
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Marcus Becker 2011-06-26 11:47:07 UTC

Created attachment 278227 [details]
manual config 2.6.39-r1

I had this problem for 2 days now -.-

If I copy files over the network (tested nfs mount and scp), the system becomes unresponsive and freezes on me.

My testing with gentoo-sources-2.6.39-r2 (genkernel and my own config):
KDE - nfs mounted via autofs (Dolphin) - complete freeze of the desktop (every now and then I could move the mouse 1-2cm)
KDE - nfs mounted via autofs (Konsole via cp) - complete freeze of the desktop (every now and then I could move the mouse 1-2cm)
XFCE4 - nfs mounted via autofs (Thunar) - complete freeze of the desktop (every now and then I could move the mouse 1-2cm)
XFCE4 - nfs mounted via autofs (Terminal via cp) - complete freeze of the desktop (every now and then I could move the mouse 1-2cm)
XFCE4 - scp command in Terminal - complete freeze of the desktop (every now and then I could move the mouse 1-2cm)
TTY1(Console with cp) - nfs mounted via autofs - Alt+F2(to switch tty) doesn't respond anymore or every now and then, keyboard input is delayed or not taken

No problems with gentoo-sources-2.6.39-r1 ^^

I attach emerge --info, and my current manual config, which works with gentoo-sources-r1.

Comment 1 Marcus Becker 2011-06-26 11:47:36 UTC

Created attachment 278229 [details]
dmesg

Comment 2 Marcus Becker 2011-06-26 11:48:03 UTC

Created attachment 278231 [details]
emerge --info

Comment 3 Marcus Becker 2011-06-28 11:46:52 UTC

In the patch are a lot of changes to intel-iommu. Maybe dma is broken on my system with this kernel? This would explain, why all the input devices stop working properly or very delayed if there is another device with a constant stream to memory, it just doesn't get an interrupt ^^

Comment 4 Marcus Becker 2011-06-28 19:45:29 UTC

I disabled GART iommu and then unticked supported Vendor models all but Intel and it's still the same issue. I can freeze my system by trying to copy about 3GB of data over the network...

this disables GART iommu as well:
[*] Supported processor vendors  --->
  [*]   Support Intel processors
only for AMD processors: 
[*] GART IOMMU support (NEW)

Comment 5 Marc Schiffbauer gentoo-dev

2011-06-28 21:42:21 UTC

Which network driver do you use?

I can confirm such problems here on my laptop using the jme LAN driver. And it does not matter whether I use GBIt or 100 MBit/s.

I will try to revert to r1 too

-Marc

Comment 6 Marcus Becker 2011-06-28 22:09:05 UTC

jme here too... but there was nothing about wired network drivers in the patch. :/

Comment 7 Mike Pagano gentoo-dev

2011-06-29 11:48:16 UTC

Can one of you guys do a bisect between vanilla-sources-2.6.29.1 and vanilla-sources-2.6.29.2 ?

Also, can you test the latest git-sources to see if this has been addressed?

Comment 8 Marc Schiffbauer gentoo-dev

2011-06-29 12:28:03 UTC

Mike,

I am in contact with the upstream author of the jme driver and I will report back here if we found something.

If I find the time I will bisect in between too.

(.39 btw not .29 ;))

Comment 9 Marcus Becker 2011-06-29 18:20:13 UTC

I can reproduce the problem with sys-kernel/git-sources-3.0_rc5, same happens using cp via nfs in Terminal running xfce4 desktop. input devices become unresponsive.

Comment 10 Marcus Becker 2011-06-29 21:28:21 UTC

I am sorry, I test the first build to bisect, then I mark it as bad and the next kernel I get kernel panic...

Comment 11 Marcus Becker 2011-06-29 22:01:08 UTC

my fault, i keep going to tell git on the first bisect 'bad' but always end up with 3.0.0-rc5-0063-g0d72c6f
should I get any output on git bisect bad?

Comment 12 Marcus Becker 2011-06-30 18:27:47 UTC

Created attachment 278727 [details]
bisect 2.6.391 and 2.6.39.2

sorry, I had too much beer yesterday :(
This worked very well this time, I tested by attempting to copy a ~10GB folder over the network.

Comment 13 Marcus Becker 2011-06-30 23:02:49 UTC

here is a nice log ^^

disi-bigtop linux-2.6.39 # git bisect log
git bisect start
# bad: [62b218cb13724881b5314f10ac0f177f4fdef8b6] Linux 2.6.39.2
git bisect bad 62b218cb13724881b5314f10ac0f177f4fdef8b6
# good: [cf29f916c310c9b13c19514b496700c549597e11] Linux 2.6.39.1
git bisect good cf29f916c310c9b13c19514b496700c549597e11
# good: [cf29f916c310c9b13c19514b496700c549597e11] Linux 2.6.39.1
git bisect good cf29f916c310c9b13c19514b496700c549597e11
# bad: [a4d37345244dea111a49dda25cc30b2ae7dab05c] x86/amd-iommu: Use only per-device dma_ops
git bisect bad a4d37345244dea111a49dda25cc30b2ae7dab05c
# bad: [0db9466ed48263ab2951e89240b482912695c4a6] iwl4965: fix 5GHz operation
git bisect bad 0db9466ed48263ab2951e89240b482912695c4a6
# bad: [646543453327a2b85083f4012d3bbeb5dabdabb8] arch/tile: allocate PCI IRQs later in boot
git bisect bad 646543453327a2b85083f4012d3bbeb5dabdabb8
# good: [3a2bc9ae5ee092a0db8aa07d695e15b14a3fe2a4] intel-iommu: Speed up processing of the identity_mapping function
git bisect good 3a2bc9ae5ee092a0db8aa07d695e15b14a3fe2a4
# bad: [b8f794de1463ab32ed90c97ad6edbcecd931abed] intel-iommu: Remove Host Bridge devices from identity mapping
git bisect bad b8f794de1463ab32ed90c97ad6edbcecd931abed
# bad: [80ebe0ace73cb376f66bdeeb92f4e7b5d4a3f8fb] intel-iommu: Use coherent DMA mask when requested
git bisect bad 80ebe0ace73cb376f66bdeeb92f4e7b5d4a3f8fb
# bad: [87cc4d1e3e05af38c7c51323a3d86fe2572ab033] intel-iommu: Dont cache iova above 32bit
git bisect bad 87cc4d1e3e05af38c7c51323a3d86fe2572ab033

Comment 14 Stratos Psomadakis (RETIRED) gentoo-dev

2011-07-13 10:48:17 UTC

Ok, so reverting this commit resolves your problem?

Marc can you confirm that?

Comment 15 Marcus Becker 2011-07-13 11:29:05 UTC

(In reply to comment #14)
> Ok, so reverting this commit resolves your problem?
> 
> Marc can you confirm that?

I haven't tried to remove a single patch and build 2.6.39.2, but this is what bisect did for me and I only said good if I was able to copy the complete 10GB folder over the network on the command line without keyboard freeze. As you see in the attached log, that worked 3 times during the bisect. When it was bad, it interrupted the input devices after about 1min of copying over a Gigbit network. Constantly hitting ctrl+c stopped the copying after ~30 seconds and the input devices (keyboard) slowly gained back control, so I could do the next bisect.

Actual I would have to read how to do this with patch -? blubb.diff etc.? :)

Still running 2.6.39.1 and copied yesterday (in Terminal running xfce4 desktop) the whole CentOS 6.0 DVD release (~5GB) over the network, no problems.

Comment 16 Marc Schiffbauer gentoo-dev

2011-07-13 13:49:53 UTC

(In reply to comment #14)
> Ok, so reverting this commit resolves your problem?
> 
> Marc can you confirm that?

Yes. To be sure I bisected the kernels myself: v2.6.39.1 vs. v2.6.39.2

Result: same bisect log

Then I checked out v2.6.39.2 again and reverted commit 87cc4d1e3e05af38c7c51323a3d86fe2572ab033, rebuild the kernel and tested again.

So, by reverting this single commit the kernel is working again.

My Testcase:
0) be in console only, no X
1) make sure eth0 is at 1 GBit/s (which is often 100MB/s here...)
2) mount an nfsv3 share
3) pv /path/to/big/file/on/nfs > /dev/null

For working kernels this worked fine for a 3GB file.
For broken kernels this always stopped at about 1.3GB and machine was frozen after that (Only SysRQ reboot possible)

So the bad commit is:

commit 87cc4d1e3e05af38c7c51323a3d86fe2572ab033
Author: Chris Wright <chrisw@sous-sol.org>
Date:   Sat May 28 13:15:04 2011 -0500
                   
    intel-iommu: Dont cache iova above 32bit
                   
    commit 1c9fc3d11b84fbd0c4f4aa7855702c2a1f098ebb upstream.
               
    Mike Travis and Mike Habeck reported an issue where iova allocation
    would return a range that was larger than a device's dma mask.

    https://lkml.org/lkml/2011/3/29/423               

    The dmar initialization code will reserve all PCI MMIO regions and copy
    those reservations into a domain specific iova tree.  It is possible for
    one of those regions to be above the dma mask of a device.  It is typical
    to allocate iovas with a 32bit mask (despite device's dma mask possibly
    being larger) and cache the result until it exhausts the lower 32bit
    address space.  Freeing the iova range that is >= the last iova in the
    lower 32bit range when there is still an iova above the 32bit range will
    corrupt the cached iova by pointing it to a region that is above 32bit.
    If that region is also larger than the device's dma mask, a subsequent
    allocation will return an unusable iova and cause dma failure.
    
    Simply don't cache an iova that is above the 32bit caching boundary.
    
    Reported-by: Mike Travis <travis@sgi.com>
    Reported-by: Mike Habeck <habeck@sgi.com>
    Acked-by: Mike Travis <travis@sgi.com>
    Tested-by: Mike Habeck <habeck@sgi.com>
    Signed-off-by: Chris Wright <chrisw@sous-sol.org>
    Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

:040000 040000 fdd2ca77df8333e2888f326c7ea26b6d7dbcc2c1 fe5353f31fc5d54a5068517e07c533c2e59d9f42 M      drivers

Comment 17 Stratos Psomadakis (RETIRED) gentoo-dev

2011-07-13 14:01:03 UTC

Ok, Marcus do you want to notify upstream about the guilty commit? Either report at bugzilla.kernel.org, or/and send email at linux-kernel@vger.kernel.org, and stable@kernel.org.

Be sure to include your dmesg, config, and bisect log, and CC kernel@gentoo.org. 

Thanks.

Comment 18 Marcus Becker 2011-07-13 14:35:49 UTC

(In reply to comment #17)
> Ok, Marcus do you want to notify upstream about the guilty commit? Either
> report at bugzilla.kernel.org, or/and send email at
> linux-kernel@vger.kernel.org, and stable@kernel.org.
> 
> Be sure to include your dmesg, config, and bisect log, and CC
> kernel@gentoo.org. 
> 
> Thanks.

Done and totally scared now, don't want any trouble with Linus :(
https://bugzilla.kernel.org/show_bug.cgi?id=39312
I also sent an email as you suggested...

Comment 19 Guo-Fu Tseng 2011-07-13 17:52:59 UTC

Thank you guys for pin point the root cause! :)

Comment 20 Stratos Psomadakis (RETIRED) gentoo-dev

2011-07-13 19:26:51 UTC

Marcus, can you please add kernel@gentoo.org to the CC list at bugzilla.kernel.org?

Comment 21 Stratos Psomadakis (RETIRED) gentoo-dev

2011-07-14 10:50:41 UTC

We're going to follow the upstream bug, and reflect any updates/changes here.

Comment 22 Guo-Fu Tseng 2011-07-19 07:29:33 UTC

Created attachment 280345 [details, diff]
Do not use DMA address over 32bit range

At second thought, I haven't heard any other bug report
that suggest the high address cause other hardware(which
also use high address) unstable. I suspect there might be
something wrong with the JMicron Ethernet hardware.

Trying not to use the Address over 32bit range see if it works.

Could anyone help me testing this patch?
I can not reproduce the issue here.

Comment 23 Jason Lamb 2011-07-19 13:07:08 UTC

I can confirm this problem with my laptop, (Clevo P150HM), which uses the JMicron JMC250 PCIE GigE Controller (rev 05). Any large copies, (using KDE and Dolphin), cause the system sluggishness behavior described, as well as the copy starting off at ~70-80mb/s, but after a couple of seconds throttles down to about ~25k/s.

This is using kernel 2.6.39-r2 previously and 2.6.39-r3 currently, and the jme 1.0.8 driver, which appears to have the "Do not use DMA.." patch code already incorporated in the 2.6.39-r3 tree.

Strange thing is that if I turn off the wired ethernet, and just use WiFi, (Intel 6230), and try to copy the same files, the same way, the network copy performance is poor, (about 700K), but there is no hang of the overall system. That behavior just happens with the JMC250.

Comment 24 Marc Schiffbauer gentoo-dev

2011-07-19 13:22:01 UTC

(In reply to comment #22)
> Created attachment 280345 [details, diff]
> Do not use DMA address over 32bit range
> 
> At second thought, I haven't heard any other bug report
> that suggest the high address cause other hardware(which
> also use high address) unstable. I suspect there might be
> something wrong with the JMicron Ethernet hardware.
> 
> Trying not to use the Address over 32bit range see if it works.
> 
> Could anyone help me testing this patch?
> I can not reproduce the issue here.

Hi Guo-Fu,

thanks for the patch. I tested it against the 2.6.39-r3 gentoo kernel.

I am sorry, but this does not fix the issue, but I the behavior of the system during failure is different:

The machine does not freeze anymore: Input responsive all the time while doing the test.

But: The network copy still stops at the same time after about 1.3G have been copied. Hitting Ctrl-C then makes the blocked process stop after about a minute or so but I can switch to another tty and login without problems all the time.

The interesting part now may be the kernel messages that I am seeing then:
(complete file attached after this post)

reg 3
DMAR:[DMA Read] Request device [03:00.0] fault addr 0 
DMAR:[fault reason 06] PTE Read access is not set
DRHD: handling fault status reg 3
[... many times]
------------[ cut here ]------------
WARNING: at drivers/pci/intel-iommu.c:2761 intel_unmap_page+0x14c/0x180()
Hardware name: P150HMx
Driver unmaps unmatched page at PFN 0
Modules linked in: tun ip6_tables iptable_filter ip_tables x_tables coretemp nfsd ipv6 microcode snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss sha256_generic aesni_i
ntel cryptd aes_x86_64 aes_generic cbc kvm_intel kvm acpi_cpufreq mperf snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device snd_hda_codec_hdmi snd_hda_codec_realtek arc4 snd_hda_i
ntel ecb snd_hda_codec snd_hwdep snd_pcm snd_timer iwlagn mac80211 snd cfg80211 firewire_ohci sdhci_pci sdhci mmc_core rfkill uvcvideo videodev pcspkr video backlight snd_page_alloc in
tel_agp intel_gtt i2c_i801 media firewire_core battery agpgart processor rtc_cmos jme mii ac thermal button xhci_hcd v4l2_compat_ioctl32 i2c_core rtc_core rtc_lib scsi_transport_iscsi 
fuse nfs nfs_acl auth_rpcgss lockd sunrpc zlib_deflate raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 dm_snapshot dm_crypt dm_mirror dm_reg
ion_hash dm_log dm_mod scsi_wait_scan hid_monterey hid_microsoft hid_logitech hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech usbhid ohci_hcd uhci_hcd usb_
storage ehci_hcd usbcore sg ata_piix ahci libahci pata_pcmcia pcmcia pcmcia_core pata_mpiix libata
Pid: 0, comm: swapper Tainted: G        W   2.6.39-gentoo-r3 #1
Call Trace:
 <IRQ>  [<ffffffff8104239b>] ? warn_slowpath_common+0x7b/0xc0
 [<ffffffff81042495>] ? warn_slowpath_fmt+0x45/0x50
 [<ffffffff813025ac>] ? intel_unmap_page+0x14c/0x180
 [<ffffffffa0156f79>] ? jme_free_rx_resources+0x69/0x1a0 [jme]
 [<ffffffffa015a3f3>] ? jme_link_change_tasklet+0x583/0xec0 [jme]
 [<ffffffff810488de>] ? tasklet_action+0x5e/0x100
 [<ffffffff81048f40>] ? __do_softirq+0xa0/0x1b0
 [<ffffffff8109d39f>] ? handle_irq_event_percpu+0x9f/0x1f0
 [<ffffffff8149840c>] ? call_softirq+0x1c/0x30
 [<ffffffff810048dd>] ? do_softirq+0x4d/0x80
 [<ffffffff810492b6>] ? irq_exit+0x96/0xb0
 [<ffffffff8100455c>] ? do_IRQ+0x5c/0xd0
 [<ffffffff81496c13>] ? common_interrupt+0x13/0x13
 <EOI>  [<ffffffff81496c0e>] ? common_interrupt+0xe/0x13
 [<ffffffff813d1b27>] ? poll_idle+0x17/0x70
 [<ffffffff813d1b1a>] ? poll_idle+0xa/0x70
 [<ffffffff813d1c2b>] ? cpuidle_idle_call+0xab/0x1f0
 [<ffffffff81001216>] ? cpu_idle+0x96/0xe0
 [<ffffffff816d0b64>] ? start_kernel+0x394/0x39f
 [<ffffffff816d040f>] ? x86_64_start_kernel+0xf4/0xfa
---[ end trace 7b8527fe8e683c20 ]---
jme 0000:03:00.0: eth0: Link is down
jme 0000:03:00.0: eth0: Link is up at ANed: 1000 Mbps, Full-Duplex, MDI
Allocating 1-page iova for 0000:03:00.0 failed
Device 0000:03:00.0 request: 1@225583840 dir 2 --- failed
Allocating 1-page iova for 0000:03:00.0 failed
Device 0000:03:00.0 request: 1@225583040 dir 2 --- failed
Allocating 1-page iova for 0000:03:00.0 failed
Device 0000:03:00.0 request: 1@224969840 dir 2 --- failed
Allocating 1-page iova for 0000:03:00.0 failed
[... many times]
jme: Allocating resources for TX error, Device STOPPED!
jme 0000:03:00.0: eth0: Link is down
jme 0000:03:00.0: eth0: Link is up at ANed: 1000 Mbps, Full-Duplex, MDI
Allocating 1-page iova for 0000:03:00.0 failed
Device 0000:03:00.0 request: 1@225e51040 dir 2 --- failed
Allocating 1-page iova for 0000:03:00.0 failed
Device 0000:03:00.0 request: 1@1bef6a840 dir 2 --- failed
[... and so on]

Comment 25 Marc Schiffbauer gentoo-dev

2011-07-19 13:24:28 UTC

Created attachment 280361 [details]
dmesg with patch applied

kernel messages that appeared during test with applied patch from comment #22

Comment 26 Marcus Becker 2011-07-19 17:06:52 UTC

Created attachment 280391 [details]
/var/log/messages with patch applied

I shorten the attachment (was 40mb :))
It didn't lock the input but floated the logfile with those errors, I lost the device twice during the copy. It automatically brought the device back up and acquired an IP via dhcp. Then it wouldn't like to get an IP any more and I gave up...

Comment 27 Chris Wright 2011-07-20 00:41:08 UTC

(In reply to comment #24)
> (In reply to comment #22)
> > Created attachment 280345 [details, diff]
> > Do not use DMA address over 32bit range
> > 
> > At second thought, I haven't heard any other bug report
> > that suggest the high address cause other hardware(which
> > also use high address) unstable. I suspect there might be
> > something wrong with the JMicron Ethernet hardware.
> > 
> > Trying not to use the Address over 32bit range see if it works.
> > 
> > Could anyone help me testing this patch?
> > I can not reproduce the issue here.
> 
> Hi Guo-Fu,
> 
> thanks for the patch. I tested it against the 2.6.39-r3 gentoo kernel.
> 
> I am sorry, but this does not fix the issue, but I the behavior of the system
> during failure is different:
> 
> The machine does not freeze anymore: Input responsive all the time while doing
> the test.

Right, this is because the scan is both smaller and starting from a cached point.

> But: The network copy still stops at the same time after about 1.3G have been
> copied. Hitting Ctrl-C then makes the blocked process stop after about a minute
> or so but I can switch to another tty and login without problems all the time.
> 
> The interesting part now may be the kernel messages that I am seeing then:
> (complete file attached after this post)
> 
> reg 3
> DMAR:[DMA Read] Request device [03:00.0] fault addr 0 

This is showing that the driver failed to allocate a dma mapping in the IOMMU. The driver told the device to DMA to address 0, but there is no mapping for that address.  The driver can catch this by checking the return value of pci_map_page() with pci_dma_mapping_error().

However, this is just a symptom.  I believe the cause is the driver not unmapping dma descriptors correctly.

Guo-Fu Tseng, can you review the unmapping path carefully?  I think we're missing one descriptor per tx unmap cycle.

Comment 28 Chris Wright 2011-07-20 00:42:17 UTC

Created attachment 280423 [details]
unmap fiirst descriptor, not just the frags

This is an example of what I'm referring to.

Comment 29 Guo-Fu Tseng 2011-07-20 09:46:35 UTC

Thank you Chris!
Your information is very useful.
I should check the return value, and I did missed a unmap!

Michał Mirosław sent a Patch to lkml-netdev on Jul 11, also
pointed the unmap issue.
"[PATCH v2 10/46] net: jme: convert to generic DMA API"

I'll soon format a patch and submit to lkml-netdev.
Thank you all for the helping! :)

Comment 30 Jason Lamb 2011-07-20 16:41:37 UTC

I can confirm that the Michał Mirosław patch, entitled "[PATCH v2 10/46] net: jme: convert to generic DMA API", referenced by Guo-Fu, that was sent to to lkml-netdev mailing list on 07/11/11, shown here;

http://www.spinics.net/lists/netdev/msg169620.html

does indeed fix this problem for my system. I can now do full rate copies with no system sluggishness.

Thanks all..

Comment 31 Guo-Fu Tseng 2011-07-20 16:44:38 UTC

(In reply to comment #30)
> I can confirm that the Michał Mirosław patch, entitled "[PATCH v2 10/46] net:
> jme: convert to generic DMA API", referenced by Guo-Fu, that was sent to to
> lkml-netdev mailing list on 07/11/11, shown here;
> 
> http://www.spinics.net/lists/netdev/msg169620.html
> 
> does indeed fix this problem for my system. I can now do full rate copies with
> no system sluggishness.
> 
> Thanks all..

Thanks you for the testing.
But however the Michał Mirosław's patch is _NOT_CORRECT_.
I'll soon paste another one.

Comment 32 Guo-Fu Tseng 2011-07-20 17:24:23 UTC

Created attachment 280475 [details, diff]
DMA unmap fix

I haven't got time to run the basic test.
Kind of busy recently.

But according to the report, I believe this patch should fix the issue.
Could anyone kindly help me test it?

Comment 33 Guo-Fu Tseng 2011-07-20 17:27:21 UTC

Created attachment 280477 [details, diff]
DMA unmap fix

Just adding compiler hint against last patch.

Comment 34 Marcus Becker 2011-07-20 21:28:58 UTC

(In reply to comment #33)
> Created attachment 280477 [details, diff]
> DMA unmap fix
> 
> Just adding compiler hint against last patch.

Stupid question :) could you create the diff against gentoo-sources-r2 or 3 or will it go into r4?

Comment 35 Marc Schiffbauer gentoo-dev

2011-07-20 23:38:36 UTC

(In reply to comment #33)
> Created attachment 280477 [details, diff]
> DMA unmap fix
> 
> Just adding compiler hint against last patch.

Hi Guo-Fu,

The patch applied with one hunk for me against vanilla 2.6.39.3

The issue seems to be fixed for me with that patch. 

I copied the 3GB file via NFS several times now without any problem and at full speed (~88 MB/s) which did not work a single time before.

Thanks!

-Marc

Comment 36 Mike Pagano gentoo-dev

2011-07-20 23:55:53 UTC

Only submitted upstream patches from Linus' tree go into gentoo-sources.

http://dev.gentoo.org/~mpagano/genpatches/faq.htm

Comment 37 Guo-Fu Tseng 2011-07-21 00:28:19 UTC

(In reply to comment #35)
> (In reply to comment #33)
> > Created attachment 280477 [details, diff]
> > DMA unmap fix
> > 
> > Just adding compiler hint against last patch.
> 
> Hi Guo-Fu,
> 
> The patch applied with one hunk for me against vanilla 2.6.39.3
> 
> The issue seems to be fixed for me with that patch. 
> 
> I copied the 3GB file via NFS several times now without any problem and at full
> speed (~88 MB/s) which did not work a single time before.
> 
> Thanks!
> 
> -Marc

Thank you Marc!
I'll submit the patch to upstream kernel today. :)

Comment 38 Guo-Fu Tseng 2011-07-21 03:02:13 UTC

I've submitted to netdev.
Here is that status of the patch:
http://patchwork.ozlabs.org/patch/105878/

Comment 39 Jason Lamb 2011-07-21 14:57:47 UTC

I can also confirm that Guo-Fu's patch, from;

http://patchwork.ozlabs.org/patch/105878/

also applied to gentoo-sources-2.6.39-r3 without issue. It has resolved any issues that I had with both performance and system lag, when doing large copies.

Thanks Guo.

Comment 40 Marcus Becker 2011-07-21 16:47:45 UTC

(In reply to comment #39)
> I can also confirm that Guo-Fu's patch, from;
> 
> http://patchwork.ozlabs.org/patch/105878/
> 
> also applied to gentoo-sources-2.6.39-r3 without issue. It has resolved any
> issues that I had with both performance and system lag, when doing large
> copies.
> 
> Thanks Guo.

same here with clean gentoo-sources-2.6.39-r3 sources, copied ~6GB over the network via nfs, no problems. Thanks :)