Summary: | Kernels 2.6.17 and 18 lose network connection to Solaris 10 machines | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Ian Ballantyne <ian> |
Component: | [OLD] Core system | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
Status: | RESOLVED INVALID | ||
Severity: | normal | ||
Priority: | High | ||
Version: | unspecified | ||
Hardware: | x86 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
Ethereal capture of ssh session from start to death.
terminal log of ssh session (username exchanged for XXX) Ethereal capture of http download of picture Ethereal capture of ssh session for kernel 2.6.16-r13 Ethereal capture of http download of picture kernel 2.6.16-r13 Ethereal capture of ssh session 2.6.16-r13, tcp_window_scaling = 0 Ethereal capture of http picture download 2.6.16-r13, tcp_window_scaling = 0 Ethereal capture of ssh session 2.6.18, tcp_window_scaling = 0 Ethereal capture of http picture download 2.6.18, tcp_window_scaling = 0 |
Description
Ian Ballantyne
2006-10-06 06:12:38 UTC
Created attachment 98932 [details]
Ethereal capture of ssh session from start to death.
This ethereal dump was obtained by starting a capture, opening an ssh session and entering 'cat README1'.
Created attachment 98933 [details]
terminal log of ssh session (username exchanged for XXX)
Created attachment 98934 [details]
Ethereal capture of http download of picture
This is the ethereal dump obtained when trying to get the file dscf0004.jpg which is available in the directory on the webserver. The attempt was when using konwueror 3.4.3. This connection seemed to die after 4Kb of data, at least that's what konqueror showed in it's progress dialogue.
Its not gentoo problem its kernel problem so you should post this bug on http://kernel.org. > Its not gentoo problem its kernel problem so you should post this bug on http://kernel.org. Does this mean the bug won't be handled here? No, at least we won't send you there until we've looked at it for ourselves. Does "echo 0 > /proc/sys/net/ipv4/tcp_default_win_scale" help? See http://lwn.net/Articles/92727/ 2.6.17 scales the window size based on your RAM size much more than earlier kernels. My system jumped from scale factor 2 to 7 (1GB RAM). Created attachment 99219 [details]
Ethereal capture of ssh session for kernel 2.6.16-r13
This is an ethereal capture of an ssh session that I opened to the server using kernel 2.6.16-r13. I've looked through it and it seems to also be filled with errors, although the transmission of data is completed. In the terminal output of the cat, there were 2 noticable "delays" of the data flow, presumabley caused by the errors, corrections and continuations that occurred in the transmission.
Created attachment 99220 [details]
Ethereal capture of http download of picture kernel 2.6.16-r13
This is another capture of an attempted download from the server to my system running kernel 2.6.16-r13. Again, this log appears to have numerous errors in it which are corrected and the download completes. Again from the users point of view, the download hangs briefly at 4Kb, then continues. Again I am guessing that this hanging corresponds to the errors in the transmission.
My machine has 1.5Gb RAM. The attempts to echo 1 > /proc/... failed with a File not found error # echo 0 > /proc/sys/net/ipv4/tcp_default_win_scale -bash: /proc/sys/net/ipv4/tcp_default_win_scale: No such file or directory However I did find /proc/sys/net/ipv4/tcp_window_scaling which seems to be relevant, so I tried the following: Kernel 2.6.16-r13: # cat /proc/sys/net/ipv4/tcp_window_scaling 1 # echo 0 > /proc/sys/net/ipv4/tcp_window_scaling with a significant results - There were no visible delays in getting data, and in the ethereal dumps there appears to be no errors or anything like that, just a window update while getting the dscf0004.jpg file. Kernel 2.6.18: # cat /proc/sys/net/ipv4/tcp_window_scaling 1 # echo 0 > /proc/sys/net/ipv4/tcp_window_scaling Again with significant results - an ssh connection has no delays what so ever, the ethereal dumps showed no errors at all. Getting the image, dscf0004.jpg did generate a number of TCP Window Full frames, however there was again no noticable delay in the transfer of the data. I will attatch ethereal captures of the results from the echo 0 > /proc/.../tcp_window_scaling I read the article at lwn. Does this mean our crisco ;-) routers with the newest software updates are responsible??? Created attachment 99225 [details]
Ethereal capture of ssh session 2.6.16-r13, tcp_window_scaling = 0
This is the ethereal capture from an ssh session from kernel 2.6.16-r13 after doing an echo 0 > /proc/sys/net/ipv4/tcp_window_scaling. The capture is the results of doing a cat of README1. This capture appears to have no errors in it.
Created attachment 99226 [details]
Ethereal capture of http picture download 2.6.16-r13, tcp_window_scaling = 0
This is the ethereal capture from a http download of dscf0004.jpg from kernel 2.6.16-r13 after doing an echo 0 > /proc/sys/net/ipv4/tcp_window_scaling. There are a number of TCP Full frames to be seen, however there seems to be no errors, and there were no delays in getting the data from the web server.
Created attachment 99228 [details]
Ethereal capture of ssh session 2.6.18, tcp_window_scaling = 0
This is the ethereal capture from an ssh session from kernel 2.6.18 after doing an echo 0 > /proc/sys/net/ipv4/tcp_window_scaling. The capture is the results of doing a cat of README1. This capture appears to have no errors in it, only two TCP window updates.
Created attachment 99229 [details]
Ethereal capture of http picture download 2.6.18, tcp_window_scaling = 0
This is the ethereal capture from a http download of dscf0004.jpg from kernel 2.6.18 after doing an echo 0 > /proc/sys/net/ipv4/tcp_window_scaling. There are a lot of TCP Full frames to be seen and two Window Updates at the end of the capture, however there seems to be no errors, and there were no delays in getting the data from the web server.
It means either some routers in your path are broken, or that the network stack on the remote end is at fault. And setting that /proc value is an acceptable workaround, you would lose out on *some* performance on a low-latency LAN but other than that things should work OK... |