Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 150262 - Kernels 2.6.17 and 18 lose network connection to Solaris 10 machines
Summary: Kernels 2.6.17 and 18 lose network connection to Solaris 10 machines
Status: RESOLVED INVALID
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: High normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-10-06 06:12 UTC by Ian Ballantyne
Modified: 2006-10-09 19:54 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
Ethereal capture of ssh session from start to death. (ethereal_capture-ssh-cat_README1.log,62.58 KB, text/x-log)
2006-10-06 06:16 UTC, Ian Ballantyne
Details
terminal log of ssh session (username exchanged for XXX) (cat_README1.out,31.52 KB, text/plain)
2006-10-06 06:18 UTC, Ian Ballantyne
Details
Ethereal capture of http download of picture (capture-www-dscf0004.jpg.log,105.06 KB, text/x-log)
2006-10-06 06:22 UTC, Ian Ballantyne
Details
Ethereal capture of ssh session for kernel 2.6.16-r13 (capture-ssh-cat_README1-2.6.16-r13.log,141.47 KB, text/x-log)
2006-10-09 11:40 UTC, Ian Ballantyne
Details
Ethereal capture of http download of picture kernel 2.6.16-r13 (capture-www-dscf0004.jpg-2.6.16-r13.log,860.07 KB, text/x-log)
2006-10-09 11:43 UTC, Ian Ballantyne
Details
Ethereal capture of ssh session 2.6.16-r13, tcp_window_scaling = 0 (capture-ssh-cat_README1-2.6.16-r13_tcp_window_scaling_is_0.log,130.29 KB, text/x-log)
2006-10-09 12:41 UTC, Ian Ballantyne
Details
Ethereal capture of http picture download 2.6.16-r13, tcp_window_scaling = 0 (capture-www-dscf0004.jpg-2.6.16-r13_tcp_window_scaling_is_0.log,810.66 KB, text/x-log)
2006-10-09 12:44 UTC, Ian Ballantyne
Details
Ethereal capture of ssh session 2.6.18, tcp_window_scaling = 0 (capture-ssh-cat_README1-2.6.18_tcp_window_scaling_is_0.log,130.77 KB, text/x-log)
2006-10-09 12:46 UTC, Ian Ballantyne
Details
Ethereal capture of http picture download 2.6.18, tcp_window_scaling = 0 (capture-www-dscf0004.jpg-2.6.18_tcp_window_scaling_is_0.log,811.78 KB, text/x-log)
2006-10-09 12:49 UTC, Ian Ballantyne
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ian Ballantyne 2006-10-06 06:12:38 UTC
Since updating to kernel 2.6.18 I've been having no end or problems connecting to a Solaris 10 server running on Sparc. When ever I try to download anything via http or to have an ssh session with the server, as soon as any significant amount of data comes, meaning any more than about 32Kb, the connection dies.  Curiously, ftp over an insecure connection appears to be not affected by this problem.  This problem is constant and is reproducible with kernels 2.6.18, 2.6.17-r7 and 2.6.17-r4.  This problem does not occur with any kernel up until and including 2.6.16-r13, meaining the problem has appeared in kernel 2.6.17.  This problem is reproducible on three different machines in our office, one running an AMD 2600+, another running an intel cpu on an MSI laptop, and another running a Sun W1100z workstation with a 64 bit AMD CPU and running with a 64 bit kernel.

I've been in contact with our network admins who can find no problem in the network.  They too see that the connection dies with a packet retransmission and a message from the server that the previous segment was lost.  The dumps that are received when they sniff the network connection at the server are identical to the dumps I get when I sniff on my local workstation.  They too consider this to be a problem in the linux kernel.

I have made a test directory available on the web server where I have saved some ethereal dumps and sample data, www.meduniwien.ac.at/user/ib/  These dumps are from my personal workstation which is running an AMD 2600+ CPU (see cpuinfo below).  The ethereal dumps have been done on a 2.6.18 kernel, the configuration for which is available in the sample data directory on the webserver (and will be available here too if I can attach files).

An ethereal dump on the server itself is not possible because nothing from X or the necessary libraries to run graphical applications is installed.

The server itself is running Solaris zones.  The address of the server, 149.l48.224.2 is a Solaris zone.  The same problem happens when I log into the root zone of the machine - ssh connections die after about 32Kb. (yes, I am the administrator of the machine)  The root zone is protected by a firewall from external accesses.  The network firewall logs and server firewall logs show nothing that would indicate there is dropped data.  At this point in time because of service availability requirements, a zone or server reboot is not possible unless there is a critical situation.

If there is any need for further informatoin, please tell me and I will do what I can.


emerge --info

Portage 2.0.54-r2 (default-linux/x86/2005.1, gcc-3.3.6, glibc-2.3.6-r4, 2.6.18-gentoo i686)
=================================================================
System uname: 2.6.18-gentoo i686 AMD Athlon(tm) XP 2600+
Gentoo Base System version 1.6.14
app-admin/eselect-compiler: [Not Present]
dev-java/java-config: 1.2.11-r1
dev-lang/python:     2.3.5, 2.4.2
dev-python/pycrypto: [Not Present]
dev-util/ccache:     [Not Present]
dev-util/confcache:  [Not Present]
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.59-r7
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r1
sys-devel/binutils:  2.16.1-r2
sys-devel/gcc-config: 1.3.13-r2
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-march=athlon-xp -O3 -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.3/env /usr/kde/3.3/share/config /usr/kde/3.3/shutdown /usr/kde/3.4/env /usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/lib/X11/xkb /usr/lib/mozilla/defaults/pref /usr/share/config"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-march=athlon-xp -O3 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig distlocks sandbox sfperms strict"
GENTOO_MIRRORS="ftp://gd.tuwien.ac.at/opsys/linux/gentoo http://gd4.tuwien.ac.at/opsys/linux/gentoo http://gentoo.oregonstate.edu  http://www.ibiblio.org/pub/Linux/distributions/gentoo"
LINGUAS="en_GB de uk"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/usr/portage/build"
PORTDIR="/usr/portage/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="x86 X aalib acl alsa apm arts audiofile avi berkdb bitmap-fonts bzip2 cdr cli crypt cups curl dlloader dri dts dvd eds emboss encode esd exif expat fam ffmpeg flac foomaticdb fortran gd gdbm gif glut gmp gnome gpm gstreamer gtk gtk2 gtkhtml idn imagemagick imlib ipv6 isdnlog java jpeg junit kde lcms ldap libcaca libg++ libwww mad mikmod mng motif mozilla mp3 mpeg ncurses nls offensive ogg oggvorbis opengl oss pam pcre pdflib perl png ppds pppd python qt qt3 qt4 quicktime readline recode reflection samba sdl session slang speex spell spl sqlite ssl svga tcl tcpd theora tiff tk truetype truetype-fonts type1-fonts udev usb vorbis xine xinerama xml2 xmms xorg xpm xv xvid zlib video_cards_apm video_cards_ark video_cards_ati video_cards_chips video_cards_cirrus video_cards_cyrix video_cards_dummy video_cards_fbdev video_cards_glint video_cards_i128 video_cards_i740 video_cards_i810 video_cards_imstt video_cards_mga video_cards_neomagic video_cards_nsc video_cards_nv video_cards_rendition video_cards_s3 video_cards_s3virge video_cards_savage video_cards_siliconmotion video_cards_sis video_cards_sisusb video_cards_tdfx video_cards_tga video_cards_trident video_cards_tseng video_cards_v4l video_cards_vesa video_cards_vga video_cards_via video_cards_vmware video_cards_voodoo input_devices_keyboard input_devices_mouse input_devices_evdev linguas_en_GB linguas_de linguas_uk userland_GNU kernel_linux elibc_glibc"
Unset:  CTARGET, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PORTAGE_RSYNC_OPTS, PORTDIR_OVERLAY



# cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 10
model name      : AMD Athlon(tm) XP 2600+
stepping        : 0
cpu MHz         : 1921.105
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow ts
bogomips        : 3844.28
Comment 1 Ian Ballantyne 2006-10-06 06:16:10 UTC
Created attachment 98932 [details]
Ethereal capture of ssh session from start to death.

This ethereal dump was obtained by starting a capture, opening an ssh session and entering 'cat README1'.
Comment 2 Ian Ballantyne 2006-10-06 06:18:38 UTC
Created attachment 98933 [details]
terminal log of ssh session (username exchanged for XXX)
Comment 3 Ian Ballantyne 2006-10-06 06:22:33 UTC
Created attachment 98934 [details]
Ethereal capture of http download of picture

This is the ethereal dump obtained when trying to get the file dscf0004.jpg which is available in the directory on the webserver.  The attempt was when using konwueror 3.4.3.  This connection seemed to die after 4Kb of data, at least that's what konqueror showed in it's progress dialogue.
Comment 4 yogeshbug 2006-10-06 06:45:42 UTC
Its not gentoo problem its kernel problem so you should post this bug on http://kernel.org.
Comment 5 Ian Ballantyne 2006-10-09 02:49:27 UTC
> Its not gentoo problem its kernel problem so you should post this bug on
http://kernel.org.

Does this mean the bug won't be handled here?
Comment 6 Daniel Drake (RETIRED) gentoo-dev 2006-10-09 08:10:23 UTC
No, at least we won't send you there until we've looked at it for ourselves.

Does "echo 0 > /proc/sys/net/ipv4/tcp_default_win_scale" help?  See http://lwn.net/Articles/92727/

2.6.17 scales the window size based on your RAM size much more than earlier kernels. My system jumped from scale factor 2 to 7 (1GB RAM).
Comment 7 Ian Ballantyne 2006-10-09 11:40:07 UTC
Created attachment 99219 [details]
Ethereal capture of ssh session for kernel 2.6.16-r13

This is an ethereal capture of an ssh session that I opened to the server using kernel 2.6.16-r13.  I've looked through it and it seems to also be filled with errors, although the transmission of data is completed.  In the terminal output of the cat, there were 2 noticable "delays" of the data flow, presumabley caused by the errors, corrections and continuations that occurred in the transmission.
Comment 8 Ian Ballantyne 2006-10-09 11:43:04 UTC
Created attachment 99220 [details]
Ethereal capture of http download of picture kernel 2.6.16-r13

This is another capture of an attempted download from the server to my system running kernel 2.6.16-r13.  Again, this log appears to have numerous errors in it which are corrected and the download completes.  Again from the users point of view, the download hangs briefly at 4Kb, then continues.  Again I am guessing that this hanging corresponds to the errors in the transmission.
Comment 9 Ian Ballantyne 2006-10-09 12:36:24 UTC
My machine has 1.5Gb RAM.  The attempts to echo 1 > /proc/... failed with a File not found error

# echo 0 > /proc/sys/net/ipv4/tcp_default_win_scale
-bash: /proc/sys/net/ipv4/tcp_default_win_scale: No such file or directory

However I did find /proc/sys/net/ipv4/tcp_window_scaling which seems to be relevant, so I tried the following:

Kernel 2.6.16-r13:
# cat /proc/sys/net/ipv4/tcp_window_scaling
1
# echo 0 > /proc/sys/net/ipv4/tcp_window_scaling

with a significant results - There were no visible delays in getting data, and in the ethereal dumps there appears to be no errors or anything like that, just a window update while getting the dscf0004.jpg file.

Kernel 2.6.18:
# cat /proc/sys/net/ipv4/tcp_window_scaling
1
# echo 0 > /proc/sys/net/ipv4/tcp_window_scaling

Again with significant results - an ssh connection has no delays what so ever, the ethereal dumps showed no errors at all.
Getting the image, dscf0004.jpg did generate a number of TCP Window Full frames, however there was again no noticable delay in the transfer of the data.

I will attatch ethereal captures of the results from the echo 0 > /proc/.../tcp_window_scaling

I read the article at lwn.  Does this mean our crisco ;-) routers with the newest software updates are responsible???
Comment 10 Ian Ballantyne 2006-10-09 12:41:31 UTC
Created attachment 99225 [details]
Ethereal capture of ssh session 2.6.16-r13, tcp_window_scaling = 0

This is the ethereal capture from an ssh session from kernel 2.6.16-r13 after doing an echo 0 > /proc/sys/net/ipv4/tcp_window_scaling.  The capture is the results of doing a cat of README1.  This capture appears to have no errors in it.
Comment 11 Ian Ballantyne 2006-10-09 12:44:47 UTC
Created attachment 99226 [details]
Ethereal capture of http picture download 2.6.16-r13, tcp_window_scaling = 0

This is the ethereal capture from a http download of dscf0004.jpg from kernel 2.6.16-r13 after doing an echo 0 > /proc/sys/net/ipv4/tcp_window_scaling. There are a number of TCP Full frames to be seen, however there seems to be no errors, and there were no delays in getting the data from the web server.
Comment 12 Ian Ballantyne 2006-10-09 12:46:49 UTC
Created attachment 99228 [details]
Ethereal capture of ssh session 2.6.18, tcp_window_scaling = 0

This is the ethereal capture from an ssh session from kernel 2.6.18 after doing an echo 0 > /proc/sys/net/ipv4/tcp_window_scaling.  The capture is the results of doing a cat of README1.  This capture appears to have no errors in it, only two TCP window updates.
Comment 13 Ian Ballantyne 2006-10-09 12:49:19 UTC
Created attachment 99229 [details]
Ethereal capture of http picture download 2.6.18, tcp_window_scaling = 0

This is the ethereal capture from a http download of dscf0004.jpg from kernel 2.6.18 after doing an echo 0 > /proc/sys/net/ipv4/tcp_window_scaling. There are a lot of TCP Full frames to be seen and two Window Updates at the end of the capture, however there seems to be no errors, and there were no delays in getting the data from the web server.
Comment 14 Daniel Drake (RETIRED) gentoo-dev 2006-10-09 19:53:57 UTC
It means either some routers in your path are broken, or that the network stack on the remote end is at fault.
Comment 15 Daniel Drake (RETIRED) gentoo-dev 2006-10-09 19:54:35 UTC
And setting that /proc value is an acceptable workaround, you would lose out on *some* performance on a low-latency LAN but other than that things should work OK...