Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 243816 - [2.6.27 regression] hangs multiple browsers
Summary: [2.6.27 regression] hangs multiple browsers
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: High normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL: many
Whiteboard: linux-2.6.27-regression
Keywords:
Depends on:
Blocks:
 
Reported: 2008-10-24 13:07 UTC by Robert Bradbury
Modified: 2008-11-01 23:03 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
Config file for kernel 2.6.27 which hangs (config-2.6.27,65.24 KB, text/plain)
2008-10-28 06:33 UTC, Robert Bradbury
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Bradbury 2008-10-24 13:07:01 UTC
Multiple browsers (Firefox, Galeon, Seamonkey, etc.) hang accessing external web pages.

Pages which respond immediately (and are small?) may be fine.  E.g. www.google.com works fine.  However, slashdot.org downloads most of the home page but then hangs when trying to download information from jobs.slashdot.org.



Reproducible: Always

Steps to Reproduce:
1. Build 2.6.27 kernel (with CONFIG_HIGH_RES_TIMERS & CONFIG_SCHED_HRTICK on or off)
2. Boot kernel & start xdm
3. Initiate browsing activity (Firefox, Seamonkey, etc.)

Actual Results:  
Pings work fine.  Traceroutes work fine.  Large FTPs work fine.  Browsing can either be slow (at first I thought it was the DSL line or the web itself that had a problem) but later came to the conclusion that browser requests to the net were "hanging".  Seems particularly true when a large number of remote requests are started in quick succession (e.g. a browser restart with multiple tabs).

May be related to the multiple thread/multi-file-handle poll() calls issued by mozilla based browsers (though I think I tried Opera once and it appeared to exhibit similar behavior).  Lynx may have worked ok (but it may have a much more  sequential rather than parallel net access behavior).

Expected Results:  
Pages, even lots of them should load normally.

May be related to bug #242634 but unsure.  Router from HP Pavilion to DSL line is a Linksys 654G.  Have not changed /etc/sysctl.conf.

Interestingly, browsers could not load bugs.gentoo.org page but did appear to be able to load bugzilla.mozilla.org.  Browsers behave normally when one falls back to kernel 2.6.26-gentoo-r1.
Comment 1 Rafał Mużyło 2008-10-24 13:40:24 UTC
Let me see:
1. I'm on gentoo-sources 2.6.27
2. my firefox session is ca. 250 tabs
3. all of them, including about 2 dozen of bugs.gentoo.org, load fine
4. jobs.slashdot.org seems to open just fine
5. my internet connection is nothing outstanding (avg 40kbps)

so it seems to be your problem only (or at least hard to reproduce)
and it's definitely not a blocker
Comment 2 Jeroen Roovers (RETIRED) gentoo-dev 2008-10-24 21:40:08 UTC
1) Please post your `emerge --info' and attach the kernel config.
2) Please familiarise yourself with [1] - this isn't a blocker.

[1] https://bugs.gentoo.org/page.cgi?id=fields.html#bug_severity
Comment 3 Robert Bradbury 2008-10-25 02:47:47 UTC
Portage 2.2_rc12 (default/linux/x86/2008.0/desktop, gcc-4.3.2, glibc-2.8_p20080602-r0, 2.6.26-gentoo-r1 i686)
=================================================================
System uname: Linux-2.6.26-gentoo-r1-i686-Intel-R-_Pentium-R-_4_CPU_2.80GHz-with-glibc2.0
Timestamp of tree: Fri, 24 Oct 2008 09:30:01 +0000
distcc 2.18.3 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
ccache version 2.3 [enabled]
app-shells/bash:     3.2_p33
dev-java/java-config: 1.3.7, 2.1.6-r1
dev-lang/python:     2.4.4-r15, 2.5.2-r8
dev-python/pycrypto: 2.0.1-r6
dev-util/ccache:     2.3
dev-util/cmake:      2.6.2
sys-apps/baselayout: 1.12.11.1
sys-apps/sandbox:    1.2.18.1-r2
sys-devel/autoconf:  2.13, 2.61-r2
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.1-r1
sys-devel/binutils:  2.18-r3
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool:   1.5.26
virtual/os-headers:  2.6.23-r3
ACCEPT_KEYWORDS="x86"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O2 -march=prescott -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/config /var/bind /var/
lib/hsqldb"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/php/ap
ache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/splash /etc
/terminfo /etc/texmf/web2c /etc/udev/rules.d"
CXXFLAGS="-O2 -march=prescott -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="ccache distlocks nostrip parallel-fetch preserve-libs protect-owned sandbox sfperms strict unmerge-orphans us
erfetch"
GENTOO_MIRRORS="http://mirror.datapipe.net/gentoo http://adelie.polymtl.ca/ http://194.117.143.69 http://open-systems.u
fl.edu/mirrors/gentoo"
LDFLAGS="-Wl,-O1"
LINGUAS="en"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_EXTRA_OPTS="--contimeout=300 --timeout=300"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats -
-timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/root3/var/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="X acl acpi alsa apache2 berkdb bluetooth branding bzip2 cairo cdr cli cracklib crypt cups dbus dri dvd dvdr dvdrea
d eds emboss encode esd evo fam firefox fortran gdbm gif gnome gpm gstreamer gtk hal iconv ipv6 isdnlog java jpeg kerbe
ros ldap libnotify mad midi mikmod mono mp3 mpeg mudflap ncurses nptl nptlonly ogg opengl openmp pam pcre pdf perl png 
ppds pppd python qt3 qt3support qt4 quicktime readline reflection ruby sdl session spell spl ssl startup-notification s
vg sysfs tcltk tcpd tiff truetype unicode usb vorbis win32codecs x86 xml xorg xv xvid zlib" ALSA_CARDS="hda-intel" ALSA
_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter
 mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic auth_digest au
thn_anon authn_dbd authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner au
thz_user autoindex cache dav dav_fs dav_lock dbd deflate dir disk_cache env expires ext_filter file_cache filter header
s ident imagemap include info log_config logio mem_cache mime mime_magic negotiation proxy proxy_ajp proxy_balancer pro
xy_connect proxy_http rewrite setenvif so speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_D
EVICES="keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses te
xt" LINGUAS="en" LIRC_DEVICES="devinput hauppauge" USERLAND="GNU" VIDEO_CARDS="fbdev i810 v4l vesa vmware"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LANG, LC_ALL, MAKEOPTS, PORTAGE_COMPRESS, PORTAGE
_COMPRESS_FLAGS, PORTDIR_OVERLAY
Comment 4 Jeroen Roovers (RETIRED) gentoo-dev 2008-10-26 23:43:37 UTC
And the kernel config?
Comment 5 Wormo (RETIRED) gentoo-dev 2008-10-27 01:19:41 UTC
Have you tried the workaround discussed in bug 242634? Disabling tcp sack, dsack, and  timestamps? 
Comment 6 Robert Bradbury 2008-10-28 06:21:26 UTC
In answer to questions, No, adding the changes from bug #242634 to sysctl.conf, e.g.
# Uncomment the tcp_sack & tcp_dsack lines to fix broken router
net.ipv4.tcp_sack = 0
net.ipv4.tcp_dsack = 0

do not fix the browser "hang" problem.

The problems remain approximately as before.  Boot in 2.6.26-gentoo-r1 (as I currently am) and everything seems ok.  Boot in 2.6.27 and while pings are OK, browsers hang (this is without changing any "user" software or the system configuration).  www.gentoo.org is a site that will hang.  www.slashdot.org is "iffy".  www.google.org and www.mozilla.org (though not some of the tested URL's from their home page) work ok.  I believe I have read that there are scheduler changes in 2.6.27 -- could this have anything to do with it?


 
Comment 7 Robert Bradbury 2008-10-28 06:33:26 UTC
Created attachment 170057 [details]
Config file for kernel 2.6.27 which hangs

This is the kernel .config from the browser hang problems.  The only recent change (over the last couple of days) was to enable CONFIG_VIDEO_FB_IVTV to try and resolve a problem with 2.6.27 being unable to access a Hauppauge PVR 150 card.  That did not solve the problem -- no "TV" program, mplayer, xawtv, etc. seems to be able to process the video output of the card (while in versions <= ~2.6.25 worked ok).  It is definitely true however that I can "hang"/crash the console (and all associated VTs) using mplayer under 2.6.27 with the Hauppauge card -- so one or more of those drivers (intelfb, ivtv, Hauppauge) clearly have significant problems.

(And yes, I know, file another bug report...).
Comment 8 Jeroen Roovers (RETIRED) gentoo-dev 2008-10-28 17:01:30 UTC
It would be useful if you'd be able to find a test that causes the same hang without all the dependencies (a huge web browser, X11, window managers and alike). Actual debugging of the kernel would be even better. :)
Comment 9 Mike Pagano gentoo-dev 2008-10-28 19:53:17 UTC
Can you please test with the latest development kernel which is git-sources-2.6.28_rc2-r2 at this time,
Comment 10 Robert Bradbury 2008-10-28 22:21:18 UTC
Jeroen, I doubt the feasibility of devising such a test case.  As I have mentioned, it seems like the network in general works (pings work, large ftp requests work and lynx appears to work).  The big browsers (seamonkey, firefox, opera, etc.) appear to work for small simple pages but not for large complicated pages.  It might help if there were a graded list of external HTTP public web sites of increasing complexity which I could use to test when the browsers fail.  My analysis to date seems to suggest that it has to do with the scheduling and responses to the multi-stream (multi-channel) network requests and the poll() calls associated with waiting until one or more of them returns results.  But I could be wrong.  I could also be the multi-threaded nature of the complicated browsers.  The browsers are by far one of the better stress tests.

If you have an alternative stress test (e.g. a multi-site ftp download that operates with parallel streams) I would be willing to test that (that at least would leave X11 and the browsers out of the picture).

Mike, I don't believe I am up to "git"ing yet another kernel.  I don't do "git".  I can barely manage CVS for firefox.  At one time I downloaded and built and ran a current kernel but I have no interest in doing it for testing.  Truth be told I would much rather spend my time debugging unresolved problems that I encounter regularly with firefox rather than attempting to deal with 6 million lines of linux kernel code that are a far cry from UNIX versions popular in the '70s and '80s which I was once upon a time familiar with.

I only filed this bug report because I considered non-functional browsers a show-stopper and it was clearly associated with 2.6.27 vs. 2.6.26.  If you want to hand me a 2.6.28 kernel (preferably no modules) which will work on my hardware (you have a .config and can change the modules to non-modules) I will test it next time I am in a kernel testing mode (I commonly boot a functional linux and will leave it running for weeks).
Comment 11 Mike Pagano gentoo-dev 2008-10-28 22:48:55 UTC
>Mike, I don't believe I am up to "git"ing yet another kernel.

Would you consider emerge git-sources?

You don't really have to "do git".
Comment 12 Robert Bradbury 2008-10-30 04:19:26 UTC
To further diagnose this, I booted 2.6.27, ran seamonkey and tried loading a number of URLs in different tabs (some simultaneously, some sequentially).  A few, such as google, loaded completely.  Most hung with the rotating (loading) icon on the tabs.

Using strace to attach the seamonkey process, one sees a continuous (repeating) system call sequence:
ioctl(3, FIONREAD, [0])                 = 0
gettimeofday({1225266270, 991298}, NULL) = 0
poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=17, events=POLLIN|POLLPRI}, {fd=19, events=POLLIN|POLLPRI}, {fd=20, events=POLLIN|POLLPRI}, {fd=21, events=POLLIN|POLLPRI}, {fd=6, events=POLLIN}], 7, 10) = 0 (Timeout)
gettimeofday({1225266271, 3985}, NULL)  = 0
gettimeofday({1225266271, 4091}, NULL)  = 0
gettimeofday({1225266271, 4142}, NULL)  = 0
gettimeofday({1225266271, 4189}, NULL)  = 0
gettimeofday({1225266271, 4241}, NULL)  = 0
ioctl(3, FIONREAD, [0])                 = 0
gettimeofday({1225266271, 4345}, NULL)  = 0

I am no expert on firefox so don't really know what is going on, but it looks like it may be polling on incoming TCP/IP requests to multiple sites and isn't getting any data.  See the next comment on sites that work vs. don't work for further information.  Why it does multiple gettimeofday() calls in rapid succession isn't clear to me.
Comment 13 Robert Bradbury 2008-10-30 04:37:39 UTC
Actually, the last couple of lines in the previous comment may be part of the repeating sequence.

A list of various URLs tested and their status is:
http://www.gentoo.org/					complete
http://www.google.com/search? ...			complete, with 10 to 100
 results requested
http://ekiga.org/index.php?rub=1&pos=10			complete
http://www.gentoo.org/doc/en/handbook/handbook-x86.xml	hung, no data
http://bugs.gentoo.org/show_bug.cgi?id=232643		hung, no data
http://www.gossamer-threads.com/lists/linux/kernel/983751?page=last	hung, no
 data
http://forum.skype.com/index.php?showtopic=100897&view=findpost&p=463103	
hung, no data
http://ubuntuforums.org/showthread.php?t=529970		hung, no data
http://www.de.gentoo-wiki.com/Webcam			hung, no data
http://forums.fedoraforum.org/showthread.php?t=196202	partially hung (page loa
ded, but tab still spinning)
http://www.skype.com/					hung, no data
http://www.harvard.edu/					complete
http://www.mit.edu/					hung, no data
http://www.bpl.org/					slow loading, then compl
etes page
http://maps.google.com/					works fine at various lo
cations
http://www.orkut.com/					seems to work fine on mu
ltiple pages
http://www.slashdot.org/				hangs after loading some
 data
http://www.verizon.com/					hangs on first request, 
mostly downloads on "reload"
http://www.att.com/					hangs on first request, 
reload requests load some data, then hang
http://www.boston.com/					hangs, perhaps loading s
ome data
http://www.nytimes.com/					hangs talking to graphic
s8.nytimes.com
http://www.ncbi.nlm.nih.gov/				hangs on first request, 
reload request gets most of main page
http://thomas.loc.gov/					complete
http://www.msn.com/					hangs after loading some
 data
http://search.yahoo.com/				hangs after laoding some
 data

Now, it looks to me as if Google and thomas.loc.gov may be doing something very funny with their HTTP responses (perhaps shoving multiple "sends" into the pipe before they wait for acknowledgements concerning the packets).  I am *not* a TCP/HTTP protocol expert.  But it seemed as if those sites never would hang vs the other sites which didn't load any (much?) data or hung in the middle of loading multiple requests.  The "hangs" are associated with poll()ing for any of the requested streams to return data (in contrast to data always() being available on a stream).

Now, I had what might be an atypical kernel TCP configuration (don't know what is typical) and tried changing it to one that was claimed to work for 2.6.27 for someone else but that didn't impact the situation.  Since I don't know what a "basic" configuration is I would like to request that someone who has working browsers on a "simple" machine (e.g. Intel Prescott, Intel 915 (or similar) graphics) send me their .config file so I can compare it with mine to see what might be different (esp. in the TCP, Scheduling and Timer options).  Thanks.



Comment 14 Daniel Drake (RETIRED) gentoo-dev 2008-10-30 23:07:21 UTC
Please retest with gentoo-sources-2.6.27-r2
Comment 15 Robert Bradbury 2008-11-01 21:47:05 UTC
2.6.27-r2 is a significant improvement.  Seamonkey can load and browse multiple sites which could previously not be reached.  I am a bit unsure about Galeon.  The first restart appeared to hang.  The second restart worked fine (this was with several dozen tabs (and therefore sites) being accessed).

Are there any hints as to exactly what was fixed between r0 and r2?
And can I remove the modifications to /etc/sysctl,conf?
Comment 16 Daniel Drake (RETIRED) gentoo-dev 2008-11-01 23:03:59 UTC
I think it's likely that the changes in bug #242634 fixed this too. Please reopen if problems persist.