Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 105664 - MLdonkey 2.6.4 keeps crashing without feedback
Summary: MLdonkey 2.6.4 keeps crashing without feedback
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: x86 Linux
: High critical (vote)
Assignee: Gentoo net-p2p team
URL:
Whiteboard:
Keywords:
: 111326 (view as bug list)
Depends on:
Blocks:
 
Reported: 2005-09-12 00:17 UTC by Master One
Modified: 2005-11-20 16:56 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Master One 2005-09-12 00:17:29 UTC
Yesterday I upgraded from 2.5.16 to 2.6.4, because I recently had some 
troubles with the core crashing all the time. Unfortunately the situation did 
not get any better. 
 
I tried to investigate that matter the whole afternoon, but it didn't bring me 
any step forward. 
 
There seem to be absolutely no hints in the system-log (/var/log/messages) or 
in the mldonkey-log (/var/log/mldonkey.log). 
 
mlnet (running on a server with gentoo-sources-2.6.12-r10 / gcc 3.4.4-r1 / 
glibc 2.3.5-r1 / nptl / mldonkey emerged with USE flags "gd" & "threads") 
generally works, but always suddenly crashes after a seemingly random period 
(can be after half an hour, or even some hours). In that case sancho-gui 
(0.9.4-47 running on a WinXP workstation) disconnects, and the mlnet process 
on the server just disappears without further notice (strangely can only be 
restarted , after I delete the mlnet.pid file in the mldonkey-home-dir, no 
idea why that is, because I didn't have to do this with mldonkey 2.5.16, and I 
even can't remember, that the mlnet.pid file was stored in the home-dir -> 
isn't that what /var/run is supposed to be there for?). 
 
I have emerged mldonkey 2.6.4 normally, since it is in portage now, and before 
that, I upgraded ocaml to 3.08.3 the same way (so emerged it from portage, and 
not using the "batch" USE-flag). 
 
As already mentioned, there is no hint, why mlnet crashes, and what exactly 
happens then. The only abnormal messages in mldonkey.log are the repeating 
lines of "[BT] Unknown BT client found please report the next line to the dev 
team: BTUC:.....", also I do not expect this to be causing the problem. 
 
I already searched bugs.gentoo.org, and found bug #103411, but that one is 
about a memory problem, which does not occure here (mlnet just only stays at a 
memory usage of about 6% -> that machine has 1 GB RAM). 
 
The only changes I made recently, was playing arround with the NICE setting 
in /etc/conf.d/mldonkey, which was set to "19" by default. At first I lowered 
that setting to "3" and then to "0", because I thought, it may have something 
to do with CPU usage. That machine has a P4 2.4, but I let the ondemand CPU 
govenor scale it down to 300 MHz on low load. It could be a coincidence, but I 
think, lowering the NICE value really helped, so that the number of crashes 
went down (means I have the feeling, that the periods between the crashes have 
become longer). 
 
I use mldonkey only on bittorrent at the moment, all other protocols are 
deactivated. Could it be, that mldonkey can be killed by "fake"-datapackages, 
"hostile"-uploaders or "hostile"-clientsoftware? 
 
Those crashes did not appear in the past. When I started with 2.5.16, the core 
ran stable for days without any problem. It really only got worse within the 
past few month, that's why I thought it may be an influence from outside 
(changes in the BT protocol, or problems with other client-software of 
uploaders). The upgrade of ocaml and mldonkey itself did not help at all. 
 
On the mldonkey forums it was suggested, that it could be a Gentoo problem, 
because such an issue is not known on other Linux or *BSD distributions. 
 
Isn't there any possibility of analysing that problem any further, so why the 
core crashes without any hints in the logs and seemingly after a random 
period? I would expect, that traces remain somewhere in the systems, when a 
process disappears. 
 
Hopefully someone has any idea concerning this matter, or is fighting with the 
same problem, so that this issue can be solved with collective thinking. The 
actual situation is very depressing, I use Gentoo on all my machines, and the 
mentioned server also handles some other services, so swapping to another 
distribution (or even FreeBSD) is not possible. I can't believe, that it stays 
with "mldonkey simply does not work on Gentoo linux". 

Reproducible: Always
Steps to Reproduce:
Just start mldonkey as a service (using /etc/init.d/mldonkey start). 
Actual Results:  
It crashed after a seemingly random time without any feedback.   

Expected Results:  
It should be running stable and uninterrupted for days or even weeks. 

Portage 2.0.51.22-r2 (default-linux/x86/2005.0, gcc-3.4.4, glibc-2.3.5-r1, 
2.6.12-gentoo-r10 i686) 
================================================================= 
System uname: 2.6.12-gentoo-r10 i686 Intel(R) Pentium(R) 4 CPU 2.40GHz 
Gentoo Base System version 1.6.13 
distcc 2.18.3 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) 
[enabled] 
ccache version 2.3 [enabled] 
dev-lang/python:     2.3.5-r2 
sys-apps/sandbox:    1.2.12 
sys-devel/autoconf:  2.13, 2.59-r6 
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6 
sys-devel/binutils:  2.15.92.0.2-r10 
sys-devel/libtool:   1.5.18-r1 
virtual/os-headers:  2.6.11-r2 
ACCEPT_KEYWORDS="x86" 
AUTOCLEAN="yes" 
CBUILD="i686-pc-linux-gnu" 
CFLAGS="-O2 -march=pentium4 -pipe -fomit-frame-pointer" 
CHOST="i686-pc-linux-gnu" 
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config /usr/share/config /var/qmail/control" 
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" 
CXXFLAGS="-O2 -march=pentium4 -pipe -fomit-frame-pointer" 
DISTDIR="/usr/portage/distfiles" 
FEATURES="autoconfig ccache distcc distlocks sandbox sfperms strict" 
GENTOO_MIRRORS="http://gentoo.inode.at http://gentoo.osuosl.org 
http://www.ibiblio.org/pub/Linux/distributions/gentoo" 
LANG="de_DE" 
LC_ALL="de_DE@euro" 
LINGUAS="de" 
MAKEOPTS="-j5" 
PKGDIR="/usr/portage/packages" 
PORTAGE_TMPDIR="/var/tmp" 
PORTDIR="/usr/portage" 
PORTDIR_OVERLAY="/usr/local/portage" 
SYNC="rsync://lanmaster/gentoo-portage" 
USE="x86 acpi apache2 bash-completion berkdb crypt eds fortran gd gpm 
gstreamer logrotate ncurses nls nptl ogg pam perl pic python readline samba 
ssl tcpd threads vorbis xml2 zlib linguas_de userland_GNU kernel_linux 
elibc_glibc" 
Unset:  ASFLAGS, CTARGET, LDFLAGS
Comment 1 Marcin Kryczek (RETIRED) gentoo-dev 2005-09-12 06:02:43 UTC
you've written, you've lowered nice level from 19 to 0 and crashes become 
rarely. well - in fact you've increase mlnet priority (-20 is the highest and 20 
is the lowest one). it may mean, that mlnet dies, when it has not sufficient 
amount of cpu activity (which ofcourse shouldn't happen, but...). could you try 
to turn off the CPU governor, so the machine runs always with it's default 2.
4GHz and let us know it that change anything?
Comment 2 Master One 2005-09-12 08:22:36 UTC
@Marcin Kryczek 
 
It may be worth a try, but I don't really think that has something to do with 
it. 
 
The reason is, that machine stays at 300 MHz most of the time, and mlnet then 
consumes only about 10% CPU and 6% MEM. The ondemand trigger is set to 80%, and 
as soon as that value is reached, the CPU immediately goes up till 2.4 GHz, so 
there is nothing maxing out the CPU power at any given time. 
 
On the other hand, those crashes just seem to appear totally randomly. ATM the 
core shows an uptime of 3.5 hours, with the CPU frequency staying at 300 MHz 
and 15 BT downloads. 
 
After the next crash occures, I will set the CPU govenor to "performance", to 
see what happens then. 
Comment 3 Jan Essert 2005-09-12 10:35:54 UTC
I _suppose_ I'm getting mldonkey crashes, too. 
 
I say 'suppose', since the error I'm experiencing is system hang on shutdown 
while 'Stopping service mldonkey' and a leftover mlnet.pid. Could this be 
related to bug #103433? 
 
I'll try to witness such a crash, right now it's running and I can stop it 
without problems with /etc/init.d/mldonkey stop. 
Comment 4 spiralvoice 2005-09-13 13:30:18 UTC
(In reply to comment #0)

> distcc 2.18.3 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) 
> MAKEOPTS="-j5" 

I had some problems on Solaris with distcc, make -j5 and Ocaml applications.
Try compiling Ocaml and MLDonkey without distcc and with make -j1. Maybe it helps.
Comment 5 Daniel Vianna 2005-09-13 19:04:55 UTC
Both the versions in portage and the precompiled cores from
http://download.berlios.de/pub/mldonkey/spiralvoice/ crash. The 2.6.4 precomp
core logs

2005/09/13 22:53:39 [cF] Checksum computation failed: Exception: os_read failed:
Input/output error

before dying.
Comment 6 Master One 2005-09-15 07:47:09 UTC
@Daniel Vianna  
  
That has to be another problem, because my issue does not result in any error 
message.  
  
In the meantime, I tried some different things:  
  
- Recompiled ocaml 3.08.3 and mldoney 2.6.4 with the following settings: 
     CFLAGS="-O1 -march=pentium4 -pipe -fomit-frame-pointer"  
     MAKEOPTS="-j1"  
     FEATURES="-ccache -distcc" 
 
- Added the following system settings:  
  /etc/security/limits.conf 
     *               soft    nproc           4096  
     *               hard    nproc           16384  
     *               soft    nofile          4096  
     *               hard    nofile          65536 
  /etc/sysctl.conf 
     kernel.shmall = 2097152  
     kernel.shmmax = 2147483648  
     kernel.shmmni = 4096  
     kernel.sem = 250 32000 100 128  
     fs.file-max = 65536 
 
I don't know, if any of these measures helped, but it seems to be more stable 
again. The actual uptime of the core is one day, before that it was about 9 
hours (then it crashed again after adding some new torrents).  
  
BTW Since the upgrade to 2.6.4, I (again) have the problem with those 
phantom-commits. When a file-download is finished, commited and moved from the 
incoming-folder to the final destination, files with the same name and a size 
of 0 KB keep showing up in the incoming-folder. No idea what's that all 
about... 
Comment 7 Master One 2005-09-15 07:50:26 UTC
BTW Since last month there is the new ocaml version 3.08.4, which seems to be a 
bugfix-release. Any idea, why that one is still not in portage? It may be an 
idea, to reemerge mldonkey with ocaml 3.08.4 installed. 
Comment 8 Master One 2005-09-15 23:19:34 UTC
I forgot to mention, that I have set the cpufreq-govenor to "performance" since 
the last crash, so maybe all the other settings have no influence at all, and 
it was all about the P4 frequency throttling. I will do some more test with the 
ondemand govenor, as soon as I find the time (I really would like to have that 
working, the ondemand govenor works really well for all the other stuff, and 
why let that machine run on 2.4 GHz 24/7, if it also can operate at only 300 
MHz, when load is low). 
Comment 9 Master One 2005-09-23 01:51:30 UTC
I think the problem is solved: 
 
It was indeed the "ondemand" CPU govenor! 
 
I have reversed the mentioned system changes, updated to ocaml 3.08.4 and 
mldonkey 2.6.4-r1 (both compiled with my systemwide standardsettings), and 
switched to the "performance" CPU govenor. Since that, mlnet runs without 
interruption for days without crash. 
 
Because I used the "ondemand" CPU govenor for quite some time, and it did not 
cause any problems at the beginning, I think, that something changed with one 
of the last kernel-upgrades. 
 
The only remaining problem is now, that I still get phantom-files with a size 
of 0 kb in the incoming folder after a commit. That's not really tragical, but 
nevertheless annoying. 
Comment 10 Jakub Moc (RETIRED) gentoo-dev 2005-11-03 00:43:29 UTC
*** Bug 111326 has been marked as a duplicate of this bug. ***
Comment 11 Jakub Moc (RETIRED) gentoo-dev 2005-11-03 00:43:46 UTC
Reopen wrt Bug 111326.
Comment 12 César Fernández 2005-11-06 03:08:53 UTC
It crashes as hell. It's impossible to use any mlnet >=2.6.5. Maybe they should
be masked. 2.6.4-r2 works... well, fine.

I don't use any cpufreq program.

Portage 2.0.53_rc7 (default-linux/x86/2005.0, gcc-3.4.4, glibc-2.3.5-r3,
2.6.14-gentoo i686)
=================================================================
System uname: 2.6.14-gentoo i686 AMD Athlon(TM) XP 1800+
Gentoo Base System version 1.12.0_pre9
ccache version 2.4 [enabled]
dev-lang/python:     2.4.2
sys-apps/sandbox:    1.2.13
sys-devel/autoconf:  2.13, 2.59-r7
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r1
sys-devel/binutils:  2.16.1
sys-devel/libtool:   1.5.20-r1
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS="x86 ~x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-march=athlon-xp -mmmx -m3dnow -msse -mfpmath=sse,387 -ffast-math -O2
-fomit-frame-pointer -frename-registers -funroll-loops -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3.5/env
/usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/kde/3/share/config
/usr/lib/X11/xkb /usr/share/config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-march=athlon-xp -mmmx -m3dnow -msse -mfpmath=sse,387 -ffast-math -O2
-fomit-frame-pointer -frename-registers -funroll-loops -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig ccache distlocks sandbox sfperms strict"
GENTOO_MIRRORS="http://linuv.uv.es/mirror/gentoo/ http://www.caliu.info/pub/gentoo/"
LANG="es_ES.UTF-8"
LC_ALL="es_ES.UTF-8"
LDFLAGS="-Wl,-O1"
LINGUAS="es"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="x86 16bit 3dnow 3dnowext 7zip S3TC X a52 aac aalib acpi alsa apache2
audiofile bash-completion berkdb bidi bzip2 cairo cddb cdparanoia cdr chroot cjk
clock-screen crypt cscope css cups curl dba dbus dlloader dts dvd dvdr dvdread
dynagraph ecc edl eds emboss erandom exif faac faad fam fbcon ffmpeg flac
font-server fontconfig foomaticdb foreign-sysvinit ftp gd gdbm gif gimpprint
glibc-omitfp glitz gpm graphviz gs gtk2 hal hardened hpn icecast iconv idn
imagemagick imlib imlib2 immqt-bc ipv6 irmc ithreads jabber java javascript jbig
jce jikes jpeg jpeg2k justify kde kdeenablefinal lcms libcaca libg++ libwww
linguas_es live lm_sensors logitech-mouse logrotate lzo lzw-tiff mad matroska
md5sum mikmod mmap mmx mmxext mng monkey moznocompose moznoirc moznomail mozsvg
mp3 mpeg mpeg4 mpi mplayer msn musepack musicbrainz mysql mysqli ncurses network
nls no-old-linux no_wxgtk1 nomac nomalloccheck nomotif nptl nptlonly ogg
oggvorbis openexr opengl pam pdflib perl pic png ppds python qt quicktime
rdesktop readline rtc ruby sftplogging slp speex spell sse ssl stencil-buffer
svg symlink tcpd tga theora threads tiff toolbar truetype truetype-fonts udev
unicode urandom usb userlocales utf8 vcd vhosts vim-with-x visualization vorbis
win32codecs wmf xine xml2 xpm xprint xrandr xscreensaver xv xvid yv12 zeroconf
zip zlib userland_GNU kernel_linux elibc_glibc"
Unset:  ASFLAGS, CTARGET, MAKEOPTS
Comment 13 spiralvoice 2005-11-07 14:40:51 UTC
CFLAGS="... -fomit-frame-pointer ..."
see bug #111626 for more details.
Comment 14 Jan Essert 2005-11-10 08:20:15 UTC
ok, it seems my old problem was caused by parallel shutdown in /etc/conf.d/rc, 
will search or submit this as another bug 
Comment 15 Marcin Kryczek (RETIRED) gentoo-dev 2005-11-20 16:56:23 UTC
marking as closed