8607 – metalog hanging from time to time

Bug 8607 - metalog hanging from time to time

Summary: metalog hanging from time to time

Status:	RESOLVED TEST-REQUEST

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] Core system (show other bugs)
Hardware:	x86 Linux

Importance:	High major
Assignee:	Heinrich Wendel (RETIRED)

URL:
Whiteboard:
Keywords:

Duplicates (2):	18384 31277 (view as bug list)
Depends on:	3434
Blocks:	20593
	Show dependency tree

Reported:	2002-10-01 07:20 UTC by Sascha Silbe
Modified:	2007-04-09 10:13 UTC (History)
CC List:	13 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
Patch to make it multiprocess (metalog-spawn.diff,1.37 KB, patch) 2003-04-16 16:02 UTC, Jedi/Sector One	Details \| Diff
this is a patch made against metalog-0.6-r10 (metalog.patch,1.38 KB, patch) 2003-05-06 15:42 UTC, Roger	Details \| Diff
this is a patch made against metalog-0.6-r10 (metalog.patch,1.38 KB, application/octet-stream) 2003-05-06 15:42 UTC, Roger	Details
this is a patch made against metalog-0.6-r10 (metalog.patch,1.38 KB, text/plain) 2003-05-06 15:42 UTC, Roger	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Sascha Silbe 2002-10-01 07:20:57 UTC

From time to time, metalog hangs. Because it handles syslog requests, many other processes hang, too.
First I thought it would occur after a long uptime, but today it happened 10 hours after the reboot.
There's no apparent cause for this, so I cannot really reproduce this. I guess it's some kind of race condition.
strace shows that the two metalog processes are waiting for pause() and syslog() to return:

root@cube:/home/sascha# strace -ff -p 6239
pause(

root@cube:/home/sascha# strace -ff -p 6247
syslog(0x2, 0xbffff518, 0x800

It happened with gcc 2.95.x and 3.2 and with Kernel 2.4.18 and 2.4.19.

Any suggestion how to reproduce or at least trace this (without generating gigabytes of logs during normal operation)?

Comment 1 SpanKY gentoo-dev

2002-10-01 10:04:45 UTC

well, anything that the kernel logs gets stored in its internal buffer and is 
sent to a user space logging daemon. 
 
so you could type `dmesg` to see if there are any funky kernel logs ...

Comment 2 Sascha Silbe 2002-10-03 06:51:07 UTC

It just happened again. There's nothing new in the kernel buffer (dmesg output). Sending a CONT does not help, sending a HUP just tells metalog to exit:

=== Begin screenshot 1 ===
root@cube:/home/sascha# strace -p 6312
pause()                                 = ? ERESTARTNOHAND (To be restarted)
--- SIGCONT (Continued) ---
pause()                                 = ? ERESTARTNOHAND (To be restarted)
--- SIGHUP (Hangup) ---
write(2, "Unlinking pid file: /var/run/met"..., 41) = -1 EBADF (Bad file descriptor)
unlink("/var/run/metalog.pid")          = 0
getpid()                                = 6312
write(2, "Process [6312] died with signal "..., 36) = -1 EBADF (Bad file descriptor)
kill(6320, SIGTERM)                     = 0
--- SIGCHLD (Child exited) ---
wait4(-1, NULL, WNOHANG, NULL)          = 6320
write(2, "Klog child [6320] died\n", 23) = -1 EBADF (Bad file descriptor)
wait4(-1, NULL, WNOHANG, NULL)          = -1 ECHILD (No child processes)
munmap(0x40018000, 4096)                = 0
munmap(0x40017000, 4096)                = 0
munmap(0x4001a000, 4096)                = 0
munmap(0x40019000, 4096)                = 0
munmap(0x40016000, 4096)                = 0
_exit(1)                                = ?
=== End screenshot 1 ===

=== Begin screenshot 2 ===
root@cube:/home/sascha# strace -p 6320
syslog(0x2, 0xbffff518, 0x800)          = ? ERESTARTSYS (To be restarted)
--- SIGCONT (Continued) ---
syslog(0x2, 0xbffff518, 0x800)          = ? ERESTARTSYS (To be restarted)
--- SIGHUP (Hangup) ---
--- SIGTERM (Terminated) ---
=== End screenshot 2 ===

The last few lines of dmesg output:

=== Begin dmesg ===
eth0: no IPv6 routers present
ipsec0: no IPv6 routers present
NVRM: AGPGART: allocated 136 pages
NVRM: AGPGART: freed 136 pages
NVRM: AGPGART: allocated 42 pages
NVRM: AGPGART: freed 42 pages
=== End dmesg ===

The old syslog file:

=== Begin /var/log/syslog/current.old ===
Oct  2 23:34:09 [pluto] "cube" #13: initiating Main Mode to replace #12
Oct  2 23:34:10 [pluto] "cube" #13: Peer ID is ID_IPV4_ADDR: '192.168.1.1'
Oct  2 23:34:10 [pluto] "cube" #13: ISAKMP SA established
Oct  2 23:38:39 [sSMTP mail] sendmail sent mail for sascha
Oct  2 23:52:37 [sSMTP mail] sendmail sent mail for sascha
Oct  3 00:09:45 [kernel] NVRM: AGPGART: allocated 136 pages
Oct  3 00:11:14 [kernel] NVRM: AGPGART: freed 136 pages
Oct  3 00:24:52 [pluto] "cube" #14: initiating Main Mode to replace #13
Oct  3 00:24:52 [pluto] "cube" #14: Peer ID is ID_IPV4_ADDR: '192.168.1.1'
Oct  3 00:24:52 [pluto] "cube" #14: ISAKMP SA established
Oct  3 00:33:40 [sSMTP mail] sendmail sent mail for sascha
Oct  3 00:42:21 [kernel] NVRM: AGPGART: allocated 42 pages
Oct  3 00:42:46 [kernel] NVRM: AGPGART: freed 42 pages
Oct  3 01:00:07 [sSMTP mail] /usr/lib/sendmail sent mail for root
Oct  3 01:10:20 [pluto] "cube" #15: initiating Main Mode to replace #14
Oct  3 01:10:21 [pluto] "cube" #15: Peer ID is ID_IPV4_ADDR: '192.168.1.1'
Oct  3 01:10:21 [pluto] "cube" #15: ISAKMP SA established
Oct  3 01:55:16 [pluto] "cube" #16: responding to Quick Mode
Oct  3 01:55:16 [pluto] "cube" #16: IPsec SA established
Oct  3 02:00:28 [pluto] "cube" #17: initiating Main Mode to replace #15
Oct  3 02:00:28 [pluto] "cube" #17: Peer ID is ID_IPV4_ADDR: '192.168.1.1'
Oct  3 02:00:28 [pluto] "cube" #17: ISAKMP SA established
Oct  3 02:06:39 [pluto] "cube" #17: ignoring Delete SA payload
Oct  3 02:06:39 [pluto] "cube" #17: received and ignored informational message
Oct  3 02:42:41 [pluto] "cube" #18: initiating Main Mode to replace #17
Oct  3 02:42:41 [pluto] "cube" #18: Peer ID is ID_IPV4_ADDR: '192.168.1.1'
Oct  3 02:42:41 [pluto] "cube" #18: ISAKMP SA established
Oct  3 03:29:48 [pluto] "cube" #19: initiating Main Mode to replace #18
Oct  3 03:29:49 [pluto] "cube" #19: Peer ID is ID_IPV4_ADDR: '192.168.1.1'
Oct  3 03:29:49 [pluto] "cube" #19: ISAKMP SA established
Oct  3 04:13:40 [pluto] "cube" #20: initiating Main Mode to replace #19
Oct  3 04:13:40 [pluto] "cube" #20: Peer ID is ID_IPV4_ADDR: '192.168.1.1'
Oct  3 04:13:40 [pluto] "cube" #20: ISAKMP SA established
Oct  3 04:55:54 [pluto] "cube" #21: initiating Main Mode to replace #20
Oct  3 04:55:54 [pluto] "cube" #21: Peer ID is ID_IPV4_ADDR: '192.168.1.1'
Oct  3 04:55:54 [pluto] "cube" #21: ISAKMP SA established
Oct  3 05:38:06 [pluto] "cube" #22: initiating Main Mode to replace #21
Oct  3 05:38:06 [pluto] "cube" #22: Peer ID is ID_IPV4_ADDR: '192.168.1.1'
Oct  3 05:38:06 [pluto] "cube" #22: ISAKMP SA established
Oct  3 06:22:36 [pluto] "cube" #23: initiating Main Mode to replace #22
=== End /var/log/syslog/current.old ===

The new file (i.e. after restarting metalog) contains these new entries:

=== Begin /var/log/syslog/current ===
Oct  3 13:40:53 [sshd(pam_unix)] session closed for user root
Oct  3 13:41:28 [pluto] "cube" #54: max number of retransmissions (2) reached STATE_MAIN_R1
Oct  3 13:41:28 [pluto] "cube" #53: max number of retransmissions (2) reached STATE_MAIN_R1
Oct  3 13:41:28 [pluto] "cube" #52: max number of retransmissions (2) reached STATE_MAIN_R1
Oct  3 13:41:28 [pluto] "cube" #51: max number of retransmissions (2) reached STATE_MAIN_R1
Oct  3 13:41:28 [pluto] "cube" #50: max number of retransmissions (2) reached STATE_MAIN_R1
Oct  3 13:41:28 [pluto] "cube" #49: max number of retransmissions (2) reached STATE_MAIN_R1
Oct  3 13:41:28 [pluto] "cube" #48: max number of retransmissions (2) reached STATE_MAIN_R1
Oct  3 13:41:28 [pluto] "cube" #47: max number of retransmissions (2) reached STATE_MAIN_R1
Oct  3 13:41:28 [pluto] "cube" #46: max number of retransmissions (2) reached STATE_MAIN_R1
Oct  3 13:41:28 [pluto] "cube" #45: max number of retransmissions (2) reached STATE_MAIN_R1
Oct  3 13:41:28 [pluto] "cube" #44: max number of retransmissions (2) reached STATE_MAIN_R1
Oct  3 13:41:28 [pluto] "cube" #43: max number of retransmissions (2) reached STATE_MAIN_R1
Oct  3 13:41:28 [pluto] "cube" #42: max number of retransmissions (2) reached STATE_MAIN_R1
Oct  3 13:41:28 [pluto] "cube" #41: max number of retransmissions (2) reached STATE_MAIN_R1
Oct  3 13:41:28 [pluto] "cube" #40: max number of retransmissions (2) reached STATE_MAIN_R1
Oct  3 13:41:28 [pluto] "cube" #39: max number of retransmissions (2) reached STATE_MAIN_R1
Oct  3 13:41:28 [pluto] "cube" #38: max number of retransmissions (2) reached STATE_MAIN_R1
Oct  3 13:41:28 [pluto] "cube" #37: max number of retransmissions (2) reached STATE_MAIN_R1
Oct  3 13:41:28 [pluto] "cube" #36: max number of retransmissions (2) reached STATE_MAIN_R1
Oct  3 13:41:28 [pluto] "cube" #35: max number of retransmissions (2) reached STATE_MAIN_R1
Oct  3 13:41:28 [pluto] "cube" #34: max number of retransmissions (2) reached STATE_MAIN_R1
Oct  3 13:41:28 [pluto] "cube" #33: max number of retransmissions (2) reached STATE_MAIN_R1
Oct  3 13:41:28 [pluto] "cube" #32: max number of retransmissions (2) reached STATE_QUICK_R1
Oct  3 13:41:28 [pluto] "cube" #31: max number of retransmissions (2) reached STATE_QUICK_R1
Oct  3 13:41:28 [pluto] "cube" #30: max number of retransmissions (2) reached STATE_QUICK_R1
Oct  3 13:41:28 [pluto] "cube" #29: max number of retransmissions (2) reached STATE_QUICK_R1
Oct  3 13:41:28 [pluto] "cube" #28: max number of retransmissions (2) reached STATE_QUICK_R1
=== End /var/log/syslog/current ===

Seems like it has to do with syslog, not klog.
Any suggestion how to proceed?

Comment 3 Marcel Köppen 2003-01-20 05:31:23 UTC

I have the same problem here, and it seems to depend on the kernel I use. With
2.4.20-pre8-ac3 everything works as expected, but with vanilla 2.4.20,
2.4.20-ck1 and 2.4.21-pre3 metalog stops working after some time.

Comment 4 Martin Zwickel 2003-02-12 02:59:09 UTC

i have the same problem!
it just sux...*argh*
i try to find a solution for my own, until it gets fixed.

Comment 5 Wouter Deconinck 2003-03-30 05:20:37 UTC

I have the same problem.

Also, logging in is not possible anymore (not as a user, not as root).  I type the username (if not using su), then I get "Password:".  If I type the password <ENTER>, nothing happens.  On the console I get a 60 seconds timeout.

My internet connection is also down while I have this problem (vpn, using pptp and pppd).

These problems are caused both by the logger (su want to write an entry, probably pppd also wants to write something, and they have to wait...).
Two choices: kernelproblem (mine is 2.4.20) with sending logentries or problem with metalog (0.6-r10) receiving logentries.


I have put some diagnostics here about su.

=== where does it go wrong for 'su'? ===
# strace su > su.crash
# reboot
# strace su > su.normal
# diff su.crash su.normal
<cut>
3556c3472,3508
< send(3, "<37>Mar 30 00:16:14 su(pam_unix)"..., 133, 0 <unfinished ...>
---
> send(3, "<37>Mar 30 00:44:40 su(pam_unix)"..., 133, 0) = 133
<cut>
... the rest is only executed in the normal case (although strace somehow doesn't allow me to log in).
========================================

Comment 6 Jens Kreiensiek 2003-04-02 02:40:19 UTC

Same problem here too! My configuration is
- gentoo-sources-2.4.20-r2
- metalog-0.6-r10

Comment 7 simon 2003-04-06 07:31:30 UTC

I have the same problem with metalog stopping from time to time with gentoo-2.4.20-r2 Kernel.

Comment 8 Nils Ohlmeier 2003-04-11 09:26:11 UTC

I have the same problem here. And it is definatly a metalog problem, 
because i had a reproduceable freeze with a program on which i develop. 
But the same freeze did not occured on my laptop where sysklogd is 
runing. 
The problem occures with synchronization turn on and without. 
 
My workaround is that i have always a root console open from where i can 
restart metalog, which solve the problem temporarly. 
If the system do not shut down a tripple hit on CRTL-ALT-DEL brings the 
system down. But sadly wihout unmounting the disks. 
 
I just installed a version of metalog with debuging symbols. Maybe i can 
deliver some gdb output. 
 
For completness: 
Portage 2.0.47-r10 (default-x86-1.4, gcc-3.2.2, glibc-2.3.1-r4) 
================================================================= 
System uname: 2.4.20-gentoo-r2 i686 AMD Athlon(TM) XP 1900+ 
GENTOO_MIRRORS="ftp://ftp.tu-clausthal.de/pub/linux/gentoo/ 
ftp://ftp.ibiblio.org/pub/linux/distribution/gentoo " 
CONFIG_PROTECT="/etc /var/qmail/control /usr/kde/2/share/config 
/usr/kde/3/share/config /usr/X11R6/lib/X11/xkb /usr/kde/3.1/share/config 
/usr/share/config" 
CONFIG_PROTECT_MASK="/etc/gconf /etc/env.d" 
PORTDIR="/usr/portage" 
DISTDIR="/usr/portage/distfiles" 
PKGDIR="/usr/portage/packages" 
PORTAGE_TMPDIR="/var/tmp" 
PORTDIR_OVERLAY="/usr/local/portage" 
USE="x86 libg++ mikmod gdbm slang readline svga tcltk tcpd libwww perl 
python motif 3dnow acpi acpi4linux alsa -apm arts avi berkdb -bonobo cdr 
crypt cups dga directfb dvb dvd encode -esd -evo fbcon flash -gb gif 
-gnome gphoto2 gpm gtk imap imlib innodb java jpeg -ldap kde maildir 
matrox mbox mmx mozilla mpeg mysql ncurses nls odbc oggvorbis opengl 
oss pam pda pdflib pic png qt qtmt quicktime samba sasl scanner sdl slp 
spell sse ssl tiff tetex truetype wmf X xml2 xmms xv zlib" 
COMPILER="gcc3" 
CHOST="i686-pc-linux-gnu" 
CFLAGS="-mcpu=athlon-xp -O3 -pipe -fomit-frame-pointer -fPIC" 
CXXFLAGS="-mcpu=athlon-xp -O3 -pipe -fomit-frame-pointer -fPIC" 
ACCEPT_KEYWORDS="x86" 
MAKEOPTS="-j2" 
AUTOCLEAN="yes" 
SYNC="rsync://rsync.de.gentoo.org/gentoo-portage" 
FEATURES="sandbox ccache"

Comment 9 Nils Ohlmeier 2003-04-11 15:11:06 UTC

Here backtrace of the problem. I dont have the time right now to 
investigate this further so i just store it here: 
 
cloudcity root # gdb /usr/sbin/metalog 8033 
GNU gdb 5.3 
Copyright 2002 Free Software Foundation, Inc. 
GDB is free software, covered by the GNU General Public License, and 
you are 
welcome to change it and/or distribute copies of it under certain 
conditions. 
Type "show copying" to see the conditions. 
There is absolutely no warranty for GDB.  Type "show warranty" for 
details. 
This GDB was configured as "i686-pc-linux-gnu"... 
Attaching to program: /usr/sbin/metalog, process 8033 
Reading symbols from /usr/lib/libpcre.so.0...done. 
Loaded symbols for /usr/lib/libpcre.so.0 
Reading symbols from /lib/libc.so.6...done. 
Loaded symbols for /lib/libc.so.6 
Reading symbols from /lib/ld-linux.so.2...done. 
Loaded symbols for /lib/ld-linux.so.2 
0x43bdb9a7 in pause () from /lib/libc.so.6 
(gdb) bt 
#0  0x43bdb9a7 in pause () from /lib/libc.so.6 
#1  0x0804b14b in spawnCommand (command=0x7 <Address 0x7 out of 
bounds>, 
    date=0x804d920 "Apr 11 20:24:20", prg=0x804bf4c "kernel", 
    info=0xbfffefc3 "usb-storage: queuecommand() called") at 
metalog.c:716 
#2  0x0804b2e1 in processLogLine (logcode=7, date=0x804d920 "Apr 11 
20:24:20", 
    prg=0x804bf4c "kernel", info=0xbfffefc3 "usb-storage: queuecommand() 
called") 
    at metalog.c:772 
#3  0x0804b4f8 in process (sockets=0xbffff810) at metalog.c:854 
#4  0x0804ba8d in main (argc=6, argv=0xbffff874) at metalog.c:1058 
#5  0x43b45dc4 in __libc_start_main () from /lib/libc.so.6 
(gdb)

Comment 10 Nils Ohlmeier 2003-04-12 05:43:47 UTC

After looking at the code and my backtrace i have a question to the people 
who have the problem: do you use console logging from the end of the 
configuration? 
 
It seems that this is the cause of the problem. I just disabled my console 
logging and will report if this will fix the problem.

Comment 11 Sascha Silbe 2003-04-12 06:48:59 UTC

Yes, I do use console logging. Will disable it temporarily, too.

Comment 12 Christoph Probst 2003-04-16 08:43:19 UTC

I'm able to trigger this bug using fetchnews comming with the newsserver leafnode.  I just have to run four or five fetchnews processes at the same time (just by starting them with fetchnews &) and metalog freezes reproducable. (Because of fetchnews bad timeout behaviour the bug is triggert by crond from time to time)

The bug only appeares when metalogs console-logging is activated. I use:

| chris@starbed2:/etc/metalog$ tail -4 metalog.conf
| console loggin :
|
|   facitity = "*"
|   command = "/usr/sbin/consolelog.sh"

| chris@starbed2:/etc/metalog$ cat /usr/sbin/consolelog.sh
| #!/bin/sh
| echo "$1 [%2] $3" > /dev/vc/10
| ...


The metalog master stucks at

| #0 0x400ce9c7 in pause () from /lib/libc.so.6
| #1 0x0804b15b in spawnCommand (command=0x3e <Adress 0x3e out of bounds>, ...)
| #2 0x0804b2f1 in processLogLine

Looking at the source:

| static int spawnCommand(const char * const command, ...)

is called by processLogLine() in line 772.

| spawnCommand(block->command, date, prg, info);

While "command" is out of bounce "block->command" isn't and its value is

| (gdb) p block->command
| $4 = 0x804ea38 "/usr/sbin/consolelog.sh"


Ok, now. Who can tell what happens there?

Comment 13 Nils Ohlmeier 2003-04-16 15:20:36 UTC

I think the out of bounds is an error in the debugger. Because if the 
command address would not be accessible, the stat call in line 705 should 
fail (probably with a seg fault). 
I fear the problem lies more in the way how they wait for the return of the 
external command with the pause() and the signal handler which should 
change command_child value.  
One idea is that this way of programming is not mutli process save, 
because the signal handler could change the value command_child 
between the while check and pause call. And then pause call will never be 
interrupted.

Comment 14 Nils Ohlmeier 2003-04-16 15:32:25 UTC

Sorry forgot to mention that the problem (at least for me) only occurs with 
console logging activated. I ran my system one day without console 
logging, which resulted in no metalog hanging during the hole day (where i 
have >3 blocks per day with console logging). 
And just someone comes to the idea the new 0.7 could solve the problem: 
no it does not. 
 
So the workaround for gentoo is easy: remove the console logging from 
the config file or at least document in the config file that it is risky to 
activate this because of the possible blocks. 
 
I'll try to point the metalog developers to this bug and my concern about 
their programming solution.

Comment 15 Jedi/Sector One 2003-04-16 16:02:27 UTC

Created attachment 10754 [details, diff]
Patch to make it multiprocess

This patch (untested) against 0.7 should avoid Metalog waiting for processes to
complete before going on with logging.

Comment 16 Christoph Probst 2003-04-17 19:17:40 UTC

Your patch seems to solve the problem.  I used it and wasn't able to trigger the bug anymore.  To cross-check I reinstalled metalog without patch and the bug reappeared.  Everything seems to be ok now. :-)

Comment 17 Nils Ohlmeier 2003-04-18 12:20:05 UTC

Yes, allthough the solution of the patch is not best, it works. And that 
counts. With the patch applied i had no lockup for the last 36 hours. 
 
With a new ebuild which applies this patch we can close this bug :-) 
A backport of the patch to 0.6 should not be hard, because the code did 
not changed in the relevant areas.

Comment 18 Jedi/Sector One 2003-04-18 12:39:07 UTC

I will release 0.8 (with the patch and some minor changes) in a few days. I'll submit 
the new ebuild as soon as it will be released. 
 
Thanks again to all Gentoo Linux freaks not only for the cool distro, but also for their 
help, reactivity and coolness. 
 
-Frank.

Comment 19 Grant Goodyear (RETIRED) gentoo-dev

2003-04-20 18:55:50 UTC

Waiting a few days to see if 0.8 is released as promised.
(Also taking over this bug, since I have a few others related
to metalog.  Unfortunately, I don't currently use metalog because it's
still lacking remote logging.)

Comment 20 Roger 2003-05-06 14:05:31 UTC

Found this bug also and it makes the system quite unstable if one doesn't have a root window open to restart metalog.  (As one can see from my posts to the gentoo-user mailling list :-/)

Comment 21 Roger 2003-05-06 15:42:06 UTC

Created attachment 11594 [details, diff]
this is a patch made against metalog-0.6-r10

This was done after a ebuild <metalog.ebuild> unpack (or after all other
patches were applied.  This is mearily the same patch submitted but back-ported
to version 0.6-r10. 

This problem turned up as soon as I installed procmail/postfix/spamassassin
combo... guess the logging done by postfix & spamassassin greatly irritates the
problem.

I would strongly suggest that this patch be incorporated or that users use
sysklogd instead.

The only thing left to do is to modify the ebuild file to incorporate the
patch/hack.  If this patch is not recommend, maybe masking the ebuild file or
something.

Comment 22 Roger 2003-05-06 15:42:44 UTC

Created attachment 11595 [details]
this is a patch made against metalog-0.6-r10

This was done after a ebuild <metalog.ebuild> unpack (or after all other
patches were applied.  This is mearily the same patch submitted but back-ported
to version 0.6-r10. 

This problem turned up as soon as I installed procmail/postfix/spamassassin
combo... guess the logging done by postfix & spamassassin greatly irritates the
problem.

I would strongly suggest that this patch be incorporated or that users use
sysklogd instead.

The only thing left to do is to modify the ebuild file to incorporate the
patch/hack.  If this patch is not recommend, maybe masking the ebuild file or
something.

Comment 23 Roger 2003-05-06 15:42:52 UTC

Created attachment 11596 [details]
this is a patch made against metalog-0.6-r10

This was done after a ebuild <metalog.ebuild> unpack (or after all other
patches were applied.  This is mearily the same patch submitted but back-ported
to version 0.6-r10. 

This problem turned up as soon as I installed procmail/postfix/spamassassin
combo... guess the logging done by postfix & spamassassin greatly irritates the
problem.

I would strongly suggest that this patch be incorporated or that users use
sysklogd instead.

The only thing left to do is to modify the ebuild file to incorporate the
patch/hack.  If this patch is not recommend, maybe masking the ebuild file or
something.

Comment 24 Roger 2003-05-06 15:46:56 UTC

sorry about the triple post. bugzilla borked/errored during the send for some reason and appeared to fail sending.

Comment 25 Roger 2003-05-07 13:49:35 UTC

ok. i give up on metalog.  not only did it have this bug, but it also prevents my cardbus/pcmcia for entirely working!  somehow metalog prevents the second (or the first) pcmcia slot from working.  As such, I only get one pcmcia slot working.  Very odd how a system logger will have so many bugs in it!  from a little more research, looks like syslog-ng is the actual contender here.  and this patch mearly hacks the system freeze from occurring but the metalog dameon will still freeze (logging will freeze..permanently?).  this metalog package *should* be masked! ..completely. lol.

Comment 26 Peter Simons 2003-05-19 07:33:44 UTC

Just wondring: Is there anything going on regarding fixing metalog? (I have these hangs, too, since I upgraded to kernel 2.4.20). The 0.8 version someone mentioned a month ago doesn't seem to appear.

Comment 27 Martin Holzer (RETIRED) gentoo-dev

2003-06-29 14:03:51 UTC

*** Bug 18384 has been marked as a duplicate of this bug. ***

Comment 28 jani 2003-08-14 07:47:27 UTC

I tracked down this bug myself just to find that it has been found months ago on gentoo and still there's no bugfix ebuild. At least it was educative for me.

Comment 29 Martin Holzer (RETIRED) gentoo-dev

2003-10-09 09:22:57 UTC

is this fixed in 0.7 ?

Comment 30 Nils Ohlmeier 2003-10-09 10:08:13 UTC

No as i wrote already before 0.7 does not fix the problem.
But you can apply one of the patches from the head of this bug (they are
all the same), they fix the problem.
Personaly i run metalog without console-loging to prevent trouble. But as
there does not seem to be 0.8 available until today i would prefer if one
of the gentoo developers could create a metalog-0.7-r2.ebuild which applies
the patch to finaly close this bug.

Comment 31 Heinrich Wendel (RETIRED) gentoo-dev

2003-10-09 12:22:39 UTC

I contacted the developer, he said he forgot about 0.8 but will provide it
soon.

Comment 32 Heinrich Wendel (RETIRED) gentoo-dev

2003-10-17 04:31:11 UTC

*** Bug 31277 has been marked as a duplicate of this bug. ***

Comment 33 Jonathan Manning 2003-10-24 11:05:41 UTC

I don't have console logging enabled, and it still happens. I don't want
to be another "me too", but this seems to be different than most reports
here. Perhaps it happens when I 'tail -f /var/log/everything/current', but
I haven't noticed a strong correlation there.

In addition to my primary machine with Gentoo already setup, I'm having this
same problem with a LiveCD install (1.4-rc4, yes I know 1.4 is out...). Metalog
is the logger for the LiveCD, and it's exhibiting the same hanging symptoms
(that a '/etc/init.d/metalog restart' fixes).

I think my solution is just to move to syslog-ng on my primary machine. I'll
give metalog one more try using "~x86" to see if unstable fixes it first.
I second the request for new 0.7 rev to add this patch.

The issues with the LiveCD are a major showstopper for an install. Is this
fixed in 1.4, or is 1.4 == 1.4-rc4?

Comment 34 Joerg Schaible 2003-10-25 11:17:09 UTC

Been hit by this problem, too. Added me to cc. Wanna be informed about anything
new.

Comment 35 Seemant Kulleen (RETIRED) gentoo-dev

2003-11-12 13:03:08 UTC

heinrich, any further contact with the developer?

Comment 36 Heinrich Wendel (RETIRED) gentoo-dev

2003-11-13 05:13:08 UTC

not yet, i'll write him another mail

Comment 37 Heinrich Wendel (RETIRED) gentoo-dev

2003-11-30 12:44:40 UTC

I added a snapshot of the cvs tree (not touched since 6 month), the bug should be fixed in this version, please test.

Comment 38 Heinrich Wendel (RETIRED) gentoo-dev

2003-11-30 12:46:57 UTC

(you have to wait for the tgz to be synced to the mirrors)

Comment 39 Joerg Schaible 2003-12-18 13:28:52 UTC

Just to give a feedback: I've been running the new version now for more than two weeks and I had no occurrence of this issue anymore.

Comment 40 Paul Tötterman 2003-12-19 04:45:38 UTC

I'd like to add my experiences to this. With gentoo-sources-2.4.20-r[89] I haven't had any problems with metalog. But when upgrading to gentoo-dev-sources-2.6.0-test* and 2.6.0 I've had the same kind of login failures as described here. And yes, I was using console-logging.

Comment 41 Evan Powers 2004-02-22 11:02:31 UTC

I've just started testing kernel 2.6.3 on my system. No problems with my old 2.4 kernel, but the new one exhibits this bug. Console logging enabled of course. I emerged the ~x86 metalog-0.8_pre20031130; it appears to resolve the problem.

Comment 42 Thomas R. (TRauMa) 2004-02-22 19:50:04 UTC

Currently stable metalog-0.8-CVS WFM. I'm still a bit uncomfortable with metalog, now that I read all the comments. From the install guide sect. 10:

"Gentoo offers several system loggers to choose from. There are sysklogd, which is the traditional set of system logging daemons, msyslog, a flexible system logger with a modularized design, syslog-ng, an advanced system logger, and metalog which is a highly-configurable system logger.

If you can't choose one, use syslog-ng as it is very powerful yet comes with a great default configuration. "

Perhaps some words of warning against metalog, now that it seems to be quite unmaintained?

Comment 43 Richard Scott 2007-04-09 04:23:23 UTC

I'm still getting this problem....and I'm using v0.8_rc4

I don't use console logging, but I do execute external bash scripts via metalog.conf

Can I be of any help to de-bug as this is getting worse for me.

My "emerge --info" is as follows:

Portage 2.1.2.2 (hardened/x86/2.6, gcc-3.4.6, glibc-2.3.6-r5, 2.6.18-hardened-r6 i686)
=================================================================
System uname: 2.6.18-hardened-r6 i686 Intel(R) Pentium(R) 4 CPU 2.80GHz
Gentoo Base System release 1.12.9
Timestamp of tree: Sun, 08 Apr 2007 22:50:01 +0000
distcc 2.18.3 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
ccache version 2.4 [disabled]
dev-lang/python:     2.3.5-r3, 2.4.3-r4
dev-python/pycrypto: 2.0.1-r5
dev-util/ccache:     2.4-r6
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.60
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10
sys-devel/binutils:  2.16.1-r3
sys-devel/gcc-config: 1.3.14
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.17-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-march=i686 -O2 -pipe -fomit-frame-pointer"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /var/bind"
CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/php/apache1-php5/ext-active/ /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c"
CXXFLAGS="-march=i686 -O2 -pipe -fomit-frame-pointer"
DISTDIR="/usr/portage/distfiles"
FEATURES="distlocks metadata-transfer sandbox sfperms strict"
GENTOO_MIRRORS="ftp://212.219.56.132/sites/www.ibiblio.org/gentoo/ ftp://212.219.56.135/sites/www.ibiblio.org/gentoo/ ftp://212.219.56.138/sites/www.ibiblio.org/gentoo/ http://212.219.56.135/sites/www.ibiblio.org/gentoo/ ftp://ftp.free.fr/mirrors/ftp.gentoo.org/"
LC_ALL="C"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/opt/portage"
SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage"
USE="apache2 apm berkdb bzip2 crypt curl curlwrappers gd gdbm gif gmp gpm hardened idn innodb jpeg libg++ libwww midi mysql ncurses nls nptl nptlonly pam pcre perl php pic png python readline session snmp ssl tcpd tetex tiff truetype winbind x86 xml xml2 xorg zlib" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="mouse keyboard" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LDFLAGS, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

Comment 44 SpanKY gentoo-dev

2007-04-09 04:31:03 UTC

file a new bug please ... the only way to really figure this out is to build metalog with debugging and when it hangs, attach to the process with gdb and run a backtrace