Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 144886

Summary: app-admin/metalog with command in metalog.conf deadlocks anything trying to log to the system logger
Product: Gentoo Linux Reporter: Jozsef Daniel <simius>
Component: Current packagesAssignee: SpanKY <vapier>
Status: RESOLVED FIXED    
Severity: normal CC: ka0ttic, r3pek
Priority: Normal    
Version: 2006.0   
Hardware: AMD64   
OS: Other   
URL: http://sourceforge.net/tracker/index.php?func=detail&aid=1394932&group_id=30635&atid=399902
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on: 173601    
Bug Blocks:    
Attachments: gdb output of hung metalog

Description Jozsef Daniel 2006-08-23 11:03:36 UTC
It happened several times, but I have no idea what triggered it.

metalog version: 0.8_pre20031130
kernel version: 2.6.17-gentoo-r4

I have console logging enabled on tty11.

Lots of things that needed logging - shutdown, login, etc - became impossible. Logins timed out. The console log froze too (didn't display the USB drive I plugged in). When I checked ps ax, I found several DEFUNCT instances of consolelog.sh.
After sigkilling metalog, everything went on its way, fixed. I could restart metalog and it would work again.

It did this to me right after bootup just now. Couldn't log in, couldn't restart... (Man, I hate hard resetting a Linux box.) I downgraded to 0.7-r1, I will keep you informed if it does the same thing. (Hope not.)
Comment 1 Jakub Moc (RETIRED) gentoo-dev 2006-08-23 11:11:03 UTC
Why don't you use 0.8_rc1-r2 which is stable everywhere???
Comment 2 Jozsef Daniel 2006-08-23 13:47:04 UTC
Well, it's not. Tried it. It did the exact same thing.

0.7-r1 didn't kill the system, it just logged a bit, and then - God knows why - just stopped logging. No error messages, no nothing, just no logging. No logging on console, no logging in files. Dead silence.

I unmerged metalog and emerged syslog-ng.
I propose the keyword ~amd64 to be set for ALL metalog versions until this is solved.
Comment 3 Small_Penguin 2006-11-21 16:58:00 UTC
I've got the same bug with metalog-0.8_rc1-r2 on a suspend2-2.6.18 smp kernel.
I can log in immediately after boot, but not a few seconds later.
`rc-update del metalog' fixed it.

It seems like there is currently no solution to this problem other than rebuilding metalog (a temporary solution, since I did this two days ago but have the same problem again now) or using another logger like syslog-ng.

There are other people suffering from this metalog misbehaviour:

http://forums.gentoo.org/viewtopic-t-484031-highlight-metalog.html
http://forums.gentoo.org/viewtopic-t-500043-highlight-metalog.html?sid=5521630ad18fda4965305337878ae6e4
http://forums.gentoo.org/viewtopic-t-494029-highlight-metalog.html?sid=5521630ad18fda4965305337878ae6e4
Comment 4 Carlos Silva (RETIRED) gentoo-dev 2006-12-28 17:30:48 UTC
Ok...... this can be just me but, can you guys try to replace the very first line of /usr/sbin/consolelog.sh?
replace "#!/bin/sh" with "#!/bin/bash"
this is stupid since /bin/sh is a link to bash, but it SOLVES the problem....
maybe the problem is in bash and not in metalog nor the script....
Comment 5 SpanKY gentoo-dev 2006-12-29 20:58:28 UTC
still cant reproduce this myself ... the source code looks fine too

you could try adding this to the top of your metalog.conf:

Metalog :

  program  = "metalog"
  logdir   = "/var/log/metalog"
  break    = 1
Comment 6 SpanKY gentoo-dev 2007-01-01 00:56:08 UTC
ok, i dont think this is consolelog.sh, i think this is any command ...

i just observed it on a machine of mine and poking at it with strace makes it look like there's a race condition / dead lock in there somewhere

going to rebuild with debugging turned on so i can throw gdb at it if it happens again
Comment 7 SpanKY gentoo-dev 2007-01-01 05:23:25 UTC
Created attachment 105080 [details]
gdb output of hung metalog
Comment 8 SpanKY gentoo-dev 2007-01-01 05:27:38 UTC
looking at the gdb backtrace it's readily apparent what the problem is ... metalog uses functions in its signal handler that are not reentrant and according to the POSIX specs, that is not valid usage and may lead to undefined behavior

in this case, the localtime() function is allowed to not be reentrant by definition:
http://www.opengroup.org/onlinepubs/009695399/functions/localtime.html

since the metalog signal handlers call the doLog() function and that calls localtime(), this misbehavior is allowed according to spec

in other words, the fault here lies in the implementation of unsafe signal handlers in metalog
Comment 9 Small_Penguin 2007-01-27 22:39:28 UTC
(In reply to comment #8)
> looking at the gdb backtrace it's readily apparent what the problem is ...
> metalog uses functions in its signal handler that are not reentrant and
> according to the POSIX specs, that is not valid usage and may lead to undefined
> behavior

[snip]

> in other words, the fault here lies in the implementation of unsafe signal
> handlers in metalog

Shouldn't that be sufficient to mask the package in the portage tree? The gentoo handbook lists this logger in the install guide. However, metalog - though powerful - makes the whole system unstable, and after looking at the forums at sourceforge it seems this won't be fixed soon.

Imagine a new user trying gentoo for the first time, ending up with a system which is constantly hanging for no apparent reason...
Comment 10 SpanKY gentoo-dev 2007-01-28 03:27:19 UTC
no
Comment 11 SpanKY gentoo-dev 2007-01-28 09:47:13 UTC
fixed in 0.8-rc2