Ever since sometime after the seperation of com_err and mit-krb5, I've been experiencing that some processes that use Kerberos hang, while staying runnable all the time (smells like an infinite loop). In particular, this goes for saslauthd, but I've seen it in imapd as well. I attached gdb to such a process while hung in this manner, and it seems that the error lies in libcom_err. I currently have com_err-1.38 installed. The stack trace is very long, but here's the top of it: #0 0xb7ef1a96 in error_message () from /lib/libcom_err.so.2 #1 0xb7f78468 in krb5_locate_kdc () from /usr/lib/libkrb5.so.3 #2 0xb7f77c1b in krb5int_locate_server () from /usr/lib/libkrb5.so.3 [...] I tried to go further, but the following output from gdb confuses me, since there is no file called auth.c in e2fsprogs: [...] (gdb) f 0 #0 0xb7ef1a96 in error_message () from /lib/libcom_err.so.2 (gdb) list 192 auth.c: No such file or directory. in auth.c The "error_message" function appears to be defined in lib/et/error_message.c, but it's not even 192 lines long. Of course, I didn't emerge any of the associated libraries or programs with debugging, so it may well be understandable that debugging doesn't work perfectly...
I have re-emerged com_err with FEATURES="noclean nostrip" and CFLAGS="-g", and I've managed to get some more debugging info from an ipop3d process that died from this. Apparently, the initial debugging info was, as I suspected, wrong, and the loop is taking place in error_message.c rather than "auth.c" (whatever that came from). The actual loop in question is the one from lines 57 to 65 in that file: for (et = _et_list; et; et = et->next) { if (et->table->base == table_num) { /* This is the right table */ if (et->table->n_msgs <= offset) goto oops; return(et->table->msgs[offset]); } } Apparently, the first element of the _et_list points to itself: (gdb) p _et_list $5 = (struct et_list *) 0xb7ba1bc8 (gdb) p *_et_list $6 = {next = 0xb7ba1bc8, table = 0xb7b9c8c4} In other words, _et_list and _et_list->next both point to the same address, which, of course, helps explain why this occurs. That's all I have for now. Hopefully I'll be able to find out how this happened _et_list in the first place.
i think ive seen mention of this before ... could you try contacting the upstream author please ? tytso@alum.mit.edu
For unknown reasons, this doesn't happen for me anymore. Maybe upstream fixed it?
dunno ... there have been fixes incorporated which bring com_err up to speed with the version used in mit-krb