Summary: | asterisk 1.2.4 causes festival 1.4.3-r3 to go into infinite loop and lockup system | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | David Fannin <dfannin> |
Component: | Current packages | Assignee: | Gentoo Accessibility Team <accessibility> |
Status: | RESOLVED NEEDINFO | ||
Severity: | critical | CC: | rajiv, stkn |
Priority: | High | ||
Version: | 2005.1 | ||
Hardware: | x86 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
Festival.conf
patch for festival-1.4.3 to fix (possible) endless loop in case of a configuration error reimplementation of the main loop based on select() |
Description
David Fannin
2006-03-05 18:02:47 UTC
trying to reproduce the problem please attach your festival.conf Created attachment 81569 [details]
Festival.conf
Setting the "usecache=yes" creates the problem for me.
additional configuration info: asterisk use flags: +alsa -bri +curl -debug +doc +gtk -h323 -hardened -lowmem +mmx +mysql -nosamples +odbc -osp -postgres -pri +speex +sqlite +ssl -ukcid +zaptel festival use flags +asterisk -doc does /tmp/asterisk/cache exist? what are the permissions on that directory? /tmp: drwxrwxrwt 31 root root 4096 Mar 6 15:50 /tmp /tmp/asterisk: drwxr-xr-x 3 asterisk asterisk 4096 Mar 4 23:36 /tmp/asterisk /tmp/asterisk/cache: drwxr-xr-x 2 asterisk asterisk 4096 Mar 4 23:53 /tmp/asterisk/cache Created attachment 81571 [details, diff]
patch for festival-1.4.3 to fix (possible) endless loop in case of a configuration error
the lockup is caused by missing error handling in the festival server code.
the server prozess spawns a new child process for every request and uses waitpid
(in a loop) to wait for child processes that have finished.
in case of an error waitpid will return -1 (e.g. if one of the children dies prematurely) and the main process will end up stuck in the waitpid loop, consuming 100% of cpu time.
Attached patch adds some error handling to avoid that.
However one issue remains: dead children may end up as zombie processes
and are not removed by the server process. I have no idea how to avoid this at the moment, but that situation is still better than before.
is ;(voice_us1_mbrola) enabled in your /etc/festival/server.scm file? if yes, is mbrola installed? mbrola is installed, but not used (commented out in the config file). I was using the default voice. interesting info on the patch. Running in non-cache mode, I am getting the following process hanging around: root 21736 15323 0 16:42 ? 00:00:00 [festival] <defunct> didn't seem to be a problem, but odd, none the less the festival servicer process is: root 15323 1 0 Mar05 ? 00:00:00 /usr/bin/festival --server -b /etc/festival/server.scm hmm ok, i think that's because the server process doesn't wait in the loop anymore, that meaning those zombie processes will be killed after the next request. i guess the only real solution would be to rewrite the server loop that handles new incoming connections. maybe i can get something working in the next couple of days. Sorry, I should have clarified that the festival defunct process was with the unpatched release. I will make changes to the ebuild and patch it, and see if it helps. BTW - I saw some posts that you may be upgrading to latest release (festival 1.9.5?). Looks like that some very good new voices. It that being posted to portage anytime soon, and should I wait for that to be releases, along with the newest version of asterisk (1.2.5)? I had to unmask asterisk 1.2.4, but it seems to be working just fine. applied the patch to 1.4.3-r3, and set cache to yes. With asterisk/festival on the same host, I verfied cache is working - it is creating files the cache directory, and festival cpu usage appears low on repeat phrases. No loops or lockups as yet in my limited testing. I will try the festival network server config, and repeat the test. The <defunct> process is still there, but appears not to be a problem. The patch also solved other issue. When asterisk called festival, it would log a large number (50 - 100) of event messages with "utils.c negative timestamp error" for each festival playback. These messages have now stopped after appling the patch. Thanks!! Created attachment 82032 [details, diff]
reimplementation of the main loop based on select()
new patch changes the main loop in the festival server to use a select call with timeout. after each services request / timeout waitpid is called to cleanup child tasks.
Festival 1.95 is now in portage. Is this still happening? All, is this still an issue with festival 1.95-beta? I am closing this since I haven't heard whether this is continuing to be an issue with festival 1.95. |