Bug 125157

Summary:	asterisk 1.2.4 causes festival 1.4.3-r3 to go into infinite loop and lockup system
Product:	Gentoo Linux	Reporter:	David Fannin <dfannin>
Component:	Current packages	Assignee:	Gentoo Accessibility Team <accessibility>
Status:	RESOLVED NEEDINFO
Severity:	critical	CC:	rajiv, stkn
Priority:	High
Version:	2005.1
Hardware:	x86
OS:	Linux
Whiteboard:
Package list:		Runtime testing required:	---
Attachments:	Festival.conf patch for festival-1.4.3 to fix (possible) endless loop in case of a configuration error reimplementation of the main loop based on select()

Description David Fannin 2006-03-05 18:02:47 UTC

Summary of Bug:
With an asterisk-1.2.4 using festival-1.4.3-r3, when asterisk is configured for cacheing festival speech files, festival goes into infinite loop .   During operation of festival (i.e. request to server for text to speech translation) will eventually causes festival to use 90%+ of system CPU, and cause system lockup.

How to create the problem:

1) configure "festival.conf" (in the /etc/asterisk directory) and
set "usecache=yes".
2) setup an asterisk extension to use festival (extensions.conf):

exten => s,1, Festival('Testing Festival Speech')
exten => s,2, Festival('All work and no play make jack a dull boy')
exten => s,3, Festival('All of you be damned, we cant have heaven crammed')
exten => s,4, Festival('Rock and Roll will never die')

3) Activate extension.  Sometimes requires a repeat.

Reproduceable:
Yes - problem reproduced on two Gentoo Systems (AMD Sempron 3100+ and Intel P4 2.8G).   Deactivating the cache feature eliminates the problem.

emerge info:
Portage 2.0.54 (default-linux/x86/2005.1, gcc-3.4.4, glibc-2.3.5-r2, 2.6.15-gentoo-r1ast01 i686)
=================================================================
System uname: 2.6.15-gentoo-r1ast01 i686 AMD Sempron(tm) Processor 3100+
Gentoo Base System version 1.6.14
distcc 2.18.3 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [enabled]
ccache version 2.3 [enabled]
dev-lang/python:     2.3.5-r2, 2.4.2
sys-apps/sandbox:    1.2.12
sys-devel/autoconf:  2.13, 2.59-r6
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r1
sys-devel/binutils:  2.16.1
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O2 -march=athlon-xp -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3.4/env /usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/kde/3/share/config /usr/lib/X11/xkb /usr/lib/mozilla/defaults/pref /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/bind /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-O2 -march=athlon-xp -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig ccache distcc distlocks sandbox sfperms strict"
GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo"
MAKEOPTS="-j8"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="x86 16bit 3dnow 3dnowext 7zip X a52 aac aalib acpi aim alsa amd apache2 apm arts asf asterisk async audiofile automount avi berkdb bitmap-fonts bonobo bootsplash bzip2 ccache cdparanoia cdr clamav clamd cli command-args cpdflib cpudetection crypt css cups curl dba dbus dga divx4linux dts dv dvd dvdr dvdread edl eds emboss encode enscript esd exif expat extensions fam fame fastcgi fat ffmpeg flac font-server fontconfig foomaticdb fortran fping fpx ftp gcj gd gdbm geometry gif gimpprint glut glx gnome gnome-print gphoto2 gpm gs gstreamer gtk gtk2 gtkhtml guile hal icecast idn ieee1394 imagemagick imap imlib innodb inode iodbc jack jai java java-internal javamail javascript jbig jce jp2 jpeg jpeg2k junit kde kdexdeltas lame largeterminal lcms ldap libcaca libg++ libgd libvisual libwww live lm_sensors lzo lzw mad mbox mcal mhash mikmod milter mime ming mjpeg mmap mmx mmxext mng motif moznocompose moznoirc moznomail mp3 mp4live mpeg mpeg2 mpeg4 mpi mplayer mpm-worker mysql mysqli nagios-dns nagios-ntp nagios-ping nagios-ssh nas ncurses netpbm network nfs nls nptl ntfs objc odbc ofx ogg oggvorbis on-the-fly-crypt openal opengl oss pam pcre pdflib pear perl php png postgres ppds python qt quicktime rar readline real recode rtc samba sasl sdl session sharedext sharedmem silc slang sndfile sockets speex spell sql sqlite sse sse2 ssl subtitles subversion tcltk tcpd tetex threads tidy tiff tokenizer transcode truetype truetype-fonts type1-fonts udev unicode usb v4l v4l2 vcd vcdimager vidix vim vim-pager vim-with-x vmdbmysql vorbis win32codecs wmf xine xml xml2 xmlrpc xmms xosd xpm xsl xslt xv xvid xvmc yv12 zapras zip zlib zvbi userland_GNU kernel_linux elibc_glibc"
Unset:  ASFLAGS, CTARGET, LANG, LC_ALL, LDFLAGS, LINGUAS

Comment 1 Stefan Knoblich (RETIRED) gentoo-dev

2006-03-06 12:22:36 UTC

trying to reproduce the problem

Comment 2 Stefan Knoblich (RETIRED) gentoo-dev

2006-03-06 14:35:21 UTC

please attach your festival.conf

Comment 3 David Fannin 2006-03-06 14:52:47 UTC

Created attachment 81569 [details]
Festival.conf 

Setting the "usecache=yes" creates the problem for me.

Comment 4 David Fannin 2006-03-06 14:55:59 UTC

additional configuration info:

asterisk use flags:
+alsa -bri +curl -debug +doc +gtk -h323 -hardened -lowmem +mmx +mysql -nosamples
+odbc -osp -postgres -pri +speex +sqlite +ssl -ukcid +zaptel

festival use flags
+asterisk -doc

Comment 5 Stefan Knoblich (RETIRED) gentoo-dev

2006-03-06 15:59:22 UTC

does /tmp/asterisk/cache exist?

what are the permissions on that directory?

Comment 6 David Fannin 2006-03-06 16:06:30 UTC

/tmp:
drwxrwxrwt  31 root root 4096 Mar  6 15:50 /tmp

/tmp/asterisk:
drwxr-xr-x  3 asterisk asterisk 4096 Mar  4 23:36 /tmp/asterisk

/tmp/asterisk/cache:
drwxr-xr-x  2 asterisk asterisk 4096 Mar  4 23:53 /tmp/asterisk/cache

Comment 7 Stefan Knoblich (RETIRED) gentoo-dev

2006-03-06 17:03:16 UTC

Created attachment 81571 [details, diff]
patch for festival-1.4.3 to fix (possible) endless loop in case of a configuration error

the lockup is caused by missing error handling in the festival server code.

the server prozess spawns a new child process for every request and uses waitpid
(in a loop) to wait for child processes that have finished.

in case of an error waitpid will return -1 (e.g. if one of the children dies prematurely) and the main process will end up stuck in the waitpid loop, consuming 100% of cpu time.

Attached patch adds some error handling to avoid that.
However one issue remains: dead children may end up as zombie processes
and are not removed by the server process. I have no idea how to avoid this at the moment, but that situation is still better than before.

Comment 8 Stefan Knoblich (RETIRED) gentoo-dev

2006-03-06 17:05:54 UTC

is ;(voice_us1_mbrola) enabled in your /etc/festival/server.scm file?

if yes, is mbrola installed?

Comment 9 David Fannin 2006-03-06 17:37:11 UTC

mbrola is installed, but not used (commented out in the config file).  I was using the default voice.

interesting info on the patch.   Running in non-cache mode, I am getting the following process hanging around:
root     21736 15323  0 16:42 ?        00:00:00 [festival] <defunct>

didn't seem to be a problem, but odd, none the less

the festival servicer process is:
root     15323     1  0 Mar05 ?        00:00:00 /usr/bin/festival --server -b /etc/festival/server.scm

Comment 10 Stefan Knoblich (RETIRED) gentoo-dev

2006-03-06 18:21:53 UTC

hmm ok, i think that's because the server process doesn't wait in the loop 
anymore, that meaning those zombie processes will be killed after the next 
request.

i guess the only real solution would be to rewrite the server loop that handles new incoming connections. maybe i can get something working in the next couple of
days.

Comment 11 David Fannin 2006-03-06 20:10:10 UTC

Sorry, I should have clarified that the festival defunct process was with the unpatched release.  I will make changes to the ebuild and patch it, and see if it helps.

BTW - I saw some posts that you may be upgrading to latest release (festival 1.9.5?).    Looks like that some very good new voices.   It that being posted to portage anytime soon, and should I wait for that to be releases, along with the newest version of asterisk (1.2.5)?  

I had to unmask asterisk 1.2.4,  but it seems to be working just fine.

Comment 12 David Fannin 2006-03-06 20:41:06 UTC

applied the patch to 1.4.3-r3, and set cache to yes.

With asterisk/festival on the same host, I verfied cache is working - it is creating files the cache directory, and festival cpu usage appears low on repeat phrases. No loops or lockups as yet in my limited testing.  I will try the festival network server config, and repeat the test.  The <defunct> process is still there, but appears not to be a problem.

The patch also solved other issue.  When asterisk called festival, it would log a large number (50 - 100) of event messages with "utils.c negative timestamp error" for each festival playback.   These messages have now stopped after appling the patch.

Thanks!!

Comment 13 Stefan Knoblich (RETIRED) gentoo-dev

2006-03-13 13:07:29 UTC

Created attachment 82032 [details, diff]
reimplementation of the main loop based on select()

new patch changes the main loop in the festival server to use a select call with timeout. after each services request / timeout waitpid is called to cleanup child tasks.

Comment 14 William Hubbs gentoo-dev

2006-09-30 15:27:11 UTC

Festival 1.95 is now in portage.
Is this still happening?

Comment 15 William Hubbs gentoo-dev

2006-11-04 13:06:46 UTC

All, is this still an issue with festival 1.95-beta?

Comment 16 William Hubbs gentoo-dev

2007-03-04 18:22:41 UTC

I am closing this since I haven't heard whether this is continuing to be an issue with festival 1.95.