Symptoms: occasional failure of "/etc/init.d/apache2 restart". occurs mostly when system is under load. Can apparently be fixed by changing the following line in the file: start-stop-daemon -o --quiet --stop --pidfile /var/run/apache2.pid to start-stop-daemon --retry 15/60 -o --quiet --stop --pidfile /var/run/apache2.pid . I think this works because the original start-stop-daemon command doesn't wait for the apache to actually finish. There is then a race between apache stopping and the arrival of the restart. E.G. : Before adding --retry bash# /etc/init.d/apache2 stop ; ls /var/run/apache2.pid ; /etc/init.d/apache2 start * Stopping apache2... [ ok ] /var/run/apache2.pid * Starting apache2... [ !! ] After adding --retry bash# /etc/init.d/apache2 stop ; ls /var/run/apache2.pid ; /etc/init.d/apache2 start * Stopping apache2... [ ok ] ls: /var/run/apache2.pid: No such file or directory * Starting apache2... [ ok ] If this is a correct diagnosis, then might want to put a conf.d variable for the timeout in the --retry . I think this bug might be present in some other runscripts... Thanks, Charlie
Whoops! There's a hidden bug here. The modification given in the bug report sends a SIG_0 to the apache process, so the code is misleading (but luckily works). The following change would behave as it appears, and send SIG_TERM: - start-stop-daemon -o --quiet --stop --pidfile /var/run/apache2.pid + start-stop-daemon --retry -15/60 -o --quiet --stop --pidfile /var/run/a pache2.pid But that might not be what is wanted, as the daemon is actually signalled by apache2ctl, so we actually want to send no signal to apache. Perhaps it should be: + start-stop-daemon --retry -0/60 -o --quiet --stop --pidfile /var/run/a pache2.pid However, I can't find any reference to SIG_0 in the headers, and I don't know if it is guaranteed to have no effect on apache. Seems logical though.... Needs input from a wise one. Charlie
apache1 should do this too
*** Bug 32366 has been marked as a duplicate of this bug. ***
i've modified the init.d script as of 2.0.48-r2 please consult it and see if this still happens for you (I can't produce it).
*** Bug 39730 has been marked as a duplicate of this bug. ***
For me this looks like that it works... I've used apache-2.0.49-r3 and it also crashs at every restart or if apache is under hard traffic (I think that it is so...till now I've found no rule for the crash)... Now I can restart it without a crash...
Feeling guilty that I haven't responded before. I am unable to make a restart fail with the newer version of apache2 runscript (currently using 2.0.49 . However, if I insert a "ls /var/run/apache2.pid" after the start-stop-daemon line in the stop() function, then I can see that the file still exists and therefore the apache process still exists, under heavy load, after the end of the stop function. I think that the race condition may still exist, but is masked by the extra time taken for the checkconfig() to complete. What (I think) is needed is some way to poll for the end of the apache2 process, which is what I was trying to do with the modification in the original bug report.
An alternative method may be found in the squid initrc file. This hasn't happened to me either in a long, long time now so I think we can, with relative confidence, assume that the problem is gone now. But then again, I've been using reload off lately :).
Had a look at the squid runscript... I wrote something similar, before I found the --retry schedule option to start-stop-daemon. That allowed a single additional option to replace about 10 lines of my very inelegant shell-scripting. Even better (I imagine) to use apache2ctl to restart the server, as suggested in Bug 39730, assuming that it does the right thing. Anyway, I have not had any recent problems with restart, but I suggest that a theoretical race condition still exists because stop() doesn't wait for the apache process to finish. Even if it doesn't cause restart() to fail, perhaps it would interfere with the runscript dependencies. Personally, if my web server was at all important, I'd prefer to have a non-racing restart, to minimise risk of downtime. Equally, I'd like to have stop() only return when the server has stopped... Lucky for me, I only run my server for fun!
I've had. I'm running probably 15 or so virtual hosts on the particular machine. Solution is to rather use reload than restart, the cases where you do want restart is not that often that this should be considered a real problem (basically when you change /etc/conf.d/apache2). The particular server takes around 20 000 hits per day on the main virtual container, the other containers are less loaded but at peak times a restart can take up to a minute or so (Luckily I've only had once or twice where I *had* to restart the service during peak times).
The attached patch seems to do the trick for me. The problem is that after signalling to apache to stop apachectl says 'OK' and begins to stop the processes. After a while they are all stopped. However, apachectl returns before the processes are actually stopped. My patch waits until no processes is owned by the user apache (which should work unless you have several apache installations).
Created attachment 43658 [details, diff] Patch for /etc/init.d/apache
Created attachment 48065 [details, diff] Another init patch for apache2, derived from /etc/init.d/squid I got this problems too and I use this script for log time. I think this patch better use ps -C 'apache2' for checking. I use grep before but grep sometimes grepping itself process and still wait.
If found that if apache under very high load and starting to terminate, apache still have race condition and cound not exit in short time, especially mpm_worker. I investigate processing of apache termination that after 10 seconds it will send SIGTERM and after 10 seconds later it will issue SIGKILL to the rest of unterminated child process and it takes about half of minute. I think it have good reason to use this patch or restarting using initscript under high load server will failed. [Mon Jan 10 18:44:39 2005] [warn] child process 14908 still did not exit, sending a SIGTERM [Mon Jan 10 18:44:39 2005] [warn] child process 15093 still did not exit, sending a SIGTERM [Mon Jan 10 18:44:39 2005] [warn] child process 15086 still did not exit, sending a SIGTERM [Mon Jan 10 18:44:39 2005] [warn] child process 15240 still did not exit, sending a SIGTERM [Mon Jan 10 18:44:39 2005] [warn] child process 15095 still did not exit, sending a SIGTERM [Mon Jan 10 18:44:39 2005] [warn] child process 14916 still did not exit, sending a SIGTERM [Mon Jan 10 18:44:39 2005] [warn] child process 15596 still did not exit, sending a SIGTERM [Mon Jan 10 18:44:39 2005] [warn] child process 12973 still did not exit, sending a SIGTERM [Mon Jan 10 18:44:39 2005] [warn] child process 14908 still did not exit, sending a SIGTERM [Mon Jan 10 18:44:39 2005] [warn] child process 15093 still did not exit, sending a SIGTERM [Mon Jan 10 18:44:39 2005] [warn] child process 15086 still did not exit, sending a SIGTERM [Mon Jan 10 18:44:39 2005] [warn] child process 15240 still did not exit, sending a SIGTERM [Mon Jan 10 18:44:39 2005] [warn] child process 15095 still did not exit, sending a SIGTERM [Mon Jan 10 18:44:39 2005] [warn] child process 14916 still did not exit, sending a SIGTERM [Mon Jan 10 18:44:39 2005] [warn] child process 12973 still did not exit, sending a SIGTERM [Mon Jan 10 18:44:41 2005] [warn] child process 14908 still did not exit, sending a SIGTERM [Mon Jan 10 18:44:41 2005] [warn] child process 15093 still did not exit, sending a SIGTERM [Mon Jan 10 18:44:41 2005] [warn] child process 15086 still did not exit, sending a SIGTERM [Mon Jan 10 18:44:41 2005] [warn] child process 15240 still did not exit, sending a SIGTERM [Mon Jan 10 18:44:41 2005] [warn] child process 15095 still did not exit, sending a SIGTERM [Mon Jan 10 18:44:41 2005] [warn] child process 14916 still did not exit, sending a SIGTERM [Mon Jan 10 18:44:41 2005] [warn] child process 12973 still did not exit, sending a SIGTERM [Mon Jan 10 18:44:45 2005] [error] child process 14908 still did not exit, sending a SIGKILL [Mon Jan 10 18:44:45 2005] [error] child process 15093 still did not exit, sending a SIGKILL [Mon Jan 10 18:44:45 2005] [error] child process 15086 still did not exit, sending a SIGKILL [Mon Jan 10 18:44:45 2005] [error] child process 15240 still did not exit, sending a SIGKILL [Mon Jan 10 18:44:45 2005] [error] child process 15095 still did not exit, sending a SIGKILL [Mon Jan 10 18:44:45 2005] [error] child process 14916 still did not exit, sending a SIGKILL [Mon Jan 10 18:44:45 2005] [error] child process 12973 still did not exit, sending a SIGKILL [Mon Jan 10 18:45:02 2005] [notice] caught SIGTERM, shutting down I already tested with production server with a near full ~150 of 200 MaxClients with mostly php webboard. System load typcially 20-25 under peak period. I found some other random unexpected segfaults using mpm_worker too and I trying to investigate it.
the new init script in (currently) hard masked ebuild is using apachectl, can anyone test if the errors still occurs? Upgrade Instructions: --------------------- After we have refreshed the packages on 8th Jan, to upgrade you will need to do the following. If you do not want to install masked/unstable packages on your machine(s), these ebuilds will be unmasked and marked stable as soon as we have determined that everything is working properly. - unmask all needed packages (using /etc/portage/package.unmask and /etc/portage/package.keywords) - emerge -uav world With this update, we are bringing some changes to the Apache configuration: - /etc/apache{,2}/conf is moving to be /etc/apache{,2} - new httpd.conf replaces commonapache{,2}.conf and apache{,2}.conf files - /etc/apache{,2}/conf/vhosts is moving to be /etc/apache{,2}/vhosts.d After installing this update, you will need to manually migrate any changes you've made to your existing configuration files into the new configuration files. See http://dev.gentoo.org/~vericgar/package-refresh.txt
i still get restart failures with 2.0.52-r3 sometimes, though i don't if it's caused by the init script or the mpm compile function not working cleanly...
If this "fix" is what is currently in stable, then it's not fixed ... I still regularly need to zap/start after a failed restart.
I'm using apache-2.2.6-r2 and still have to zap/start after a failed restart. It's quite annoying. After testing the apache2-init.patch for a couple days and haven't had any problems. I would put my two cents in for adding this to the apache ebuild.