29932 – /etc/init.d/apache{1,2} restart failures

Bug 29932 - /etc/init.d/apache{1,2} restart failures

Summary: /etc/init.d/apache{1,2} restart failures

Status:	RESOLVED TEST-REQUEST

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	All Linux

Importance:	High normal
Assignee:	Apache Team - Bugzilla Reports

URL:
Whiteboard:
Keywords:

Duplicates (2):	32366 39730 (view as bug list)
Depends on:
Blocks:	37039
	Show dependency tree

Reported:	2003-09-29 10:45 UTC by phceac
Modified:	2007-11-13 19:44 UTC (History)
CC List:	5 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
Patch for /etc/init.d/apache (apa,741 bytes, patch) 2004-11-10 07:35 UTC, Ole Tange	Details \| Diff
Another init patch for apache2, derived from /etc/init.d/squid (apache2-init.patch,913 bytes, patch) 2005-01-09 15:46 UTC, Phattanon Duangdara	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description phceac 2003-09-29 10:45:50 UTC

Symptoms: occasional failure of "/etc/init.d/apache2 restart".
occurs mostly when system is under load.

Can apparently be fixed by changing the following line in the file:
   start-stop-daemon  -o --quiet --stop --pidfile /var/run/apache2.pid
to 
  start-stop-daemon --retry 15/60 -o --quiet --stop --pidfile /var/run/apache2.pid
.

I think this works because the original start-stop-daemon command doesn't wait for the apache to actually finish. There is then a race between apache stopping and the arrival of the restart.
E.G. :
Before adding --retry
 bash#  /etc/init.d/apache2 stop ; ls /var/run/apache2.pid ; /etc/init.d/apache2 start
 * Stopping apache2...                                [ ok ]
/var/run/apache2.pid
 * Starting apache2...                                [ !! ]

After adding --retry
 bash#  /etc/init.d/apache2 stop ; ls /var/run/apache2.pid ; /etc/init.d/apache2 start
 * Stopping apache2...                               [ ok ]
ls: /var/run/apache2.pid: No such file or directory
 * Starting apache2...                               [ ok ]



If this is a correct diagnosis, then might want to put a conf.d variable for the timeout in the --retry .
I think this bug might be present in some other runscripts...

Thanks, 
Charlie

Comment 1 phceac 2003-10-14 04:35:16 UTC

Whoops! There's a hidden bug here. The modification given in the bug report
sends a SIG_0 to the apache process, so the code is misleading (but luckily
works). The following change would behave as it appears, and send SIG_TERM:
-  start-stop-daemon -o --quiet --stop --pidfile /var/run/apache2.pid
+  start-stop-daemon --retry -15/60 -o --quiet --stop --pidfile /var/run/a
pache2.pid

But that might not be what is wanted, as the daemon is actually signalled
by apache2ctl, so we actually want to send no signal to apache. Perhaps it
should 
 be: 
+  start-stop-daemon --retry -0/60 -o --quiet --stop --pidfile /var/run/a
pache2.pid
However, I can't find any reference to SIG_0 in the headers, and I don't
know 
if it is guaranteed to have no effect on apache. Seems logical though....
Needs input from a wise one.
Charlie

Comment 2 Martin Holzer (RETIRED) gentoo-dev

2003-10-22 12:47:15 UTC

apache1 should do this too

Comment 3 Martin Holzer (RETIRED) gentoo-dev

2003-10-31 03:36:32 UTC

*** Bug 32366 has been marked as a duplicate of this bug. ***

Comment 4 Robin Johnson archtester

2004-01-17 03:54:13 UTC

i've modified the init.d script as of 2.0.48-r2
please consult it and see if this still happens for you (I can't produce it).

Comment 5 Martin Holzer (RETIRED) gentoo-dev

2004-01-29 00:12:08 UTC

*** Bug 39730 has been marked as a duplicate of this bug. ***

Comment 6 Joerg Flade 2004-06-09 01:02:45 UTC

For me this looks like that it works...
I've used apache-2.0.49-r3 and it also crashs at every restart or if apache is under hard traffic (I think that it is so...till now I've found no rule for the crash)...
Now I can restart it without a crash...

Comment 7 phceac 2004-06-10 08:59:10 UTC

Feeling guilty that I haven't responded before.

I am unable to make a restart fail with the newer version of apache2 runscript (currently using 2.0.49 .

However, if I insert a "ls /var/run/apache2.pid" after the start-stop-daemon line in the stop() function, then I can see that the file still exists and therefore the apache process still exists, under heavy load, after the end of the stop function.

I think that the race condition may still exist, but is masked by the extra time taken for the checkconfig() to complete. What (I think) is needed is some way to poll for the end of the apache2 process, which is what I was trying to do with the modification in the original bug report.

Comment 8 Jaco Kroon 2004-06-11 01:05:14 UTC

An alternative method may be found in the squid initrc file.  This hasn't happened to me either in a long, long time now so I think we can, with relative confidence, assume that the problem is gone now.

But then again, I've been using reload off lately :).

Comment 9 phceac 2004-06-11 05:36:45 UTC

Had a look at the squid runscript... I wrote something similar, before I found the --retry schedule option to start-stop-daemon. That allowed a single additional option to replace about 10 lines of my very inelegant shell-scripting. 
Even better (I imagine) to use apache2ctl to restart the server, as suggested in Bug 39730, assuming that it does the right thing.

Anyway, I have not had any recent problems with restart, but I suggest that a theoretical race condition still exists because stop() doesn't wait for the apache process to finish. Even if it doesn't cause restart() to fail, perhaps it would interfere with the runscript dependencies.  Personally, if my web server was at all important, I'd prefer to have a non-racing restart, to minimise risk of downtime. Equally, I'd like to have stop() only return when the server has stopped... 
Lucky for me, I only run my server for fun!

Comment 10 Jaco Kroon 2004-09-23 13:22:46 UTC

I've had.  I'm running probably 15 or so virtual hosts on the particular machine.  Solution is to rather use reload than restart, the cases where you do want restart is not that often that this should be considered a real problem (basically when you change /etc/conf.d/apache2).  The particular server takes around 20 000 hits per day on the main virtual container, the other containers are less loaded but at peak times a restart can take up to a minute or so (Luckily I've only had once or twice where I *had* to restart the service during peak times).

Comment 11 Ole Tange 2004-11-10 07:33:59 UTC

The attached patch seems to do the trick for me. The problem is that after
signalling to apache to stop apachectl says 'OK' and begins to stop the
processes. After a while they are all stopped. However, apachectl returns
before the processes are actually stopped.

My patch waits until no processes is owned by the user apache (which should
work unless you have several apache installations).

Comment 12 Ole Tange 2004-11-10 07:35:33 UTC

Created attachment 43658 [details, diff]
Patch for /etc/init.d/apache

Comment 13 Phattanon Duangdara 2005-01-09 15:46:26 UTC

Created attachment 48065 [details, diff]
Another init patch for apache2, derived from /etc/init.d/squid

I got this problems too and I use this script for log time.

I think this patch better use ps -C 'apache2' for checking. I use grep before
but grep sometimes grepping itself process and still wait.

Comment 14 Phattanon Duangdara 2005-01-10 03:53:27 UTC

If found that if apache under very high load and starting to terminate, apache still have race condition and cound not exit in short time, especially mpm_worker.

I investigate processing of apache termination that after 10 seconds it will send SIGTERM and after 10 seconds later it will issue SIGKILL to the rest of unterminated child process and it takes about half of minute.

I think it have good reason to use this patch or restarting using initscript under high load server will failed.

[Mon Jan 10 18:44:39 2005] [warn] child process 14908 still did not exit, sending a SIGTERM
[Mon Jan 10 18:44:39 2005] [warn] child process 15093 still did not exit, sending a SIGTERM
[Mon Jan 10 18:44:39 2005] [warn] child process 15086 still did not exit, sending a SIGTERM
[Mon Jan 10 18:44:39 2005] [warn] child process 15240 still did not exit, sending a SIGTERM
[Mon Jan 10 18:44:39 2005] [warn] child process 15095 still did not exit, sending a SIGTERM
[Mon Jan 10 18:44:39 2005] [warn] child process 14916 still did not exit, sending a SIGTERM
[Mon Jan 10 18:44:39 2005] [warn] child process 15596 still did not exit, sending a SIGTERM
[Mon Jan 10 18:44:39 2005] [warn] child process 12973 still did not exit, sending a SIGTERM
[Mon Jan 10 18:44:39 2005] [warn] child process 14908 still did not exit, sending a SIGTERM
[Mon Jan 10 18:44:39 2005] [warn] child process 15093 still did not exit, sending a SIGTERM
[Mon Jan 10 18:44:39 2005] [warn] child process 15086 still did not exit, sending a SIGTERM
[Mon Jan 10 18:44:39 2005] [warn] child process 15240 still did not exit, sending a SIGTERM
[Mon Jan 10 18:44:39 2005] [warn] child process 15095 still did not exit, sending a SIGTERM
[Mon Jan 10 18:44:39 2005] [warn] child process 14916 still did not exit, sending a SIGTERM
[Mon Jan 10 18:44:39 2005] [warn] child process 12973 still did not exit, sending a SIGTERM
[Mon Jan 10 18:44:41 2005] [warn] child process 14908 still did not exit, sending a SIGTERM
[Mon Jan 10 18:44:41 2005] [warn] child process 15093 still did not exit, sending a SIGTERM
[Mon Jan 10 18:44:41 2005] [warn] child process 15086 still did not exit, sending a SIGTERM
[Mon Jan 10 18:44:41 2005] [warn] child process 15240 still did not exit, sending a SIGTERM
[Mon Jan 10 18:44:41 2005] [warn] child process 15095 still did not exit, sending a SIGTERM
[Mon Jan 10 18:44:41 2005] [warn] child process 14916 still did not exit, sending a SIGTERM
[Mon Jan 10 18:44:41 2005] [warn] child process 12973 still did not exit, sending a SIGTERM
[Mon Jan 10 18:44:45 2005] [error] child process 14908 still did not exit, sending a SIGKILL
[Mon Jan 10 18:44:45 2005] [error] child process 15093 still did not exit, sending a SIGKILL
[Mon Jan 10 18:44:45 2005] [error] child process 15086 still did not exit, sending a SIGKILL
[Mon Jan 10 18:44:45 2005] [error] child process 15240 still did not exit, sending a SIGKILL
[Mon Jan 10 18:44:45 2005] [error] child process 15095 still did not exit, sending a SIGKILL
[Mon Jan 10 18:44:45 2005] [error] child process 14916 still did not exit, sending a SIGKILL
[Mon Jan 10 18:44:45 2005] [error] child process 12973 still did not exit, sending a SIGKILL
[Mon Jan 10 18:45:02 2005] [notice] caught SIGTERM, shutting down

I already tested with production server with a near full ~150 of 200 MaxClients with mostly php webboard. System load typcially 20-25 under peak period.

I found some other random unexpected segfaults using mpm_worker too and I trying to investigate it.

Comment 15 Benedikt Böhm (RETIRED) gentoo-dev

2005-01-22 00:44:24 UTC

the new init script in (currently) hard masked ebuild is using apachectl, can anyone test if the errors still occurs?

Upgrade Instructions:
---------------------

After we have refreshed the packages on 8th Jan, to upgrade you will need to
do the following.

If you do not want to install masked/unstable packages on your machine(s),
these ebuilds will be unmasked and marked stable as soon as we have
determined that everything is working properly.

- unmask all needed packages (using /etc/portage/package.unmask and
  /etc/portage/package.keywords)
- emerge -uav world

With this update, we are bringing some changes to the Apache configuration:

- /etc/apache{,2}/conf is moving to be /etc/apache{,2}
- new httpd.conf replaces commonapache{,2}.conf and apache{,2}.conf files
- /etc/apache{,2}/conf/vhosts is moving to be /etc/apache{,2}/vhosts.d

After installing this update, you will need to manually migrate any changes
you've made to your existing configuration files into the new configuration
files.

See http://dev.gentoo.org/~vericgar/package-refresh.txt

Comment 16 Benedikt Böhm (RETIRED) gentoo-dev

2005-01-30 13:04:44 UTC

i still get restart failures with 2.0.52-r3 sometimes, though i don't if it's caused by the init script or the mpm compile function not working cleanly...

Comment 17 Jaco Kroon 2007-02-26 08:17:24 UTC

If this "fix" is what is currently in stable, then it's not fixed ... I still regularly need to zap/start after a failed restart.

Comment 18 James Johnson 2007-11-13 19:44:35 UTC

I'm using apache-2.2.6-r2 and still have to zap/start after a failed restart.  It's quite annoying.  After testing the apache2-init.patch for a couple days and haven't had any problems.  I would put my two cents in for adding this to the apache ebuild.