Bug 481456

Summary:	www-servers/nginx - init.d script should not check configuration on reload/upgrade (upstream said)
Product:	Gentoo Linux	Reporter:	Thomas Deutschmann (RETIRED) <whissi>
Component:	Current packages	Assignee:	No maintainer - Look at https://wiki.gentoo.org/wiki/Project:Proxy_Maintainers if you want to take care of it <maintainer-needed>
Status:	CONFIRMED ---
Severity:	normal	CC:	bugs, dev-zero, proxy-maint
Priority:	Normal
Version:	unspecified
Hardware:	All
OS:	Linux
URL:	http://mailman.nginx.org/pipermail/nginx/2013-August/040193.html
Whiteboard:
Package list:		Runtime testing required:	---

Description Thomas Deutschmann (RETIRED) gentoo-dev

2013-08-17 12:44:59 UTC

Hi,

following the nginx mailing list today I saw $URL:

> [...]
> 
> I don't think that calling "nginx -t" as a mandatory step before 
> configuration reload is a good idea: nginx binary running and 
> nginx binary on disk might be different, and "nginx -t" result 
> might be incorrect because of this, in some cases rejecting valid 
> configurations.
> 
> Additionally, it does duplicate work by parsing/loading a 
> configuration which will be again parsed by a master process 
> during configuration reload.  While in most cases it's not 
> significant, I've seen configurations taking more than 1m to load 
> due to big geo module bases used.
> 
> [...]

In addition:
Our reload function just validates kill's return value. But the kill command may succeed, but nginx may refuse to load the new configuration. In the beginning of the same thread, upstream says:

> Reloading is enough.  What is very wrong is to assume that sending 
> a HUP signal to nginx is enough for a reload.  For various 
> reasons, ranging from configuration syntax errors to out of memory 
> problems, configuration reload might fail.
> 
> [...]

So for me it seems to be wrong to tell the user "Refreshing nginx was successful", because we cannot tell. A better message would be "Sending SIGHUP to nginx" using ebegin and when succeed, show an einfo or ewarn telling the user to watch logs for errors, due to nginx may have refused to reload, see http://mailman.nginx.org/pipermail/nginx/2013-August/040186.html



Reproducible: Always

Comment 1 Johan Bergström 2013-09-29 12:48:59 UTC

Sounds ok to me (if it only applies to reload/upgrade). Patch?

Comment 2 Johan Bergström 2014-07-11 00:03:54 UTC

(In reply to Johan Bergström from comment #1)
> Sounds ok to me (if it only applies to reload/upgrade). Patch?

Wouldn't it be more reasonable to change the nginx return value if the reload isn't ok (for instance, configuration syntax error)?

Comment 3 Thomas Deutschmann (RETIRED) gentoo-dev

2014-07-11 00:13:22 UTC

Which return value? We are sending a signal using "kill". So all we will get is a return value from "kill"... we never know (without checking nginx's error log) if HUP'ing was really successful.

Or am I missing something?

Comment 4 Tiziano Müller (RETIRED) gentoo-dev

2014-08-06 06:07:32 UTC

We changed it to use `nginx -t` at some point and evaluate this return value.
So I guess we can close this as FIXED.

Comment 5 Torbjörn Lönnemark 2016-03-04 07:02:10 UTC

This bug seems to have been mistakenly closed as FIXED. The argument was to *remove* a config check (using nginx -t), not to add it.

Assuming the arguments presented in the original report and the referenced bug thread still hold (and I'd say they do), the original proposed resolution looks good to me, at least for the 'reload' action.

That is:
> A better message would be "Sending SIGHUP to nginx" using
> ebegin and when succeed, show an einfo or ewarn telling the user to watch logs for
> errors, due to nginx may have refused to reload, see http://mailman.nginx.org/pipermail
> /nginx/2013-August/040186.html

Comment 6 Thomas Deutschmann (RETIRED) gentoo-dev

2016-03-04 13:24:34 UTC

So let's re-opening this bug for further discussions:

The problem:
============
Problem 1: `nginx -t` isn't reliable/can lead to unexpected results

The command will always use the binary from disk. This version don't have to match which the version of the running nginx master process. That means even if `nginx -t` succeeded (because your configuration is valid with the nginx version on disk) you could have an invalid configuration for the _running_ nginx version.

(and that's why https://bugs.gentoo.org/show_bug.cgi?id=481456#c4 wasn't a fix for this bug)

Problem 2: Duplicated time consuming work when dealing with large configurations

It is duplicated work because the nginx master process will do the same check when receiving the HUP signal.

Regarding problem 1:
We could record nginx' version in OpenRC (service_set_value) and only allow reload when the nginx version from disk matches the one running. Then we should never hit such a situation.

However we should indicate in the output that we only sent the command to reload but cannot tell if nginx is now really using the new configuration (only nginx can you tell). I.e. it is possible that nginx rejects reloading and keep using the old configuration from memory:

# grep `pgrep --uid 0 nginx` /var/log/nginx/error_log
2016/03/04 14:17:03 [emerg] 26354#26354: unknown directive "this_will_break_configuration" in /etc/nginx/nginx.conf:67

The only reliable way, which can be used in scripts/CFM tools, seems to be the "upgrade" command. Does anybody knows a reason why you shouldn't always use "upgrade"?

Regarding problem 2:
I am not sure if we need to address this problem. While I agree that this can double nginx startup time I prefer a response I can use in scripts.

But we could add "skip_cfgcheck_on_{reload,restart,start}" in /etc/conf.d/nginx to allow users to bypass these checks.