Subject: [ANNOUNCE] haproxy-1.3.15.2 and 1.3.14.6 (major bug) Date: Sat, 21 Jun 2008 23:11:52 +0200 Sender: Willy Tarreau Hi all! Alexander Staubo has run a benchmark on haproxy+mongrel during which he noticed an anomaly in the response times distribution when running with maxconn=1 : http://affectioncode.wordpress.com/2008/06/11/comparing-nginx-and-haproxy-for-web-applications/ My first analysis was that this problem was caused by "direct" requests (those with a server cookie) always being considered before the load balanced ones. But while fixing this design idiocy, I discovered a real problem : it was perfectly possible for a fresh new request to be served immediately without passing through the queue, causing requests in the queue to be delayed for at least as long as the queue timeout, until they might eventually expire. Now *that* explains the horrible peaks on Alexander's graphs. My problem was that it was a real misdesign, which could not be fixed by a 3-liner patch. So I spent the whole week reworking the queue management logic in a saner manner and running regression tests. I have back-ported the fix to both 1.3.15 and 1.3.14, carefully testing both of them. Since the logic is cleaner and clearer now, and due to the time I have spent on this, I am quite confident that there is no regression. But I will not lie to you, it is a big patch so you have to apply it with care. Especially distro maintainers should wait at least 1 or 2 weeks before upgrading, "just in case", but they should upgrade eventually because their users are affected. Well, the good news is that not only this fixes a number of 503 errors and long response times when running with a low maxconn, but as an added bonus, the "redispatch" option is now naturally considered when a server's maxqueue is reached, so that it will now not be necessary anymore to trade between large queues and the risk of returning 503 errors. I believe that the most affected people are the ones running Ruby on Rails, because they often set maxconn to 1 on the servers, which enhances the risk of the problem happening. Those people should observe a notable improvement. Note that it is possible that the measured response time among valid responses increases due to all requests being served. If this is the case, it means that before it was lower because some requests never reached the server, so they did not take time there. But the new code requires less CPU power than before and less task wakeups, so it is also possible that users of high traffic sites will notice a slightly lower CPU usage. Last, my friend Benoit has set up a reverse proxy-cache on our dedicated server, so I have updated the DNS record for haproxy.1wt.eu to point to flx01.formilux.org. It should be faster to get updates now :-) Obviously, if you notice anything strange, please tell me. The cache is configured to maintain objects for 24 hours by default, but you can force a reload if in doubt. Please find updates here : http://haproxy.1wt.eu/download/1.3/src/ http://haproxy.1wt.eu/download/1.3/bin/ Regards, Willy Reproducible: Always Steps to Reproduce:
(In reply to comment #0) > But I will not lie to you, it is a big patch so you have to > apply it with care. Especially distro maintainers should wait at least > 1 or 2 weeks before upgrading, "just in case", but they should upgrade > eventually because their users are affected. I will follow his advise and wait for 2 weeks.
Fixed in cvs.