Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 438710 - http://packages.gentoo.org downtime
Summary: http://packages.gentoo.org downtime
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Infrastructure
Classification: Unclassified
Component: Other web server issues (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo Infrastructure
URL:
Whiteboard:
Keywords:
: 439252 439520 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-10-17 14:31 UTC by Derk W te Bokkel
Modified: 2012-11-02 21:15 UTC (History)
14 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Derk W te Bokkel 2012-10-17 14:31:38 UTC
as per description both site are unavailable via http although ping indicates that they appear to be live .. but not serving pages.
Comment 1 Alec Warner (RETIRED) archtester gentoo-dev Security 2012-10-17 14:41:32 UTC
barbet (the host) is not functioning sufficiently to serve queries.


13:10 -willikins:#gentoo-infra- (nagios) **PROBLEM** Service: Content Check HTTP planet.gentoo.org | Host: barbet | State: CRITICAL | Info: CRITICAL - Socket 
          timeout after 21 seconds | Date: Wed Oct 17 13:10:15 UTC 2012
13:41 -willikins:#gentoo-infra- (nagios) **PROBLEM** Service: Content Check HTTP get.gentoo.org | Host: barbet | State: CRITICAL | Info: CRITICAL - Socket 
          timeout after 21 seconds | Date: Wed Oct 17 13:41:15 UTC 2012
13:42 -willikins:#gentoo-infra- (nagios) **PROBLEM** Service: Content Check HTTP devmanual.gentoo.org | Host: barbet | State: CRITICAL | Info: CRITICAL - 
          Socket timeout after 21 seconds | Date: Wed Oct 17 13:42:55 UTC 2012
13:45 -willikins:#gentoo-infra- (nagios) **PROBLEM** Service: Content Check HTTP packages.gentoo.org | Host: barbet | State: CRITICAL | Info: CRITICAL - 
          Socket timeout after 21 seconds | Date: Wed Oct 17 13:45:15 UTC 2012

I'll try to spare an hour to move the services somewhere else. These machines are notorious for bad memory :/
Comment 2 Alec Warner (RETIRED) archtester gentoo-dev Security 2012-10-17 15:32:34 UTC
Barbet cannot merge packages, nor can it start varnish (to serve) due to what I can only imagine are memory and disk errors (lots of SMART errors in the logs.)
Comment 3 Alec Warner (RETIRED) archtester gentoo-dev Security 2012-10-17 22:55:05 UTC
All services except for bouncer and packages are restored to service.
Comment 4 Derk W te Bokkel 2012-10-22 01:55:16 UTC
planet.gentoo.org has fallen over again ..
Comment 5 Mike Doty (RETIRED) gentoo-dev 2012-10-22 02:08:48 UTC
apache/varnishd kicked.  apache had some stuck processes that had to be beaten with kill -9.
Comment 6 Wei-Chun Chung 2012-10-22 05:48:09 UTC
Hi all, packages.gentoo.org website seem dead again now (althouth I've seem it work about 2 or 3 hours ago).
Comment 7 Wim Muskee 2012-10-23 10:31:22 UTC
Don't know if related, but devmanual.gentoo.org is also unresponsive. Checked from multiple locations.
Comment 8 Jeroen Roovers (RETIRED) gentoo-dev 2012-10-24 16:27:37 UTC
*** Bug 439520 has been marked as a duplicate of this bug. ***
Comment 9 Alec Warner (RETIRED) archtester gentoo-dev Security 2012-10-24 18:22:49 UTC
*** Bug 439252 has been marked as a duplicate of this bug. ***
Comment 10 Alec Warner (RETIRED) archtester gentoo-dev Security 2012-10-24 18:26:43 UTC
So an update. 80% of infra was at a conference (and then conf cleanup / travel home.) A bunch of low-risk http services run on the same box. Due to what I suspect is a bug in our cfengine deployment, packages.gentoo.org is not currently serving content properly. This eventually causes all the apache workers to hang waiting on mod_python to serve content.

Ideally we would not be using mod_python (fcgi is probably better.) There also appears to be an issue where varnish is hitting its open-file-descriptor limit. I need to poke it when it is in this state and figure out what all the descriptors are for. I'm guessing that too will lead me to hung apache workers and generally 'lots of search engines like to hammer packages.gentoo.org.'

Currently p.g.o is not enabled, so we are serving a number of 404's or the default service page for apache. This was done primarily to spare the other vhosts on the machine (get,planet,devmanual)

-A
Comment 11 Alec Warner (RETIRED) archtester gentoo-dev Security 2012-10-26 07:28:31 UTC
I wasted another two evenings on this and I've given up on the existing codebase. We are in the process of provisioning some VMs for a 2012 GSOC project to replace the existing codebase, so we might as well deploy it. Expect the site to stay down a few more days.

-A
Comment 12 Christian Ruppert (idl0r) gentoo-dev 2012-11-02 21:15:32 UTC
Should work again.