Summary: | Forums flap every day between 3:30am and 4:30am UTC | ||
---|---|---|---|
Product: | Gentoo Infrastructure | Reporter: | Alec Warner <antarus> |
Component: | Forums | Assignee: | Gentoo Infrastructure <infra-bugs> |
Status: | RESOLVED OBSOLETE | ||
Severity: | normal | CC: | forum-mods |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
Alec Warner (RETIRED)
2013-09-01 22:44:44 UTC
For those not in the know, Grebe and Grouse run loadbalancing for the forums, and also run mysql. Grebe's LB logs: 9/01/2013 Outage began 3:33, ended 4:09. zegrep '(Disabling|Enabling)' /var/log/messages | less Grebe' sar numbers: 00:00:01 pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff 03:30:02 3668.74 1446.59 322.05 0.01 457.56 0.00 0.00 0.00 0.00 03:40:02 7406.57 2126.09 381.13 0.10 1963.16 1215.83 0.00 1215.82 100.00 03:50:02 5787.58 2159.74 1409.06 0.14 3219.06 1464.49 0.00 1464.49 100.00 04:10:01 3359.44 1402.28 873.41 0.11 3142.33 1018.54 0.00 1018.54 100.00 04:20:01 0.18 434.65 348.73 0.00 226.70 0.00 0.00 0.00 0.00 This basically says 'wow we are moving tons of pages in and out.' 00:00:01 kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact 03:20:02 5394980 11067292 67.23 680724 7061900 3639672 14.88 7793564 2606520 03:30:02 2624224 13838048 84.06 683056 9764440 3685528 15.07 7835368 5262008 03:40:02 271112 16191160 98.35 678512 12132944 3690996 15.09 7843888 7623772 03:50:02 310220 16152052 98.12 668032 12153024 3653032 14.93 6973288 8471164 04:10:01 6586740 9875532 59.99 658396 5989484 3643032 14.89 6956520 2352204 So we see here that during the outage interval, we are quite low on memory on a percentage basis. Now of course we still have ~250 MB free, so its not the end of the world. Sep 1 03:24:05 grebe kernel: [12150592.407818] grsec: mount of /dev/mapper/vg-var_lib_mysql_snapshot to /var/tmp/mylvmbackup/mnt/grebe.gentoo.org-mysql-maste r-backup by /bin/mount[mount:25522] uid/euid:0/0 gid/egid:0/0, parent /usr/bin/mylvmbackup[mylvmbackup:25483] uid/euid:0/0 gid/egid:0/0 Sep 1 04:01:44 grebe kernel: [12152849.445516] grsec: unmount of /dev/mapper/vg-var_lib_mysql_snapshot by /bin/umount[umount:28191] uid/euid:0/0 gid/egid:0/0 , parent /usr/bin/mylvmbackup[mylvmbackup:25483] uid/euid:0/0 gid/egid:0/0 Implies the outage for grebe is 3:24 -> 4:01. I added debugging for mylvmbackup, and I disabled mylvmbackup on grouse (by chmoding the script 000.) mylvmbackup seems to jibe with the outage times. I've disabled backups on both grouse and grebe to see if we can go 24hrs without an outage. -A No outage today confirms mylvmbackup, now we just need to figure out how to limit memory use... -A |