In the layman-2.0 api changes I made. I added 'If-Modified-Since' & 'User-Agent' headers to the http request sent to the gentoo server. The url is: http://www.gentoo.org/proj/en/overlays/repositories.xml I would very much like to get some log data to see how many of the requests made were from layman-2.* and how many times the 'If-Modified-Since' header saved bandwidth by returning the 304 HTTPError. For reference, the layman versions prior to 2.0* did not send any header data with the request. My intent is to get an idea of percentage of requests using the testing _rc*/-9999 versions as compared to the total as well as the general bandwidth savings that my changes are bringing about for the 2.0* requests. I also wouldn't mind knowing how many requests are made in total for a given time period. For reference, the first rc release in the tree was: *layman-2.0.0_rc1-r1 (17 Jul 2011) with -9999 having started a little before that. Would it be possible to get log data (or results) starting from July, 2011 to end of Dec. 2011. I don't know what is convenient or easily doable, please use your judgement. If just December 2011 due to large volumes of data, that would be fine too. It will give me an idea of how my changes are working/helping. If there are any other changes I can make to help reduce the load on gentoo's server's, please voice them. I'll see what I can do to improve things. Thank you -- Brian Dolbec <dolsen@gentoo.org> Reproducible: Always
Digging by month, http_code, count Jul 200 548241 Jul 304 36867 Aug 200 582149 Aug 304 71817 Sep 200 585667 Sep 304 81864 Oct 200 653376 Oct 304 351349 Nov 200 639034 Nov 304 795785 Dec 200 622275 Dec 304 826015 Now the real oddity is that we don't see a reduction in 200's like I'd expect from the roll-out of the header. Lets see if we actually save any bandwidth. month, bytes Jul 102801938102 Aug 112226131167 Sep 114863415973 Oct 129538081241 Nov 129602584351 Dec 128053145082 I will do a more complete analysis of traffic in another post. My guess is that there are just more folks using the old client and they fetch more often which drowns out the git-using folks. UA's by month. Jul "Layman-2.0.0_rc1" 31578 Jul "Layman-2.0-git" 8322 Aug "Layman-2.0.0_rc1" 45005 Aug "Layman-2.0.0-git" 2593 Aug "Layman-2.0-git" 1850 Aug "Layman-2.0.0_rc2" 42129 Sep "Layman-2.0.0_rc1" 12926 Sep "Layman-2.0.0-git" 4886 Sep "Layman-2.0.0_rc3" 28124 Sep "Layman-2.0-git" 1520 Sep "Layman-2.0.0_rc2" 49786 Oct "Layman-2.0.0_rc1" 11686 Oct "Layman-2.0.0-git" 4752 Oct "Layman-2.0.0_rc3" 350227 Oct "Layman-2.0-git" 1534 Oct "Layman-2.0.0_rc2" 3472 Nov "Layman-2.0.0_rc1" 8506 Nov "Layman-2.0.0-git" 5612 Nov "Layman-2.0.0_rc3" 807441 Nov "Layman-2.0-git" 1462 Nov "Layman-2.0.0_rc2" 1337 Dec "Layman-2.0.0_rc1" 3910 Dec "Layman-2.0.0-git" 8669 Dec "Layman-2.0.0_rc3" 823422 Dec "Layman-2.0-git" 1222 Dec "Layman-2.0.0_rc2" 899 Unique IPs by month Jul 40949 Aug 41291 Sep 39856 Oct 42943 Nov 41369 Dec 40160
One idea might be bots. They don't obey the rules because they are not smart enough to send the header. Sadly bots account for very little traffic to this particular URL (200 queries over 6 months). We can expand the search and look for non-layman-like UAs. That gives us 2500 queries over six months. Nothing substantial. One key indicator is adoption and traffic spread. No one is hammering us with requests from a specific IP. However what if a lot of ips are just using the old UA? Lets use the UA data from my previous post, but this time include the older layman UA (which is python-urllib) Month, UA, count Jul "Layman-2.0-git" 8322 Jul "Layman-2.0.0_rc1" 31578 Jul "Python-urllib/1.16" 62 Jul "Python-urllib/1.17" 7 Jul "Python-urllib/2.4" 31 Jul "Python-urllib/2.5" 101 Jul "Python-urllib/2.6" 351119 Jul "Python-urllib/2.7" 194758 Aug "Layman-2.0-git" 1850 Aug "Layman-2.0.0-git" 2593 Aug "Layman-2.0.0_rc1" 45005 Aug "Layman-2.0.0_rc2" 42129 Aug "Python-urllib/1.16" 62 Aug "Python-urllib/1.17" 4 Aug "Python-urllib/2.4" 87 Aug "Python-urllib/2.5" 93 Aug "Python-urllib/2.6" 373042 Aug "Python-urllib/2.7" 189958 Sep "Layman-2.0-git" 1520 Sep "Layman-2.0.0-git" 4886 Sep "Layman-2.0.0_rc1" 12926 Sep "Layman-2.0.0_rc2" 49786 Sep "Layman-2.0.0_rc3" 28124 Sep "Python-urllib/1.16" 60 Sep "Python-urllib/2.4" 150 Sep "Python-urllib/2.5" 90 Sep "Python-urllib/2.6" 373660 Sep "Python-urllib/2.7" 198028 Oct "Layman-2.0-git" 1534 Oct "Layman-2.0.0-git" 4752 Oct "Layman-2.0.0_rc1" 11686 Oct "Layman-2.0.0_rc2" 3472 Oct "Layman-2.0.0_rc3" 350227 Oct "Python-urllib/1.16" 64 Oct "Python-urllib/2.4" 158 Oct "Python-urllib/2.5" 69 Oct "Python-urllib/2.6" 394447 Oct "Python-urllib/2.7" 238640 Nov "Layman-2.0-git" 1462 Nov "Layman-2.0.0-git" 5612 Nov "Layman-2.0.0_rc1" 8506 Nov "Layman-2.0.0_rc2" 1337 Nov "Layman-2.0.0_rc3" 807441 Nov "Python-urllib/1.16" 62 Nov "Python-urllib/2.4" 151 Nov "Python-urllib/2.5" 62 Nov "Python-urllib/2.6" 369482 Nov "Python-urllib/2.7" 241322 Dec "Layman-2.0-git" 1222 Dec "Layman-2.0.0-git" 8669 Dec "Layman-2.0.0_rc1" 3910 Dec "Layman-2.0.0_rc2" 899 Dec "Layman-2.0.0_rc3" 823422 Dec "Python-urllib/1.16" 62 Dec "Python-urllib/1.17" 1 Dec "Python-urllib/2.4" 155 Dec "Python-urllib/2.5" 62 Dec "Python-urllib/2.6" 375870 Dec "Python-urllib/2.7" 234327 Based on UA we seem to see that query-rate of older (stable?) Layman is actually on the rise which is why we see an increase in 200's. Jul we saw approximately, 550k hits from urllib, in Dec we saw ~800k from the same UAs. My last thought for the night is to map retcodes to UAs. Layman: 200 104436 304 2157124 Python-urllib: 200 3522371 304 5593 The header stuff seems to be working admirably. The xml is about 220k and an HTTP 304 is probably 100x smaller than that. Note: The numbers are fudged a bit. We also served 400's, 500's and so forth on occasion. However the query-count for those is minimal so they were discarded.
Let me know if you need more data.