Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 398465 - request for repositories.xml server log data
Summary: request for repositories.xml server log data
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Infrastructure
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo Infrastructure
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-01-11 03:19 UTC by Brian Dolbec
Modified: 2012-01-11 07:42 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Brian Dolbec (RETIRED) gentoo-dev 2012-01-11 03:19:22 UTC
In the layman-2.0 api changes I made.  I added 'If-Modified-Since' &
'User-Agent'  headers to the http request sent to the gentoo server.
The url is:
 http://www.gentoo.org/proj/en/overlays/repositories.xml

I would very much like to get some log data to see how many of the
requests made were from layman-2.* and how many times the
'If-Modified-Since' header saved bandwidth by returning the 304
HTTPError. 

 For reference, the layman versions prior to 2.0* did not
send any header data with the request.

My intent is to get an idea of percentage of requests using the testing
_rc*/-9999 versions as compared to the total as well as the general
bandwidth savings that my changes are bringing about for the 2.0*
requests.  I also wouldn't mind knowing how many requests are made in total for a given time period.

For reference, the first rc release in the tree was:

*layman-2.0.0_rc1-r1 (17 Jul 2011)

with -9999 having started a little before that.

Would it be possible to get log data (or results) starting from July, 2011 to end of Dec. 2011.  I don't know what is convenient or easily doable, please use your judgement.  If just December 2011 due to large volumes of data, that would be fine too.  It will give me an idea of how my changes are working/helping.


If there are any other changes I can make to help reduce the load on gentoo's server's, please voice them.  I'll see what I can do to improve things.

Thank you
-- 
Brian Dolbec <dolsen@gentoo.org>


Reproducible: Always
Comment 1 Alec Warner (RETIRED) archtester gentoo-dev Security 2012-01-11 06:56:10 UTC
Digging by month, http_code, count
Jul 200 548241
Jul 304 36867
Aug 200 582149
Aug 304 71817
Sep 200 585667
Sep 304 81864
Oct 200 653376
Oct 304 351349
Nov 200 639034
Nov 304 795785
Dec 200 622275
Dec 304 826015

Now the real oddity is that we don't see a reduction in 200's like I'd expect from the roll-out of the header. Lets see if we actually save any bandwidth.

month, bytes
Jul 102801938102
Aug 112226131167
Sep 114863415973
Oct 129538081241
Nov 129602584351
Dec 128053145082

I will do a more complete analysis of traffic in another post. My guess is that there are just more folks using the old client and they fetch more often which drowns out the git-using folks.

UA's by month.
Jul "Layman-2.0.0_rc1" 31578
Jul "Layman-2.0-git" 8322
Aug "Layman-2.0.0_rc1" 45005
Aug "Layman-2.0.0-git" 2593
Aug "Layman-2.0-git" 1850
Aug "Layman-2.0.0_rc2" 42129
Sep "Layman-2.0.0_rc1" 12926
Sep "Layman-2.0.0-git" 4886
Sep "Layman-2.0.0_rc3" 28124
Sep "Layman-2.0-git" 1520
Sep "Layman-2.0.0_rc2" 49786
Oct "Layman-2.0.0_rc1" 11686
Oct "Layman-2.0.0-git" 4752
Oct "Layman-2.0.0_rc3" 350227
Oct "Layman-2.0-git" 1534
Oct "Layman-2.0.0_rc2" 3472
Nov "Layman-2.0.0_rc1" 8506
Nov "Layman-2.0.0-git" 5612
Nov "Layman-2.0.0_rc3" 807441
Nov "Layman-2.0-git" 1462
Nov "Layman-2.0.0_rc2" 1337
Dec "Layman-2.0.0_rc1" 3910
Dec "Layman-2.0.0-git" 8669
Dec "Layman-2.0.0_rc3" 823422
Dec "Layman-2.0-git" 1222
Dec "Layman-2.0.0_rc2" 899

Unique IPs by month
Jul 40949
Aug 41291
Sep 39856
Oct 42943
Nov 41369
Dec 40160
Comment 2 Alec Warner (RETIRED) archtester gentoo-dev Security 2012-01-11 07:42:05 UTC
One idea might be bots. They don't obey the rules because they are not smart enough to send the header. Sadly bots account for very little traffic to this particular URL (200 queries over 6 months). We can expand the search and look for non-layman-like UAs. That gives us 2500 queries over six months. Nothing substantial.

One key indicator is adoption and traffic spread. No one is hammering us with requests from a specific IP. However what if a lot of ips are just using the old UA? Lets use the UA data from my previous post, but this time include the older layman UA (which is python-urllib)

Month, UA, count
Jul "Layman-2.0-git" 8322
Jul "Layman-2.0.0_rc1" 31578
Jul "Python-urllib/1.16" 62
Jul "Python-urllib/1.17" 7
Jul "Python-urllib/2.4" 31
Jul "Python-urllib/2.5" 101
Jul "Python-urllib/2.6" 351119
Jul "Python-urllib/2.7" 194758
Aug "Layman-2.0-git" 1850
Aug "Layman-2.0.0-git" 2593
Aug "Layman-2.0.0_rc1" 45005
Aug "Layman-2.0.0_rc2" 42129
Aug "Python-urllib/1.16" 62
Aug "Python-urllib/1.17" 4
Aug "Python-urllib/2.4" 87
Aug "Python-urllib/2.5" 93
Aug "Python-urllib/2.6" 373042
Aug "Python-urllib/2.7" 189958
Sep "Layman-2.0-git" 1520
Sep "Layman-2.0.0-git" 4886
Sep "Layman-2.0.0_rc1" 12926
Sep "Layman-2.0.0_rc2" 49786
Sep "Layman-2.0.0_rc3" 28124
Sep "Python-urllib/1.16" 60
Sep "Python-urllib/2.4" 150
Sep "Python-urllib/2.5" 90
Sep "Python-urllib/2.6" 373660
Sep "Python-urllib/2.7" 198028
Oct "Layman-2.0-git" 1534
Oct "Layman-2.0.0-git" 4752
Oct "Layman-2.0.0_rc1" 11686
Oct "Layman-2.0.0_rc2" 3472
Oct "Layman-2.0.0_rc3" 350227
Oct "Python-urllib/1.16" 64
Oct "Python-urllib/2.4" 158
Oct "Python-urllib/2.5" 69
Oct "Python-urllib/2.6" 394447
Oct "Python-urllib/2.7" 238640
Nov "Layman-2.0-git" 1462
Nov "Layman-2.0.0-git" 5612
Nov "Layman-2.0.0_rc1" 8506
Nov "Layman-2.0.0_rc2" 1337
Nov "Layman-2.0.0_rc3" 807441
Nov "Python-urllib/1.16" 62
Nov "Python-urllib/2.4" 151
Nov "Python-urllib/2.5" 62
Nov "Python-urllib/2.6" 369482
Nov "Python-urllib/2.7" 241322
Dec "Layman-2.0-git" 1222
Dec "Layman-2.0.0-git" 8669
Dec "Layman-2.0.0_rc1" 3910
Dec "Layman-2.0.0_rc2" 899
Dec "Layman-2.0.0_rc3" 823422
Dec "Python-urllib/1.16" 62
Dec "Python-urllib/1.17" 1
Dec "Python-urllib/2.4" 155
Dec "Python-urllib/2.5" 62
Dec "Python-urllib/2.6" 375870
Dec "Python-urllib/2.7" 234327

Based on UA we seem to see that query-rate of older (stable?) Layman is actually on the rise which is why we see an increase in 200's. Jul we saw approximately, 550k hits from urllib, in Dec we saw ~800k from the same UAs.

My last thought for the night is to map retcodes to UAs.

Layman:
200 104436
304 2157124

Python-urllib:
200 3522371
304 5593

The header stuff seems to be working admirably. The xml is about 220k and an HTTP 304 is probably 100x smaller than that.

Note: The numbers are fudged a bit. We also served 400's, 500's and so forth on occasion. However the query-count for those is minimal so they were discarded.
Comment 3 Alec Warner (RETIRED) archtester gentoo-dev Security 2012-01-11 07:42:27 UTC
Let me know if you need more data.