Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 134076 - resources for new gentoo-stats service
Summary: resources for new gentoo-stats service
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Infrastructure
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All Linux
: High enhancement (vote)
Assignee: Gentoo Infrastructure
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-05-22 20:23 UTC by Marius Mauch (RETIRED)
Modified: 2006-08-26 11:08 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marius Mauch (RETIRED) gentoo-dev 2006-05-22 20:23:17 UTC
see dev.gentoo.org/~genone/temp/soc-application-v3.txt for a description of the new service as well as a rough roadmap.

When the project goes into the test phase I won't be able to do this on my own boxes anymore due to bandwidth limitations, there I'd like to have some place on Gentoo infra to host this then.
The service will mostly consist of a (rather large) database and a webinterface for data upload and db queries.

So what do I need for this?
- >=5 GB of diskspace (estimate about 10-20 MB per uploading client, so this will grow quite a bit when it goes into public test/production)
- sufficient bandwidth (especially for uploads to the server)
- a mysql-5 database
- apache2 with mod_python
Can't say much about required cpu/ram.

Not much code for the server side of the service exists yet, so required software isn't fix yet, but unlikely to change.

If you think that you won't be able to implement this until end-june please let me know so I can look for other options. TIA
Comment 1 Lance Albertson (RETIRED) gentoo-dev 2006-05-22 20:45:47 UTC
(In reply to comment #0)

> So what do I need for this?
> - >=5 GB of diskspace (estimate about 10-20 MB per uploading client, so this
> will grow quite a bit when it goes into public test/production)

Is that database space or web server space? And do you have any estimates for growth in the next year if this catches on? 10-20M per client is pretty insane if you start talking about thousands of users. 

> - sufficient bandwidth (especially for uploads to the server)

Any guess on the average or max bandwidth we'll need?

> - a mysql-5 database
> - apache2 with mod_python
> Can't say much about required cpu/ram.

We probably don't have much problem with getting the test environment up, but I will definately need a better number once this gets into production. I'll have to evaluate our resources and figure out what we need for it.

Thanks-
Comment 2 Marius Mauch (RETIRED) gentoo-dev 2006-05-22 21:44:08 UTC
(In reply to comment #1)
> > So what do I need for this?
> > - >=5 GB of diskspace (estimate about 10-20 MB per uploading client, so this
> > will grow quite a bit when it goes into public test/production)
> 
> Is that database space or web server space? And do you have any estimates for
> growth in the next year if this catches on? 10-20M per client is pretty insane
> if you start talking about thousands of users. 

Well, this is a very rough estimate, it's the size of the database and storage space for the reference records (these take the most). There are several ways to reduce it by dropping some not so important features, making the growth non-linear. But at least for (closed) testing I'd like to keep those for debugging purposes (need the reference data). Also some reduction might be reached at the expense of cpu load by compressing stuff or using more efficient storage formats, though that would still mean linear growth.

> > - sufficient bandwidth (especially for uploads to the server)
> 
> Any guess on the average or max bandwidth we'll need?

depends on the number of testers, the workflow is like this:
- at first upload client uploads reference record (big, about 2-3 MB compressed)
- later updates just send a delta to that reference record (size depends on update interval, but typically <100KB compressed)
- each delta updates the reference record on both sides
- in parallel users query the db (no clue how much traffic this will cause)

I'd guess that for testing with 100-200 clients a one or two MBit/s average should be sufficient. If that's not possible either we can limit the number of testers or they simply will have to be patient on uploads.
Seeding the database is the thing that causes the most traffic, but that's a one time cost for each client (in theory at least).

> We probably don't have much problem with getting the test environment up, but 
> will definately need a better number once this gets into production. I'll have
> to evaluate our resources and figure out what we need for it.

Sure, that's one reason why I'm planning such an extensive test phase (besides getting test data).
Comment 3 Marius Mauch (RETIRED) gentoo-dev 2006-06-29 19:07:52 UTC
any news on this one? Could make use of it in a week.
Comment 4 Lance Albertson (RETIRED) gentoo-dev 2006-06-30 15:25:47 UTC
(In reply to comment #3)
> any news on this one? Could make use of it in a week.
> 

Haven't gotten very far on it. Let me think about it over the weekend. When would be the last day I could get this done for you?
Comment 5 Marius Mauch (RETIRED) gentoo-dev 2006-06-30 22:47:29 UTC
Well, really need to get a test system up soon if I don't want to run into problems with the overall timeframe (I'm already late due to other problems). So if we can't get something by next weekend (July 10th) I'll have to look for alternate resources for the test system.
Comment 6 Lance Albertson (RETIRED) gentoo-dev 2006-07-11 20:33:34 UTC
(In reply to comment #5)
> Well, really need to get a test system up soon if I don't want to run into
> problems with the overall timeframe (I'm already late due to other problems).
> So if we can't get something by next weekend (July 10th) I'll have to look for
> alternate resources for the test system.

If you can wait a few more days, I'll have a system up and ready for you. I'm going to use a rebuilt toucan if you don't mind. Since you have conerns with space, I'm asking joe to swap the drives out tomorrow morning and rebuild the RAID with bigger drives. I'll be making a backup copy of the currently install system and copying it back. After that, it shouldn't be just me setting up mysql/apache/etc for ya. Sorry for the late response.

Let me know what you think. I may have curtis119 help ya if I get tied up on something.
Comment 7 Marius Mauch (RETIRED) gentoo-dev 2006-07-13 09:33:50 UTC
Given that I haven't made much progress myself (weather sucks the life out of me) that works for me.
Comment 8 Marius Mauch (RETIRED) gentoo-dev 2006-08-01 09:32:49 UTC
Little update: with the last few test runs it seems that storing a compressed record (with all modules) on disk needs about 200-500 kb. However compression doesn't really work within the DB, and I don't know how to get a reliable estimate for innodb tables (as all innodb tables are stored in a single file, and that file doesn't seem to shrink when you delete data from the tables) so no real estimate there. But it's safe to say that I'm going to need (much) more db space than normal storage.
Btw, could use access now to see how the db code performs there (my systems here are very different from toucan, might be good to see the code running on something with less cpu but more IO). Would be just local tests for now (didn't have a chance to work on the interface code yet).
Comment 9 Marius Mauch (RETIRED) gentoo-dev 2006-08-07 01:13:24 UTC
any progress?
Comment 10 Lance Albertson (RETIRED) gentoo-dev 2006-08-07 06:15:10 UTC
Crap, I totally forgot about this :-(. I'll see about getting someone on this and getting them access.
Comment 11 Curtis Napier (RETIRED) gentoo-dev 2006-08-07 21:58:55 UTC
(In reply to comment #10)
> Crap, I totally forgot about this :-(. I'll see about getting someone on this
> and getting them access.
> 

Did you forget about me? I emailed jrinkovs about the hard drive issue but he hasn't responded yet.
Comment 12 Lance Albertson (RETIRED) gentoo-dev 2006-08-08 06:43:59 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > Crap, I totally forgot about this :-(. I'll see about getting someone on this
> > and getting them access.
> > 
> 
> Did you forget about me? I emailed jrinkovs about the hard drive issue but he
> hasn't responded yet.
> 

Ended up that toucan uses a weird style of hard drives so we couldn't go that route. If need be we can move this to the new box at GNi eventually. I got him all setup last night, so he should be good outside of minor things. Sorry I didn't get a chance to hooking you up :(.
Comment 13 Lance Albertson (RETIRED) gentoo-dev 2006-08-26 11:08:41 UTC
We got this done. Please let me know if you need anything else via irc or email. Thanks!