Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 4950 - distfiles scalability
Summary: distfiles scalability
Status: RESOLVED FIXED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Core (show other bugs)
Hardware: x86 Linux
: High enhancement (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2002-07-13 09:00 UTC by Marcel Kunath
Modified: 2011-10-30 22:38 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marcel Kunath 2002-07-13 09:00:12 UTC
Hello,

I am new to Gentoo. Read about it for two weeks and then tried it. The last week
has been a joy. Great work. I was so impressed my brains started creating ideas
on how to improve things. I hope it happens. 


Fact:

Emerge runs on a machine and downloads any files to be compiled and merged to

/usr/portage/distfiles

Files get saved locally and access at that point is only happening on a local basis.



Problem:

People with 3+ machines on their network and limited download capbilities must
find a way to reduce their bandwidth usage.



Solution:

distfiles must become scalable.

1. 
Create distfiles variables in some configuration file of emerge:

$DISTFILES_HIERARCHY
$DISTFILES_SAVE

2.
The variable defaults to:

$DISTFILES_HIERARCHY="mirror0 nfs smb local"
$DISTFILES_SAVE = "local"

3.
$DISTFILES_HIERARCHY variable defines the search order emerge will act upon when
it is executed. The names defined and separated by space define directories.

mirror0 defines /usr/portage/distfiles-mirror0
nfs     defines /usr/portage/distfiles-nfs
smb     defines /usr/portage/distfiles-smb
local   defines /usr/portage/distfiles-local

When a user runs emerge it will check the directories sequentially if the the
file to be merged is already present and need not to be downloaded.


a) mirror0 may be a directly synced mirror of the portage sources on ibiblio.org
using rsync or wget.
b) nfs is a local network NFS file system of the sources.
c) smb is a local network Samba file system of the sources.
d) local is the directory which is right now known as /usr/portage/distfiles and
is only affecting the local machine.


4.
If emerge fails to find the source file in any of the directories it shall be
downloaded. If the file gets downloaded we need to define the location to save
it to. $DISTFILES_SAVE allows us to do so.

(Note: If a user sets $DISTFILES_SAVE to nfs or smb emerge must check if root
has write access to this file system.)

5.
Both variables $DISTFILES_HIERARCHY $DISTFILES_SAVE can easily be tweaked and
changed depending on a user's needs and network setup.


I need this feature badly since I got ten machines on my network and I NFS out
the sources similar to what is described above. I also have terrible net access
and this is the only way I can limit my bandwidth needs.

Thanks,

Marcel
Comment 1 Markus Krainer 2002-07-13 13:19:11 UTC
I do not really understand what you want to achieve.

Whats wrong with setting up one machine to hold the /usr/portage tree
(including distfiles) and nfs-mount this directory from your other 10 
machines?

So no matter how often or from which machine you install a package, the file
gets only downloaded once (the first time you install it). Of course, as you
stated, root must have write permissions to the nfs mount.

 -Markus-
Comment 2 Marcel Kunath 2002-07-13 19:42:58 UTC
Here is why this is needed. When I did my first Gentoo install I had to:

1. I used the 135Mb install.
2. Download 5 packages like metalog and saved them to distfiles and merged them.
3. Compile kernel and Reboot.
4. NFS mount a share from my SuSE server.
5. Copy over the files from distfiles to distfiles-nfs.
6. Move distfiles to distfiles-local.
7. Create a symlink from distfiles to distfiles-nfs.
8. Edit fstab to automount NFS share.


If I install a second box using the old scheme it will take:

1. I used the 135Mb install.
2. Download 5 packages like metalog and saved them to distfiles and merged them.
3. Compile kernel and Reboot.
4. NFS mount a share from my SuSE server.
5. Move distfiles to distfiles-local.
6. Create a symlink from distfiles to distfiles-nfs.
7. Edit fstab to automount NFS share.

If we install a second machine using MY NEW SCHEME it takes less effort:

1. I used the 135Mb install.
2. Download 5 packages like metalog and saved them to distfiles-local and merged
them.
3. Compile kernel and Reboot.
4. NFS mount a share from my SuSE server distfiles-nfs.
5. Edit fstab to automount NFS share.



On top of that I can easily tweak where I write files to by changing a variable.
With the old scheme you have to do a lot of work to achieve this.

Imagine this scenario:

I want to upgrade a Gentoo box. I need to download some stuff. I usually store
my sources on NFS. Suddenly I got disk trouble on my NFS server. I run out of
space or the disk suddenly has a problem or maybe the machine has totally died.
What do I do?

Under the old scheme I have to unmount my NFS share, move directories and
redirect symlinks.

Under my scheme all I have to do is change a single variable and it switches
from distfiles-nfs to distfiles-local.

My idea is scalable for the network environment, the old one centres around
single machine setups.

I want to achieve choice to run my portage tree(s) as I see fit using two
additional variables.

Another point is. Disks have limited sizes. Maybe I have two 10 GB disks in my
server to NFS out Gentoo sources.

What do I do if one disk fills up? I have to make use of the second disk or
delete files on the first. I still want to use both disks' sources simultaneously.

The only way to do this properly is via LVM on the server or my proposed
variables on the client and

/usr/portage/distfiles-nfs-disk1
/usr/portage/distfiles-nfs-disk2

Once disk1 is full I just set

$DISTFILES_SAVE="nfs-disk2"

You cannot mount disk1 and disk2 to /usr/portage/distfiles at the same time.
It's one or the other using the old scheme. 

There is so much choice, flexiblity and scalability you can give administrators
by introducing my two variables.

dictionary.com says:

Scalability is how well a solution to some problem will work when the size of
the problem increases.

You have to look at complex network issues to understand why I think the
variables are needed.

Marcel
Comment 3 SpanKY gentoo-dev 2002-07-14 05:23:08 UTC
that is a hell of a lot of effort ... too much imho since not everyone needs 
this, and those who do usually come up with a solution on their own (see 
below) ;)

edit /etc/samba/smb.conf on the server which will be storing the distfiles

[distfiles]
        comment = gentoo portage distfiles
        path = /usr/portage/distfiles
        guest ok = yes
        write list = @root
        read only = no

then on your other machines put this into your /etc/fstab
//serverwithdistfiles/distfiles /usr/portage/distfiles smbfs guest 0 0
Comment 4 Brian Rozmierski 2002-07-14 14:36:52 UTC
To add to spanky's suggestion. If, during initital installation you can't mount
the smb (nfs, coda, imtermezzo, cans-n-string, etc..) partition, have the
machine hosting the distfiles directory export it through apache and put the
full URL in /etc/make.conf as GENTOO_MIRRORS. Careful to leave the ibiblio site
there, just put yours in first, space delimited.

This way if the file is on that server it will download it from there first,
else it will fail to ibiblio, and ultimately the SRC_URL (I think) in the ebuild.

You can also do something similiar with rsync, but it only allows you to specify
one SYNC server. (So you'd have to change it eventually, or keep that machine
well in-sync.)
Comment 5 SpanKY gentoo-dev 2002-07-14 14:40:12 UTC
i like that idea brianr, thx for the suggestion ;)
Comment 6 Andy Romeril 2003-01-22 14:23:48 UTC
Here's another wrinkle that an "n-tier distfiles" would help with:

You have a laptop and carry around the current contents of
/usr/portage/distfiles on read-only media, such as CD or DVD to save precious
local disk space.  After an 'emerge sync', you find that one or more packages
requires updated sources.  I already have my "local" sources mounted so that
portage can find them, but now I need to have a place for the *new* sources to
go, since I can't write to the read-only media.

I suppose that it would be possible to simply create symlinks in
/usr/portage/distfiles to every file on the R/O media and simply let portage
download new sources directly into that directory, but the approach lacks elegance.

Thoughts?  Thanks for listening.
Andy Romeril
(aka buckyball)
Comment 7 Troy Dack 2003-01-22 16:05:46 UTC
Also don't forget that the LiveCD can do NFS straight from booting.

I currently have /usr/portage sitting on my RH box and exported via NFS.

My last two re-installs I simply:
1. booted the LiveCD, 
2. started Portmap and NFS 
3. made my partitions, 
4. mounted physical drives, 
5. untarred the stage I was using
6. for good measure rm -fr'd /mnt/gentoo/usr/portage/*
7. mount -t nfs remotebox:/path/to/portage /mnt/gentoo/usr/portage
8. chroot in and away you go

This sped up my install significantly because the emerge rsync didn't have to
rsync the entire tree and I already had ALL my distfiles ready to go.

During the install I configured my fstab to mount /usr/portage at boot and made
sure that I rc-update'd portmap and nfs.  No need to copying files, symlinks or
any of that stuff, and the server does it's job, serving files ;)
Comment 8 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2003-07-08 13:40:45 UTC
For multiple distfiles directories, see Kurt Hindenburg's ongoing work here:
http://www.cherrynebula.net/projects/portage-kvh/portage-kvh.html

I suggested a new approach to him that would avoid breakage in any of the current ebuilds.
Comment 9 Marius Mauch (RETIRED) gentoo-dev 2004-02-09 01:27:48 UTC
You can now put normal paths in GENTOO_MIRRORS, that should solve the problem. For the example in the original report the following setup would be equivalent:

GENTOO_MIRRORS="/usr/portage/distfiles-mirror0 /usr/portage/distfiles-nfs /usr/portage/distfiles-smb"
DISTDIR="/usr/portage/distfiles-local"