User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7 Build Identifier: See email conversation below: > romandas: > Would it be possible to have the emerge-webrsync script check for the > existance of portage-latest before checking the dated portage > snapshots? The reason for this is since I am in a later timezone than > the main Gentoo servers (an assumption; I am in GMT +1) when I use > emerge-webrsync at 0900 my time, it tries to grab the current day's > portage, which hasn't been posted yet. Whereas portage-latest always > (as far as I know) points to the latest version available. > > I realize this just avoids a simple timeout and an annoying message; I > even went so far as to modify the script myself (the variable anyway) > but figure that will get overwritten next emerge-webrsync update. I > am not familiar with shell scripting though (Perl yes, sh no) so did > not put in anything like "check for portage-latest else check for > portage-ddmmyy". If I knew how to do that little bit, I'd send you a > code snippet. > > Let me know what you think. karltk@gentoo.org: This is a reasonable request. At the very least, this functionality should be exported via a command line parameter. I am on leave just now, however, so I cannot attend to it myself. Can you put this in a bug on bugs.gentoo.org? Then other developers are able to see your request, and hopefully tend to it. Reproducible: Always Steps to Reproduce: 1. Be in the CET timezone 2. Run emerge-webrsync around 0900 Actual Results: The script tries to pull down portage-<datestamp>.tar.bz2 Expected Results: It would be nice if it tried to pull portage-latest.tar.bz2 first, then switched to the datestamped file.
As it has been over a month since the bug was entered and there was no reaction, I'd like to provide a patch myself. Beware though, I'm no bash hacker. And yes, I'm in the GMT+1 timezone too. A patch in unified diff format: --- emerge-webrsync 2006-11-20 14:03:34.000000000 +0100 +++ /usr/sbin/emerge-webrsync 2006-11-20 14:40:54.000000000 +0100 @@ -83,23 +83,26 @@ echo "Fetching most recent snapshot" -declare -i attempts=-1 +declare -i attempts=-2 while (( $attempts < 40 )) ; do attempts=$(( attempts + 1 )) + if [ $attempts == -2 ]; then + FILE_ORIG="portage-latest.tar.bz2" + else - #this too, sucks. it works in the interim though. - if [ "$USERLAND" == "BSD" ] || [ "$USERLAND" == "Darwin" ] ; then - daysbefore=$(expr $(date +"%s") - 86400 \* $attempts) - day=$(date -r $daysbefore +"%d") - month=$(date -r $daysbefore +"%m") - year=$(date -r $daysbefore +"%Y") - else - day=$(date -d "-$attempts day" +"%d") - month=$(date -d "-$attempts day" +"%m") - year=$(date -d "-$attempts day" +"%Y") - fi - - FILE_ORIG="portage-${year}${month}${day}.tar.bz2" + #this too, sucks. it works in the interim though. + if [ "$USERLAND" == "BSD" ] || [ "$USERLAND" == "Darwin" ] ; then + daysbefore=$(expr $(date +"%s") - 86400 \* $attempts) + day=$(date -r $daysbefore +"%d") + month=$(date -r $daysbefore +"%m") + year=$(date -r $daysbefore +"%Y") + else + day=$(date -d "-$attempts day" +"%d") + month=$(date -d "-$attempts day" +"%m") + year=$(date -d "-$attempts day" +"%Y") + fi + FILE_ORIG="portage-${year}${month}${day}.tar.bz2" + fi echo "Attempting to fetch file dated: ${year}${month}${day}" got_md5=0
(In reply to comment #1) Ofcourse this is wrong: + if [ $attempts == -2 ]; then it should be + if [ $attempts == -1 ]; then Sorry.
Most of the time this approach will work fine. However, using portage-latest.tar.bz2 does introduce a race condition between the fetching of the tarball and the md5sum. It's possible that the portage-latest has been updated between those two events. If possible, I'd prefer not to introduce a race condition like this. How much time is wasted, on average, when an attempt is made to fetch the current day's snapshot?
(In reply to comment #3) > Most of the time this approach will work fine. However, using > portage-latest.tar.bz2 does introduce a race condition between the fetching of > the tarball and the md5sum. It's possible that the portage-latest has been > updated between those two events. Just to be strict, the md5sum is fetched before the tarball, but that doesn't actually change anything. > If possible, I'd prefer not to introduce a > race condition like this. I can't disagree. But let's consider three following scenarios: 1) Let's leave portage-latest out: a) All those people syncing after day change at their place but before day change and update on the server (isn't that about quarter of the globe?) perform one 404 hit before they fetch the freshest portage. That's acceptable. It won't happen with a server in UTC+12. b) Many will have the same date as the file despite living in different timezone. Everything goes just fine. c) All those, who sync between day change and update at the server and at their place will most likely fetch an outdated tarball. That's bad. It won't happen with a server in UTC-12. 2) Now let's make some use of the portage-latest: Everybody tries to fetch the freshest tarball regardless of their local time. However, once in a while some unlucky fellow will fetch old md5sum and then a fresh tarball just after that, which will make him either follow on to the next mirror or try to fetch the portage tree based on date if all the mirrors fail (or the race condition replays). In the latter case, they will fall back to the behaviour described in 1a, which will make them fetch the whole tarball again (which might be the same as the latest or an outdated one). Now, how probable is that? Don't leave your home if that happens - a really bad day. Harmfull? Not much. Acceptable? Who needs such loosers anyway? Just kidding. > How much time is wasted, on average, when an attempt > is made to fetch the current day's snapshot? Not much on my ADSL line with proxy, but why barf at the poor server? alnitak ~ $ time wget http://distfiles.gentoo.org/snapshots/portage-20061122.tar.bz2.md5sum --13:39:20-- http://distfiles.gentoo.org/snapshots/portage-20061123.tar.bz2.md5sum => `portage-20061123.tar.bz2.md5sum' Connecting to 10.102.10.102:8888... connected. Proxy request sent, awaiting response... 404 Not Found 13:39:20 ERROR 404: Not Found. real 0m0.341s user 0m0.000s sys 0m0.000s PS. Calculating how many servers will match 1a, 1b and 1c is quite complicated, I know. One should take into account the time at which server does updates, server's timezone, clients' sync time and their timezones, Gentoo users population density and many more, I suppose. Some statistical analysis would be interesting. Anyone?
(In reply to comment #4) > 1) Let's leave portage-latest out: > a) All those people syncing after day change at their place but before day > change and update on the server (isn't that about quarter of the globe?) > perform one 404 hit before they fetch the freshest portage. That's acceptable. > It won't happen with a server in UTC+12. > b) Many will have the same date as the file despite living in different > timezone. Everything goes just fine. > c) All those, who sync between day change and update at the server and at their > place will most likely fetch an outdated tarball. That's bad. It won't happen > with a server in UTC-12. How about if we smooth over the timezone differences by using date -u (UTC mode)? It seems to me that it should have been using UTC all along. I know the portage-latest race condition is quite unlikely to bite anyone, but still, I'd really prefer not to have one. :)
In svn r5154 I've fixed it to use UTC time for decisions about which snapshots to download.