I quickly scanned through the mirrorselect code, and it seems that a particularly long DNS resolution or TCP negotiation during the timed fetch would muck up the timing results. I think perhaps the best way to get accurate timings is to get a quick dummy file (i.e. a 1 byte or 1K file) just to resolve the hostname and "wake up" the path between the local machine and the remote machine, and THEN proceed to do the timing. Am I making sense? Reproducible: Always Steps to Reproduce:
Timing after DNS lookup was cached makes sense to me... let's see what the maintainers think of the idea.
Any developers going to pick {on,up} this idea or shall I write a patch myself, or what?
(In reply to comment #2) > Any developers going to pick {on,up} this idea or shall I write a patch myself, > or what? Patches are always welcome!
We can use socket.getaddrinfo() to resolve the host names to ip addresses. There's an example in the rsync_protocol_scan.py script that's attached to bug 168646. Also, as mentioned in bug #244997, comment #4, I'd like to convert the download timing code to use urllib instead of spawing a fetcher.
Created attachment 174369 [details, diff] resolve hostname before URL fetch I had forgotten about this bug until Zac's comment. Anyway. In the past 5 minutes, I whipped up a quick patch. Hopefully this is acceptable until urllib usage is implemented.
Also, another problem that I wasn't clear about in my initial description: You might notice that if you ping a server that you haven't connected to recently, the first ping latency is much higher than the following ping replies. Same thing with HTTP requests. This isn't just DNS resolution. This is basically "waking up" the path. I can't explain the latency in precise terms as I normally would, but I suspect that servers between point A and point B take a while to realize they need to route something. Once the route is figured out, things go nicely.
I suppose we can do the "wake up" by opening a connection to the file without actually downloading it. Then we'll open the connection a second time for the timed run.
This is fixed in mirrorselect-1.4 (accounts for dns and routing 'wake up').