Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 681976 - AAAA roundrobin for distfiles.gentoo.org contains dead hosts
Summary: AAAA roundrobin for distfiles.gentoo.org contains dead hosts
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Infrastructure
Classification: Unclassified
Component: Other web server issues (show other bugs)
Hardware: All Linux
: Normal major (vote)
Assignee: Gentoo Infrastructure
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-03-29 06:48 UTC by petre rodan
Modified: 2022-01-24 22:26 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description petre rodan 2019-03-29 06:48:23 UTC
this is bugging me for at least one year now. every time I have to merge a new packet it takes for ever to get the archive from the main distfiles website.
this is due to the fact that your ipv6 roudrobin always seems to have a dead host in it and my RNG always seems to favor that particular mirror.

so please be so kind and set up a cron-based script that tests the connectivity and removes the dead entries from that record.

lipo ~ # dig +short AAAA distfiles.gentoo.org
2605:bc80:3010::134
2600:3402:200:227::2
2600:3404:200:237::2
2a00:8a60:e012:a00::21

lipo ~ # telnet 2605:bc80:3010::134 80
Trying 2605:bc80:3010::134...
Connected to 2605:bc80:3010::134.
Escape character is '^]'.
^]
telnet> quit
Connection closed.

lipo ~ # telnet 2600:3404:200:237::2 80
Trying 2600:3404:200:237::2...
Connected to 2600:3404:200:237::2.
Escape character is '^]'.
^]
telnet> quit
Connection closed.

lipo ~ # telnet 2a00:8a60:e012:a00::21 80
Trying 2a00:8a60:e012:a00::21...
telnet: connect to address 2a00:8a60:e012:a00::21: Connection timed out
lipo ~ # telnet 2600:3402:200:227::2 80
Trying 2600:3402:200:227::2...
Connected to 2600:3402:200:227::2.
Escape character is '^]'.
^]
telnet> quit
Connection closed.
Comment 1 Alec Warner (RETIRED) archtester gentoo-dev Security 2019-03-29 17:07:32 UTC
(In reply to petre rodan from comment #0)
> this is bugging me for at least one year now. every time I have to merge a
> new packet it takes for ever to get the archive from the main distfiles
> website.
> this is due to the fact that your ipv6 roudrobin always seems to have a dead
> host in it and my RNG always seems to favor that particular mirror.

I'm happy to remove any unavailable mirrors.

> 
> so please be so kind and set up a cron-based script that tests the
> connectivity and removes the dead entries from that record.

Portage should be retrying a number of times; the AAAA record has multiple entries precisely so that if one is dead portage will try another one. While this might add latency for downloads (each attempt has some time cost) we shouldn't be in a state where more than say, 3 of the records are dead; and so retries should cover us here.

Cron based DNS edits are non-trivial in our existing setup and have their own issues that I think outweigh the gains of automation here.

> 
> lipo ~ # dig +short AAAA distfiles.gentoo.org
> 2605:bc80:3010::134
> 2600:3402:200:227::2
> 2600:3404:200:237::2
> 2a00:8a60:e012:a00::21
> 
> lipo ~ # telnet 2605:bc80:3010::134 80
> Trying 2605:bc80:3010::134...
> Connected to 2605:bc80:3010::134.
> Escape character is '^]'.
> ^]
> telnet> quit
> Connection closed.
> 
> lipo ~ # telnet 2600:3404:200:237::2 80
> Trying 2600:3404:200:237::2...
> Connected to 2600:3404:200:237::2.
> Escape character is '^]'.
> ^]
> telnet> quit
> Connection closed.
> 
> lipo ~ # telnet 2a00:8a60:e012:a00::21 80
> Trying 2a00:8a60:e012:a00::21...
> telnet: connect to address 2a00:8a60:e012:a00::21: Connection timed out
> lipo ~ # telnet 2600:3402:200:227::2 80
> Trying 2600:3402:200:227::2...
> Connected to 2600:3402:200:227::2.
> Escape character is '^]'.
> ^]
> telnet> quit
> Connection closed.
Comment 2 Alec Warner (RETIRED) archtester gentoo-dev Security 2019-03-29 17:09:32 UTC
(In reply to Alec Warner from comment #1)
> (In reply to petre rodan from comment #0)
> > this is bugging me for at least one year now. every time I have to merge a
> > new packet it takes for ever to get the archive from the main distfiles
> > website.
> > this is due to the fact that your ipv6 roudrobin always seems to have a dead
> > host in it and my RNG always seems to favor that particular mirror.
> 
> I'm happy to remove any unavailable mirrors.
> 
> > 
> > so please be so kind and set up a cron-based script that tests the
> > connectivity and removes the dead entries from that record.
> 
> Portage should be retrying a number of times; the AAAA record has multiple
> entries precisely so that if one is dead portage will try another one. While
> this might add latency for downloads (each attempt has some time cost) we
> shouldn't be in a state where more than say, 3 of the records are dead; and
> so retries should cover us here.
> 
> Cron based DNS edits are non-trivial in our existing setup and have their
> own issues that I think outweigh the gains of automation here.
> 
> > 
> > lipo ~ # dig +short AAAA distfiles.gentoo.org
> > 2605:bc80:3010::134
> > 2600:3402:200:227::2
> > 2600:3404:200:237::2
> > 2a00:8a60:e012:a00::21
> > 
> > lipo ~ # telnet 2605:bc80:3010::134 80
> > Trying 2605:bc80:3010::134...
> > Connected to 2605:bc80:3010::134.
> > Escape character is '^]'.
> > ^]
> > telnet> quit
> > Connection closed.
> > 
> > lipo ~ # telnet 2600:3404:200:237::2 80
> > Trying 2600:3404:200:237::2...
> > Connected to 2600:3404:200:237::2.
> > Escape character is '^]'.
> > ^]
> > telnet> quit
> > Connection closed.
> > 
> > lipo ~ # telnet 2a00:8a60:e012:a00::21 80
> > Trying 2a00:8a60:e012:a00::21...
> > telnet: connect to address 2a00:8a60:e012:a00::21: Connection timed out
> > lipo ~ # telnet 2600:3402:200:227::2 80
> > Trying 2600:3402:200:227::2...
> > Connected to 2600:3402:200:227::2.
> > Escape character is '^]'.
> > ^]
> > telnet> quit
> > Connection closed.

Note that for me, all 4 of these are working currently.

-A
Comment 3 petre rodan 2019-03-30 05:23:47 UTC
2a00:8a60:e012:a00::21 is not working for me and it looks like maybe I get filtered at the very last hop. my local ipv6 is currently 2a02:2f0e:808:4e10:1687:67f5:603f:61c1/64 .

prodan@lipo ~ $ telnet 2a00:8a60:e012:a00::21 80
Trying 2a00:8a60:e012:a00::21...
telnet: connect to address 2a00:8a60:e012:a00::21: Connection timed out

prodan@lipo ~ $ traceroute6 2a00:8a60:e012:a00::21
traceroute to 2a00:8a60:e012:a00::21 (2a00:8a60:e012:a00::21), 30 hops max, 80 byte packets
 1  2a02:2f0e:808:4e10::1 (2a02:2f0e:808:4e10::1)  4.076 ms  4.012 ms  3.973 ms
 2  2a02:2f0e:8ff:ff00::2 (2a02:2f0e:8ff:ff00::2)  16.673 ms  15.185 ms  15.133 ms
 3  2a02:2f0e:eff:ff01::1 (2a02:2f0e:eff:ff01::1)  17.230 ms  16.450 ms  16.385 ms
 4  2a02:2f00:8708:2:2:0:6:0 (2a02:2f00:8708:2:2:0:6:0)  11.016 ms  11.879 ms  12.977 ms
 5  rcsrds-ic-323846-ffm-b4.c.telia.net (2001:2000:3080:131e::2)  35.586 ms  40.588 ms  35.500 ms
 6  * * *
 7  kr-aah15.x-win.dfn.de (2001:638:c:a100::2)  33.511 ms  34.700 ms  34.794 ms
 8  fw-xwin-2-vlan106.noc.rwth-aachen.de (2a00:8a60:0:f000::12)  32.409 ms  32.387 ms  33.379 ms
 9  n7k-lssnord-1-vl158.noc.rwth-aachen.de (2a00:8a60:0:f001::4)  35.245 ms  37.077 ms  35.676 ms
10  n7k-sw23-2-et2-1.noc.rwth-aachen.de (2a00:8a60:0:f025::2)  38.246 ms  36.529 ms  38.081 ms
11  * * *
12  * * *
13  * * *
14  * * *
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
[..]

from a different location that uses a Hurricane Electric tunnel the mirror is indeed reached:

skycaves ~ # telnet 2a00:8a60:e012:a00::21 80
Trying 2a00:8a60:e012:a00::21...
Connected to 2a00:8a60:e012:a00::21.
Escape character is '^]'.
^]
telnet> quit
Connection closed.

skycaves ~ # traceroute6 2a00:8a60:e012:a00::21
traceroute to 2a00:8a60:e012:a00::21 (2a00:8a60:e012:a00::21), 30 hops max, 80 byte packets
 1  petrerodan-1.tunnel.tserv5.lon1.ipv6.he.net (2001:470:1f08:34a::1)  5.089 ms  8.191 ms  11.104 ms
 2  10ge3-16.core1.lon2.he.net (2001:470:0:67::1)  11.021 ms  10.993 ms  10.970 ms
 3  100ge7-1.core1.fra1.he.net (2001:470:0:37::2)  23.022 ms  22.999 ms  22.977 ms
 4  * * *
 5  kr-aah15.x-win.dfn.de (2001:638:c:a100::2)  27.421 ms  27.082 ms  27.927 ms
 6  fw-xwin-1-vlan106.noc.rwth-aachen.de (2a00:8a60:0:f000::11)  27.031 ms  17.638 ms  17.575 ms
 7  n7k-lssnord-1-vl158.noc.rwth-aachen.de (2a00:8a60:0:f001::4)  18.879 ms  18.342 ms  18.162 ms
 8  n7k-sw23-2-et2-1.noc.rwth-aachen.de (2a00:8a60:0:f025::2)  18.485 ms  19.092 ms  18.979 ms
 9  ftp.halifax.rwth-aachen.de (2a00:8a60:e012:a00::21)  17.461 ms  17.776 ms  17.438 ms

it is true that a timeout exists, but for a packet like dev-texlive/texlive-fontsextra where hundreds of files need to be downloaded it will take 60 seconds of timeout * 3 retries * hundreds of files = many hours of nothing. and I swear that the broken ip is always the one that ends up used first - which I find extremely weird.

>>> Downloading 'http://distfiles.gentoo.org/distfiles/texlive-module-Asana-Math-2017.tar.xz'
--2019-03-30 07:13:11--  http://distfiles.gentoo.org/distfiles/texlive-module-Asana-Math-2017.tar.xz
Resolving distfiles.gentoo.org... 2a00:8a60:e012:a00::21, 2600:3402:200:227::2, 2605:bc80:3010::134, ...
Connecting to distfiles.gentoo.org|2a00:8a60:e012:a00::21|:80... zzZZZzzzZZ ^C

I did indeed tweak the local FETCHCOMMAND, but that's not a proper fix.
Comment 4 Alec Warner (RETIRED) archtester gentoo-dev Security 2019-03-30 13:35:39 UTC
(In reply to petre rodan from comment #3)
> 2a00:8a60:e012:a00::21 is not working for me and it looks like maybe I get
> filtered at the very last hop. my local ipv6 is currently
> 2a02:2f0e:808:4e10:1687:67f5:603f:61c1/64 .

Ok, well lets follow up with them to see why you appear to be filtered.

My original test was HE, but I also found some other ipv6 native hosts and I tried those and they worked; so we should determine why your reachability is impaired to this particular mirror. I'm not sure we should remove it just because 1 user reports a reachability problem when its likely that many users are using the mirror just fine.


> 
> prodan@lipo ~ $ telnet 2a00:8a60:e012:a00::21 80
> Trying 2a00:8a60:e012:a00::21...
> telnet: connect to address 2a00:8a60:e012:a00::21: Connection timed out
> 
> prodan@lipo ~ $ traceroute6 2a00:8a60:e012:a00::21
> traceroute to 2a00:8a60:e012:a00::21 (2a00:8a60:e012:a00::21), 30 hops max,
> 80 byte packets
>  1  2a02:2f0e:808:4e10::1 (2a02:2f0e:808:4e10::1)  4.076 ms  4.012 ms  3.973
> ms
>  2  2a02:2f0e:8ff:ff00::2 (2a02:2f0e:8ff:ff00::2)  16.673 ms  15.185 ms 
> 15.133 ms
>  3  2a02:2f0e:eff:ff01::1 (2a02:2f0e:eff:ff01::1)  17.230 ms  16.450 ms 
> 16.385 ms
>  4  2a02:2f00:8708:2:2:0:6:0 (2a02:2f00:8708:2:2:0:6:0)  11.016 ms  11.879
> ms  12.977 ms
>  5  rcsrds-ic-323846-ffm-b4.c.telia.net (2001:2000:3080:131e::2)  35.586 ms 
> 40.588 ms  35.500 ms
>  6  * * *
>  7  kr-aah15.x-win.dfn.de (2001:638:c:a100::2)  33.511 ms  34.700 ms  34.794
> ms
>  8  fw-xwin-2-vlan106.noc.rwth-aachen.de (2a00:8a60:0:f000::12)  32.409 ms 
> 32.387 ms  33.379 ms
>  9  n7k-lssnord-1-vl158.noc.rwth-aachen.de (2a00:8a60:0:f001::4)  35.245 ms 
> 37.077 ms  35.676 ms
> 10  n7k-sw23-2-et2-1.noc.rwth-aachen.de (2a00:8a60:0:f025::2)  38.246 ms 
> 36.529 ms  38.081 ms
> 11  * * *
> 12  * * *
> 13  * * *
> 14  * * *
> 15  * * *
> 16  * * *
> 17  * * *
> 18  * * *
> 19  * * *
> 20  * * *
> 21  * * *
> [..]
> 
> from a different location that uses a Hurricane Electric tunnel the mirror
> is indeed reached:
> 
> skycaves ~ # telnet 2a00:8a60:e012:a00::21 80
> Trying 2a00:8a60:e012:a00::21...
> Connected to 2a00:8a60:e012:a00::21.
> Escape character is '^]'.
> ^]
> telnet> quit
> Connection closed.
> 
> skycaves ~ # traceroute6 2a00:8a60:e012:a00::21
> traceroute to 2a00:8a60:e012:a00::21 (2a00:8a60:e012:a00::21), 30 hops max,
> 80 byte packets
>  1  petrerodan-1.tunnel.tserv5.lon1.ipv6.he.net (2001:470:1f08:34a::1) 
> 5.089 ms  8.191 ms  11.104 ms
>  2  10ge3-16.core1.lon2.he.net (2001:470:0:67::1)  11.021 ms  10.993 ms 
> 10.970 ms
>  3  100ge7-1.core1.fra1.he.net (2001:470:0:37::2)  23.022 ms  22.999 ms 
> 22.977 ms
>  4  * * *
>  5  kr-aah15.x-win.dfn.de (2001:638:c:a100::2)  27.421 ms  27.082 ms  27.927
> ms
>  6  fw-xwin-1-vlan106.noc.rwth-aachen.de (2a00:8a60:0:f000::11)  27.031 ms 
> 17.638 ms  17.575 ms
>  7  n7k-lssnord-1-vl158.noc.rwth-aachen.de (2a00:8a60:0:f001::4)  18.879 ms 
> 18.342 ms  18.162 ms
>  8  n7k-sw23-2-et2-1.noc.rwth-aachen.de (2a00:8a60:0:f025::2)  18.485 ms 
> 19.092 ms  18.979 ms
>  9  ftp.halifax.rwth-aachen.de (2a00:8a60:e012:a00::21)  17.461 ms  17.776
> ms  17.438 ms
> 
> it is true that a timeout exists, but for a packet like
> dev-texlive/texlive-fontsextra where hundreds of files need to be downloaded
> it will take 60 seconds of timeout * 3 retries * hundreds of files = many
> hours of nothing. and I swear that the broken ip is always the one that ends
> up used first - which I find extremely weird.
> 
> >>> Downloading 'http://distfiles.gentoo.org/distfiles/texlive-module-Asana-Math-2017.tar.xz'
> --2019-03-30 07:13:11-- 
> http://distfiles.gentoo.org/distfiles/texlive-module-Asana-Math-2017.tar.xz
> Resolving distfiles.gentoo.org... 2a00:8a60:e012:a00::21,
> 2600:3402:200:227::2, 2605:bc80:3010::134, ...
> Connecting to distfiles.gentoo.org|2a00:8a60:e012:a00::21|:80... zzZZZzzzZZ
> ^C
> 
> I did indeed tweak the local FETCHCOMMAND, but that's not a proper fix.
Comment 5 Alec Warner (RETIRED) archtester gentoo-dev Security 2022-01-24 22:26:05 UTC
We moved distfiles.gentoo.org to a CDN, my hope is you have better connectivity to the CDN.