Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 596372 - net-misc/wget-1.18: mirroring devmanual.gentoo.org gets stuck in an infinite pit of slashes
Summary: net-misc/wget-1.18: mirroring devmanual.gentoo.org gets stuck in an infinite ...
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-06 21:27 UTC by Raymond Jennings
Modified: 2016-12-07 08:22 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
wget log (err,175.16 KB, text/plain)
2016-10-06 21:27 UTC, Raymond Jennings
Details
emerge --info (einfo.txt,6.28 KB, text/plain)
2016-10-06 21:37 UTC, Raymond Jennings
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Raymond Jennings 2016-10-06 21:27:13 UTC
Created attachment 449360 [details]
wget log

wget --mirror devmanual.gentoo.org --rate-limit=100k

Results in an infinite dig.

Shouldn't wget notice that the URLs are going to the same page and not attempt to recurse again?

Could also be a badly slashed link on the devmanual...
Comment 1 Raymond Jennings 2016-10-06 21:37:43 UTC
Created attachment 449362 [details]
emerge --info
Comment 2 Mike Gilbert gentoo-dev 2016-10-06 22:11:40 UTC
I fixed a double-slash in the devmanual.

https://gitweb.gentoo.org/proj/devmanual.git/commit/?id=627ca55670862b7bcab101b9f2d30cd6f467e081

Giving this to base-system to address the wget issue.
Comment 3 Raymond Jennings 2016-10-07 09:32:27 UTC
double slashes aside, I filed this bug not just to get the devmanual fixed, but also because if wget is saving the file in the same spot either way it should also have caught that it was a duplicate.

Anything after the domain name (including port) is a file path and should be treated accordingly.
Comment 4 SpanKY gentoo-dev 2016-12-07 08:22:36 UTC
(In reply to Raymond Jennings from comment #3)

that is simply not true.  the remote side is free to interpret things however it likes, including treating of double slashes differently.

here's a live example.  these two URLs do not produce the same page:
https://www.gnu.org/software/autoconf/manual/
https://www.gnu.org/software/autoconf//manual/

i don't think there's a bug here.  your output didn't show an infinite loop, it showed fetching of the same set of resources.

of course, when wget operates in mirror mode, it makes assumptions about the behavior of slashes and files on disk.  there isn't a good answer here.  either way, this should go upstream if you want to pursue it.