Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 336488 - net-misc/wget: Wget fails to honor robots.txt crawl-delay directive
Summary: net-misc/wget: Wget fails to honor robots.txt crawl-delay directive
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: High enhancement (vote)
Assignee: Gentoo Linux bug wranglers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-08 20:53 UTC by Raymond Jennings
Modified: 2010-09-09 17:05 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Raymond Jennings 2010-09-08 20:53:51 UTC
I ran a wget on localhost to test my robots.txt directives, and wget failed to wait as instructed.

I included no command line switches related to waiting.

I checked with #wget on freenode and they confirm that waiting as directed is the proper default behavior.

Reproducible: Always

Steps to Reproduce:
1.  wget a site whose robots.txt specifies a crawl delay.
2.
3.

Actual Results:  
downloads like mad

Expected Results:  
It should wait
Comment 1 Raymond Jennings 2010-09-08 21:06:20 UTC
Turns out upstream chatroom misunderstood me.

It's not yet a feature, but the dev says "it should be"

Downgrading to enhancement.
Comment 2 Raymond Jennings 2010-09-08 21:23:19 UTC
Upstream has been notified here:

https://savannah.gnu.org/bugs/index.php?30999
Comment 3 Jeroen Roovers (RETIRED) gentoo-dev 2010-09-09 17:05:37 UTC
Nothing we can do but wait.