Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 424385

Summary: app-portage/euscan should stop scanning when blocked by robots.txt
Product: Gentoo Linux Reporter: Justin Lecher (RETIRED) <jlec>
Component: Current packagesAssignee: Corentin Chary (RETIRED) <iksaif>
Status: RESOLVED INVALID    
Severity: normal    
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---

Description Justin Lecher (RETIRED) gentoo-dev 2012-07-01 12:59:21 UTC
 * Url 'http://www.kdau.com/files' blocked by robots.txt
 * Generating version from 1.2.0
 * Brute forcing: http://www.kdau.com/files/gelemental-${PV}.tar.bz2
 * Url 'http://www.kdau.com/files/gelemental-1.2.1.tar.bz2' blocked by robots.txt
 * Url 'http://www.kdau.com/files/gelemental-1.2.2.tar.bz2' blocked by robots.txt
 * Url 'http://www.kdau.com/files/gelemental-1.2.3.tar.bz2' blocked by robots.txt
 * Url 'http://www.kdau.com/files/gelemental-1.3.0.tar.bz2' blocked by robots.txt
 * Url 'http://www.kdau.com/files/gelemental-1.4.0.tar.bz2' blocked by robots.txt
 * Url 'http://www.kdau.com/files/gelemental-1.5.0.tar.bz2' blocked by robots.txt
 * Url 'http://www.kdau.com/files/gelemental-2.0.0.tar.bz2' blocked by robots.txt
 * Url 'http://www.kdau.com/files/gelemental-3.0.0.tar.bz2' blocked by robots.txt
 * Url 'http://www.kdau.com/files/gelemental-4.0.0.tar.bz2' blocked by robots.txt

Once the base URL is blocked, we can skip the rest, because it will be blocked too.
Comment 1 Corentin Chary (RETIRED) gentoo-dev 2012-07-02 11:53:17 UTC
Not always, "Disallow:" can be set only on a particular URL.

Anyway, it's almost free to print these lines since robots.txt is fetched only once, and before scanning an url we see if we are allowed to do so before starting the network request. The only drawback is the noise in the log...