Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 690438 - Allow bugzilla indexing by search engines
Summary: Allow bugzilla indexing by search engines
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Infrastructure
Classification: Unclassified
Component: Bugzilla (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Bugzilla Admins
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-22 12:42 UTC by Alexander Tsoy
Modified: 2019-07-22 20:34 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Tsoy 2019-07-22 12:42:39 UTC
According to [1] bugzilla should allow requests from bots/spiders. However all requests via https are disallowed:

$ curl https://bugs.gentoo.org/robots.txt
User-agent: *
Disallow: *
Disallow: /
Disallow: /index.cgi
Disallow: /show_bug.cgi
Disallow: /attachment.cgi
Disallow: /data/duplicates.rdf
Disallow: /data/cached/
Disallow: /query.cgi
Disallow: /enter_bug.cgi
Disallow: /userprefs.cgi
Disallow: /params.cgi
Disallow: /editsettings.cgi
Disallow: /report.cgi
Disallow: /request.cgi
Disallow: /votes.cgi
Disallow: /showdependencygraph.cgi
Disallow: /showdependencytree.cgi
Disallow: /data/webdot/
Disallow: /buglist.cgi
Disallow: /custom_buglist.cgi
Disallow: /show_activity.cgi

And here http:// for comparison:

$ curl http://bugs.gentoo.org/robots.txt
User-agent: *
Allow: /
Allow: /index.cgi
Allow: /show_bug.cgi
Allow: /attachment.cgi
Allow: /data/duplicates.rdf
Allow: /data/cached/
Disallow: /query.cgi
Disallow: /enter_bug.cgi
Disallow: /userprefs.cgi
Disallow: /params.cgi
Disallow: /editsettings.cgi
Disallow: /report.cgi
Disallow: /request.cgi
Disallow: /votes.cgi
Disallow: /showdependencygraph.cgi
Disallow: /showdependencytree.cgi
Disallow: /data/webdot/
Disallow: /buglist.cgi
Disallow: /custom_buglist.cgi
Disallow: /show_activity.cgi

Was this done intentional due to load on the bugzilla servers? If that is the case, then [1] should be updated.


[1] https://bugs.gentoo.org/bots.html
Comment 1 Larry the Git Cow gentoo-dev 2019-07-22 20:34:54 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/proj/gentoo-bugzilla.git/commit/?id=a4b5fb2ffbdffd35f6628739cb6e9c766c838ee1

commit a4b5fb2ffbdffd35f6628739cb6e9c766c838ee1
Author:     Robin H. Johnson <robbat2@gentoo.org>
AuthorDate: 2019-07-22 20:28:15 +0000
Commit:     Robin H. Johnson <robbat2@gentoo.org>
CommitDate: 2019-07-22 20:30:38 +0000

    robots.txt: allow more indexing
    
    A long time ago, web crawlers were discouraged from using SSL to crawl
    bugzilla, because of CPU load caused in SSL, and that it was a duplicate
    of results on the non-SSL side and the bots got that wrong.
    
    In the SSL-Everywhere world, with forced upgrades to SSL, and cheaper
    CPU, this was never revisited, and means that Gentoo Bugzilla isn't on
    Google. Re-enable the safer terms, and let the bots come in!
    
    Fixes: https://bugs.gentoo.org/show_bug.cgi?id=690438
    Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>

 robots-ssl.txt | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)