Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 295194 - repoman should be able to check whether access to $HOMEPAGE is answered with status code 301
Summary: repoman should be able to check whether access to $HOMEPAGE is answered with ...
Status: CONFIRMED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Repoman (show other bugs)
Hardware: All All
: High enhancement (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords:
: 294791 (view as bug list)
Depends on: 295335
Blocks: 297028
  Show dependency tree
 
Reported: 2009-11-30 14:42 UTC by Martin Walch
Modified: 2014-05-18 13:08 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Walch 2009-11-30 14:42:37 UTC
About 9% of all website links (via http, https) in the official portage tree return status code 301 (Moved Permanently) when requesting that website. This status code is described in rfc 2616 (HTTP/1.1) p. 62:

> The requested resource has been assigned a new permanent URI and any
> future references to this resource SHOULD use one of the returned
> URIs.  Clients with link editing capabilities ought to automatically
> re-link references to the Request-URI to one or more of the new
> references returned by the server, where possible. This response is
> cacheable unless indicated otherwise.
> 
> The new permanent URI SHOULD be given by the Location field in the
> response. Unless the request method was HEAD, the entity of the
> response SHOULD contain a short hypertext note with a hyperlink to
> the new URI(s).

rfc 1945 (HTTP/1.0) and rfc 2068 (obsolete rfc for HTTP/1.1) say nearly the same.

The accurate numbers I measured were 1221 links in 13653 packages, some of them inherited from eclasses (e. g. php-ezc.eclass and emul-linux-x86). In this test, GET requests were sent, which - according to the protocol - should end up in the same result as HEAD requests. However, some servers behave wrongly by answering with code 200 (OK) on HEAD requests, while sending code 301 for GET requests.

Although some websites answer with 301 when 404 or 410 would be appropriate, samples show that most answers with status code 301 redirect to the correct website. Furthermore 3xx codes are sometimes mixed up. However, it only rarely happens that 301 is sent erroneously (like instead of 302).

I suggest making repoman check homepage links whenever a packet is modified or updated and to throw a warning whenever status code 301 is encountered, so the committing developer can manually check and - if it is appopriate - change the link.

Furthermore, I suggest a complete check for all packages once or twice a year.

Doing all this is a good thing, because
- rfc 2616 wants it this way (and standards are good, aren't they?)
- it avoids pretty much redirection (the homepage links are not only used by Gentoo users, because often they appear in Google's search results)
- it prevents some hyperlinks from dying (when the redirection disappears, and finding the new site manually might need more work)
- it is effective, as only few redirections are wrong (if a website has disappeared completely and redirection leads to some parked domain or similar, you problably want to keep the old link)
- it reduces bug tracker noise from people like me :)

All this can also be done for SRC_URI.

This suggestion evolved from bug #294791, where you can also find the code snippet I used for counting the redirections with status code 301 (the server behind homepage.mac.com seems to be broken, which makes the script hang. Kill (SIGKILL) the httping process to continue).
Comment 1 Sebastian Pipping gentoo-dev 2010-03-11 23:48:13 UTC
*** Bug 294791 has been marked as a duplicate of this bug. ***