Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 231334 - Gentoo handbook is no valid XML
Summary: Gentoo handbook is no valid XML
Status: RESOLVED INVALID
Alias: None
Product: [OLD] Docs on www.gentoo.org
Classification: Unclassified
Component: Installation Handbook (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Docs Team
URL: http://www.gentoo.org/doc/en/handbook...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-07-09 20:15 UTC by Dennis Schridde
Modified: 2008-07-10 11:06 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dennis Schridde 2008-07-09 20:15:39 UTC
The Gentoo handbook comes as .xml file but is no valid XML.
Lots of closing tags are omited. for example in head, where link tags are not closed with </link> or <link .../>.
Sidenote: I would have assumed to find a xhtml 1.1 compliant document in an .xml file.

This creates a problem when you try to save the file to disk and open it via Konqueror or Firefox. Both will complain about these XML errors.

Reproducible: Always

Steps to Reproduce:
Comment 1 nm (RETIRED) gentoo-dev 2008-07-09 21:26:12 UTC
Nope. Please read the actual rendered source code:
!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd

The files just have a .xml extension -- the original source files that we work with are XML, not xhtml or html. They are rendered in html 4.0 transition later on; it's a server thing.

Not a bug; closing.
Comment 2 Dennis Schridde 2008-07-09 21:35:51 UTC
It seems that both Firefox and Konqueror care very little about what the DOCTYPE says. The logic they seem to follow is probably: If it comes in an .xml file it must be XML. And that does not sound that stupid...
I'd say if you are unable to produce valid XML, you should not name the file .xml, but .html.
Comment 3 Thomas Faucher 2008-07-09 21:42:42 UTC
Just to make it obvious:

The W3C Validator says This Page Is Valid HTML 4.01 Transitional!" and it also "validates as CSS level 2.1".
http://tinyurl.com/67m65k & http://tinyurl.com/6prxpf

The XML seems to be valid:
$ xmllint --valid --noout doc/en/handbook/handbook-sparc.xml
$
Comment 4 Dennis Schridde 2008-07-09 21:57:27 UTC
I dont know which file you ran xmllint on, but here it says loads of:
handbook-sparc.xml:15: parser error : Opening and ending tag mismatch: link line 11 and head
</head>
       ^

That is exactly the issue I described: Omited end tags, like </link> or <link .../>.

I agree that the content is valid HTML4, but the filetype is XML and not HTML.
Comment 5 Thomas Faucher 2008-07-09 22:04:54 UTC
(In reply to comment #4)
> I dont know which file you ran xmllint on

A fresh CVS checkout of the XML source file, directly from the Gentoo repository.

My guess is you're trying to use xmllint on the file you got by saving the page from your browser. This is a HTML file generated by the web server from the XML source file, not a XML file, as Josh pointed out.
Comment 6 Dennis Schridde 2008-07-09 22:25:33 UTC
It is exactly the file I was talking about all the time...
It is also the one which makes both Firefox and Konqueror bail.
Comment 7 nm (RETIRED) gentoo-dev 2008-07-10 06:09:01 UTC
(In reply to comment #6)
> It is exactly the file I was talking about all the time...
> It is also the one which makes both Firefox and Konqueror bail.

It's the browser's fault if it's saving it as something stupid, not the valid generated content (and valid XML source).

What you need to do is go to "View --> View Source", then go to "File --> Save As" and rename the thing to whatever.html.

Not Gentoo's problem.
Comment 8 Dennis Schridde 2008-07-10 08:50:56 UTC
Saved via wget, not via browser. Same issue.
Comment 10 Camille Huot (RETIRED) gentoo-dev 2008-07-10 09:58:48 UTC
On the server side, we're right in using the extension we want since the MIME type is correct (text/html).
The issue is client side, since the MIME type disappears when the file is saved on disk. Then, the client which wants to read the file thinks it is an XML file since the extension is .xml.
We can solve this issue server-side by using the Content-Disposition HTTP header (http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.5.1), which allows the server to suggest a filename to the client.
Gorg could then s/.xml$/.html/ the filename so the browser (or wget) save the file using the right extension (.html).

My 2 cents.
Comment 11 Xavier Neys (RETIRED) gentoo-dev 2008-07-10 11:06:07 UTC
(In reply to comment #2)
> It seems that both Firefox and Konqueror care very little about what the
> DOCTYPE says. The logic they seem to follow is probably: If it comes in an .xml
> file it must be XML. And that does not sound that stupid...

Actually, it does sound stupid.
What you request is a valid XML file, what you get is content whose type is given by the mime type header. The name of an URL is irrelevant to the type of served content.

> I'd say if you are unable to produce valid XML, you should not name the file
> .xml, but .html.

It IS xml. Would you expect to get some executable PHP code when hitting an URL that ends with .php? Probably not.

(In reply to comment #4)
> I agree that the content is valid HTML4, but the filetype is XML and not HTML.

Again, what you hit IS an XML file.
Served content can be HTML, XHTML, text,... depending on resource, parameters and server. That's why there is a mime type in the http header. 

(In reply to comment #8)
> Saved via wget, not via browser. Same issue.

`man wget` look for -E

(In reply to comment #10)
> On the server side, we're right in using the extension we want since the MIME
> type is correct (text/html).

cam++

> The issue is client side, since the MIME type disappears when the file is saved
> on disk. Then, the client which wants to read the file thinks it is an XML file
> since the extension is .xml.

DOS is back :)

> We can solve this issue server-side by using the Content-Disposition HTTP
> header (http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.5.1),
> which allows the server to suggest a filename to the client.
> Gorg could then s/.xml$/.html/ the filename so the browser (or wget) save the
> file using the right extension (.html).

When the result is actually HTML.
* Please file an improvement bug for gorg. *
I'll take care of it later this year.

Almost gone on holiday, still couting down work days, time available this week expired by answering this bug. Not even home BTW.