The Gentoo handbook comes as .xml file but is no valid XML. Lots of closing tags are omited. for example in head, where link tags are not closed with </link> or <link .../>. Sidenote: I would have assumed to find a xhtml 1.1 compliant document in an .xml file. This creates a problem when you try to save the file to disk and open it via Konqueror or Firefox. Both will complain about these XML errors. Reproducible: Always Steps to Reproduce:
Nope. Please read the actual rendered source code: !DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd The files just have a .xml extension -- the original source files that we work with are XML, not xhtml or html. They are rendered in html 4.0 transition later on; it's a server thing. Not a bug; closing.
It seems that both Firefox and Konqueror care very little about what the DOCTYPE says. The logic they seem to follow is probably: If it comes in an .xml file it must be XML. And that does not sound that stupid... I'd say if you are unable to produce valid XML, you should not name the file .xml, but .html.
Just to make it obvious: The W3C Validator says This Page Is Valid HTML 4.01 Transitional!" and it also "validates as CSS level 2.1". http://tinyurl.com/67m65k & http://tinyurl.com/6prxpf The XML seems to be valid: $ xmllint --valid --noout doc/en/handbook/handbook-sparc.xml $
I dont know which file you ran xmllint on, but here it says loads of: handbook-sparc.xml:15: parser error : Opening and ending tag mismatch: link line 11 and head </head> ^ That is exactly the issue I described: Omited end tags, like </link> or <link .../>. I agree that the content is valid HTML4, but the filetype is XML and not HTML.
(In reply to comment #4) > I dont know which file you ran xmllint on A fresh CVS checkout of the XML source file, directly from the Gentoo repository. My guess is you're trying to use xmllint on the file you got by saving the page from your browser. This is a HTML file generated by the web server from the XML source file, not a XML file, as Josh pointed out.
It is exactly the file I was talking about all the time... It is also the one which makes both Firefox and Konqueror bail.
(In reply to comment #6) > It is exactly the file I was talking about all the time... > It is also the one which makes both Firefox and Konqueror bail. It's the browser's fault if it's saving it as something stupid, not the valid generated content (and valid XML source). What you need to do is go to "View --> View Source", then go to "File --> Save As" and rename the thing to whatever.html. Not Gentoo's problem.
Saved via wget, not via browser. Same issue.
Try wget http://www.gentoo.org/doc/en/handbook/handbook-sparc.xml?style=printable&full=1&passthru=1
On the server side, we're right in using the extension we want since the MIME type is correct (text/html). The issue is client side, since the MIME type disappears when the file is saved on disk. Then, the client which wants to read the file thinks it is an XML file since the extension is .xml. We can solve this issue server-side by using the Content-Disposition HTTP header (http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.5.1), which allows the server to suggest a filename to the client. Gorg could then s/.xml$/.html/ the filename so the browser (or wget) save the file using the right extension (.html). My 2 cents.
(In reply to comment #2) > It seems that both Firefox and Konqueror care very little about what the > DOCTYPE says. The logic they seem to follow is probably: If it comes in an .xml > file it must be XML. And that does not sound that stupid... Actually, it does sound stupid. What you request is a valid XML file, what you get is content whose type is given by the mime type header. The name of an URL is irrelevant to the type of served content. > I'd say if you are unable to produce valid XML, you should not name the file > .xml, but .html. It IS xml. Would you expect to get some executable PHP code when hitting an URL that ends with .php? Probably not. (In reply to comment #4) > I agree that the content is valid HTML4, but the filetype is XML and not HTML. Again, what you hit IS an XML file. Served content can be HTML, XHTML, text,... depending on resource, parameters and server. That's why there is a mime type in the http header. (In reply to comment #8) > Saved via wget, not via browser. Same issue. `man wget` look for -E (In reply to comment #10) > On the server side, we're right in using the extension we want since the MIME > type is correct (text/html). cam++ > The issue is client side, since the MIME type disappears when the file is saved > on disk. Then, the client which wants to read the file thinks it is an XML file > since the extension is .xml. DOS is back :) > We can solve this issue server-side by using the Content-Disposition HTTP > header (http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.5.1), > which allows the server to suggest a filename to the client. > Gorg could then s/.xml$/.html/ the filename so the browser (or wget) save the > file using the right extension (.html). When the result is actually HTML. * Please file an improvement bug for gorg. * I'll take care of it later this year. Almost gone on holiday, still couting down work days, time available this week expired by answering this bug. Not even home BTW.