Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 328113 - repoman should complain if metadata.xml doesn't refer to dtd
Summary: repoman should complain if metadata.xml doesn't refer to dtd
Status: RESOLVED FIXED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Repoman (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords: InVCS
Depends on:
Blocks: 459934
  Show dependency tree
 
Reported: 2010-07-13 14:55 UTC by Dirkjan Ochtman (RETIRED)
Modified: 2015-08-25 01:08 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dirkjan Ochtman (RETIRED) gentoo-dev 2010-07-13 14:55:10 UTC
There were a bunch of metadata.xml files in the tree without the DTD reference. This in turn makes the validation mostly useless (since it only checks wellformedness).
Comment 1 Sebastian Luther (few) 2010-07-13 16:58:39 UTC
What should repoman check for?

I did a grep for the first two lines in all metadata.xml files. The following lines appear in at least one file as one of the first two lines.

empty line
<!DOCTYPE catmetadata SYSTEM "http://www.gentoo.org/dtd/metadata.dtd">
<!DOCTYPE pkgmetadata SYSTEM "http://www.gentoo.org/dtd/metadata.dtd">
<!DOCTYPE pkgmetadata SYSTEM "http://www.gentoo.org/dtd/metadata.dtd">
<!DOCTYPE pkgmetadata SYSTEM "http://www.gentoo.org/dtd/metadata.dtd">
<?xml version = '1.0' encoding = 'UTF-8'?>
<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0"encoding="UTF-8"?> 
<?xml version="1.0" encoding="utf-8"?>
<?xml version="1.0"?>
<?xml version='1.0' encoding='UTF-8'?>
<pkgmetadata>

What is valid? In which order? etc.
Comment 2 Fabian Groffen gentoo-dev 2010-07-13 17:05:22 UTC
The first line should be the <?xml one to identify it as an XML file.
Comment 3 Sebastian Luther (few) 2010-07-13 18:14:48 UTC
I have a patch that uses the following two regular expressions to match the first resp. the second line.

metadata_xml_header_re = re.compile("^<\?xml [^>]+>$")
metadata_xml_dtd_re = re.compile("^<!DOCTYPE [^ ]+ SYSTEM \"" + metadata_dtd_uri + "\">$")

Let me know if you want something changed.
Comment 4 Fabian Groffen gentoo-dev 2010-07-13 18:25:51 UTC
you probably want to match something such that you get
- warning for trailing whitespace (just style)
- a '<?xml version="1.0" encoding="UTF-8"?>' header, where you can barf at whitespace, lowercase, or use of single quotes (just style)
- a '<!DOCTYPE pkgmetadata SYSTEM "http://www.gentoo.org/dtd/metadata.dtd">' line
  * verify that if pkgmetadata is in there, there is also a <pkgmetadata> tag, and the metadata.xml file is in a package location (2nd dir?)
  * verify that is catmetadata is in there, there is also a <catmetadata> tag, and the metadata.xml file is in a category location (1st dir?)
  * verify that the url points to 'http://www.gentoo.org/dtd/metadata.dtd'

leave the rest to the xml validator

My €0.02 :)
Comment 5 Fabian Groffen gentoo-dev 2010-07-13 18:26:46 UTC
(In reply to comment #4)
> you probably want to match something such that you get
> - warning for trailing whitespace (just style)

I meant leading whitespace before the <?xml thing
Comment 6 Fabian Groffen gentoo-dev 2010-07-13 18:30:39 UTC
And verify that the file is actually UTF-8 encoded.  Would be handy for ChangeLog files too, as some devs still don't have UTF-8 fully sorted out.
Comment 7 dwfreed 2013-02-24 07:25:06 UTC
(In reply to comment #4)
>   * verify that if pkgmetadata is in there, there is also a <pkgmetadata>
> tag, and the metadata.xml file is in a package location (2nd dir?)
>   * verify that is catmetadata is in there, there is also a <catmetadata>
> tag, and the metadata.xml file is in a category location (1st dir?)

These particular checks would have been helpful to prevent bug 458054 from happening.  The other checks mentioned in comment 4 also have merit, as package metadata.xml's are still inconsistent regarding the first three lines.  I used the following command against the tree to check this:

for i in */*/metadata.xml; do cat -vETn $i | head -n 3; done | sort -u | less

Category metadata.xml's are at least consistent here, but checks to ensure that it stays that way are always a good idea.  From a consistency standpoint, I'd suggest requiring that the first three lines match exactly to the following:

For category metadata.xml:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE catmetadata SYSTEM "http://www.gentoo.org/dtd/metadata.dtd">
<catmetadata>

For package metadata.xml:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE pkgmetadata SYSTEM "http://www.gentoo.org/dtd/metadata.dtd">
<pkgmetadata>

If it doesn't match, throw a warning.  Just my thoughts on the subject.
Comment 8 Zac Medico gentoo-dev 2013-02-25 05:09:43 UTC
We already use xml.etree.ElementTree.parse() to parse the metadata.xml, so if we could use that to validate it, then that would be ideal.
Comment 11 Zac Medico gentoo-dev 2013-03-03 19:10:30 UTC
This is fixed in 2.1.11.54 and 2.2.0_alpha165.