Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 279206 - metadata.dtd: make <herd> required attribute
Summary: metadata.dtd: make <herd> required attribute
Status: RESOLVED OBSOLETE
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: New packages (show other bugs)
Hardware: All Linux
: High QA (vote)
Assignee: Alec Warner
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 279145
  Show dependency tree
 
Reported: 2009-07-26 19:11 UTC by Thilo Bangert (RETIRED) (RETIRED)
Modified: 2011-09-23 09:26 UTC (History)
10 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
require a herd tag patch (metadata.diff,431 bytes, patch)
2010-09-15 07:16 UTC, Torsten Veller (RETIRED)
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Thilo Bangert (RETIRED) (RETIRED) gentoo-dev 2009-07-26 19:11:10 UTC
Hi,

your packages metadata.xml is missing a <herd> tag. Quoting from the Gentoo Developer Handbook, Chapter 4
http://www.gentoo.org/proj/en/devrel/handbook/handbook.xml?part=2&chap=4

<quote>
There must at least be one herd subtag. The contents of this tag must be the name of a herd as specified in the herds.xml file or the "no-herd" herd. It must occur at least once.
</quote>

Thanks.
Comment 1 Peter Volkov (RETIRED) gentoo-dev 2009-07-29 15:18:42 UTC
Fixed for psimedia.

But why don't we enforce this trough metadata.dtd? Like this:

--- /usr/portage/metadata/dtd/metadata.dtd.orig 2009-07-29 19:14:00.000000000 +0400
+++ /usr/portage/metadata/dtd/metadata.dtd      2009-07-29 19:14:08.000000000 +0400
@@ -5,7 +5,7 @@
 <!ATTLIST catmetadata pkgname CDATA "">

 <!-- Metadata for a package -->
-<!ELEMENT pkgmetadata ( (herd|maintainer|longdescription|use|upstream)* )>
+<!ELEMENT pkgmetadata ( herd, (maintainer|longdescription|use|upstream)* )>
 <!ATTLIST pkgmetadata pkgname CDATA "">
Comment 2 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2009-09-09 13:35:40 UTC
Infra isn't responsible for that file. I think neysx/docs are, as it's a copy of the dtds from the website.
Comment 3 Xavier Neys (RETIRED) gentoo-dev 2009-09-09 16:04:18 UTC
Let's just say I edit/commit DTD's that are not related to our doc system as a favour to fellow devs.
Comment 4 Xavier Neys (RETIRED) gentoo-dev 2009-09-09 16:52:32 UTC
1) it would be herd+

2) find . -name metadata.xml -exec xmllint --noout --valid {} \;

** Before : (2 invalid metadata.xml)
./gentoo-x86/dev-java/bytelist/metadata.xml:2: validity error : Validation failed: no DTD found !
<pkgmetadata>
            ^
./gentoo-x86/dev-java/jvyamlb/metadata.xml:2: validity error : Validation failed: no DTD found !
<pkgmetadata>
            ^

**After:

./gentoo-x86/sci-visualization/gfsview/metadata.xml:9: element pkgmetadata: validity error : Element pkgmetadata content does not follow the DTD, expecting (herd+ , (maintainer | longdescription | use | upstream)*), got (maintainer herd )
</pkgmetadata>
              ^
and 199 more errors ...

More than metadata.xml files without any <herd> tag
The reason is that you are forcing herd to appear first. Feel free to fix the specs and about 200 metadata.xml files
Comment 5 Peter Volkov (RETIRED) gentoo-dev 2009-09-26 14:47:47 UTC
Xavier, why do you want us fix "about 200 metadata.xml files" _first_ instead of changing  dtd and having fix appear automagically with next version bump. Really DTD should enforce policy not follow it. In other case we'll never get to the point where all metadata.xml files will follow policy.
Comment 6 Hans de Graaff gentoo-dev 2009-12-07 22:39:53 UTC
I think the issue that Xavier describes is that putting (herd+, (maintainer | longdescription | use | upstream)*) in the DTD has the effect that at least one herd tag is required, but it also has the side-effect of requiring the herd tag to be the first tag within pkgmetadata, and that affects many more metadata.xml files than just the ones without a herd tag. This happens because the comma describes a sequence that must be followed.

Comment 7 Hans de Graaff gentoo-dev 2009-12-07 22:50:27 UTC
The following patch seems to more or less work, although it really is a bit of a workaround given that all the <herd> elements must follow each other. I get 17 invalid metadata.xml files in the current tree with this, most of them without any <herd> tag, and one with a duplicate <herd> tag.

--- metadata.dtd.orig	2009-12-07 22:49:17.000000000 +0100
+++ metadata.dtd	2009-12-07 23:24:25.000000000 +0100
@@ -5,7 +5,8 @@
 <!ATTLIST catmetadata pkgname CDATA "">
 
 <!-- Metadata for a package -->
-<!ELEMENT pkgmetadata ( (herd|maintainer|longdescription|use|upstream)* )>
+<!ENTITY % optionals "(maintainer | longdescription | use | upstream)*">
+<!ELEMENT pkgmetadata (%optionals; , herd+ , %optionals;) >
 <!ATTLIST pkgmetadata pkgname CDATA "">
 
   <!-- One tag for each herd this package is assigned to. -->
Comment 8 Peter Volkov (RETIRED) gentoo-dev 2009-12-15 15:20:34 UTC
(In reply to comment #7)
> -<!ELEMENT pkgmetadata ( (herd|maintainer|longdescription|use|upstream)* )>
> +<!ENTITY % optionals "(maintainer | longdescription | use | upstream)*">
> +<!ELEMENT pkgmetadata (%optionals; , herd+ , %optionals;) >

This fails in case metadata.xml has herd , something, herd:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE pkgmetadata SYSTEM "http://www.gentoo.org/dtd/metadata.dtd">
<pkgmetadata>
    <herd>no-herd</herd>
    <maintainer>
        <email>maintainer-needed@gentoo.org</email>
        <name>Default assignee for orphaned packages</name>
    </maintainer>
    <herd>netmon</herd>
</pkgmetadata>

But, I'll highlight this question in -dev.
Comment 9 Hans de Graaff gentoo-dev 2009-12-15 15:50:40 UTC
(In reply to comment #8)

> This fails in case metadata.xml has herd , something, herd:

Yes, I know. It seems that DTDs just are not equipped to express that we want at least one herd elements, and we don't care about order. When I tested this on the main tree earlier I found only one package with a problem here, and that had a duplicate <herd>no-herd<//herd> element.

My preference would be to still make this change and have better checking but require that all <herd> elements are grouped together.
Comment 10 Hans de Graaff gentoo-dev 2010-01-03 13:25:34 UTC
Jonathan Callen (abdc) came up with a better approach on the mailing list:

<!-- Metadata for a package -->
<!ELEMENT pkgmetadata ((maintainer|longdescription|use|upstream)*,herd,(herd|maintainer|longdescription|use|upstream)*) >
<!ATTLIST pkgmetadata pkgname CDATA "">


This enforces that there is at least one herd element, but its location and the location of the other elements is not important. I've just ran this version against the whole tree and only got one (correct) error:

find . -name metadata.xml -exec xmllint --noout --valid {} \;
./dev-libs/luaexpat/metadata.xml:8: element pkgmetadata: validity error : Element pkgmetadata content does not follow the DTD, expecting ((maintainer | longdescription | use | upstream)* , herd , (herd | maintainer | longdescription | use | upstream)*), got (maintainer )
</pkgmetadata>
Comment 11 Petteri Räty (RETIRED) gentoo-dev 2010-03-07 06:23:01 UTC
(In reply to comment #10)
> 
> 
> This enforces that there is at least one herd element, but its location and the
> location of the other elements is not important. I've just ran this version
> against the whole tree and only got one (correct) error:
> 

neysx: can we get this committed?
Comment 12 Peter Volkov (RETIRED) gentoo-dev 2010-03-09 19:08:35 UTC
Petteri, thread on -dev suggests that we don't need make things complex:
http://archives.gentoo.org/gentoo-dev/msg_808eb0a624a84a659c6d5876e9a0dc5e.xml

And this means that somebody needs to write script to fix the tree first.

The only problem with this approach is that some teams want ordering (to put first item into Assigned to field in bugzilla). But since Bug wranglers do not use this ordering I don't see any reason to keep it and it should be added somehow differently together with automatic bug assignment.
Comment 13 Petteri Räty (RETIRED) gentoo-dev 2010-03-10 12:39:50 UTC
(In reply to comment #12)
> Petteri, thread on -dev suggests that we don't need make things complex:
> http://archives.gentoo.org/gentoo-dev/msg_808eb0a624a84a659c6d5876e9a0dc5e.xml
> 
> And this means that somebody needs to write script to fix the tree first.
> 

In comment 10 Hans says the tree is already in order.
Comment 14 Peter Volkov (RETIRED) gentoo-dev 2010-03-12 06:06:43 UTC
(In reply to comment #13)
> In comment 10 Hans says the tree is already in order.
 
And that comment suggests:

<!ELEMENT pkgmetadata
((maintainer|longdescription|use|upstream)*,herd,(herd|maintainer|longdescription|use|upstream)*)
>

which is not what we want.
Comment 15 Petteri Räty (RETIRED) gentoo-dev 2010-03-12 11:22:57 UTC
(In reply to comment #14)
> (In reply to comment #13)
> 
> which is not what we want.
> 

What do we want then? As far as I see it makes <herd> a required attribute which is the point of this bug. Forced ordering etc is a different issue. One issue per bug. This can be committed without changing the tree. When the tree is in order can we force the order if that's wanted.
Comment 16 Hans de Graaff gentoo-dev 2010-03-14 11:46:37 UTC
+1 on committing this fix.

As for the ordering for bug assignment, I'm not sure that can/needs to be enforced by the XML checks anyway, since you still need the freedom to have a herd or a maintainer listed as the first contact. Putting the elements in a specific place in metadata.xml doesn't help with this other than putting in a restriction that doesn't help.
Comment 17 Petteri Räty (RETIRED) gentoo-dev 2010-07-13 13:29:02 UTC
CCing a bunch of people with access to the file. Maybe we should rationalize the people who maintain this stuff or is there some kind of rationalize to the current set of people?

betelgeuse@corvid /var/cvsroot/gentoo/xml/htdocs/dtd $ grep cvsxsl /etc/group                                                                        cvsxsl:!:3018:klieber,pauldv,rajiv,blackace,neysx,solar,robbat2,nightmorph,dabbott 
Comment 18 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2010-07-13 13:40:45 UTC
(In reply to comment #17)
> cvsxsl:!:3018:klieber,pauldv,rajiv,blackace,neysx,solar,robbat2,nightmorph,dabbott 
Lots of those are retired, we should remove them. The XSL was locked tightly because breaking it breaks multiple Gentoo websites and parts of tree building.
That said, if we can ensure changes won't break stuff, more people DO need to be able to change those files. Propose a list of people to add please.

(And lets make sure ALL of the tree has valid metadata.xml with what the DTD changes will be before the DTD change is committed).


Comment 19 Petteri Räty (RETIRED) gentoo-dev 2010-07-13 14:33:33 UTC
(In reply to comment #18)
> (In reply to comment #17)
> > cvsxsl:!:3018:klieber,pauldv,rajiv,blackace,neysx,solar,robbat2,nightmorph,dabbott 
> Lots of those are retired, we should remove them. 

Opened bug 328111 for the removal.
Comment 20 nm (RETIRED) gentoo-dev 2010-07-13 17:06:47 UTC
Come up with a current patch, and I'll be happy to apply it. Thanks.
Comment 21 Jorge Manuel B. S. Vicetto Gentoo Infrastructure gentoo-dev 2010-07-13 17:35:40 UTC
(In reply to comment #18)
> (In reply to comment #17)
> > cvsxsl:!:3018:klieber,pauldv,rajiv,blackace,neysx,solar,robbat2,nightmorph,dabbott 
> Lots of those are retired, we should remove them. The XSL was locked tightly
> because breaking it breaks multiple Gentoo websites and parts of tree building.
> That said, if we can ensure changes won't break stuff, more people DO need to
> be able to change those files. Propose a list of people to add please.

We'll proceed with the clean-up in the bug Petteri opened. I'll check with infra if I have enough privileges to do it or not.

> (And lets make sure ALL of the tree has valid metadata.xml with what the DTD
> changes will be before the DTD change is committed).

I've gone through the output of the retirement script and fixed all metadata.xml files missing a herd tag or without a proper herd - <herd></herd>.
I can run the script again before we're ready to commit the dtd change to ensure no broken file is in the tree.
Comment 22 Matthew Kasa (RETIRED) gentoo-dev 2010-07-13 19:18:44 UTC
(In reply to comment #17)
> CCing a bunch of people with access to the file. Maybe we should rationalize
> the people who maintain this stuff or is there some kind of rationalize to the
> current set of people?

I think the rationale is those people are the ones who know what they're doing when it comes to changing xsl and dtd stuff without breaking things :)
Comment 23 Mark Loeser (RETIRED) gentoo-dev 2010-09-14 18:30:56 UTC
This seems easy enough to fix.  I'll commit it if the correct access is given.

Jorge: Is the tree currently in line with what the DTD is going to be validating?
Comment 24 Torsten Veller (RETIRED) gentoo-dev 2010-09-14 21:51:19 UTC
(In reply to comment #23)
> Jorge: Is the tree currently in line with what the DTD is going to be
> validating?

All metadata.xml files in gentoo remain valid with the change in comment #10
Comment 25 Peter Volkov (RETIRED) gentoo-dev 2010-09-15 05:46:46 UTC
(In reply to comment #24)
> (In reply to comment #23)
> > Jorge: Is the tree currently in line with what the DTD is going to be
> > validating?
> 
> All metadata.xml files in gentoo remain valid with the change in comment #10

But as was discussed on mailing list it is durty hack... It's much more clear to force ordering and add attribute for set order for bug-wranglers.
Comment 26 solar (RETIRED) gentoo-dev 2010-09-15 06:00:54 UTC
Should this go via the council?
Comment 27 Peter Volkov (RETIRED) gentoo-dev 2010-09-15 06:06:33 UTC
(In reply to comment #26)
> Should this go via the council?

If you wish. But since solution that suites everybody exists (but is not implemented yet) I don't see how council may help here.

Comment 28 Petteri Räty (RETIRED) gentoo-dev 2010-09-15 06:16:42 UTC
(In reply to comment #25)
> 
> But as was discussed on mailing list it is durty hack... It's much more clear
> to force ordering and add attribute for set order for bug-wranglers.
> 

It might not be the ideal solution but it's still better than what we currently have.
Comment 29 Peter Volkov (RETIRED) gentoo-dev 2010-09-15 06:42:52 UTC
Don't get me wrong. I'm not against applying solution in #10. I'm just telling that we had different resolution. May be it's good idea to apply it, keep this bug open and add reference on it in comments in dtd.

BTW, do I understand correctly that nobody is responsible for xml/dtd(s)? Looks like this is the real problem here.
Comment 30 nm (RETIRED) gentoo-dev 2010-09-15 07:00:31 UTC
(In reply to comment #20)
> Come up with a current patch, and I'll be happy to apply it. Thanks.
> 

Again. This. ^. Whatever it is. If it's what you ebuild devs want and have all agreed upon, then just gimme a patch and I'll do it.
Comment 31 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2010-09-15 07:12:20 UTC
Is <herd /> valid?
Comment 32 Torsten Veller (RETIRED) gentoo-dev 2010-09-15 07:16:48 UTC
Created attachment 247423 [details, diff]
require a herd tag patch

This is the patch under discussion.
Comment 33 nm (RETIRED) gentoo-dev 2010-09-15 09:07:52 UTC
(In reply to comment #31)
> Is <herd /> valid?
This is XML; it'd be <herd/>, no space. Only HTML uses spaces. That's if the rest of the devs think it's okay to have an empty herd tag in the first place.
Comment 34 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2010-09-15 09:09:10 UTC
I think <herd/> looks better than <herd>no-herd</herd>.
Comment 35 Diego Elio Pettenò (RETIRED) gentoo-dev 2010-09-15 09:16:43 UTC
I'm quite sure that XML allows whitespace after <tagname, so

<herd/>
<herd />
<herd      />
<herd
/>

are all valid as well as

<herd   >foo</herd>
Comment 36 Diego Elio Pettenò (RETIRED) gentoo-dev 2010-09-15 09:19:30 UTC
For reference http://www.w3.org/TR/2008/REC-xml-20081126/#sec-starttags

STag	   ::=   	'<' Name (S Attribute)* S? '>'
ETag	   ::=   	'</' Name S? '>'
EmptyElemTag	   ::=   	'<' Name (S Attribute)* S? '/>'
S	   ::=   	(#x20 | #x9 | #xD | #xA)+

Comment 37 nm (RETIRED) gentoo-dev 2010-09-15 09:28:22 UTC
It's not part of our XML code style. It's unnecessary and just adds more bytes, times many thousands, to our tree. The less cruft shoved into metadata files, the better.
Comment 38 Peter Volkov (RETIRED) gentoo-dev 2010-09-15 09:36:08 UTC
(In reply to comment #34)
> I think <herd/> looks better than <herd>no-herd</herd>.

Last time it was discussed on -dev we've decided to use no-herd. Also this is documented in our policy:

http://devmanual.gentoo.org/ebuild-writing/misc-files/metadata/index.html
"If no herd is suitable, no-herd should be used"

If necessary I can grep archives to find exact discussion but as things stand now we are not allowed to use <herd/>
Comment 39 Diego Elio Pettenò (RETIRED) gentoo-dev 2010-09-15 09:41:14 UTC
Project pages already don't follow "our" (or rather, your) code style. And we're talking about metadata.xml files that are touched by a good part of devs who haven't even ever read your code style atall

And more importantly for compression and de-duplication purposes it is to have _the same format_, not saving one byte. We currently have less than 15K metadata.xml files, of which most will not use the old syntax; adding a single whitespace is not going to make much of a difference. Especially since it's not going to change the disk space used by block size.

But rather than creating false pretense of "only HTML allows that", and start the mirror climbing on that, can we agree on the idea? I agree with Robin that having <herd />, <herd/> or <herd    /> is much better than having no-herd in there, given that it's not really a herd...

But more importantly, why this stuff is being discussed here and not on -dev?
Comment 40 Jorge Manuel B. S. Vicetto Gentoo Infrastructure gentoo-dev 2010-11-02 19:40:06 UTC
Josh,

sorry for pushing this one over you, but as you expressed a willingness to apply a patch, I'm assigning this to you.
Comment 41 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2011-09-23 09:26:58 UTC
Per bug #382929, we did the other way.