Summary: | 2.2_rc2 --jobs breaks PMS's invariants | ||
---|---|---|---|
Product: | Portage Development | Reporter: | Łukasz Michalik <lpmichalik> |
Component: | Core | Assignee: | Portage team <dev-portage> |
Status: | RESOLVED WORKSFORME | ||
Severity: | normal | CC: | ciaran.mccreesh, m.debruijne, pms, spb |
Priority: | High | ||
Version: | 2.2 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
Łukasz Michalik
2008-07-26 10:40:06 UTC
(In reply to comment #0) > PMS 11.4 states that there shall be no variancy between pkg_setup and > pkg_postinst, however --jobs allows that to happen by performing merges of > other packages between those phases. Is it really necessary for the rule to be so strict? (In reply to comment #1) > Is it really necessary for the rule to be so strict? Yes, unfortunately it is. The way around it is to build binary packages in parallel, ensuring that only one pkg_setup is running at a time, and then install them linearly (rerunning pkg_setup as you would for binaries). That way none of the existing behaviours ebuilds expect are changed, and the only things that break are those that already break for binary packages. Can't you build something like qt and gtk in parallel and not have to worry about it? There are cases when you can do this and nothing too bad will happen, but unfortunately there is no reliable way to know what those cases are. In the general case, then, one has to assume that you can't. The algorithm that portage currently uses is to traverse the subgraph of deep dependencies of a given package and build it in parallel only when there are no merges scheduled within that subgraph. How about if we amend PMS 11.4 to allow for this? No go. Consider, for example, two plugins for the same program. They aren't interdependent in any way, yet there's still a parallelism constraint upon them. PMS is worded that way because it's the laxest set of rules possible that don't impose changes upon ebuild behaviour under parallelism. Any ebuild that breaks under the rules in PMS will also break when used as a binary package; this isn't the case if you start introducing changes to / between pkg_setup and pkg_preinst. I think the way portage does it is fine. Package a reads something in from ROOT in pkg_setup and sticks a modified version back to ROOT in post_preinst. Package b reads the same thing in from ROOT in pkg_setup and sticks a different modified version back in pkg_postinst. Previously this would work, even with binary packages. Now it won't. For examples of 'reads something in', consider things calling 'eselect opengl'. We haven't had any reports of problems caused by portage's current behavior. So? People haven't happened to have run something that, on their particular system under whatever load it is under, triggers the race yet. Or if they have, they've rerun the build and seen it mysteriously work, and gone no further. We're talking race conditions here, which means an annoying source of inconsistent, difficult to reproduce, very weird breakages. You don't fix them when someone comes across a consistent, reproducible problem. You design so they can't happen. Well, unless this is observable in practice then I don't care. It is. It's observable in practice by very weird, very hard to reproduce bugs. I believe that the risk is negligible. Know all those parallel make bugs that keep on biting people? Know how much of a pain in the arse they are? Exactly the same issue, except that the impact of a failure can be a lot worse than a broken build. I still haven't seen any hard evidence to support your claims. Comment #8 contains a full description of how there's a race condition. Which part of it don't you understand? Like I already said, I believe the risk is negligible. You can believe whatever you want, but that doesn't make it true. Or do you believe that praying that no-one will get hit by the bugs that you know exist is the correct way to design software? In any case, please update the documentation to say "There is a chance that using --jobs will completely break your system beyond any possibility of repair. The Portage authors believe it is a very small chance, and that everyone is lucky, so this probably won't happen." You can believe whatever you want but that doesn't make it true. I already explained to you how it's broken. You agree that it's broken, but think that the chances of anyone actually noticing the breakage are very small. So why not document that it's broken but that you think people will probably get away with it? Or better yet, why not just fix the thing? It's a simple change. Zac, are you serious? Ciaran points out a valid case where Portage can fail. Just because you don't want or don't know how to fix the bug, how ever small a chance it can trigger, doesn't mean it doesn't exist. Worksforme is an improper resolution. I contend that any problems can and should be fixed at the ebuild level. So how does an ebuild specify that its pkg_setup has to be run invariantly with pkg_preinst, and who is going to go through and add that to every ebuild that hasn't been proven to be safe? The invariance is a natural consequence of dependency handling as mentioned in comment #5. If an relevant variance occurs then it is due to an unspecified dependency. If and when such a case is discovered, an ebuild or one of the ebuilds in the subgraph of it's deep dependencies needs to be updated to specify the missing dependency. Note that problems with invariance can still occur due to unspecified dependencies, even if PMS 11.4 is strictly adhered to. So, adhering to PMS 11.4 doesn't really solve the root problem. It would only serve to hide some cases of variance and not others. In practice, the current approach used by portage (comment #5) has proven to be quite reliable. Given that there may still be variance issues even if PMS 11.4 is strictly adhered to, I see not reason not to continue using the approach that portage currently uses. We established in comment #6 that what you said in comment #5 is nonsense. And we're not aiming for 'quite reliable' here, we're aiming for 'doesn't break things'. Whilst sticking to what PMS allows doesn't quite guarantee that, not sticking to PMS is a lot worse. (In reply to comment #25) > We established in comment #6 that what you said in comment #5 is nonsense. What you've established in comment #6 is that some ebuilds may interact with the live file system in a way that is not compatible with the approach mentioned in comment #5. I'd like to see some specific examples of this so that we can make an attempt to fix them. If they can't be fixed for some reason, then we may have to restrict parallelization for those specific ebuilds. It doesn't seem worthwhile to adhere to PMS 11.4 when, as established in comment #24, it's not a cure-all as it only serves as a workaround for some poorly behaved ebuilds and won't help for some other poorly behaved ebuilds. (In reply to comment #25) > And > we're not aiming for 'quite reliable' here, we're aiming for 'doesn't break > things'. Whilst sticking to what PMS allows doesn't quite guarantee that, not > sticking to PMS is a lot worse. In practice the current approach used by portage has been shown to be 'a lot worse'. I appreciate that you're aiming to avoid breakage but I think you've missed the mark and chosen a sub-optimal solution. (In reply to comment #26) > In practice the current approach used by portage has been shown to be 'a lot > worse'. I appreciate that you're aiming to avoid breakage but I think you've > missed the mark and chosen a sub-optimal solution. s/has been/has not been/ |