Today I lost grub functionality because of an update world before. I guess that grub accessed its stage two by block address, and this changed in the update. To prevent this kind of error, I suggest adding /boot/grub to the CONFIG_PROTECT list by default.
If you have issues with grub, then post the error messages and other relevant information and describe your problems in more detail. Grub stages are not configuration files, which the CONFIG_PROTECT feature is designed for.
OK, error message looked somewhat like this: "GRUB 064 expected 032F". The numbers are pure imagination, I don't remember the exact values, but it looked very much like this, and I got no boot menu. Problem is, that I can't embed the appropriate stage 1.5 in my partition. Because of this, grub can't use this stage 1.5 at a fixed address to load a filesystem driver and find its stage 2 using the filesystem. Instead, the stage1 that is embedded in the boot record of this partition is patched with the block address of the stage2 file. So the stage2 file must not change its physical position. It seems like installing a new version did indeed change this physical location. Maybe "cat new_stage2 > /boot/grub/stage2" or something similar might actually work. Also note that some information like the location of the configuration file or the saved default bootmenu entry are saved in this stage2, so it makes sense to treat it as a configuration file, which should not automatically be replaced. It would be possible to just protect /boot/grub/stage2, if this mechanism supports protecting single files.
Which grub version are you using? Post the version and reopen then.
This was an update from grub-0.96-r1 to grub-0.96-r2.
OK, the information about the saved default being stored in stage2 is no longer correct, as I found out when examining my own occurence of bug 83287. It seems that stage2 is not changed after installation any more. Looking at the r2 ebuild, line 161 in fact addresses the issue, but does not solve it. People won't notice that the stage2.old is still used and either delete it or start wondering why re-merging the same package suddenly breaks things. As far as I can see, there is no way to figure out which stage2 is still in use if there are already two copies, so the only robust solution would be to use sequence numbers or something similar to keep an arbitrary number of stage2 files around. It would probably also be a good idea to issue a warning, to notify the user why there is more than one stage2 and under which condition the old one may be deleted. Although I myself usually never see such warnings in the middle of some "emerge -uD world" output...
Either keep all old stage2 files in /boot/grub or (as we install the files in /lib/grub and copy them during postinst to /boot/grub) only copy the files if they don't exist, else skip them and provide the user with a grub-copy-files script that does the copy upon request or advice users to use grub-install, which also performs the copying. I prefer the script/grub-install solution, because I think it's easier to maintain rather than having /boot/grub filled up with old stage2 files.
(In reply to comment #6) > Either keep all old stage2 files in /boot/grub or (as we install the files in > /lib/grub and copy them during postinst to /boot/grub) only copy the files if > they don't exist, else skip them and provide the user with a grub-copy-files > script that does the copy upon request or advice users to use grub-install, > which also performs the copying. We could advise users to run ebuild config and do the copying there, maybe that would be a cleaner Gentoo-like solution?
(In reply to comment #7) > We could advise users to run ebuild config and do the copying there, maybe that > would be a cleaner Gentoo-like solution? Yeah, thought about it, but I was thinking that config doesn't match the purpose for copying the files. As I think about it more you're right, I guess that's the cleanest and the best Gentoo like solution. Do the initial copying in postinst if the grub files don't exist in /boot/grub yet and have users run ebuild config to do the copying once they are ready to finish the grub update. This way they will stay at the same grub version that is already running and nothing can break from portage doing automatic things.
I'm for config protection, because running etc-update resp. dispatch-conf after updates is common practice, and an informational message to this effect is printed at the end of a bulk emerge, not after an indivual ebuild. I'm also for a generic approach instead of custom specific scripts. So my order of preference would be: 1. config protection 2. ebuild config 3. custom script Keep in mind that not everybody uses grub-install to setup grub in the first place.
We can't CONFIG_PROTECT /boot/grub, because the files are installed in /lib/grub and then copied to /boot/grub. This case isn't catched by CONFIG_PROTECT. And directly installing into /boot/grub breaks grub-install and isn't the way upstream designed it.
While I'm thinking about it, I could copy the files from /lib/grub to /boot/grub before the actual merge to the live filesystem takes place. This way we could CONFIG_PROTECT it. But this is ugly in my mind, because /boot isn't a place to check for updated configuration files. /boot/grub should be under total user control IMHO.
(In reply to comment #11) > But this is ugly in my mind, because /boot isn't a place > to check for updated configuration files. /boot/grub should be under total user > control IMHO. I guess CONFIG_PROTECT in /boot would produce more bugs (think confused users) than this feature would solve. And yes, it's really ugly and CONFIG_PROTECT is not really designed for the kind of data found in /boot/grub.
*** Bug 99548 has been marked as a duplicate of this bug. ***
isnt this the point of keeping older versions of grub around ? if you're having problems with a new grub, downgrade
we're not going to CONFIG_PROTECT the boot files
*** Bug 160365 has been marked as a duplicate of this bug. ***
(In reply to comment #16) > *** Bug 160365 has been marked as a duplicate of this bug. *** Um, I REALLY think some effort should be made to fix this. The current behavior is that, with not so much as an ewarn, the third time you emerge grub (but not the first two!), your system will, at some random time in the future, fail to boot. Not fixing such a bug is insane! I mean I'm getting used to seemingly innocuous upgrades breaking stuff, but with this kind of bug most users won't have a clue what's wrong, and some won't be able to fix it. It's far preferable to simply not touch /boot and instruct the user to install grub manually. (Who ever wants their grub updated, anyway?) I'm attaching an (untested) patch that renames the stage2 file in a way that doesn't delete files necessary for the system to boot. This will still cause breakage if the ABI between stage2 and later stages changes. We could either make it more automatic and make a config file telling the ebuild how to install stage1, or less automatic and leave /boot untouched, but the current design is badly broken.
Created attachment 105611 [details, diff] doesn't ever delete a stage2, warns the user (untested)
Ouch! This bit me again today, as I wanted to boot after yesterdays world update. Why exactly is this bug marked WONTFIX, instead of the patch from comment #18 or some similar solution being applied? I think a non-booting system without even a warning is severe enough to warrant some effort in order to avoid it.
Reopening for inclusion.
Ok, a variation on this patch is included now, as it's important for the -r5 upgrade.