Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 98768 - grub stages broken after upgrade
Summary: grub stages broken after upgrade
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: High normal (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
: 99548 160365 (view as bug list)
Depends on:
Blocks:
 
Reported: 2005-07-12 05:36 UTC by Martin von Gagern
Modified: 2008-03-30 17:22 UTC (History)
5 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
doesn't ever delete a stage2, warns the user (untested) (grub-dont-delete-stage2.diff,849 bytes, patch)
2007-01-06 10:09 UTC, Tom Felker
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martin von Gagern 2005-07-12 05:36:57 UTC
Today I lost grub functionality because of an update world before. I guess that grub accessed its stage two by block address, and this changed in the update. To prevent this kind of error, I suggest adding /boot/grub to the CONFIG_PROTECT list by default.
Comment 1 Jakub Moc (RETIRED) gentoo-dev 2005-07-12 05:49:48 UTC
If you have issues with grub, then post the error messages and other relevant
information and describe your problems in more detail. Grub stages are not
configuration files, which the CONFIG_PROTECT feature is designed for.
Comment 2 Martin von Gagern 2005-07-12 06:22:11 UTC
OK, error message looked somewhat like this: "GRUB 064 expected 032F". The
numbers are pure imagination, I don't remember the exact values, but it looked
very much like this, and I got no boot menu.

Problem is, that I can't embed the appropriate stage 1.5 in my partition.
Because of this, grub can't use this stage 1.5 at a fixed address to load a
filesystem driver and find its stage 2 using the filesystem. Instead, the stage1
that is embedded in the boot record of this partition is patched with the block
address of the stage2 file. So the stage2 file must not change its physical
position. It seems like installing a new version did indeed change this physical
location. Maybe "cat new_stage2 > /boot/grub/stage2" or something similar might
actually work.

Also note that some information like the location of the configuration file or
the saved default bootmenu entry are saved in this stage2, so it makes sense to
treat it as a configuration file, which should not automatically be replaced. It
would be possible to just protect /boot/grub/stage2, if this mechanism supports
protecting single files.
Comment 3 Jakub Moc (RETIRED) gentoo-dev 2005-07-12 06:31:55 UTC
Which grub version are you using? Post the version and reopen then.
Comment 4 Martin von Gagern 2005-07-13 00:14:49 UTC
This was an update from grub-0.96-r1 to grub-0.96-r2.
Comment 5 Martin von Gagern 2005-07-15 10:23:24 UTC
OK, the information about the saved default being stored in stage2 is no longer
correct, as I found out when examining my own occurence of bug 83287. It seems
that stage2 is not changed after installation any more.

Looking at the r2 ebuild, line 161 in fact addresses the issue, but does not
solve it. People won't notice that the stage2.old is still used and either
delete it or start wondering why re-merging the same package suddenly breaks
things. As far as I can see, there is no way to figure out which stage2 is still
in use if there are already two copies, so the only robust solution would be to
use sequence numbers or something similar to keep an arbitrary number of stage2
files around.

It would probably also be a good idea to issue a warning, to notify the user why
there is more than one stage2 and under which condition the old one may be
deleted. Although I myself usually never see such warnings in the middle of some
"emerge -uD world" output...
Comment 6 Sven Wegener gentoo-dev 2005-07-15 17:03:10 UTC
Either keep all old stage2 files in /boot/grub or (as we install the files in
/lib/grub and copy them during postinst to /boot/grub) only copy the files if
they don't exist, else skip them and provide the user with a grub-copy-files
script that does the copy upon request or advice users to use grub-install,
which also performs the copying.

I prefer the script/grub-install solution, because I think it's easier to
maintain rather than having /boot/grub filled up with old stage2 files.
Comment 7 Jakub Moc (RETIRED) gentoo-dev 2005-07-16 05:42:41 UTC
(In reply to comment #6)
> Either keep all old stage2 files in /boot/grub or (as we install the files in
> /lib/grub and copy them during postinst to /boot/grub) only copy the files if
> they don't exist, else skip them and provide the user with a grub-copy-files
> script that does the copy upon request or advice users to use grub-install,
> which also performs the copying.

We could advise users to run ebuild config and do the copying there, maybe that
would be a cleaner Gentoo-like solution?
Comment 8 Sven Wegener gentoo-dev 2005-07-16 07:23:34 UTC
(In reply to comment #7)
> We could advise users to run ebuild config and do the copying there, maybe that
> would be a cleaner Gentoo-like solution?

Yeah, thought about it, but I was thinking that config doesn't match the purpose
for copying the files. As I think about it more you're right, I guess that's the
cleanest and the best Gentoo like solution. Do the initial copying in postinst
if the grub files don't exist in /boot/grub yet and have users run ebuild config
to do the copying once they are ready to finish the grub update. This way they
will stay at the same grub version that is already running and nothing can break
from portage doing automatic things.
Comment 9 Martin von Gagern 2005-07-16 11:43:54 UTC
I'm for config protection, because running etc-update resp. dispatch-conf after
updates is common practice, and an informational message to this effect is
printed at the end of a bulk emerge, not after an indivual ebuild. I'm also for
a generic approach instead of custom specific scripts. So my order of preference
would be:
1. config protection
2. ebuild config
3. custom script
Keep in mind that not everybody uses grub-install to setup grub in the first place.
Comment 10 Sven Wegener gentoo-dev 2005-07-16 12:11:32 UTC
We can't CONFIG_PROTECT /boot/grub, because the files are installed in /lib/grub
and then copied to /boot/grub. This case isn't catched by CONFIG_PROTECT. And
directly installing into /boot/grub breaks grub-install and isn't the way upstream
designed it.
Comment 11 Sven Wegener gentoo-dev 2005-07-16 12:14:42 UTC
While I'm thinking about it, I could copy the files from /lib/grub to /boot/grub
before the actual merge to the live filesystem takes place. This way we could
CONFIG_PROTECT it. But this is ugly in my mind, because /boot isn't a place
to check for updated configuration files. /boot/grub should be under total user
control IMHO.
Comment 12 Jakub Moc (RETIRED) gentoo-dev 2005-07-18 01:02:40 UTC
(In reply to comment #11)
> But this is ugly in my mind, because /boot isn't a place
> to check for updated configuration files. /boot/grub should be under total user
> control IMHO.

I guess CONFIG_PROTECT in /boot would produce more bugs (think confused users)
than this feature would solve. And yes, it's really ugly and CONFIG_PROTECT is
not really designed for the kind of data found in /boot/grub. 
Comment 13 Sven Wegener gentoo-dev 2005-07-19 09:46:59 UTC
*** Bug 99548 has been marked as a duplicate of this bug. ***
Comment 14 SpanKY gentoo-dev 2005-08-19 16:35:13 UTC
isnt this the point of keeping older versions of grub around ?  if you're having
problems with a new grub, downgrade
Comment 15 SpanKY gentoo-dev 2006-09-07 22:18:29 UTC
we're not going to CONFIG_PROTECT the boot files
Comment 16 Jakub Moc (RETIRED) gentoo-dev 2007-01-06 09:11:34 UTC
*** Bug 160365 has been marked as a duplicate of this bug. ***
Comment 17 Tom Felker 2007-01-06 10:07:45 UTC
(In reply to comment #16)
> *** Bug 160365 has been marked as a duplicate of this bug. ***

Um, I REALLY think some effort should be made to fix this.  The current behavior is that, with not so much as an ewarn, the third time you emerge grub (but not the first two!), your system will, at some random time in the future, fail to boot.  Not fixing such a bug is insane!  I mean I'm getting used to seemingly innocuous upgrades breaking stuff, but with this kind of bug most users won't  have a clue what's wrong, and some won't be able to fix it.  It's far preferable to simply not touch /boot and instruct the user to install grub manually.  (Who ever wants their grub updated, anyway?)

I'm attaching an (untested) patch that renames the stage2 file in a way that doesn't delete files necessary for the system to boot.  This will still cause breakage if the ABI between stage2 and later stages changes.  We could either make it more automatic and make a config file telling the ebuild how to install stage1, or less automatic and leave /boot untouched, but the current design is badly broken.
Comment 18 Tom Felker 2007-01-06 10:09:29 UTC
Created attachment 105611 [details, diff]
doesn't ever delete a stage2, warns the user (untested)
Comment 19 Jakub Moc (RETIRED) gentoo-dev 2007-02-15 07:26:43 UTC
*** Bug 160365 has been marked as a duplicate of this bug. ***
Comment 20 Martin von Gagern 2008-03-30 10:31:44 UTC
Ouch! This bit me again today, as I wanted to boot after yesterdays world update. Why exactly is this bug marked WONTFIX, instead of the patch from comment #18 or some similar solution being applied? I think a non-booting system without even a warning is severe enough to warrant some effort in order to avoid it.
Comment 21 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2008-03-30 17:20:35 UTC
Reopening for inclusion.
Comment 22 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2008-03-30 17:22:31 UTC
Ok, a variation on this patch is included now, as it's important for the -r5 upgrade.