<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "http://bugs.gentoo.org/bugzilla.dtd">

<bugzilla version="2.22.7"
          urlbase="http://bugs.gentoo.org/"
          maintainer="bugzilla@gentoo.org"
>

    <bug>
          <bug_id>98768</bug_id>
          
          <creation_ts>2005-07-12 05:36 0000</creation_ts>
          <short_desc>grub stages broken after upgrade</short_desc>
          <delta_ts>2008-03-30 17:22:31 0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>Gentoo Linux</product>
          <component>Core system</component>
          <version>2005.0</version>
          <rep_platform>x86</rep_platform>
          <op_sys>Linux</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          
          <everconfirmed>1</everconfirmed>
          <reporter>Martin.vGagern@gmx.net</reporter>
          <assigned_to>base-system@gentoo.org</assigned_to>
          <cc>eric.brown@dnbrown.net</cc>
    
    <cc>jakub@gentoo.org</cc>
    
    <cc>robmoss@gentoo.org</cc>
    
    <cc>swegener@gentoo.org</cc>
    
    <cc>tomfelker@gmail.com</cc>

      

      
          <long_desc isprivate="0">
            <who>Martin.vGagern@gmx.net</who>
            <bug_when>2005-07-12 05:36:57 0000</bug_when>
            <thetext>Today I lost grub functionality because of an update world before. I guess that grub accessed its stage two by block address, and this changed in the update. To prevent this kind of error, I suggest adding /boot/grub to the CONFIG_PROTECT list by default.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>jakub@gentoo.org</who>
            <bug_when>2005-07-12 05:49:48 0000</bug_when>
            <thetext>If you have issues with grub, then post the error messages and other relevant
information and describe your problems in more detail. Grub stages are not
configuration files, which the CONFIG_PROTECT feature is designed for.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>Martin.vGagern@gmx.net</who>
            <bug_when>2005-07-12 06:22:11 0000</bug_when>
            <thetext>OK, error message looked somewhat like this: &quot;GRUB 064 expected 032F&quot;. The
numbers are pure imagination, I don&apos;t remember the exact values, but it looked
very much like this, and I got no boot menu.

Problem is, that I can&apos;t embed the appropriate stage 1.5 in my partition.
Because of this, grub can&apos;t use this stage 1.5 at a fixed address to load a
filesystem driver and find its stage 2 using the filesystem. Instead, the stage1
that is embedded in the boot record of this partition is patched with the block
address of the stage2 file. So the stage2 file must not change its physical
position. It seems like installing a new version did indeed change this physical
location. Maybe &quot;cat new_stage2 &gt; /boot/grub/stage2&quot; or something similar might
actually work.

Also note that some information like the location of the configuration file or
the saved default bootmenu entry are saved in this stage2, so it makes sense to
treat it as a configuration file, which should not automatically be replaced. It
would be possible to just protect /boot/grub/stage2, if this mechanism supports
protecting single files.
</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>jakub@gentoo.org</who>
            <bug_when>2005-07-12 06:31:55 0000</bug_when>
            <thetext>Which grub version are you using? Post the version and reopen then.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>Martin.vGagern@gmx.net</who>
            <bug_when>2005-07-13 00:14:49 0000</bug_when>
            <thetext>This was an update from grub-0.96-r1 to grub-0.96-r2.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>Martin.vGagern@gmx.net</who>
            <bug_when>2005-07-15 10:23:24 0000</bug_when>
            <thetext>OK, the information about the saved default being stored in stage2 is no longer
correct, as I found out when examining my own occurence of bug 83287. It seems
that stage2 is not changed after installation any more.

Looking at the r2 ebuild, line 161 in fact addresses the issue, but does not
solve it. People won&apos;t notice that the stage2.old is still used and either
delete it or start wondering why re-merging the same package suddenly breaks
things. As far as I can see, there is no way to figure out which stage2 is still
in use if there are already two copies, so the only robust solution would be to
use sequence numbers or something similar to keep an arbitrary number of stage2
files around.

It would probably also be a good idea to issue a warning, to notify the user why
there is more than one stage2 and under which condition the old one may be
deleted. Although I myself usually never see such warnings in the middle of some
&quot;emerge -uD world&quot; output...
</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>swegener@gentoo.org</who>
            <bug_when>2005-07-15 17:03:10 0000</bug_when>
            <thetext>Either keep all old stage2 files in /boot/grub or (as we install the files in
/lib/grub and copy them during postinst to /boot/grub) only copy the files if
they don&apos;t exist, else skip them and provide the user with a grub-copy-files
script that does the copy upon request or advice users to use grub-install,
which also performs the copying.

I prefer the script/grub-install solution, because I think it&apos;s easier to
maintain rather than having /boot/grub filled up with old stage2 files.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>jakub@gentoo.org</who>
            <bug_when>2005-07-16 05:42:41 0000</bug_when>
            <thetext>(In reply to comment #6)
&gt; Either keep all old stage2 files in /boot/grub or (as we install the files in
&gt; /lib/grub and copy them during postinst to /boot/grub) only copy the files if
&gt; they don&apos;t exist, else skip them and provide the user with a grub-copy-files
&gt; script that does the copy upon request or advice users to use grub-install,
&gt; which also performs the copying.

We could advise users to run ebuild config and do the copying there, maybe that
would be a cleaner Gentoo-like solution?
</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>swegener@gentoo.org</who>
            <bug_when>2005-07-16 07:23:34 0000</bug_when>
            <thetext>(In reply to comment #7)
&gt; We could advise users to run ebuild config and do the copying there, maybe that
&gt; would be a cleaner Gentoo-like solution?

Yeah, thought about it, but I was thinking that config doesn&apos;t match the purpose
for copying the files. As I think about it more you&apos;re right, I guess that&apos;s the
cleanest and the best Gentoo like solution. Do the initial copying in postinst
if the grub files don&apos;t exist in /boot/grub yet and have users run ebuild config
to do the copying once they are ready to finish the grub update. This way they
will stay at the same grub version that is already running and nothing can break
from portage doing automatic things.
</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>Martin.vGagern@gmx.net</who>
            <bug_when>2005-07-16 11:43:54 0000</bug_when>
            <thetext>I&apos;m for config protection, because running etc-update resp. dispatch-conf after
updates is common practice, and an informational message to this effect is
printed at the end of a bulk emerge, not after an indivual ebuild. I&apos;m also for
a generic approach instead of custom specific scripts. So my order of preference
would be:
1. config protection
2. ebuild config
3. custom script
Keep in mind that not everybody uses grub-install to setup grub in the first place.
</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>swegener@gentoo.org</who>
            <bug_when>2005-07-16 12:11:32 0000</bug_when>
            <thetext>We can&apos;t CONFIG_PROTECT /boot/grub, because the files are installed in /lib/grub
and then copied to /boot/grub. This case isn&apos;t catched by CONFIG_PROTECT. And
directly installing into /boot/grub breaks grub-install and isn&apos;t the way upstream
designed it.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>swegener@gentoo.org</who>
            <bug_when>2005-07-16 12:14:42 0000</bug_when>
            <thetext>While I&apos;m thinking about it, I could copy the files from /lib/grub to /boot/grub
before the actual merge to the live filesystem takes place. This way we could
CONFIG_PROTECT it. But this is ugly in my mind, because /boot isn&apos;t a place
to check for updated configuration files. /boot/grub should be under total user
control IMHO.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>jakub@gentoo.org</who>
            <bug_when>2005-07-18 01:02:40 0000</bug_when>
            <thetext>(In reply to comment #11)
&gt; But this is ugly in my mind, because /boot isn&apos;t a place
&gt; to check for updated configuration files. /boot/grub should be under total user
&gt; control IMHO.

I guess CONFIG_PROTECT in /boot would produce more bugs (think confused users)
than this feature would solve. And yes, it&apos;s really ugly and CONFIG_PROTECT is
not really designed for the kind of data found in /boot/grub. </thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>swegener@gentoo.org</who>
            <bug_when>2005-07-19 09:46:59 0000</bug_when>
            <thetext>*** Bug 99548 has been marked as a duplicate of this bug. ***</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>vapier@gentoo.org</who>
            <bug_when>2005-08-19 16:35:13 0000</bug_when>
            <thetext>isnt this the point of keeping older versions of grub around ?  if you&apos;re having
problems with a new grub, downgrade</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>vapier@gentoo.org</who>
            <bug_when>2006-09-07 22:18:29 0000</bug_when>
            <thetext>we&apos;re not going to CONFIG_PROTECT the boot files</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>jakub@gentoo.org</who>
            <bug_when>2007-01-06 09:11:34 0000</bug_when>
            <thetext>*** Bug 160365 has been marked as a duplicate of this bug. ***</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>tomfelker@gmail.com</who>
            <bug_when>2007-01-06 10:07:45 0000</bug_when>
            <thetext>(In reply to comment #16)
&gt; *** Bug 160365 has been marked as a duplicate of this bug. ***

Um, I REALLY think some effort should be made to fix this.  The current behavior is that, with not so much as an ewarn, the third time you emerge grub (but not the first two!), your system will, at some random time in the future, fail to boot.  Not fixing such a bug is insane!  I mean I&apos;m getting used to seemingly innocuous upgrades breaking stuff, but with this kind of bug most users won&apos;t  have a clue what&apos;s wrong, and some won&apos;t be able to fix it.  It&apos;s far preferable to simply not touch /boot and instruct the user to install grub manually.  (Who ever wants their grub updated, anyway?)

I&apos;m attaching an (untested) patch that renames the stage2 file in a way that doesn&apos;t delete files necessary for the system to boot.  This will still cause breakage if the ABI between stage2 and later stages changes.  We could either make it more automatic and make a config file telling the ebuild how to install stage1, or less automatic and leave /boot untouched, but the current design is badly broken.
</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>tomfelker@gmail.com</who>
            <bug_when>2007-01-06 10:09:29 0000</bug_when>
            <thetext>Created an attachment (id=105611)
doesn&apos;t ever delete a stage2, warns the user (untested)

</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>jakub@gentoo.org</who>
            <bug_when>2007-02-15 07:26:43 0000</bug_when>
            <thetext>*** Bug 160365 has been marked as a duplicate of this bug. ***</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>Martin.vGagern@gmx.net</who>
            <bug_when>2008-03-30 10:31:44 0000</bug_when>
            <thetext>Ouch! This bit me again today, as I wanted to boot after yesterdays world update. Why exactly is this bug marked WONTFIX, instead of the patch from comment #18 or some similar solution being applied? I think a non-booting system without even a warning is severe enough to warrant some effort in order to avoid it.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>robbat2@gentoo.org</who>
            <bug_when>2008-03-30 17:20:35 0000</bug_when>
            <thetext>Reopening for inclusion.</thetext>
          </long_desc>
          <long_desc isprivate="0">
            <who>robbat2@gentoo.org</who>
            <bug_when>2008-03-30 17:22:31 0000</bug_when>
            <thetext>Ok, a variation on this patch is included now, as it&apos;s important for the -r5 upgrade.</thetext>
          </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="1"
              isprivate="0"
          >
            <attachid>105611</attachid>
            <date>2007-01-06 10:09 0000</date>
            <desc>doesn&apos;t ever delete a stage2, warns the user (untested)</desc>
            <filename>grub-dont-delete-stage2.diff</filename>
            <type>text/plain</type>
            <data encoding="base64">LS0tIC91c3IvcG9ydGFnZS9zeXMtYm9vdC9ncnViL2dydWItMC45Ny1yMy5lYnVpbGQJMjAwNi0x
Mi0wMiAxODowNjo0Mi4wMDAwMDAwMDAgLTA2MDAKKysrIGdydWItMC45Ny1yMy5lYnVpbGQJMjAw
Ny0wMS0wNiAwNDowNToyNy4wMDAwMDAwMDAgLTA2MDAKQEAgLTEyOSw3ICsxMjksMTcgQEAKIAkJ
bG4gLXNuZiBncnViLmNvbmYgIiR7ZGlyfSIvZ3J1Yi9tZW51LmxzdAogCWZpCiAKLQlbWyAtZSAk
e2Rpcn0vZ3J1Yi9zdGFnZTIgXV0gJiYgbXYgIiR7ZGlyfSIvZ3J1Yi9zdGFnZTJ7LC5vbGR9CisJ
aWYgW1sgLWUgJHtkaXJ9L2dydWIvc3RhZ2UyIF1dOyB0aGVuCisJCW12ICIke2Rpcn0iL2dydWIv
c3RhZ2UyICQobWt0ZW1wIFwKKwkJCSIke2Rpcn0iL2dydWIvc3RhZ2UyLSQoZGF0ZSAtLXJmYy0z
MzM5PWRhdGUpLlhYWFhYWCkKKwkJZXdhcm4KKwkJZXdhcm4gIioqKiBJTVBPUlRBTlQgTk9URTog
eW91IHdpbGwgbmVlZCB0byBydW4gZ3J1YiB0byBpbnN0YWxsIgorCQlld2FybiAiIHRoZSBuZXcg
dmVyc2lvbidzIHN0YWdlMSB0byB5b3VyIE1CUi4gIFVudGlsIHlvdSBkbywiCisJCWV3YXJuICIg
c3RhZ2UxIGFuZCBzdGFnZTIgd2lsbCBzdGlsbCBiZSB0aGUgb2xkIHZlcnNpb24sIGJ1dCIKKwkJ
ZXdhcm4gIiBsYXRlciBzdGFnZXMgd2lsbCBiZSB0aGUgbmV3IHZlcnNpb24sIHdoaWNoIGNvdWxk
IgorCQlld2FybiAiIGNhdXNlIHByb2JsZW1zLiIKKwkJZXdhcm4KKwlmaQogCiAJZWluZm8gIkNv
cHlpbmcgZmlsZXMgZnJvbSAvbGliL2dydWIgYW5kIC91c3IvbGliL2dydWIgdG8gIiR7ZGlyfSIi
CiAJZm9yIHggaW4gL2xpYiovZ3J1Yi8qLyogL3Vzci9saWIqL2dydWIvKi8qIDsgZG8K
</data>        

          </attachment>
    </bug>

</bugzilla>