132462 – since udev-089, udev-start.sh prevents from booting on athlon xp

Bug 132462 - since udev-089, udev-start.sh prevents from booting on athlon xp

Summary: since udev-089, udev-start.sh prevents from booting on athlon xp

Status:	RESOLVED INVALID

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	New packages (show other bugs)
Hardware:	All Linux

Importance:	Low critical (vote)
Assignee:	Greg Kroah-Hartman (RETIRED)

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	udev-meta
	Show dependency tree

Reported:	2006-05-06 08:33 UTC by Jimmy.Jazz
Modified:	2006-08-30 21:17 UTC (History)
CC List:	2 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jimmy.Jazz 2006-05-06 08:33:58 UTC

Hello,

i have rarely used such high level reporting in Buzilla, but really if you remerge  udev-090 you will not be able to reboot again without a liveCD or a distribution CD. So sorry if that does hurt any of your devs'feelings :)

Indeed, a really nasty bug appears in udev-090 and udev-089. It affects the early boot process. So it is really difficult to diagnose because you cannot stop the kernel messages scrolling along your display.

What is strange is that it has worked until now, that means, until i emerged the same version of the package again!

The problem resides in udev-start.sh file.

Facts:

populate_udev() function calls udevtrigger since udev-098, instead of trigger_events() that is not supported any more since udev-090.
But udevtrigger doesn't work for athlon xp (at least for my computer). Increasing the 300 loops doesn't help anymore.

Possible solution:
As workaround, you can add in /lib/rcscripts/addons/udev-start.sh the "old" udev-trigger() function already present in udev-089 package and call that function instead of udevtrigger in populate_udev()

Hope that helps.

Jj

Comment 1 Matthew Daubenspeck 2006-05-10 05:23:18 UTC

I have the same problem on amd64.

Comment 2 Jimmy.Jazz 2006-06-01 07:20:18 UTC

(In reply to comment #1)
> I have the same problem on amd64.
> 
sorry for the late reply

to make my athlon XP to boot again, i modify the udev-start.sh script with the following:

 diff -ruN /lib/rcscripts/addons/udev-start.sh /var/tmp/udev-start.sh
--- /lib/rcscripts/addons/udev-start.sh 2006-04-17 21:30:21.000000000 +0200
+++ /var/tmp/udev-start.sh      2006-06-01 16:09:31.000000000 +0200
@@ -51,7 +51,7 @@
        # populate /dev with devices already found by the kernel
        if [ "$(get_KV)" -gt "$(KV_to_int '2.6.14')" ] ; then
                ebegin "Populating /dev with existing devices through uevents"
-               udevtrigger
+               trigger_events
                eend 0
        else
                ebegin "Populating /dev with existing devices with udevstart"

Comment 3 Greg Kroah-Hartman (RETIRED) gentoo-dev

2006-06-01 09:27:01 UTC

That means that some module is being automatically loaded for your 
machine that is causing it to lock up.

Any chance you can modify the trigger_events function to handle all devices
also (take the comment out of the line it says to) and see if it still
locks up there?

If so, can you add some "echos" to that function to narrow it down to what
device is causing the problem?

Comment 4 Jimmy.Jazz 2006-06-01 12:46:55 UTC

(In reply to comment #3)

> That means that some module is being automatically loaded for your 
> machine that is causing it to lock up.

we should not define that as a "module lock up initialisation" :). The boot process doesn't stop in udev-start.sh script but rather after when trying to access to the filesystems. Because the kernel did not find anything in /dev the kernel panics. The problem is certainly with udevtrigger. It doesn't do its job well and do nothing when it is called in udev-start.sh script. That's why /dev/.udev stays empty. 

> 
> Any chance you can modify the trigger_events function to handle all devices
> also (take the comment out of the line it says to) and see if it still
> locks up there?

trigger_events works well and populate /dev. So all the modules could be loaded.
What bothers me is - apart the fact that two of my machines don't have the same proc (Athlon XP vs Amd64) they have the same boot config (root=/dev/ram0 init=/linuxrc real_root=/dev/evms/root doevms2) - the problem occurs only with  an athlon XP. It will definitly not work with udevtrigger.

> 
> If so, can you add some "echos" to that function to narrow it down to what
> device is causing the problem?
> 

The easiest way to read something on the display were to increased the loop size in populate_udev() and to add --verbose to udevtrigger. udevtrigger doesn't print anything and of course doesn't populate /dev.

With the trigger_events script i didn't have such problem.

PS: i would do some more test on the server but i had so many problems to boot it from a livecd. Only with a lot of luck i could access to the root filesystem  and restore udev-start.sh. I'm not really hurry to start again ;)

Thx

Jj

Comment 5 Greg Kroah-Hartman (RETIRED) gentoo-dev

2006-06-01 14:55:03 UTC

Ah, so this might be an evms issue, not a udev one.

You didn't mention that the boot process failed because the root partition
was not found, that's very important.

Are you using the genkernel package?

What happens if you do not use any initramfs/initrd for your kernel?

Comment 6 Jimmy.Jazz 2006-06-05 06:15:47 UTC

(In reply to comment #5)
> Ah, so this might be an evms issue, not a udev one.

Really i don't believe that could be an evms issue. /dev devices are simply not created when udev-start.sh is called. Changing udevtrigger with trigger_events corrects the issue.

As i mentionned above, i have both computers with exacly the same config and  i encounter the issue only on the 32bit kernel version. The real difference is the kernel level; the athlon amd64 and the athlon xp are using respectively a  2.6.16-gentoo-r9 64bit kernel and a linux-2.6.17-rc5 vanilla 32bit kernel.

Both are using acpi, sysfs, evms2, etc.

> 
> You didn't mention that the boot process failed because the root partition
> was not found, that's very important.
> 

Yes, root is on an evms partition. So i need to load the kernel and evms2 module in memory to access root partition. Only /boot stays on an ext2 partition. Swap is on its own standalone evms container. I guess it is not necessary any longer.

> Are you using the genkernel package?
> 

Yes that makes more sense than doing all the job by hand ;). Really a great tool.I'm using sys-kernel/genkernel-3.3.11d. Works great.

here is the command: 
genkernel --gensplash --gensplash-res=1024x768 --no-install --no-clean --evms2 --kerneldir=/usr/src/linux all

the genkernel.conf is standard except for CACHE_DIR="/var/genkernel/pkg/%%ARCH%%"

> What happens if you do not use any initramfs/initrd for your kernel?
> 

Of course, it's impossible for the kernel to find the root partition ;)
I cannot pass the first step when the modules are called in memory (/dev/ram).

When the issue occurs the evms root partition is mounted. so it couldn't be related with the evms module since it is already loaded.

Comment 7 Jimmy.Jazz 2006-06-29 10:05:34 UTC

Hello,

i'm not sure what really append but after remerging the whole world with gcc 4.1.1 instead gcc 4.2, the problem vanished. Probably the -03 flag or whatever gcc 4.2 optimizations could have corrupt the udevtrigger code. The problem has  only affected the amd xp 32bit computer.

My flag was:
CFLAGS="-march=athlon-xp -O3 -pipe"


Jj

Comment 8 Greg Kroah-Hartman (RETIRED) gentoo-dev

2006-08-30 21:17:54 UTC

Then this was a compiler issue.  Closing bug.