Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 178756 - sys-kernel/genkernel-3.4.8 problems with network boot (nfsroot=)
Summary: sys-kernel/genkernel-3.4.8 problems with network boot (nfsroot=)
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Hosted Projects
Classification: Unclassified
Component: genkernel (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Gentoo Genkernel Maintainers
URL:
Whiteboard:
Keywords:
: 194227 201151 (view as bug list)
Depends on:
Blocks:
 
Reported: 2007-05-16 12:31 UTC by Stefan Hellermann
Modified: 2008-06-26 17:49 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
simple patch to let udhcpc no longer fail (fix_wrong_udhcpc_call.patch,460 bytes, patch)
2007-06-02 16:10 UTC, Stefan Hellermann
Details | Diff
patch udhcpc to get the root-path from dhcp-server automatically (busybox_udhcp_get_rootpath.patch,800 bytes, patch)
2007-06-02 17:05 UTC, Stefan Hellermann
Details | Diff
patch busybox: reintroduce -R as -O (busybox_udhcp_add_-O_option.patch,4.49 KB, patch)
2007-06-02 18:19 UTC, Stefan Hellermann
Details | Diff
patch for genkernel to work with last patch (genkernel_udhcpc_-R_is_now_-O.patch,439 bytes, patch)
2007-06-02 18:24 UTC, Stefan Hellermann
Details | Diff
patch busybox: reintroduce -R as -O (busybox_udhcp_add_-O_option-v2.patch,4.39 KB, patch)
2007-06-02 19:40 UTC, Stefan Hellermann
Details | Diff
Enhancements to genkernel (genkernel.diff,1.57 KB, patch)
2007-07-09 09:42 UTC, Stephan Schenk
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Hellermann 2007-05-16 12:31:15 UTC
While trying to use a "Gentoo 2007.0 Minimal i686 CD" as base for a PXE/NFSroot I got the following Error:

udhcpc: invalid option -- R

then further it can't mount the nfsroot and regardless it tries to switch_root to /newroot but fails with:

Booting (initramfs)...
switch_root: Bad init '/sbin/init'
kernel panic - not syncing: Attempted to kill init
(I think this is another unwanted behavior, somehow "exec switch_root ..." should return to /linuxrc)

On the "Gentoo 2007.0 Minimal i686 CD" the network-driver e1000 and nfs are included as modules, not compiled in. So it won't work giving ip=dhcp or nfsroot=x.x.x.x:/netroot, because at boottime the kernel can't access the network and can't mount a nfs-export.
But it seems that genkernel supports this, first it loads the modules, then checks if ip=??? is present and otherwise uses udhcpc to get an ip and the nfsroot. A quick search showed that busybox supported the -R option for getting custom options from the dhcp, but current busybox misses this feature.

I think somehow we have to patch busybox udhcpc to support the -R option or use another method to get an ip and the nfsroot, and maybe we should also send out requests to all netdevices, not only eth0 hardcoded.

I know I could recompile the kernel with nfs and e1000 compiled in and use nfsroot and ip=dhcp ... but that would not add useful things to genkernel :)

Regard Stefan Hellermann

Reproducible: Always

Steps to Reproduce:
1. Copy "Gentoo 2007.0 Minimal i686 CD" kernel and initrd on a pxe bootserver, export the cd with NFS
2. Boot the kernel from cd with "root=/dev/ram0 init=/linuxrc initrd=gentoo.igz real_root=/dev/nfs nfsroot=192.168.0.14:/srv/netboot/cd looptype=squashfs loop=image.squashfs cdroot" (it won't work by providing ip=dhcp)

(same happens if I export a stage3 with NFS and use this as nfsroot, then the kernel-cmdline would be: "root=/dev/ram0 init=/linuxrc initrd=gentoo.igz real_root=/dev/nfs nfsroot=192.168.0.14:/srv/netboot/stage3" (again, providing ip=dhcp does not change anything))
Actual Results:  
udhcpc: invalid option -- R

then further

Attempting to mount NFS CD image on 192.168.0.14:/srv/netboot/cd
NFS Mounting failed. Is the path corrent ?

then further

Booting (initramfs)...
switch_root: Bad init '/sbin/init'
kernel panic - not syncing: Attempted to kill init

Expected Results:  
udhcpc gets an ip and a nfsroot (if nfsroot is not given on cmdline)

/linuxrc uses this nfsroot and mounts it to /newroot (or /newroot/mnt/cdrom)

switch_root would start the system on nfs
Comment 1 John R. Graham gentoo-dev 2007-06-02 11:52:24 UTC
Stefan,

I don't think this is genkernel's fault, per se (at least not part of it).  genkernel already has a solution built into the init script that it loads onto the initrd to load modules.  It interprets a kernel command line option called "doload" to specify additional modules that need to be loaded early.  You could include

    doload=e1000,nfs

on your kernel command line in grug.conf or lilo.conf. 

- John
Comment 2 Stefan Hellermann 2007-06-02 15:42:20 UTC
It already loads the appropriate modules, that's not the problem (I already know doload=...)
The problem is that the init-script calls udhcpc with an option that was supported  only on older versions of busybox. Therefore udhcpc will give an error and not run at all.
The -R should get a root-path, which is sometimes useful, but not needed, you could use nfsroot=... on command-line.
But maybe you could get root-path with a udhcpc-script, http://udhcp.busybox.net/README.udhcpc says that there's a variable named rootpath in the script.
Comment 3 Stefan Hellermann 2007-06-02 16:10:15 UTC
Created attachment 120949 [details, diff]
simple patch to let udhcpc no longer fail

Seems that all we have to do is to remove "-R rootpath".
As of udhcpc's readme[1], it will automatically set rootpath in the udhcpc.script, but sadly it doesn't. Maybe because it tries to get rootpath and not root-path from the dhcp-server? (This would be a busybox bug)

[1] http://udhcp.busybox.net/README.udhcpc
Comment 4 Stefan Hellermann 2007-06-02 17:05:14 UTC
Created attachment 120955 [details, diff]
patch udhcpc to get the root-path from dhcp-server automatically

with this patch to busybox udhcpc gets the root-path every-time.
Comment 5 Stefan Hellermann 2007-06-02 18:19:34 UTC
Created attachment 120968 [details, diff]
patch busybox: reintroduce -R as -O

Sorry for all the spam! After some search i found this old patch:
http://busybox.net/lists/udhcp/attachments/20040212/737c60a4/dhcp-patch.obj
Busybox has changed much since then, so I took the idea and wrote a new patch, including a rename of the -R option to -O, as -R is already taken.
(Everything based on sys-kernel/busybox-1.5)

This and a small change in genkernels initrd.scripts file (udhcpc -R => -O) will bring back the old behavior.
Comment 6 Stefan Hellermann 2007-06-02 18:24:19 UTC
Created attachment 120971 [details, diff]
patch for genkernel to work with last patch

Here's the relevant change in genkernel, -R is now -O
Comment 7 Stefan Hellermann 2007-06-02 19:40:04 UTC
Created attachment 120976 [details, diff]
patch busybox: reintroduce -R as -O

V2 of the patch, now it works as expected ;)

Again this patch is for busybox-1.5! Will genkernel update to busybox-1.5? Here I'm running a somewhat modified genkernel-script with busybox-1.5.
Comment 8 SpanKY gentoo-dev 2007-06-22 04:57:17 UTC
we need to get away from the "lets patch busybox" mentality and to "lets fix busybox upstream and then use the new version"

has this patch been posted to the busybox list ?  i'm no expert (well, i dont know s**t beyond running `dhcpcd`), but why does the dhcp request need OPTION_REQ ?  doesnt the outgoing packet include "rootpath" in it which means it should get a "rootpath" back ?
Comment 9 Stefan Hellermann 2007-06-22 15:22:23 UTC
(In reply to comment #8)
> we need to get away from the "lets patch busybox" mentality and to "lets fix
> busybox upstream and then use the new version"
> 
> has this patch been posted to the busybox list ?  i'm no expert (well, i dont
> know s**t beyond running `dhcpcd`), but why does the dhcp request need
> OPTION_REQ ?  doesnt the outgoing packet include "rootpath" in it which means
> it should get a "rootpath" back ?
> 

I think udhcpc should request rootpath and some more options by default, but it doesn't and it doesn't give you the option to do so. I could post this patch to busybox devel list, of course!
Comment 10 Stephan Schenk 2007-07-09 09:42:26 UTC
Created attachment 124306 [details, diff]
Enhancements to genkernel

Hi,

considering the summary of the bug I would like to add some comments and suggestions. I tried to use genkernel's (3.4.8) initrd to boot a cluster with / on NFS (no Install-CD, real system). I managed to boot everything after a few tweaks I'd like to share.

* work around -R option in udhcp:
This is discussed here in detail (I simply removed -R and used 'nfsroot' command line option)

* set hostname in udhcpc.script
For me this turned out to be necessary for onesis to determine the node class on boot-up. As far as I can see, this does not do harm either --- correct me if I'm wrong. For this to succeed I had to turn on 'hostname' in busybox.

* do not use a predefined set of mount-options
Instead of hardcoding the NFS mount options in initrd.scripts, we could simply use REAL_ROOTFLAGS, possibly with some default values.

* networking will not work once machine is booted
This took me quite a while to figure out since 'ifconfig' reports everything is fine. I simply kill the udhcpc from busybox immediately after configuration of the interface. In the 'boot' runlevel I start dhcpcd on eth0 (not net.eth0 though) and everything works fine.

I'll attach a patch with my changes which will hopefully be useful to you.

Stephan
Comment 11 Andrew Gaffney (RETIRED) gentoo-dev 2007-08-13 22:30:41 UTC
+		# kill dhcp server (runs as busybox)
+		busybox killall busybox

Either I'm misunderstanding something, or that's just asking for trouble. What's going on there?
Comment 12 Stephan Schenk 2007-08-14 08:31:30 UTC
(In reply to comment #11)
> +               # kill dhcp server (runs as busybox)
> +               busybox killall busybox
> 
> Either I'm misunderstanding something, or that's just asking for trouble.
> What's going on there?
> 
I know it sounds a little bit strange. The udhcp client (provided by busybox binary) is setting up the network connection and the root filesystem can be mounted via NFS. However, later in the boot process, the network connection is dead despite the fact that ifconfig reports a valid configuration. Somehow the network configuration does not survive the boot process. The network has to be reconfigured during in the boot runlevel. Once the pivot_root succeeds, the udhcpc of the initrd is no longer accessible. In the boot runlevel I once again start a dhcp client (dhcpcd in my case). Since it is not a good idea to have two dhcp clients on the same interface, I kill the udhcpc from the initrd. I did a little bit of debugging to figure out which processes are running once the network has been successfully configured by the initrd. It turns out that the udhcpc is present as 'busybox' in 'ps xa' as one might expect from the functionality of busybox. Since this is the only busybox process running (at least for my 80+ test systems) I simply used killall. This, in turn, is also provided by busybox. Therefore you have this strange command line 'busybox killall busybox'.
Comment 13 Stephan Schenk 2007-08-14 10:18:04 UTC
(In reply to comment #11)
Silly me. It's so simple. I should have had a look at busybox' udhcp code much earlier. In networking/udhcp/dhcpc.c all available options are listed. Instead of using the obscure killall one could easily supply the '-q' option to udhcpc which will cause it to quit after obtaining the lease (and configuring the interface). Tested it on my machines and it works. 

By the way, it seems reasonable to me to increase the timeout for the DHCP answer since the default '3' is rather short. Increasing it to 60 should be more than enough. Currently, my DHCP call looks like this in findnfsmount():

busybox udhcpc -n -T 60 -q -s /bin/udhcpc.scripts
Comment 14 Stefan Hellermann 2007-08-14 10:54:14 UTC
The patch I provided to fetch the root-path with udhcpc still applies to busybox-1.6.1, the one I'm using in my latest initramfs for many machines here. But I haven't posted this patch to busybox-devel-list so far.

The command "busybox udhcpc -n -T 60 -q -s /bin/udhcpc.scripts -O rootpath" seems reaseonable to me (added the -O rootpath). You could drop the busybox as the initramfs installs all busybox-utilities as links to busybox.
Comment 15 Chris Gianelloni (RETIRED) gentoo-dev 2007-11-07 20:47:45 UTC
*** Bug 194227 has been marked as a duplicate of this bug. ***
Comment 16 Andrew Gaffney (RETIRED) gentoo-dev 2007-12-03 21:54:56 UTC
*** Bug 201151 has been marked as a duplicate of this bug. ***
Comment 17 Andrew Gaffney (RETIRED) gentoo-dev 2007-12-03 21:56:30 UTC
I plan on fixing this one properly when I add proper netboot support to genkernel in 3.5. Until then, just build a kernel without an initramfs for nfsroot. You don't need one.
Comment 18 Michael Hordijk 2007-12-03 21:59:46 UTC
(In reply to comment #17)
> I plan on fixing this one properly when I add proper netboot support to
> genkernel in 3.5. Until then, just build a kernel without an initramfs for
> nfsroot. You don't need one.

I'll hack around it until then.  I need an initramfs for a bunch of other things, including doing unionfs on top of a RO NFS root.  Thanks.
Comment 19 Stefan Hellermann 2007-12-03 22:52:11 UTC
I've send my busybox-patch and some ideas upstream.

May I add a feature request for genkernel-3.5?
Before booting from NFS I would like to setup a bridge with only eth0 as interface. Then fetch the IP with udhcpc on the bridge interface and mount NFS. This makes it possible to use XEN and any Qemu-based Emulator/Virtualization  with bridged interfaces when booting from NFS. They need the bridge for direct connection to the network. Adding a bridge later is nearly impossible when eth0 is in use for your rootfs.

Here is the function I'm using here, call it befor setting up NFS of course. I change the name of the real device to peth0 and the bridge is called eth0, so it's transparent for every other application. Also I use $IFDEV instead of hard coded eth0.

setup_bridge() {
        # Check if $IFDEV exists, otherwise return
        ip link list dev "${IFDEV}" > /dev/null 2>&1 || return 1
        # Rename it to p$IFDEV (physical-Ifdev)
        ip link set "${IFDEV}" down
        ip link set "${IFDEV}" name "p${IFDEV}"
        ifconfig "p${IFDEV}" up promisc
        # create a bridge with the name $IFDEV
        brctl addbr "${IFDEV}"
        # this is needed, otherwise it lasts 15s until the bridge is usable
        brctl setfd "${IFDEV}" 0
        # add the physical interface to our new bridge
        brctl addif "${IFDEV}" "p${IFDEV}"
}

It's not perfect but in use for many machines ;)

Cheers
Stefan
Comment 20 Stefan Hellermann 2007-12-10 12:10:07 UTC
My busybox_udhcp_add_-O_option-v2.patch is accepted upstream with some changes, will be in busybox-1.9. So it's obsolete here, genkernel_udhcpc_-R_is_now_-O.patch is needed to use it.
Comment 21 Andrew Gaffney (RETIRED) gentoo-dev 2008-03-14 16:12:23 UTC
I've fixed this in SVN. The new udhcpc command is:

busybox udhcpc -n -T 15 -q -s /bin/udhcpc.scripts
Comment 22 Stefan Hellermann 2008-03-14 16:45:30 UTC
(In reply to comment #21)
> I've fixed this in SVN. The new udhcpc command is:
> 
> busybox udhcpc -n -T 15 -q -s /bin/udhcpc.scripts
> 

Now I have to put the nfsroot-information to the kernel-cmdline. If you use >=busybox-1.9.0 you can append -O rootpath. With this udhcpc requests the nfsroot from the dhcp-server.
But it's no longer important for me, I use my own initramfs now. I can put it onto my homepage if someone is interested.
Comment 23 Andrew Gaffney (RETIRED) gentoo-dev 2008-03-14 17:43:03 UTC
We're only using busybox-1.7.4 due to issue with forward-porting the mdadm patches.
Comment 24 Chris Gianelloni (RETIRED) gentoo-dev 2008-06-26 17:49:40 UTC
OK.  This is resolved in genkernel 3.4.10, which is now in the tree and stable.