Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 472516 - =sys-fs/zfs-kmod-0.6.1-r1 - Boot hang at 'Importing ZFS pools'
Summary: =sys-fs/zfs-kmod-0.6.1-r1 - Boot hang at 'Importing ZFS pools'
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: Normal major (vote)
Assignee: Richard Yao (RETIRED)
URL:
Whiteboard:
Keywords: Bug, UPSTREAM
Depends on:
Blocks:
 
Reported: 2013-06-06 18:03 UTC by Christer Ekholm
Modified: 2014-06-24 16:45 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
Patch that might fix the issue (patch,1.36 KB, patch)
2013-06-09 16:06 UTC, Richard Yao (RETIRED)
Details | Diff
Patch that might fix the issue (patch,985 bytes, patch)
2013-06-09 16:09 UTC, Richard Yao (RETIRED)
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Christer Ekholm 2013-06-06 18:03:47 UTC
With zfs-kmod-0.6.1-r1 boot is hanging after displaying
"Mounting ZFS filesystems".  I have also tried to do "zfs mount" from
single-user, and that also hangs. Downgrading to 0.6.1 helps.

Since I have one zvol I suspect it has something to do with
zfs-kmod-0.6.1-fix-zvol-initialization.patch

 $LANG=C sudo zfs get all pool0/ibdata
 NAME          PROPERTY              VALUE                  SOURCE
 pool0/ibdata  type                  volume                 -
 pool0/ibdata  creation              Wed Apr 24 23:35 2013  -
 pool0/ibdata  used                  851M                   -
 pool0/ibdata  available             166G                   -
 pool0/ibdata  referenced            813M                   -
 pool0/ibdata  compressratio         1.00x                  -
 pool0/ibdata  reservation           none                   default
 pool0/ibdata  volsize               800M                   local
 pool0/ibdata  volblocksize          4K                     -
 pool0/ibdata  checksum              on                     default
 pool0/ibdata  compression           off                    default
 pool0/ibdata  readonly              off                    default
 pool0/ibdata  copies                1                      default
 pool0/ibdata  refreservation        851M                   local
 pool0/ibdata  primarycache          all                    default
 pool0/ibdata  secondarycache        all                    default
 pool0/ibdata  usedbysnapshots       0                      -
 pool0/ibdata  usedbydataset         813M                   -
 pool0/ibdata  usedbychildren        0                      -
 pool0/ibdata  usedbyrefreservation  37.9M                  -
 pool0/ibdata  logbias               latency                default
 pool0/ibdata  dedup                 off                    inherited from pool0
 pool0/ibdata  mlslabel              none                   default
 pool0/ibdata  sync                  standard               default
 pool0/ibdata  refcompressratio      1.00x                  -
 pool0/ibdata  written               813M                   -
 pool0/ibdata  snapdev               hidden                 default

pool0 has mountpoint=/ but canmount=off. My real / is ext3.

Please ask for additional information you might think is relevant.


Reproducible: Always
Comment 1 Christer Ekholm 2013-06-06 18:20:19 UTC
Oh, kernel-version might be relevant. I'm running a home-built 3.9.4.
Comment 2 Richard Yao (RETIRED) gentoo-dev 2013-06-08 15:55:40 UTC
Code involving zvols should be unable to cause a hang at "Mounting ZFS filesystems". Would you try removing zfs from the boot runlevel, rebooting, running `zpool import -N rpool0` and running `zfs mount -a`?
Comment 3 Christer Ekholm 2013-06-08 16:33:33 UTC
I checked again. And the hang is actually at
"Importing ZFS pools". I'm very sorry about that. I waited a couple of
days between before I reported it. And apparently remembered wrong.

Anyway, I have tested your suggestion now.
If I boot sinle user, the system hang when I run
"zfs import -N rpool0"

I have also tested this on a second machine by creating a zvol. And I
have no problem at all there.

The kernel is now 3.9.5
Comment 4 Christer Ekholm 2013-06-08 16:56:32 UTC
I have made some more testing...

I stowed away the lvol and removed it.

 sudo dd bs=4096 conv=sparse if=/dev/pool0/ibdata of=/slask/ibdata
 sudo zfs destroy pool0/ibdata

Then I don't get the hang. And when I later added the lvol again, the
hang reappeared.
Comment 5 Richard Yao (RETIRED) gentoo-dev 2013-06-09 16:06:54 UTC
Created attachment 350538 [details, diff]
Patch that might fix the issue

(In reply to Christer Ekholm from comment #3)
> I checked again. And the hang is actually at
> "Importing ZFS pools". I'm very sorry about that. I waited a couple of
> days between before I reported it. And apparently remembered wrong.
> 
> Anyway, I have tested your suggestion now.
> If I boot sinle user, the system hang when I run
> "zfs import -N rpool0"
> 
> I have also tested this on a second machine by creating a zvol. And I
> have no problem at all there.
> 
> The kernel is now 3.9.5

I am attaching a patch. Would you place it at /etc/portage/patches/sys-fs/zfs-kmod-0.6.1-r1/zfs-kmod-0.6.1-zvol-initialization.patch, rebuild sys-fs/zfs-kmod-0.6.0-r1 and let me know it works.
Comment 6 Richard Yao (RETIRED) gentoo-dev 2013-06-09 16:09:14 UTC
Created attachment 350540 [details, diff]
Patch that might fix the issue

The previous patch included code from another patch by mistake. I am attaching a new version that removes it. Please use this one instead.
Comment 7 Christer Ekholm 2013-06-09 18:07:01 UTC
No, that didn't help.
Comment 8 Christer Ekholm 2013-06-14 21:52:23 UTC
I have experimented some by adding some printk to the code.

--- /tmp/portage/sys-fs/zfs-kmod-0.6.1-r1/work/zfs-zfs-0.6.1/module/zfs/zvol.c	2013-06-14 23:36:31.547361236 +0200
+++ zvol.c	2013-06-14 23:41:33.443424000 +0200
@@ -102,6 +102,7 @@
 	if (*minor >= (1 << MINORBITS))
 		return ENXIO;
 
+	printk(KERN_ALERT "ZFS: zvol_find_minor: %u\n", *minor);
 	return 0;
 }
 
@@ -1213,6 +1214,7 @@
 	zvol_state_t *zv;
 	int error = 0;
 
+	printk(KERN_ERR "ZFS: zvol_alloc: %s", name);
 	zv = kmem_zalloc(sizeof (zvol_state_t), KM_SLEEP);
 	if (zv == NULL)
 		goto out;
@@ -1481,6 +1483,7 @@
 {
 	spa_t *spa = NULL;
 	int error = 0;
+	printk(KERN_ALERT "ZFS: zvol_create_minors: %s\n",pool);
 
 	if (zvol_inhibit_dev)
 		return (0);
@@ -1502,6 +1505,7 @@
 	}
 	mutex_exit(&zvol_state_lock);
 
+	printk(KERN_ALERT "ZFS: zvol_create_minors: done\n");
 	return error;
 }
 
@@ -1569,6 +1573,7 @@
 zvol_init(void)
 {
 	int error;
+	printk(KERN_ALERT "ZFS: zvol_init\n");
 
 	list_create(&zvol_state_list, sizeof (zvol_state_t),


If I boot single-user and do
 # echo 7 > /proc/sys/kernel/printk
 # zpool import

I get this output. (manualy typed)

ZFS: zvol_create_minors: pool0
ZFS: zvol_find_minor: 0
ZFS: zvol_alloc: pool0/ibdata
ZFS: zvol_create_minors: done
SPL: using hostid 0x00000000

After that the machine hangs.
No zvol_init apparently?
I don't know if this is useful?
Comment 9 Christer Ekholm 2013-06-17 21:49:07 UTC
Good news, I have reproduced the hang on my other machine.

I compared my kernel-settings and played with them until I get the
hang on the other server also.

I found that with CONFIG_PREEMPT_NONE=y I get the hang, but with
CONFIG_PREEMPT=y I don't

I have not tested the reverse on my first machine. I can't boot it
right now. But I will as soon as possible.
Comment 10 Christer Ekholm 2013-06-17 23:42:29 UTC
I have now tested with CONFIG_PREEMPT=y on the first machine also. And
yes, that helped.
Comment 11 Richard Yao (RETIRED) gentoo-dev 2013-06-23 18:49:02 UTC
This bug should have received much more attention, but I have been busy with things offline. With that said, you would try regenerating your initramfs with the following:

genkernel all --no-clean --zfs --callback="env ACCEPT_KEYWORDS=** EGIT_BRANCH=gentoo-next spl_LIVE_REPO='https://github.com/ryao/spl.git' zfs_kmod_LIVE_REPO='https://github.com/ryao/zfs.git' zfs_LIVE_REPO='https://github.com/ryao/zfs.git' emerge --oneshot --nodeps sys-kernel/spl sys-fs/zfs-kmod sys-fs/zfs"

You might need to make some minor adjustments for your setup. That will pull my latest development code from git and knowing whether or not your issue is resolved by that will be helpful to me in figuring out what is wrong here.
Comment 12 Christer Ekholm 2013-06-23 20:12:35 UTC
I don't use a initramfs at all, is that a problem?

I have tested aproximately what you suggested by:

Adding to /etc/portage/package.keywords

 # ZFS testing
 sys-kernel/spl **
 sys-fs/zfs-kmod **
 sys-fs/zfs **

Adding to /etc/portage/make.conf

 #ZFS-testing
 EGIT_BRANCH=gentoo-next
 spl_LIVE_REPO='https://github.com/ryao/spl.git'
 zfs_kmod_LIVE_REPO='https://github.com/ryao/zfs.git'
 zfs_LIVE_REPO='https://github.com/ryao/zfs.git'

And rebuilt zfs zfs-kmod and spl

The commit-point used according to the build-logs are:

GIT update -->
   repository:               https://github.com/ryao/zfs.git
   at the commit:            cdc7fc1523ee428fab03b3285c94d135f39e4c61
   branch:                   gentoo-next
   storage directory:        "/usr/portage/distfiles/egit-src/zfs.git"
   checkout type:            bare repository

GIT update -->
   repository:               https://github.com/ryao/spl.git
   at the commit:            198b2763b3aa7d802d101a3acfa4958c075fe85b
   branch:                   gentoo-next
   storage directory:        "/usr/portage/distfiles/egit-src/spl.git"
   checkout type:            bare repository
   


The server still hangs when CONFIG_PREEMPT_NONE=y and not when
CONFIG_PREEMPT=y

The kernel-version is by now 3.9.7
Comment 13 Richard Yao (RETIRED) gentoo-dev 2013-07-06 13:40:43 UTC
(In reply to Christer Ekholm from comment #12)
> I don't use a initramfs at all, is that a problem?

It is only a problem when using ZFS as your rootfs.

> I have tested aproximately what you suggested by:
> 
> Adding to /etc/portage/package.keywords
> 
>  # ZFS testing
>  sys-kernel/spl **
>  sys-fs/zfs-kmod **
>  sys-fs/zfs **
> 
> Adding to /etc/portage/make.conf
> 
>  #ZFS-testing
>  EGIT_BRANCH=gentoo-next
>  spl_LIVE_REPO='https://github.com/ryao/spl.git'
>  zfs_kmod_LIVE_REPO='https://github.com/ryao/zfs.git'
>  zfs_LIVE_REPO='https://github.com/ryao/zfs.git'
> 
> And rebuilt zfs zfs-kmod and spl
> 
> The commit-point used according to the build-logs are:
> 
> GIT update -->
>    repository:               https://github.com/ryao/zfs.git
>    at the commit:            cdc7fc1523ee428fab03b3285c94d135f39e4c61
>    branch:                   gentoo-next
>    storage directory:        "/usr/portage/distfiles/egit-src/zfs.git"
>    checkout type:            bare repository
> 
> GIT update -->
>    repository:               https://github.com/ryao/spl.git
>    at the commit:            198b2763b3aa7d802d101a3acfa4958c075fe85b
>    branch:                   gentoo-next
>    storage directory:        "/usr/portage/distfiles/egit-src/spl.git"
>    checkout type:            bare repository
>    
> 
> 
> The server still hangs when CONFIG_PREEMPT_NONE=y and not when
> CONFIG_PREEMPT=y
> 
> The kernel-version is by now 3.9.7

At this point, I am going to suggest filing an upstream bug. Having a bug here basically limits its attention to myself. My time is extremely limited and we would make more progress on this issue if upstream and I collaborated as we do on other issues.

https://github.com/zfsonlinux/zfs/issues/new
Comment 14 Christer Ekholm 2013-07-06 21:54:32 UTC
Ok. I have reported this at:
https://github.com/zfsonlinux/zfs/issues/1574
Comment 15 Christer Ekholm 2014-04-16 18:49:21 UTC
This is now fixed upstreams in zfs-0.6.2-129-gba6a240
Comment 16 Richard Yao (RETIRED) gentoo-dev 2014-06-24 16:45:03 UTC
This has been fixed in Gentoo since 0.6.2-r5. Closing as resolved upstream.