Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 532248 - sys-fs/zfs-0.6.3-r1 with kernel 3.14.25 - BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 in zap_create_claim+0x4b/0x2d0 [zfs]
Summary: sys-fs/zfs-0.6.3-r1 with kernel 3.14.25 - BUG: unable to handle kernel NULL p...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Richard Yao (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-11 10:08 UTC by alexander haensch
Modified: 2016-01-16 20:01 UTC (History)
12 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Proposed patch that fixes the issue (fix.patch,773 bytes, patch)
2015-02-16 16:35 UTC, Ivan Vecera
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description alexander haensch 2014-12-11 10:08:43 UTC
I am not 100% sure when it happens and if it is happening because of sys-fs/zfs-0.6.3-r1 or the new kernel.

Reproducible: Always




[  434.590685] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[  434.590860] IP: [<ffffffffa0bcab2b>] zap_create_claim+0x4b/0x2d0 [zfs]
[  434.591004] PGD 0 
[  434.591114] Oops: 0000 [#1] SMP 
[  434.591274] Modules linked in: binfmt_misc arc4 ecb md4 cifs fscache act_police cls_basic cls_fw cls_u32 sch_tbf sch_prio sch_htb sch_hfsc sch_ingress sch_sfq nf_conntrack_snmp xt_CHECKSUM xt_statistic xt_CT xt_LOG xt_connlimit xt_realm xt_addrtype xt_comment xt_recent xt_nat ipt_ULOG ipt_REJECT ipt_MASQUERADE ipt_ECN ipt_CLUSTERIP ipt_ah xt_set ip_set nf_nat_tftp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_TPROXY nf_defrag_ipv6 xt_time xt_TCPMSS xt_tcpmss xt_sctp
[  434.600310]  xt_policy xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_connmark xt_CLASSIFY xt_AUDIT xt_tcpudp xt_state iptable_raw iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables veth x_tables bridge stp llc iTCO_wdt iTCO_vendor_support gpio_ich x86_pkg_temp_thermal coretemp kvm_intel kvm crc32c_intel microcode ixgbe pcspkr joydev mdio i2c_i801 lpc_ich mfd_core igb ioatdma isci i2c_algo_bit dca acpi_cpufreq processor thermal_sys button zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) ipv6 xts gf128mul aes_x86_64 cbc sha512_generic sha256_generic sha1_generic iscsi_tcp libiscsi_tcp
[  434.604424]  libiscsi scsi_transport_iscsi tg3 ptp pps_core libphy e1000 fuse nfs lockd sunrpc multipath linear raid0 dm_raid raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid10 dm_snapshot dm_bufio dm_crypt dm_mirror dm_region_hash dm_log dm_mod hid_sunplus hid_sony led_class hid_samsung hid_pl hid_petalynx hid_gyration sl811_hcd usbhid ohci_pci ohci_hcd uhci_hcd usb_storage aic94xx lpfc qla2xxx megaraid_sas megaraid_mbox megaraid_mm megaraid aacraid sx8 DAC960 cciss 3w_9xxx 3w_xxxx mptsas mptfc scsi_transport_fc scsi_tgt mptspi mptscsih mptbase atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx gdth advansys initio BusLogic arcmsr aic7xxx aic79xx scsi_transport_spi sg pdc_adma sata_inic162x sata_mv ata_piix sata_qstor sata_vsc sata_uli sata_sis sata_sx4 sata_nv sata_via
[  434.608989]  sata_svw sata_sil24 sata_sil sata_promise pata_sl82c105 pata_cs5530 pata_cs5520 pata_via pata_jmicron pata_marvell pata_sis pata_netcell pata_sc1200 pata_pdc202xx_old pata_triflex pata_atiixp pata_opti pata_amd pata_ali pata_it8213 pata_pcmcia pcmcia pcmcia_core pata_ns87415 pata_ns87410 pata_serverworks pata_artop pata_it821x pata_optidma pata_hpt3x2n pata_hpt3x3 pata_hpt37x pata_hpt366 pata_cmd64x pata_efar pata_rz1000 pata_sil680 pata_radisys pata_pdc2027x pata_mpiix ehci_pci ehci_hcd ahci libsas libahci i2c_core usbcore libata usb_common
[  434.611595] CPU: 18 PID: 5121 Comm: txg_sync Tainted: P           O 3.14.25-hardened-r1 #1
[  434.611669] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.0a 07/31/2013
[  434.611740] task: ffff881fdfe2f6d0 ti: ffff881fdfe2fc30 task.ti: ffff881fdfe2fc30
[  434.611813] RIP: 0010:[<ffffffffa0bcab2b>]  [<ffffffffa0bcab2b>] zap_create_claim+0x4b/0x2d0 [zfs]
[  434.611958] RSP: 0000:ffff881e7485fb58  EFLAGS: 00010282
[  434.612026] RAX: 000000000000001d RBX: ffff881fbb7fd000 RCX: 000000000000001e
[  434.612097] RDX: 000000000000001d RSI: 000000000000001c RDI: ffff881fbb7fd000
[  434.612168] RBP: ffff881e7485fbc8 R08: 0000000000000000 R09: 0000000000000002
[  434.612240] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000001c
[  434.612312] R13: 000000000000001d R14: ffff881fbd79e000 R15: 0000000000000000
[  434.612383] FS:  0000000000000000(0000) GS:ffff88207f440000(0000) knlGS:0000000000000000
[  434.612456] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  434.612525] CR2: 0000000000000018 CR3: 000000000159d000 CR4: 00000000001607f0
[  434.612596] Stack:
[  434.612659]  ffff881c882e2d48 0000000000000246 ffff881c57d3f018 ffff881c882e2d48
[  434.612919]  0000000000000001 ffff881fa77a70c0 ffff881e7485fb98 ffffffff8158a269
[  434.613180]  ffff881e7485fbb8 ffff881fbd79e000 0000000000000000 ffff881fa77a70c0
[  434.613439] Call Trace:
[  434.613508]  [<ffffffff8158a269>] ? mutex_unlock+0x9/0x10
[  434.613585]  [<ffffffffa0bcb1ab>] spa_feature_decr+0x4b/0xc0 [zfs]
[  434.613663]  [<ffffffffa0b69542>] ? bptree_is_empty+0x82/0x90 [zfs]
[  434.613744]  [<ffffffffa0b944da>] dsl_scan_sync+0x7da/0xa20 [zfs]
[  434.613818]  [<ffffffffa09cae7f>] ? spl_kmem_cache_free+0x11f/0x180 [spl]
[  434.613896]  [<ffffffffa0bfbd71>] ? zio_wait+0x151/0x1f0 [zfs]
[  434.613977]  [<ffffffffa0ba4ec7>] spa_sync+0x4e7/0xb50 [zfs]
[  434.614049]  [<ffffffff8107f6b1>] ? autoremove_wake_function+0x11/0x40
[  434.614121]  [<ffffffff8107ef95>] ? __wake_up_common+0x55/0x90
[  434.614194]  [<ffffffff81098597>] ? ktime_get_ts+0x47/0xe0
[  434.614274]  [<ffffffffa0bb52cc>] txg_init+0x53c/0xab0 [zfs]
[  434.614352]  [<ffffffff81078389>] ? enqueue_task_fair+0x1e9/0x520
[  434.614437]  [<ffffffffa0bb4f80>] ? txg_init+0x1f0/0xab0 [zfs]
[  434.614515]  [<ffffffffa09cd373>] __thread_create+0x1c3/0x1e0 [spl]
[  434.614594]  [<ffffffffa09cd300>] ? __thread_create+0x150/0x1e0 [spl]
[  434.614675]  [<ffffffff810643d4>] kthread+0xc4/0xe0
[  434.614747]  [<ffffffff81064310>] ? kthread_freezable_should_stop+0x60/0x60
[  434.614822]  [<ffffffff8158c2f4>] ret_from_fork+0x74/0xa0
[  434.614892]  [<ffffffff81064310>] ? kthread_freezable_should_stop+0x60/0x60
[  434.614965] Code: 55 48 89 d0 48 89 e5 48 83 ec 70 48 89 5d d8 48 89 fb 4c 89 65 e0 49 89 f4 4c 89 6d e8 49 89 d5 4c 89 7d f8 4d 89 c7 4c 89 75 f0 <41> 8b 70 18 48 89 4d b8 b9 08 00 00 00 49 8b 50 08 44 89 4d ac 
[  434.618150] RIP  [<ffffffffa0bcab2b>] zap_create_claim+0x4b/0x2d0 [zfs]
[  434.618275]  RSP <ffff881e7485fb58>
[  434.618340] CR2: 0000000000000018
[  434.618406] ---[ end trace bac7ffad34254698 ]---
Comment 1 alexander haensch 2014-12-11 10:39:00 UTC
found some people with the same problem:

https://github.com/zfsonlinux/zfs/issues/2946
Comment 2 Richard Yao (RETIRED) gentoo-dev 2014-12-19 18:17:50 UTC
(In reply to alexander haensch from comment #1)
> found some people with the same problem:
> 
> https://github.com/zfsonlinux/zfs/issues/2946

I was travelling when this issue was filed. It does look like a regression might have occurred when I did my backports. It is possible that head is also affected given that the backports were strictly changes from head. I will look into this on the weekend.
Comment 3 J. Roeleveld 2015-01-28 10:32:44 UTC
Any update on this?

I get the same issue on a clean install using:
kernel: gentoo-sources-3.17.7
zfs-0.6.3-r2
zfs-kmod-0.6.3-r1
spl-0.6.3-r1

Steps to reproduce (consistently):

zpool create data raidz2 /dev/xvd[bcde]
zfs create data/volume1
zfs destroy data/volume1
Comment 4 Markus Osterhoff 2015-02-01 09:04:12 UTC
(In reply to J. Roeleveld from comment #3)
> I get the same issue on a clean install using:
> kernel: gentoo-sources-3.17.7
> zfs-0.6.3-r2
> zfs-kmod-0.6.3-r1
> spl-0.6.3-r1
> 
> Steps to reproduce (consistently):
> 
> zpool create data raidz2 /dev/xvd[bcde]
> zfs create data/volume1
> zfs destroy data/volume1

same here; works with 3.16.7-tuxonice with *-0.6.3 for zfs/spl on laptop and 3.14.27-gentoo on PC
Comment 5 Jared B. 2015-02-03 20:43:41 UTC
I've also run into this.  It actually left my system unbootable until I used a recovery disk to downgrade ZFS back to 0.6.3.  This has also been brought up on the upstream tracker:
https://github.com/zfsonlinux/zfs/issues/3019
Comment 6 Ivan Vecera 2015-02-16 16:34:18 UTC
The problem seems to be in the patch 0021-Illumos-4390-I-O-errors-can-corrupt-space-map-when-d.patch from the sys-fs/zfs-kmod-0.6.3-r1 package.
The appropriate ZoL upstream commit uses new style spa_feature_* API but 0.6.3 version provides the old one. @ryao backported this commit so he changed the new API usage to old one but forgot one call of spa_feature_decr in dsl_scan.c.
The function is called as spa_feature_decr(spa, SPA_FEATURE_ASYNC_DESTROY, tx); there but old API defines the function as `void spa_feature_decr(spa_t *spa, zfeature_info_t *feature, dmu_tx_t *tx);`.
SPA_FEATURE_ASYNC_DESTROY is enum with value 0 so the feature parameter is later dereferenced as NULL pointer.

The attached patch should fix this problem.
Comment 7 Ivan Vecera 2015-02-16 16:35:21 UTC
Created attachment 396602 [details, diff]
Proposed patch that fixes the issue
Comment 8 Ian 2015-02-22 07:48:34 UTC
I had this problem too, but unfortunately I found it when trying out zfs for the first time (wow is this zfs stupidly slow!?). I don't know if anyone else had this problem, but my zfs got into a state where it would crash the kernel on boot (even when removed from boot runlevel -- not mounting).

The attached patch here worked just fine for me. Not important data so I intend to continue with this setup until an official update to the ebuild. Good thing I finally checked dmesg; turns out zfs isn't as slow as I feared once it wasn't crashing anymore.

versions:
Kernel=3.14.14-gentoo
zfs-kmod=0.6.3-r1 - patched with attached file in this bug
Comment 9 Sod off! I am no loger here! 2015-02-22 14:49:41 UTC
I had same problem on my server (kernel 3.17.7, ZFS 0.6.3-r1). Kernel crashed on boot for the zpool where I did some zfs destroy's.

I tried booting with a freeBSD live CD and performed the zfs destroy commands I needed. I cannot recall if I did a scrub from freeBSD or in linux on next boot but after the freeBSD boot the pool worked fine in Linux as well and kernel did not hang. 


Looking forward towards an updated ebuild. 


BR
Erik
Comment 10 Azamat H. Hackimov 2015-03-26 10:49:55 UTC
Hello. I got same problem with hanging IO in system. Every time I had to hard reset system to revive it. At end I got destroyed unrecoverable RAID5 with faulted ZFS on it.
So I had to update ZFS packages to 0.6.3.1.3 (0.6.3-1.3 tag on github) version from local overlay, recreated ZFS pool and since then no issues so far.
Comment 11 Richard Yao (RETIRED) gentoo-dev 2016-01-16 20:01:41 UTC
I am going through old bug reports and regret to see that I dropped the ball on this. It was resolved in 0.6.4 which was committed on 27 Apr 2015.