Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 746539

Summary: >=net-wireless/iwd-1.8-r3: crash in dmesg at boot
Product: Gentoo Linux Reporter: m1027 <m1027>
Component: KeywordingAssignee: Ben Kohler <bkohler>
Status: RESOLVED FIXED    
Severity: normal CC: balint, bkohler, cruzki123, desgranges.arnaud, miklosh, sam
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
See Also: https://bugzilla.kernel.org/show_bug.cgi?id=208599
Whiteboard:
Package list:
Runtime testing required: ---
Attachments:
Description Flags
emerge --info
none
kernel patch none

Description m1027 2020-10-04 13:16:28 UTC
iwd-1.7 has been removed from portage and did not have this issue.

All remaining builds (iwd-1.8 and iwd-1.9) have the following issue according to dmesg at boot. iwd seems to work afterwards, though.

[   16.086329] ------------[ cut here ]------------
[   16.086333] WARNING: CPU: 0 PID: 370 at net/wireless/nl80211.c:7284 nl80211_get_reg_do+0x1fc/0x230
[   16.086334] CPU: 0 PID: 370 Comm: iwd Not tainted 5.8.13 #15
[   16.086335] Hardware name: LENOVO 20KFCTO1WW/20KFCTO1WW, BIOS N20ET55W (1.40 ) 06/01/2020
[   16.086336] RIP: 0010:nl80211_get_reg_do+0x1fc/0x230
[   16.086338] Code: 00 00 00 48 89 ef e8 13 ff 85 ff 85 c0 0f 84 01 ff ff ff eb a6 48 89 ef 48 89 04 24 e8 4d ff e0 ff 48 8b 04 24 e9 43 ff ff ff <0f> 0b 48 89 ef e8 3a ff e0 ff b8 ea ff ff ff e9 2f ff ff ff e9 78
[   16.086338] RSP: 0018:ffff9ab9c041bb98 EFLAGS: 00010202
[   16.086339] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[   16.086340] RDX: ffff95472b7b0008 RSI: 0000000000000000 RDI: ffff95472b7b02e0
[   16.086340] RBP: ffff9547264c1d00 R08: 0000000000000004 R09: ffff954728585014
[   16.086340] R10: ffff954728581000 R11: 0000000000000001 R12: ffff9ab9c041bbf0
[   16.086341] R13: 0000000000000000 R14: ffff954728585014 R15: ffff95472b7b02e0
[   16.086341] FS:  00007f346c24e740(0000) GS:ffff95472e400000(0000) knlGS:0000000000000000
[   16.086342] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   16.086342] CR2: 000055f2e8fc3010 CR3: 00000004222c2002 CR4: 00000000003606f0
[   16.086343] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   16.086343] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   16.086344] Call Trace:
[   16.086346]  genl_rcv_msg+0x1ae/0x2f0
[   16.086348]  ? genl_family_rcv_msg_attrs_parse.isra.0+0xd0/0xd0
[   16.086349]  netlink_rcv_skb+0x46/0x110
[   16.086350]  genl_rcv+0x1f/0x30
[   16.086351]  netlink_unicast+0x197/0x230
[   16.086352]  netlink_sendmsg+0x1ed/0x400
[   16.086353]  __sys_sendto+0x1d3/0x1f0
[   16.086355]  __x64_sys_sendto+0x21/0x30
[   16.086356]  do_syscall_64+0x4d/0x1d0
[   16.086358]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   16.086360] RIP: 0033:0x7f346c3d862c
[   16.086361] Code: 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 21 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 6c e9 91 0c 04 00 0f 1f 80 00 00 00 00 55 48
[   16.086361] RSP: 002b:00007fff38eb6978 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[   16.086362] RAX: ffffffffffffffda RBX: 000055f2e8faab00 RCX: 00007f346c3d862c
[   16.086362] RDX: 000000000000001c RSI: 000055f2e8fc2ac0 RDI: 0000000000000004
[   16.086363] RBP: 000055f2e8fc1910 R08: 0000000000000000 R09: 0000000000000000
[   16.086363] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff38eb6a08
[   16.086364] R13: 000055f2e8fb4910 R14: 000055f2e8fb4790 R15: 0000000000000000
[   16.086364] ---[ end trace a7bf7b4f3a41fe0a ]---
Comment 1 Ben Kohler gentoo-dev 2020-10-04 14:29:05 UTC
I'm actually seeing this too, but I've been on iwd-1.9 for a while, I think this may be due to kernel 5.8 but I haven't had time to test yet.  Can you try a different kernel series?
Comment 2 Ben Kohler gentoo-dev 2020-10-04 15:33:24 UTC
Hmm I'm still seeing this on gentoo-kernel-bin-5.4.69... are you using iwlwifi or some other driver?
Comment 3 Ben Kohler gentoo-dev 2020-10-04 16:03:29 UTC
I also have noticed that if I start iwd manually after boot is complete, there is no issue.  Could this be a problem with timing and maybe udev module loading?

Do you boot via openrc or systemd?
Comment 4 m1027 2020-10-04 16:19:34 UTC
Thanks for the ultra fast support.

Yes, I am using systemd. Indeed the system boots quite fast. Depending on varying external circumstances (typing decryption password) the crash happens within the first 10-16 seconds at boot.

Udev: I have only two rules in /etc/udev/rules.d which should apply only when those two usb devices are present which is not the case normally.

Kernel: I could test against the coming 5.9. But as you state, this issue is present since some older releases already.

Kernel options: I am using CONFIG_IWLWIFI, CONFIG_IWLMVM, and have the ucode compiled-in.

Hm...
Comment 5 m1027 2020-10-04 19:00:22 UTC
Yes, and I can confirm that this misbehaviour does not occur when iwd.service is started manually after boot.
Comment 6 Ben Kohler gentoo-dev 2020-10-05 15:26:08 UTC
I bisected this in iwd and the issue shows up after this commit:
https://git.kernel.org/pub/scm/network/wireless/iwd.git/commit/?id=b43e915b989dcbb0fa763fb7f256e30fe7426f14

But I believe it's just exposing a kernel bug.  Is anyone able to test on a kernel from last year?
Comment 7 m1027 2020-10-05 15:47:09 UTC
I did at least this quick test:

Kernel compiled without CONFIG_CFG80211_CRDA_SUPPORT, iwd compiled including
crda support.

Same issue.
Comment 8 Ben Kohler gentoo-dev 2020-10-05 15:48:49 UTC
Yeah and something else I tested was with CFG80211=y (built-in, with regulatory.db & .db.p7s built-in) as well, to get it initialized earlier.  No change.
Comment 9 Ben Kohler gentoo-dev 2020-10-20 11:50:19 UTC
Still happening on 5.9.1, someone probably needs to report this to (kernel) upstream.  In the meantime I do not see any negative impacts of this crash so I'm not strongly motivated to work on it right now.
Comment 10 m1027 2020-10-20 16:19:57 UTC
(In reply to Ben Kohler from comment #9)
> Still happening on 5.9.1, someone probably needs to report this to (kernel)
> upstream.  In the meantime I do not see any negative impacts of this crash
> so I'm not strongly motivated to work on it right now.

Okay, I've put this issue on iwd@lists.01.org just now and linked it here.
Subject: >=iwd-1.8: crash in dmesg at boot
It should be public in a few.

Hope that's the right place for a start.
Comment 11 Ben Kohler gentoo-dev 2020-10-20 16:21:45 UTC
Maybe you will get a good response there, but I have already talked to iwd devs via IRC and they are pointing me to the kernel (iow, file at bugzilla.kernel.org)
Comment 12 m1027 2020-10-20 18:27:14 UTC
Follow this if you wish:

https://lists.01.org/hyperkitty/list/iwd@lists.01.org/thread/PSQEBUVXJLMR7TB2DDVY2R6JNXYIQLSD/

In case it turns out that someone needs to forward it further I could do so.
Comment 13 m1027 2020-10-21 10:29:07 UTC
I've found a bug on kernel.org which seems to be it, and commented there.

https://bugzilla.kernel.org/show_bug.cgi?id=208599
Comment 14 Arnaud Desgranges 2020-10-24 13:23:11 UTC
Created attachment 668300 [details]
emerge --info
Comment 15 Arnaud Desgranges 2020-10-24 13:24:34 UTC
I can confirm this bug, with either sys-kernel/gentoo-sources-5.4.72 or vanilla 5.6.19. I use a wireless intel device Intel(R) Dual Band Wireless AC 8265, REV=0x230(iwlwifi). My computer is a IntelĀ® NUC NUC8i7HVK. I use iwd as a backend for networkmanager. It crashes at boot started with systemd.
Comment 16 m1027 2020-12-02 16:10:03 UTC
FYI: Still an issue after upgrading iwd from 1.9 to 1.10.
Comment 17 Ben Kohler gentoo-dev 2020-12-02 16:13:32 UTC
This isn't something that will get fixed in iwd, it's correct behavior in iwd exposing a kernel bug.  But it doesn't look like there's been any action on the kernel bug report, unfortunately.
Comment 18 Balint SZENTE 2020-12-08 14:28:34 UTC
(In reply to m1027 from comment #13)
> I've found a bug on kernel.org which seems to be it, and commented there.
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=208599

I believe it is the same issue as:
https://bugzilla.kernel.org/show_bug.cgi?id=205005
Comment 19 m1027 2020-12-08 14:40:34 UTC
(In reply to Balint SZENTE from comment #18)
> 
> I believe it is the same issue as:
> https://bugzilla.kernel.org/show_bug.cgi?id=205005

FYI: My Fn + WIFI key do actually work well, I receive proper journal messages
and get wifi toggled on and off. However, whatever state I leave it before
rebooting, I get the mentioned kernel warning. So, I don't believe your link
to the other bug report addresses the same thing.
Comment 20 Balint SZENTE 2020-12-08 14:54:45 UTC
(In reply to m1027 from comment #19)
> (In reply to Balint SZENTE from comment #18)
> > 
> > I believe it is the same issue as:
> > https://bugzilla.kernel.org/show_bug.cgi?id=205005
> 
> FYI: My Fn + WIFI key do actually work well, I receive proper journal
> messages
> and get wifi toggled on and off. However, whatever state I leave it before
> rebooting, I get the mentioned kernel warning. So, I don't believe your link
> to the other bug report addresses the same thing.

That Fn+WIFI key thing is just a sidetrack there (it was a guess of the author - but has nothing to do with the issue, see his "comment 2"). The callstack is exactly the same, but with triggered by wpa_supplant.

This is the issue in all cases (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/wireless/nl80211.c?h=v5.10-rc7#n7574):

	/* a self-managed-reg device must have a private regdom */
	if (WARN_ON(!regdom && self_managed)) {
		nlmsg_free(msg);
		return -EINVAL;
	}

I do believe it is the same "root cause".
Comment 21 Balint SZENTE 2020-12-08 15:39:57 UTC
I have also an "extra" warning in net/wireless/sme.c on my system:

[    9.805107] ------------[ cut here ]------------
[    9.805111] WARNING: CPU: 0 PID: 1453 at net/wireless/nl80211.c:7301 nl80211_get_reg_do+0x1ec/0x220
[    9.805112] CPU: 0 PID: 1453 Comm: iwd Not tainted 5.9.12-gentoo #1
[    9.805113] Hardware name: Dell Inc. Latitude 7490/0KP0FT, BIOS 1.16.0 07/13/2020
[    9.805114] RIP: 0010:nl80211_get_reg_do+0x1ec/0x220
[    9.805115] Code: 24 0c 01 00 00 00 e8 a3 51 a5 ff 85 c0 0f 84 01 ff ff ff eb a6 48 89 ef 48 89 04 24 e8 3d bb e9 ff 48 8b 04 24 e9 43 ff ff ff <0f> 0b 48 89 ef e8 2a bb e9 ff b8 ea ff ff ff e9 2f ff ff ff e9 78
[    9.805116] RSP: 0018:ffffb79a80ec7c18 EFLAGS: 00010202
[    9.805116] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[    9.805117] RDX: ffff95b0eab68008 RSI: 0000000000000000 RDI: ffff95b0eab682e0
[    9.805117] RBP: ffff95b0ebfc8900 R08: 0000000000000004 R09: ffff95b0ea0a1014
[    9.805118] R10: 0000000000000016 R11: 0000000000000001 R12: ffffb79a80ec7c70
[    9.805118] R13: 0000000000000000 R14: ffff95b0ea0a1014 R15: ffff95b0eab682e0
[    9.805119] FS:  00007f3e58eb6740(0000) GS:ffff95b0ee400000(0000) knlGS:0000000000000000
[    9.805119] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    9.805120] CR2: 00007f3e59041b10 CR3: 00000004670a4004 CR4: 00000000003706f0
[    9.805120] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    9.805121] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    9.805121] Call Trace:
[    9.805124]  ? _cond_resched+0x10/0x20
[    9.805126]  genl_rcv_msg+0x19f/0x300
[    9.805140]  ? genl_family_rcv_msg_attrs_parse.isra.0+0xd0/0xd0
[    9.805141]  netlink_rcv_skb+0x44/0x110
[    9.805142]  genl_rcv+0x1f/0x30
[    9.805143]  netlink_unicast+0x18c/0x230
[    9.805144]  netlink_sendmsg+0x219/0x430
[    9.805146]  __sys_sendto+0x17a/0x190
[    9.805147]  ? __sys_recvmsg+0x51/0xa0
[    9.805149]  ? do_epoll_wait+0xab/0xd0
[    9.805150]  __x64_sys_sendto+0x20/0x30
[    9.805150]  do_syscall_64+0x33/0x40
[    9.805152]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    9.805153] RIP: 0033:0x7f3e58fb36dc
[    9.805154] Code: c0 ff ff ff ff eb bc 0f 1f 80 00 00 00 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 19 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 64 c3 0f 1f 00 55 48 83 ec 20 48 89 54 24 10
[    9.805154] RSP: 002b:00007ffe93006e38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[    9.805155] RAX: ffffffffffffffda RBX: 0000556a61fca870 RCX: 00007f3e58fb36dc
[    9.805155] RDX: 000000000000001c RSI: 0000556a61fded60 RDI: 0000000000000004
[    9.805155] RBP: 0000556a61fd5bf0 R08: 0000000000000000 R09: 0000000000000000
[    9.805156] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe93006e90
[    9.805156] R13: 00007ffe93006e8c R14: 0000556a61fd4500 R15: 0000000000000000
[    9.805157] ---[ end trace a34f283c1281abcf ]---
[   10.379331] wlan0: authenticate with 8c:3b:ad:de:93:44
[   10.382435] wlan0: send auth to 8c:3b:ad:de:93:44 (try 1/3)
[   10.389599] wlan0: authenticated
[   10.390439] ------------[ cut here ]------------
[   10.390443] WARNING: CPU: 6 PID: 1453 at net/wireless/sme.c:533 cfg80211_connect+0x59f/0x6b0
[   10.390445] CPU: 6 PID: 1453 Comm: iwd Tainted: G        W         5.9.12-gentoo #1
[   10.390445] Hardware name: Dell Inc. Latitude 7490/0KP0FT, BIOS 1.16.0 07/13/2020
[   10.390447] RIP: 0010:cfg80211_connect+0x59f/0x6b0
[   10.390448] Code: 83 e7 f8 48 89 0a 48 8b 4c 06 f8 48 89 4c 02 f8 48 29 fa 8d 0c 10 48 29 d6 89 c8 c1 e8 03 89 c1 f3 48 a5 e9 51 ff ff ff 0f 0b <0f> 0b b8 8d ff ff ff e9 21 fd ff ff 0f 0b 48 89 44 24 20 45 31 c9
[   10.390449] RSP: 0018:ffffb79a80ec7a20 EFLAGS: 00010286
[   10.390450] RAX: 0000000000000000 RBX: ffff95b0ea16e850 RCX: 0000000000000000
[   10.390450] RDX: ffffffff9518fce0 RSI: ffff95b0e4147a3a RDI: ffff95b0ea16e8c6
[   10.390450] RBP: 0000000000000000 R08: 000000000000006f R09: ffff95b0eab68420
[   10.390451] R10: ffff95b0eab68400 R11: 000000000000002a R12: ffffb79a80ec7ab8
[   10.390451] R13: ffff95b0ea16e8c0 R14: ffff95b0eab68000 R15: ffff95b0ea16e000
[   10.390452] FS:  00007f3e58eb6740(0000) GS:ffff95b0ee580000(0000) knlGS:0000000000000000
[   10.390452] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.390453] CR2: 00007f3e59118000 CR3: 00000004670a4001 CR4: 00000000003706e0
[   10.390453] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   10.390454] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   10.390454] Call Trace:
[   10.390456]  nl80211_connect+0x5bd/0x870
[   10.390458]  genl_rcv_msg+0x19f/0x300
[   10.390460]  ? genl_family_rcv_msg_attrs_parse.isra.0+0xd0/0xd0
[   10.390461]  netlink_rcv_skb+0x44/0x110
[   10.390462]  genl_rcv+0x1f/0x30
[   10.390464]  netlink_unicast+0x18c/0x230
[   10.390465]  netlink_sendmsg+0x219/0x430
[   10.390467]  __sys_sendto+0x17a/0x190
[   10.390468]  ? vfs_read+0x13d/0x170
[   10.390469]  __x64_sys_sendto+0x20/0x30
[   10.390471]  do_syscall_64+0x33/0x40
[   10.390472]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   10.390473] RIP: 0033:0x7f3e58fb36dc
[   10.390474] Code: c0 ff ff ff ff eb bc 0f 1f 80 00 00 00 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 19 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 64 c3 0f 1f 00 55 48 83 ec 20 48 89 54 24 10
[   10.390474] RSP: 002b:00007ffe93006e38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[   10.390475] RAX: ffffffffffffffda RBX: 0000556a61fca870 RCX: 00007f3e58fb36dc
[   10.390476] RDX: 000000000000009c RSI: 0000556a61fe4880 RDI: 0000000000000004
[   10.390476] RBP: 0000556a61fdd2c0 R08: 0000000000000000 R09: 0000000000000000
[   10.390476] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe93006ea8
[   10.390477] R13: 00007ffe93006ea4 R14: 0000556a61fd4500 R15: 0000000000000000
[   10.390477] ---[ end trace a34f283c1281abd0 ]---
[   10.390500] ------------[ cut here ]------------
[   10.390502] WARNING: CPU: 6 PID: 1453 at net/wireless/sme.c:533 cfg80211_connect+0x59f/0x6b0
[   10.390503] CPU: 6 PID: 1453 Comm: iwd Tainted: G        W         5.9.12-gentoo #1
[   10.390503] Hardware name: Dell Inc. Latitude 7490/0KP0FT, BIOS 1.16.0 07/13/2020
[   10.390504] RIP: 0010:cfg80211_connect+0x59f/0x6b0
[   10.390505] Code: 83 e7 f8 48 89 0a 48 8b 4c 06 f8 48 89 4c 02 f8 48 29 fa 8d 0c 10 48 29 d6 89 c8 c1 e8 03 89 c1 f3 48 a5 e9 51 ff ff ff 0f 0b <0f> 0b b8 8d ff ff ff e9 21 fd ff ff 0f 0b 48 89 44 24 20 45 31 c9
[   10.390505] RSP: 0018:ffffb79a80ec7a20 EFLAGS: 00010286
[   10.390506] RAX: 0000000000000000 RBX: ffff95b0ea16e850 RCX: 0000000000000000
[   10.390506] RDX: ffffffff9518fce0 RSI: ffff95b0e414703a RDI: ffff95b0ea16e8c6
[   10.390507] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff95b0eab68420
[   10.390507] R10: ffff95b0eab683f8 R11: 000000000000002a R12: ffffb79a80ec7ab8
[   10.390507] R13: ffff95b0ea16e8c0 R14: ffff95b0eab68000 R15: ffff95b0ea16e000
[   10.390508] FS:  00007f3e58eb6740(0000) GS:ffff95b0ee580000(0000) knlGS:0000000000000000
[   10.390508] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.390509] CR2: 00007f3e59118000 CR3: 00000004670a4001 CR4: 00000000003706e0
[   10.390509] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   10.390510] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   10.390510] Call Trace:
[   10.390511]  nl80211_connect+0x5bd/0x870
[   10.390513]  genl_rcv_msg+0x19f/0x300
[   10.390515]  ? genl_family_rcv_msg_attrs_parse.isra.0+0xd0/0xd0
[   10.390516]  netlink_rcv_skb+0x44/0x110
[   10.390518]  genl_rcv+0x1f/0x30
[   10.390519]  netlink_unicast+0x18c/0x230
[   10.390520]  netlink_sendmsg+0x219/0x430
[   10.390522]  __sys_sendto+0x17a/0x190
[   10.390523]  ? do_epoll_wait+0xab/0xd0
[   10.390524]  __x64_sys_sendto+0x20/0x30
[   10.390526]  do_syscall_64+0x33/0x40
[   10.390532]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   10.390532] RIP: 0033:0x7f3e58fb36dc
[   10.390533] Code: c0 ff ff ff ff eb bc 0f 1f 80 00 00 00 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 19 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 64 c3 0f 1f 00 55 48 83 ec 20 48 89 54 24 10
[   10.390534] RSP: 002b:00007ffe93006e38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[   10.390535] RAX: ffffffffffffffda RBX: 0000556a61fca870 RCX: 00007f3e58fb36dc
[   10.390536] RDX: 000000000000009c RSI: 0000556a61fe5f60 RDI: 0000000000000004
[   10.390536] RBP: 0000556a61fe5b40 R08: 0000000000000000 R09: 0000000000000000
[   10.390537] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe93006e90
[   10.390537] R13: 00007ffe93006e8c R14: 0000556a61fd4500 R15: 0000000000000000
[   10.390538] ---[ end trace a34f283c1281abd1 ]---

I have module support disabled, everything is built into kernel:

CONFIG_CMDLINE="iwlwifi.enable_ini=0"
CONFIG_FW_LOADER=y
CONFIG_EXTRA_FIRMWARE="intel-ucode/06-8e-0a i915/kbl_dmc_ver1_04.bin iwlwifi-8265-36.ucode regulatory.db regulatory.db.p7s"
CONFIG_EXTRA_FIRMWARE_DIR="/lib/firmware"
CONFIG_CFG80211_CRDA_SUPPORT=y
CONFIG_CFG80211=y
CONFIG_CFG80211_REQUIRE_SIGNED_REGDB=y
CONFIG_CFG80211_USE_KERNEL_REGDB_KEYS=y
CONFIG_CFG80211_DEFAULT_PS=y
CONFIG_CFG80211_CRDA_SUPPORT=y
CONFIG_CFG80211_WEXT=y
CONFIG_MAC80211=y
CONFIG_MAC80211_HAS_RC=y
CONFIG_MAC80211_RC_MINSTREL=y
CONFIG_MAC80211_RC_DEFAULT_MINSTREL=y
CONFIG_MAC80211_RC_DEFAULT="minstrel_ht"
CONFIG_MAC80211_LEDS=y
CONFIG_MAC80211_STA_HASH_MAX_SIZE=0
CONFIG_IWLWIFI=y
CONFIG_IWLWIFI_LEDS=y
CONFIG_IWLMVM=y

I have also a Lenovo ideapad 510-14ISK, with exactly the same issue.
Comment 22 cruzki 2020-12-16 16:19:10 UTC
I have this bug also since I update to 5.10. The key point is that I randomly end without wifi when the bug happen (but only on 5.10, not in 5.9). Have someone experience the same problem? Could be a different problem? Could be a configuration issue?
Comment 23 m1027 2020-12-17 11:37:31 UTC
(In reply to cruzki from comment #22)
> I have this bug also since I update to 5.10. The key point is that I
> randomly end without wifi when the bug happen (but only on 5.10, not in
> 5.9). Have someone experience the same problem? Could be a different
> problem? Could be a configuration issue?

FYI: No such additional issues here with 5.10.1. Just the original ones.
Comment 24 Balint SZENTE 2020-12-18 17:58:00 UTC
Upgrading to gentoo-sources-5.10.1 made no difference to me. The kernel log and the iwd crash at startup is the same.
Comment 25 miklosh 2020-12-30 23:34:48 UTC
I can confirm that the same bug is happening to me.
I use =openrc-0.42.1, =iwd-1.9-r1, =sys-kernel/gentoo-sources-5.10.3.


[    6.277400] ------------[ cut here ]------------
[    6.277406] WARNING: CPU: 7 PID: 3572 at net/wireless/nl80211.c:7575 nl80211_get_reg_do+0x1f1/0x220
[    6.277407] Modules linked in: algif_aead bnep ecb md4 fuse pkcs8_key_parser iwlmvm btusb btrtl btbcm btintel crc32c_intel bluetooth iwlwifi ecdh_generic ecc snd_hda_codec_hdmi evdev efivarfs
[    6.277417] CPU: 7 PID: 3572 Comm: iwd Tainted: G                T 5.10.3-gentoo #3
[    6.277418] Hardware name: Gigabyte Technology Co., Ltd. B550 AORUS MASTER/B550 AORUS MASTER, BIOS F11p 12/18/2020
[    6.277420] RIP: 0010:nl80211_get_reg_do+0x1f1/0x220
[    6.277421] Code: 24 0c 01 00 00 00 e8 4e fc 9b ff 85 c0 0f 84 fc fe ff ff eb a6 48 89 ef 48 89 04 24 e8 c8 be ef ff 48 8b 04 24 e9 43 ff ff ff <0f> 0b 48 89 ef e8 b5 be ef ff b8 ea ff ff ff e9 2f ff ff ff b8 97
[    6.277422] RSP: 0018:ffff9d1d0126bb68 EFLAGS: 00010202
[    6.277424] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[    6.277425] RDX: ffff9b350ed98008 RSI: 0000000000000000 RDI: ffff9b350ed98300
[    6.277425] RBP: ffff9b3509870700 R08: 0000000000000004 R09: ffff9b35050ea014
[    6.277426] R10: 0000000000000016 R11: 0000000000000000 R12: ffff9d1d0126bbc8
[    6.277427] R13: 0000000000000000 R14: ffff9b35050ea014 R15: ffff9b350ed98300
[    6.277428] FS:  00007f186d5ac740(0000) GS:ffff9b3c1ebc0000(0000) knlGS:0000000000000000
[    6.277428] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.277429] CR2: 000055e0fcfb0000 CR3: 0000000109976000 CR4: 0000000000350ee0
[    6.277430] Call Trace:
[    6.277434]  ? _cond_resched+0x10/0x20
[    6.277437]  genl_family_rcv_msg_doit.isra.0+0xea/0x150
[    6.277438]  genl_rcv_msg+0xdb/0x1e0
[    6.277440]  ? __cfg80211_rdev_from_attrs+0x1b0/0x1b0
[    6.277442]  ? nl80211_send_regdom.constprop.0+0x1a0/0x1a0
[    6.277443]  ? genl_family_rcv_msg_doit.isra.0+0x150/0x150
[    6.277445]  netlink_rcv_skb+0x4b/0xf0
[    6.277446]  genl_rcv+0x1f/0x30
[    6.277448]  netlink_unicast+0x186/0x220
[    6.277450]  netlink_sendmsg+0x22f/0x460
[    6.277453]  __sys_sendto+0x18b/0x1a0
[    6.277455]  ? __sys_recvmsg+0x6a/0xb0
[    6.277456]  __x64_sys_sendto+0x20/0x30
[    6.277458]  do_syscall_64+0x33/0x40
[    6.277460]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[    6.277461] RIP: 0033:0x7f186d6ad3dc
[    6.277462] Code: c0 ff ff ff ff eb bc 0f 1f 80 00 00 00 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 21 45 31 c9 45 31 c0 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 6c c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48
[    6.277463] RSP: 002b:00007fffe1f8e0f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[    6.277464] RAX: ffffffffffffffda RBX: 000055e0fcf9a870 RCX: 00007f186d6ad3dc
[    6.277465] RDX: 000000000000001c RSI: 000055e0fcfaf190 RDI: 0000000000000004
[    6.277465] RBP: 000055e0fcfa4450 R08: 0000000000000000 R09: 0000000000000000
[    6.277466] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffe1f8e14c
[    6.277467] R13: 000055e0fcfa4500 R14: 000055e0fbfee37d R15: 0000000000000000
[    6.277468] ---[ end trace 3467a46600bad98d ]---
Comment 26 Ben Kohler gentoo-dev 2021-01-22 18:11:14 UTC
There is at least a bit of new info on the upstream kernel bug report:

"I've been trying to track down the cause of this warning for a few days now. I think it's a combination of changes to both the kernel and iwd. The warning seems to arise because when iwd is launched, it tries to get regulatory domain information from the kernel. If the adapter (phy0) has not been set up at this point, the warning is produced. In my case, at this point in the boot wlan0 exists, but the associated phy0 does not. For now, i "fix" that situation by inserting "ip link set wlan0 up" into the init script at the point immediately before iwd is launched. Bringing up the WLAN appears to set up the the missing phy0 and the warning is no longer produced. This explains why the warning is seen during boot, but does not appear if the network is manually restarted after login."
Comment 27 Balint SZENTE 2021-02-05 15:51:51 UTC
(In reply to Ben Kohler from comment #26)
> There is at least a bit of new info on the upstream kernel bug report:
> 
> [...] For now, i
> "fix" that situation by inserting "ip link set wlan0 up" into the init
> script at the point immediately before iwd is launched. [...]

In my case adding

start_pre() {
        ip link set wlan0 up
}

to /etc/init.d/iwd did not fixed the issue. iwd-1.10 crashes the same way. I even tried to add the command to the initramfs init script, but to no avail. I'm on gentoo-sources-5.10.13 and not using the predictable if names.

Can anybody confirm the `ip link` workaround?
Comment 28 cruzki 2021-02-06 18:43:26 UTC
In my case, adding this line have remove the crash but now it expose a different one [I think]. When I try to connect to an eduroam network, iwd crash with the following trace.

[    5.902043] udevd[1778]: Error changing net interface name wlan0 to wlp1s0: Device or resource busy
[    5.902057] udevd[1778]: could not rename interface '4' from 'wlan0' to 'wlp1s0': Device or resource busy
[    7.781103] elogind-daemon[2960]: New seat seat0.
[    7.783985] elogind-daemon[2960]: Watching system buttons on /dev/input/event2 (Power Button)
[    7.810758] elogind-daemon[2960]: Watching system buttons on /dev/input/event0 (Lid Switch)
[    7.810923] elogind-daemon[2960]: Watching system buttons on /dev/input/event1 (Sleep Button)
[    8.193284] wlan0: authenticate with ac:a3:1e:c7:02:b0
[    8.196703] wlan0: send auth to ac:a3:1e:c7:02:b0 (try 1/3)
[    8.197725] wlan0: authenticated
[    8.200340] wlan0: associate with ac:a3:1e:c7:02:b0 (try 1/3)
[    8.202211] wlan0: RX AssocResp from ac:a3:1e:c7:02:b0 (capab=0x11 status=0 aid=1)
[    8.203948] wlan0: associated
[    8.240846] BUG: kernel NULL pointer dereference, address: 0000000000000000
[    8.240855] #PF: supervisor read access in kernel mode
[    8.240861] #PF: error_code(0x0000) - not-present page
[    8.240865] PGD 0 P4D 0 
[    8.240878] Oops: 0000 [#1] SMP
[    8.240887] CPU: 3 PID: 2613 Comm: iwd Not tainted 5.10.6-gentoo #1
[    8.240893] Hardware name: Acer TravelMate P648-M/BAD40_SL          , BIOS V1.14 09/07/2016
[    8.240898] RIP: 0010:0xffffffffb5343251
[    8.240906] Code: 04 49 8b 44 24 10 44 89 c2 0f 85 ac 01 00 00 ff 50 d0 85 c0 0f 85 68 01 00 00 48 8b 75 30 48 c7 c7 a0 e3 19 b6 b9 04 00 00 00 <f3> a6 0f 97 c0 1c 00 84 c0 75 0b 8b 45 50 85 c0 0f 85 a4 01 00 00
[    8.240913] RSP: 0018:ffffb0d08197fd58 EFLAGS: 00010246
[    8.240921] RAX: 0000000000000000 RBX: ffffa043d023bfc0 RCX: 0000000000000004
[    8.240926] RDX: 3c412f9496d1d76d RSI: 0000000000000000 RDI: ffffffffb619e3a0
[    8.240931] RBP: ffffb0d08197fe80 R08: 0000000000000007 R09: fffffffffffffff8
[    8.240935] R10: 0000000000000000 R11: ffffffffb5e3b808 R12: ffffa043d023b340
[    8.240941] R13: ffffa043d0cb1900 R14: ffffa043f273f800 R15: 000000000000010e
[    8.240947] FS:  00007fb17f454740(0000) GS:ffffa04674580000(0000) knlGS:0000000000000000
[    8.240954] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.240959] CR2: 0000000000000000 CR3: 0000000108f6a003 CR4: 00000000001706e0
[    8.240963] Call Trace:
[    8.240970]  ? 0xffffffffb532a7ce
[    8.240974]  ? 0xffffffffb5342e6b
[    8.240979]  ? 0xffffffffb5322f23
[    8.240983]  ? 0xffffffffb5341fa3
[    8.240987]  ? 0xffffffffb53232d0
[    8.240992]  ? 0xffffffffb592497d
[    8.240995]  ? 0xffffffffb5a00068
[    8.241001] CR2: 0000000000000000
[    8.241007] ---[ end trace 7e41832a4fa2826f ]---
[    8.241012] RIP: 0010:0xffffffffb5343251
[    8.241018] Code: 04 49 8b 44 24 10 44 89 c2 0f 85 ac 01 00 00 ff 50 d0 85 c0 0f 85 68 01 00 00 48 8b 75 30 48 c7 c7 a0 e3 19 b6 b9 04 00 00 00 <f3> a6 0f 97 c0 1c 00 84 c0 75 0b 8b 45 50 85 c0 0f 85 a4 01 00 00
[    8.241023] RSP: 0018:ffffb0d08197fd58 EFLAGS: 00010246
[    8.241029] RAX: 0000000000000000 RBX: ffffa043d023bfc0 RCX: 0000000000000004
[    8.241033] RDX: 3c412f9496d1d76d RSI: 0000000000000000 RDI: ffffffffb619e3a0
[    8.241037] RBP: ffffb0d08197fe80 R08: 0000000000000007 R09: fffffffffffffff8
[    8.241041] R10: 0000000000000000 R11: ffffffffb5e3b808 R12: ffffa043d023b340
[    8.241045] R13: ffffa043d0cb1900 R14: ffffa043f273f800 R15: 000000000000010e
[    8.241050] FS:  00007fb17f454740(0000) GS:ffffa04674580000(0000) knlGS:0000000000000000
[    8.241054] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    8.241058] CR2: 0000000000000000 CR3: 0000000108f6a003 CR4: 00000000001706e0
[    8.253902] elogind-daemon[2960]: Watching system buttons on /dev/input/event4 (AT Translated Set 2 keyboard)
[    8.257316] elogind-daemon[2960]: New session c1 of user sddm.
[    8.313702] wlan0: deauthenticating from ac:a3:1e:c7:02:b0 by local choice (Reason: 3=DEAUTH_LEAVING)
[   15.435017] elogind-daemon[2960]: New session c2 of user cruzki.
[   15.492132] elogind-daemon[2960]: Removed session c1.
[   37.850449] usb 1-1: new low-speed USB device number 5 using xhci_hcd
[   37.997593] input: USB Optical Mouse as /devices/pci0000:00/0000:00:14.0/usb1/1-1/1-1:1.0/0003:04B3:310C.0001/input/input18
[   37.997986] hid-generic 0003:04B3:310C.0001: input: USB HID v1.11 Mouse [USB Optical Mouse] on usb-0000:00:14.0-1/input0
[  102.181478] udevd[4797]: Error changing net interface name wlan0 to wlp1s0: Device or resource busy
[  102.181536] udevd[4797]: could not rename interface '5' from 'wlan0' to 'wlp1s0': Device or resource busy
[  104.442234] wlan0: authenticate with ac:a3:1e:c7:02:b0
[  104.445542] wlan0: send auth to ac:a3:1e:c7:02:b0 (try 1/3)
[  104.446502] wlan0: authenticated
[  104.446971] wlan0: associate with ac:a3:1e:c7:02:b0 (try 1/3)
[  104.448027] wlan0: RX AssocResp from ac:a3:1e:c7:02:b0 (capab=0x11 status=0 aid=1)
[  104.449567] wlan0: associated
[  104.482356] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  104.482359] #PF: supervisor read access in kernel mode
[  104.482361] #PF: error_code(0x0000) - not-present page
[  104.482362] PGD 0 P4D 0 
[  104.482366] Oops: 0000 [#2] SMP
[  104.482369] CPU: 1 PID: 4792 Comm: iwd Tainted: G      D           5.10.6-gentoo #1
[  104.482371] Hardware name: Acer TravelMate P648-M/BAD40_SL          , BIOS V1.14 09/07/2016
[  104.482373] RIP: 0010:0xffffffffb5343251
[  104.482375] Code: 04 49 8b 44 24 10 44 89 c2 0f 85 ac 01 00 00 ff 50 d0 85 c0 0f 85 68 01 00 00 48 8b 75 30 48 c7 c7 a0 e3 19 b6 b9 04 00 00 00 <f3> a6 0f 97 c0 1c 00 84 c0 75 0b 8b 45 50 85 c0 0f 85 a4 01 00 00
[  104.482377] RSP: 0018:ffffb0d086a7fd58 EFLAGS: 00010246
[  104.482379] RAX: 0000000000000000 RBX: ffffa0448749b840 RCX: 0000000000000004
[  104.482380] RDX: 3c412f9496d1d76d RSI: 0000000000000000 RDI: ffffffffb619e3a0
[  104.482382] RBP: ffffb0d086a7fe80 R08: 0000000000000007 R09: fffffffffffffff8
[  104.482383] R10: 0000000000000000 R11: ffffffffb5e3b808 R12: ffffa0448749bfc0
[  104.482384] R13: ffffa0443c820500 R14: ffffa043e96fec00 R15: 000000000000010e
[  104.482386] FS:  00007fd4b68ab740(0000) GS:ffffa04674480000(0000) knlGS:0000000000000000
[  104.482388] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  104.482389] CR2: 0000000000000000 CR3: 0000000109fd3006 CR4: 00000000001706e0
[  104.482390] Call Trace:
[  104.482393]  ? 0xffffffffb532a7ce
[  104.482394]  ? 0xffffffffb5342e6b
[  104.482395]  ? 0xffffffffb5322f23
[  104.482396]  ? 0xffffffffb5341fa3
[  104.482397]  ? 0xffffffffb53232d0
[  104.482398]  ? 0xffffffffb592497d
[  104.482399]  ? 0xffffffffb5a00068
[  104.482401] CR2: 0000000000000000
[  104.482403] ---[ end trace 7e41832a4fa28270 ]---
[  104.482404] RIP: 0010:0xffffffffb5343251
[  104.482407] Code: 04 49 8b 44 24 10 44 89 c2 0f 85 ac 01 00 00 ff 50 d0 85 c0 0f 85 68 01 00 00 48 8b 75 30 48 c7 c7 a0 e3 19 b6 b9 04 00 00 00 <f3> a6 0f 97 c0 1c 00 84 c0 75 0b 8b 45 50 85 c0 0f 85 a4 01 00 00
[  104.482408] RSP: 0018:ffffb0d08197fd58 EFLAGS: 00010246
[  104.482410] RAX: 0000000000000000 RBX: ffffa043d023bfc0 RCX: 0000000000000004
[  104.482411] RDX: 3c412f9496d1d76d RSI: 0000000000000000 RDI: ffffffffb619e3a0
[  104.482413] RBP: ffffb0d08197fe80 R08: 0000000000000007 R09: fffffffffffffff8
[  104.482414] R10: 0000000000000000 R11: ffffffffb5e3b808 R12: ffffa043d023b340
[  104.482415] R13: ffffa043d0cb1900 R14: ffffa043f273f800 R15: 000000000000010e
[  104.482417] FS:  00007fd4b68ab740(0000) GS:ffffa04674480000(0000) knlGS:0000000000000000
[  104.482418] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  104.482420] CR2: 0000000000000000 CR3: 0000000109fd3006 CR4: 00000000001706e0
[  104.543691] wlan0: deauthenticating from ac:a3:1e:c7:02:b0 by local choice (Reason: 3=DEAUTH_LEAVING)


After this crash, I cannot reload it. I could connect with a different kernel that only differs in the firmware versions (I think, I could provide both config files) Could be that it makes the firmware crash? In that case, I assume this a different bug, no?
Comment 29 Balint SZENTE 2021-02-10 11:24:53 UTC
(In reply to cruzki from comment #28)
> In my case, adding this line have remove the crash but now it expose a
> different one [I think]. When I try to connect to an eduroam network, iwd
> crash with the following trace.
> 
> [    5.902043] udevd[1778]: Error changing net interface name wlan0 to
> wlp1s0: Device or resource busy
> [    5.902057] udevd[1778]: could not rename interface '4' from 'wlan0' to
> 'wlp1s0': Device or resource busy
> [...]
>
> After this crash, I cannot reload it. I could connect with a different
> kernel that only differs in the firmware versions (I think, I could provide
> both config files) Could be that it makes the firmware crash? In that case,
> I assume this a different bug, no?

Try to disable the predictable network interface names so that udev will not try to rename the interface (that is used probably by `ip` command in that moment).
Comment 30 cruzki 2021-02-12 09:37:53 UTC
> Try to disable the predictable network interface names so that udev will not
> try to rename the interface (that is used probably by `ip` command in that
> moment).

Adding net.ifnames=0 solve the problem for me. Should I report this upstream?
Comment 31 Balint SZENTE 2021-02-23 12:26:52 UTC
(In reply to cruzki from comment #30)
> > Try to disable the predictable network interface names so that udev will not
> > try to rename the interface (that is used probably by `ip` command in that
> > moment).
> 
> Adding net.ifnames=0 solve the problem for me. Should I report this upstream?

I don't think so. This is just a workaround that udev will not try to rename the device kept busy by the `ip` command.

Anyway, for me the `ip link set wlan0 up` workaround is completely unreliable. *Sometimes* the system boots without the crash, but *more often* it boots with the crash. It must be some kind of racing/order issue.
Comment 32 Ben Kohler gentoo-dev 2021-02-23 13:49:03 UTC
Well the comment on the kernel bug report seems to detail the problem pretty well, but nobody is really working on fixing it there.

The workaround mentioned there does work for me, bringing up wlan0 just before iwd is launched.  It's a bit of a kludge though.

For systemd people the change would be like:

--- iwd.service
+++ iwd.service
@@ -23,6 +23,7 @@
 ConfigurationDirectory=iwd
 StateDirectory=iwd
 StateDirectoryMode=0700
+ExecStartPre=ip link set wlan0 up
 
 [Install]
 WantedBy=multi-user.target



OpenRC people could probably put something like this into /etc/conf.d/iwd:

start_pre() {
  ip link set wlan0 up
}


I won't apply this to the official package since it's such a kludge, but give it a try locally if you want to silence the error until this is fixed in the kernel.
Comment 33 Balint SZENTE 2021-02-23 14:17:27 UTC
(In reply to Ben Kohler from comment #32)
> Well the comment on the kernel bug report seems to detail the problem pretty
> well, but nobody is really working on fixing it there.
> 
> [...]
> 
> OpenRC people could probably put something like this into /etc/conf.d/iwd:
> 
> start_pre() {
>   ip link set wlan0 up
> }
> 
> 
> I won't apply this to the official package since it's such a kludge, but
> give it a try locally if you want to silence the error until this is fixed
> in the kernel.

This is exactly what I wrote in comment 27 but in my case, as I said in previous comment, it is completely random. Seldom it boots up OK, often it does not. I could not find any deterministic pattern. Thus the fix greatly depends on "luck" and perhaps also on kernel builds (at least on my Dell Latitude 7490 with i7-8650U and Intel WiFi 8265).
Comment 34 Ben Kohler gentoo-dev 2021-02-23 14:31:29 UTC
Sorry I missed that.  I have only confirmed the workaround on systemd.

And in my case, even without any attempted fix, it succeeds on something like 1 or 2 in 10 boots.
Comment 35 Balint SZENTE 2021-02-23 15:28:50 UTC
(In reply to Ben Kohler from comment #34)
> Sorry I missed that.  I have only confirmed the workaround on systemd.
> 
> And in my case, even without any attempted fix, it succeeds on something
> like 1 or 2 in 10 boots.

Obviously, the fix has nothing to do with SystemD or OpenRC, but with the HW/Kernel build itself. In your case it certainly would work also with OpenRC as well.

Just for the record, may I ask you, please, what hardware do you have? I try to narrow down the issue whether if it is related to the speed of CPU (e.g. Regulatory databases get initialized in time for iwd) and the WiFi adapter type.

Why in your case the fix always work and in my case just sometimes (1 or 2 in 10 boots). Without the fix in my case *never* works.
Comment 36 Ben Kohler gentoo-dev 2021-02-23 15:31:18 UTC
Right now I'm primarily testing on a Thinkpad T550 with Intel Corporation Wireless 7265 (rev 99).

And yeah the fix isn't init-specific, but the *timing* may be just different enough for it to matter-- what if you put "ip link set wlan0 up; sleep 3" in your start_pre?  I know a 3 second sleep is ugly, but just as a test.
Comment 37 Balint SZENTE 2021-02-23 16:39:38 UTC
(In reply to Ben Kohler from comment #36)
> Right now I'm primarily testing on a Thinkpad T550 with Intel Corporation
> Wireless 7265 (rev 99).
> 
> And yeah the fix isn't init-specific, but the *timing* may be just different
> enough for it to matter-- what if you put "ip link set wlan0 up; sleep 3" in
> your start_pre?  I know a 3 second sleep is ugly, but just as a test.

I tried already even with 10s, but unfortunately made no difference. When I will have a little bit of free time, I will try to make a kernel log diff between successful start and unsuccessful start. Maybe it will give some hit about the startup/initialization order.
Comment 38 Balint SZENTE 2021-02-23 20:03:45 UTC
Finally, I found the "working recipe" for my system: the delay had to be added between iwd and NetworkManager. Apparently iwd needs a little bit of time to set-up, and NetworkManager should not be started right after it.

As a conclusion:

1. In `/etc/init.d/iwd` it is required to add (see Comment 32 for SystemD):

start_pre() {
        ip link set wlan0 up
}

This is mandatory requirement to make the iwd crash disappear (the callstack warning in dmesg).

Apparently for some people this is enough.

2. I also added the `sleep 2` to `/etc/init.d/NetworkManager`:

start_pre() {
        sleep 2
        checkpath -q -d -m 0755 /run/NetworkManager
}

In my case delaying the start of NetworkManager was a requirement. Otherwise it always failed to make connection at startup.

Waiting 1 seconds was not enough, occasionally the crash appeared. 2 seconds seems a good value.

So there are two issues:

1. interaction between iwd and the kernel (the fix with ip link)
2. interaction between iwd and NetworkManager (the fix with sleep before starting NetworkManager)
Comment 39 Ben Kohler gentoo-dev 2021-06-09 14:23:24 UTC
Created attachment 714909 [details, diff]
kernel patch

Upstream bug report links to a patch on the linux-wireless mailing list now:

https://lore.kernel.org/linux-wireless/iwlwifi.20210409123755.ba2ea961f4ae.I8fde32d3196e860efa3b4ec464c42194195b42ec@changeid/

I'm attaching it here.  This patch applies to (eg) gentoo-sources or vanilla-sources, not iwd.
Comment 40 Balint SZENTE 2021-06-14 15:00:18 UTC
(In reply to Ben Kohler from comment #39)
> Created attachment 714909 [details, diff] [details, diff]
> kernel patch
> 
> Upstream bug report links to a patch on the linux-wireless mailing list now:
> 
> https://lore.kernel.org/linux-wireless/iwlwifi.20210409123755.ba2ea961f4ae.
> I8fde32d3196e860efa3b4ec464c42194195b42ec@changeid/
> 
> I'm attaching it here.  This patch applies to (eg) gentoo-sources or
> vanilla-sources, not iwd.

Thanks Ben for the link. I confirm the patch works on my machine. I reverted the changes that I made for the iwd and NetworkManager init scripts.
Comment 41 m1027 2021-11-02 21:04:00 UTC
With linux-5.15 I don't get this kernel warning anymore.
Comment 42 Ben Kohler gentoo-dev 2021-11-03 11:53:09 UTC
Same, maybe we can close out this bug in a few months once 5.15.x goes stable..
Comment 43 m1027 2021-11-06 14:31:43 UTC
Yes, let's close this issue as linux-5.15 fixes it. Thanks to everybody!