Just a copy of the bug I have created upstream: ---- Hi, I have a regression with my audio: 00:05.0 Audio device: NVIDIA Corporation MCP61 High Definition Audio (rev a2) Subsystem: ASUSTeK Computer Inc. MCP61 High Definition Audio Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 22, NUMA node 0 Memory at dbff8000 (32-bit, non-prefetchable) [size=16K] Capabilities: <access denied> Kernel driver in use: snd_hda_intel I use LTS kernel from Gentoo. And since 5.10 migration (two weeks ago), sound is just crackling from the boot sequence until the shutdown. When I try to play a sound, I can heard both crackling and my sond. I tried many kernels: => Working linux-5.4.97-gentoo linux-5.4.109-gentoo linux-5.4.113 => Compile but fail to launch (kernel panic, I do not understand… If you have any idea it can help me to identify which version introduce this bug) linux-5.5.18 linux-5.5.19-gentoo linux-5.6.0 => Not working linux-5.6.18-gentoo linux-5.7.19-gentoo linux-5.8.18-gentoo linux-5.9.16-gentoo linux-5.10.0-gentoo linux-5.10.23-gentoo linux-5.10.27-gentoo linux-5.11.14 linux-5.12-rc7 I checked my log but do not see anything in my logs except the device numbers. They are differents… but is it an issue? 5/6/7 => OK 0/1/2 => KO Apr 17 19:21:41 fixe kernel: input: HDA NVidia Rear Mic as /devices/pci0000:00/0000:00:05.0/sound/card0/input0 Apr 17 19:21:41 fixe kernel: input: HDA NVidia Line as /devices/pci0000:00/0000:00:05.0/sound/card0/input1 Apr 17 19:21:41 fixe kernel: input: HDA NVidia Line Out as /devices/pci0000:00/0000:00:05.0/sound/card0/input2 => 5.12-rc7 Apr 17 19:24:40 fixe kernel: input: HDA NVidia Rear Mic as /devices/pci0000:00/0000:00:05.0/sound/card0/input5 Apr 17 19:24:40 fixe kernel: input: HDA NVidia Line as /devices/pci0000:00/0000:00:05.0/sound/card0/input6 Apr 17 19:24:40 fixe kernel: input: HDA NVidia Line Out as /devices/pci0000:00/0000:00:05.0/sound/card0/input7 => 5.4.113 Reproducible: Always Steps to Reproduce: 1. Boot with 5.10.27 kernel Actual Results: Sound crackling Expected Results: No sound
This seems to be driver problem. You will have to bisect the kernel because only people with access to the hardware can probably find the problem, see https://wiki.gentoo.org/wiki/Kernel_git-bisect for details. > => Compile but fail to launch (kernel panic, I do not understand… Please go into details. You shouldn't see a kernel panic so we (you) should understand why this is happening.
Created attachment 700908 [details] Screenshot Hi Thomas, Thanks for your anwser. I know git bisect but for now the step is a bit big between 5.4.113 and 5.6.18. Let's try to reduce it. This is my kernel panic. I have found this link: https://lkml.org/lkml/2020/3/14/186 It seems that another Gentoo developer had this issue. I will look at his patch.
Git bisect will get you the right commit in about 10 attempts with commit count between them being of little importance. Of course if you suspect a particular commit, then it's well worth trying that first but if you're going to be throwing darts at the wall, then you may as well go for git bisect.
It's also worth pointing out that you should only compare the vanilla-sources or perhaps gentoo-sources x.y.0 versions since higher patch numbers than 0 indicate that commits from newer kernels have almost certainly been backported, potentially introducing this bug that originally might not have shipped with that kernel series.
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 9b294c13809a..da9f4ea9bf4c 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -11,6 +11,12 @@ extra-y += vmlinux.lds CPPFLAGS_vmlinux.lds += -U$(UTS_MACHINE) +# smpboot's init_secondary initializes stack canary. +# Make sure we don't emit stack checks before it's +# initialized. +nostackp := $(call cc-option, -fno-stack-protector) +CFLAGS_smpboot.o := $(nostackp) + solved the kernel panic issue. Regression is between 5.5.19 (gentoo) and 5.6.0 (vanilla). Niklāvs, I will check 5.5.19 vanilla tomorrow to be sure. Then have a look to bisect.
Created attachment 702111 [details, diff] Commit Hi, After bisecting between 5.5 and 5.6, it leads me to the commit 88452da92ba2b264a3922218c2cec13aac51c502 It does not mean something to me. Probably due to non-knowledge of C. I took a like at this commit and compared it with 5.12 (that are not working too). Most of the commit has disappeared. diff --git a/include/sound/hdaudio.h b/include/sound/hdaudio.h index e05b95e83d5a..81373a2efd96 100644 --- a/include/sound/hdaudio.h +++ b/include/sound/hdaudio.h @@ -317,6 +317,7 @@ struct hdac_bus { struct hdac_rb corb; struct hdac_rb rirb; unsigned int last_cmd[HDA_MAX_CODECS]; /* last sent command */ + wait_queue_head_t rirb_wq; /* CORB/RIRB and position buffers */ struct snd_dma_buffer rb; diff --git a/sound/hda/hdac_bus.c b/sound/hda/hdac_bus.c index 8f19876244eb..48b227fff204 100644 --- a/sound/hda/hdac_bus.c +++ b/sound/hda/hdac_bus.c @@ -43,6 +43,7 @@ int snd_hdac_bus_init(struct hdac_bus *bus, struct device *dev, mutex_init(&bus->cmd_mutex); mutex_init(&bus->lock); INIT_LIST_HEAD(&bus->hlink_list); + init_waitqueue_head(&bus->rirb_wq); bus->irq = -1; return 0; } diff --git a/sound/hda/hdac_controller.c b/sound/hda/hdac_controller.c index 7e7be8e4dcf9..cd1c3b282657 100644 --- a/sound/hda/hdac_controller.c +++ b/sound/hda/hdac_controller.c @@ -216,6 +216,9 @@ void snd_hdac_bus_update_rirb(struct hdac_bus *bus) else if (bus->rirb.cmds[addr]) { bus->rirb.res[addr] = res; bus->rirb.cmds[addr]--; + if (!bus->rirb.cmds[addr] && + waitqueue_active(&bus->rirb_wq)) + wake_up(&bus->rirb_wq); } else { dev_err_ratelimited(bus->dev, "spurious response %#x:%#x, last cmd=%#08x\n", diff --git a/sound/pci/hda/hda_controller.c b/sound/pci/hda/hda_controller.c index 2f3b7a35f2d9..f30a053d981e 100644 --- a/sound/pci/hda/hda_controller.c +++ b/sound/pci/hda/hda_controller.c @@ -792,21 +792,25 @@ static int azx_rirb_get_response(struct hdac_bus *bus, unsigned int addr, struct hda_bus *hbus = &chip->bus; unsigned long timeout; unsigned long loopcounter; - int do_poll = 0; + wait_queue_entry_t wait; bool warned = false; + init_wait_entry(&wait, 0); again: timeout = jiffies + msecs_to_jiffies(1000); for (loopcounter = 0;; loopcounter++) { spin_lock_irq(&bus->reg_lock); - if (bus->polling_mode || do_poll) + if (!bus->polling_mode) + prepare_to_wait(&bus->rirb_wq, &wait, + TASK_UNINTERRUPTIBLE); + if (bus->polling_mode) snd_hdac_bus_update_rirb(bus); if (!bus->rirb.cmds[addr]) { - if (!do_poll) - bus->poll_count = 0; if (res) *res = bus->rirb.res[addr]; /* the last value */ + if (!bus->polling_mode) + finish_wait(&bus->rirb_wq, &wait); spin_unlock_irq(&bus->reg_lock); return 0; } @@ -814,7 +818,9 @@ static int azx_rirb_get_response(struct hdac_bus *bus, unsigned int addr, if (time_after(jiffies, timeout)) break; #define LOOP_COUNT_MAX 3000 - if (hbus->needs_damn_long_delay || + if (!bus->polling_mode) { + schedule_timeout(msecs_to_jiffies(2)); + } else if (hbus->needs_damn_long_delay || loopcounter > LOOP_COUNT_MAX) { if (loopcounter > LOOP_COUNT_MAX && !warned) { dev_dbg_ratelimited(chip->card->dev, @@ -829,19 +835,12 @@ static int azx_rirb_get_response(struct hdac_bus *bus, unsigned int addr, } } + if (!bus->polling_mode) + finish_wait(&bus->rirb_wq, &wait); + if (hbus->no_response_fallback) return -EIO; - if (!bus->polling_mode && bus->poll_count < 2) { - dev_dbg(chip->card->dev, - "azx_get_response timeout, polling the codec once: last cmd=0x%08x\n", - bus->last_cmd[addr]); - do_poll = 1; - bus->poll_count++; - goto again; - } - - if (!bus->polling_mode) { dev_warn(chip->card->dev, "azx_get_response timeout, switching to polling mode: last cmd=0x%08x\n", Any idea?
So this is the bad commit? Have you tried reverting this commit? I am wondering a little bit because this commit is included in 5.6(.0), i.e. in every 5.6.x kernel but you said 5.6(.0) worked for you.
Hi Thomas, 5.6.x kernels contain the bug (at least the first and latest tag). Maybe a confusion between my “compile but fail to launch” issue and the sound regression? Most of the source code has been changed on 5.10/5.12 (I did not look at the other branches). I do not know exactly how git revert works but if it is like “patch -R”, I will have some conflicts for the main parts. I will try to remove what I can on 5.10+revert.
Created attachment 702132 [details, diff] Workaround I managed to create this workaround for 5.10.31 (with revert/manual merge). It is probably quite dirty but it works for me… diff --git a/sound/hda/hdac_bus.c b/sound/hda/hdac_bus.c index 9766f6af8743..44d1e309a8b6 100644 --- a/sound/hda/hdac_bus.c +++ b/sound/hda/hdac_bus.c @@ -44,7 +44,6 @@ int snd_hdac_bus_init(struct hdac_bus *bus, struct device *dev, mutex_init(&bus->cmd_mutex); mutex_init(&bus->lock); INIT_LIST_HEAD(&bus->hlink_list); - init_waitqueue_head(&bus->rirb_wq); bus->irq = -1; /* diff --git a/sound/hda/hdac_controller.c b/sound/hda/hdac_controller.c index b98449fd92f3..abdc0bb1b462 100644 --- a/sound/hda/hdac_controller.c +++ b/sound/hda/hdac_controller.c @@ -218,9 +218,6 @@ void snd_hdac_bus_update_rirb(struct hdac_bus *bus) else if (bus->rirb.cmds[addr]) { bus->rirb.res[addr] = res; bus->rirb.cmds[addr]--; - if (!bus->rirb.cmds[addr] && - waitqueue_active(&bus->rirb_wq)) - wake_up(&bus->rirb_wq); } else { dev_err_ratelimited(bus->dev, "spurious response %#x:%#x, last cmd=%#08x\n", diff --git a/sound/pci/hda/hda_controller.c b/sound/pci/hda/hda_controller.c index b972d59eb1ec..0b621bd5b5ab 100644 --- a/sound/pci/hda/hda_controller.c +++ b/sound/pci/hda/hda_controller.c @@ -777,9 +777,46 @@ static int azx_rirb_get_response(struct hdac_bus *bus, unsigned int addr, { struct azx *chip = bus_to_azx(bus); struct hda_bus *hbus = &chip->bus; + unsigned long timeout; + unsigned long loopcounter; + int do_poll = 0; + bool warned = false; int err; again: + timeout = jiffies + msecs_to_jiffies(1000); + + for (loopcounter = 0;; loopcounter++) { + spin_lock_irq(&bus->reg_lock); + if (bus->polling_mode || do_poll) + snd_hdac_bus_update_rirb(bus); + if (!bus->rirb.cmds[addr]) { + if (!do_poll) + bus->poll_count = 0; + if (res) + *res = bus->rirb.res[addr]; /* the last value */ + spin_unlock_irq(&bus->reg_lock); + return 0; + } + spin_unlock_irq(&bus->reg_lock); + if (time_after(jiffies, timeout)) + break; +#define LOOP_COUNT_MAX 3000 + if (hbus->core.needs_damn_long_delay || + loopcounter > LOOP_COUNT_MAX) { + if (loopcounter > LOOP_COUNT_MAX && !warned) { + dev_dbg_ratelimited(chip->card->dev, + "too slow response, last cmd=%#08x\n", + bus->last_cmd[addr]); + warned = true; + } + msleep(2); /* temporary workaround */ + } else { + udelay(10); + cond_resched(); + } + } + err = snd_hdac_bus_get_response(bus, addr, res); if (!err) return 0; @@ -787,6 +824,16 @@ static int azx_rirb_get_response(struct hdac_bus *bus, unsigned int addr, if (hbus->no_response_fallback) return -EIO; + if (!bus->polling_mode && bus->poll_count < 2) { + dev_dbg(chip->card->dev, + "azx_get_response timeout, polling the codec once: last cmd=0x%08x\n", + bus->last_cmd[addr]); + do_poll = 1; + bus->poll_count++; + goto again; + } + + if (!bus->polling_mode) { dev_warn(chip->card->dev, "azx_get_response timeout, switching to polling mode: last cmd=0x%08x\n",
Add CONFIG_SND_HDA_CODEC_ANALOG=y solved the issue. Maybe description is a bit to generic: Say Y or M here to include Analog Devices HD-audio codec support in snd-hda-intel driver, such as AD1986A. => AD1986A is just one chipset among lots of: https://doc.ubuntu-fr.org/audio_intel_hda I have compiled my kernels for 14years. Never used it!