I first run into this bug after =sys-kernel/gentoo-sources-4.0.9 (not completely sure tho). With my configuration and genkernels default configuration my whole system freezes suddenly after a certain uptime (randomly) initially it took over 10 days (4.0.9) but currently it happens after <4-5h (4.4.5) - not sure if it's really related to the kernel version. With a freeze I mean that the power stays on and the screen continuously shows the last frame but nothing is happening (no SysRq, no Caps-Lock, LEDs). The only thing I can do is to hard-reset. It seems like not even a panic is caused since I have loaded a panic kernel (and it works with echo c > /proc/sysrq-trigger). With my current config it freezes just for ~2 sec and the following line show up (dmesg): [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle Reproducible: Couldn't Reproduce Steps to Reproduce: Seems to happen at random (currently only once ~<4-5h after booting) but seems somehow related to chromium (as far as I remember this never happened without chromium running however it should be possible to reproduce it without chromium). Actual Results: Everything freezes for ~2sec and the following line show up in dmesg: [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle Some other lines that might be related (dmesg): ... [ 0.283757] [drm] Initialized i915 1.6.0 20151010 for 0000:00:02.0 on minor 0 ... [ 1.982637] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device ... [ 63.349412] WARNING: CPU: 0 PID: 5113 at drivers/gpu/drm/i915/intel_uncore.c:619 hsw_unclaimed_reg_debug+0x64/0x7c() [ 63.349414] Unclaimed register detected before reading register 0x22380 [ 63.349416] Modules linked in: iwlmvm snd_hda_codec_hdmi iTCO_wdt asus_nb_wmi iTCO_vendor_support asus_wmi x86_pkg_temp_thermal iwlwifi lpc_ich mfd_core wmi efivarfs [ 63.349428] CPU: 0 PID: 5113 Comm: BrowserBlocking Tainted: G W 4.4.5-gentoo #3 [ 63.349429] Hardware name: ASUSTeK COMPUTER INC. UX303LAB/UX303LAB, BIOS UX303LAB.210 08/25/2015 [ 63.349431] 0000000000000000 ffff88021ec03d18 ffffffff812c5e8c ffff88021ec03d60 [ 63.349434] 0000000000000009 ffff88021ec03d50 ffffffff810753a5 ffffffff813f11fd [ 63.349437] ffff880212da0000 ffff880212da0000 ffff880212da0000 ffff880212da0080 [ 63.349439] Call Trace: [ 63.349442] <IRQ> [<ffffffff812c5e8c>] dump_stack+0x4d/0x63 [ 63.349449] [<ffffffff810753a5>] warn_slowpath_common+0x9a/0xb3 [ 63.349453] [<ffffffff813f11fd>] ? hsw_unclaimed_reg_debug+0x64/0x7c [ 63.349456] [<ffffffff81075401>] warn_slowpath_fmt+0x43/0x4b [ 63.349458] [<ffffffff813f4ee8>] ? fw_domains_get_with_thread_status+0xd/0x58 [ 63.349461] [<ffffffff813f11fd>] hsw_unclaimed_reg_debug+0x64/0x7c [ 63.349464] [<ffffffff813f2369>] gen6_read32+0x43/0xae [ 63.349467] [<ffffffff813e9c5a>] intel_lrc_irq_handler+0x96/0x1ae [ 63.349470] [<ffffffff813b3784>] gen8_gt_irq_handler+0x75/0x1d8 [ 63.349473] [<ffffffff813b3952>] gen8_irq_handler+0x6b/0x520 [ 63.349476] [<ffffffff810a86b1>] handle_irq_event_percpu+0x78/0x1a7 [ 63.349478] [<ffffffff810a8806>] handle_irq_event+0x26/0x46 [ 63.349481] [<ffffffff810ab1d7>] handle_edge_irq+0xa1/0xbe [ 63.349484] [<ffffffff8100630b>] handle_irq+0x104/0x10c [ 63.349486] [<ffffffff81005c76>] do_IRQ+0x46/0xb5 [ 63.349490] [<ffffffff8175eebf>] common_interrupt+0x7f/0x7f [ 63.349491] <EOI> [ 63.349493] ---[ end trace be1d80709bc4e6de ]--- [ 411.849338] perf interrupt took too long (2505 > 2500), lowering kernel.perf_event_max_sample_rate to 50000 [ 759.268856] kworker/dying (6) used greatest stack depth: 12472 bytes left [ 1255.449707] perf interrupt took too long (5036 > 5000), lowering kernel.perf_event_max_sample_rate to 25000 [ 4729.323445] kworker/dying (34) used greatest stack depth: 12448 bytes left ... [24005.351739] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... render ring idle [26561.371210] asus_wmi: Unknown key cf pressed ... [58269.439309] [drm] stuck on render ring [58269.441141] [drm] GPU HANG: ecode 8:0:0xac277ffe, in chrome [5110], reason: Ring hung, action: reset [58269.441143] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [58269.441144] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [58269.441145] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [58269.441146] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [58269.441147] [drm] GPU crash dump saved to /sys/class/drm/card0/error [58269.526788] drm/i915: Resetting chip after gpu hang ... [61122.440706] [drm] stuck on render ring [61122.441982] [drm] GPU HANG: ecode 8:0:0x84dfbffe, in chrome [3967], reason: Ring hung, action: reset [61122.443699] drm/i915: Resetting chip after gpu hang ... Boot-Options (my setup sucks (initramfs+boot-options in kernel) due to some UEFI problems): [ 0.000000] Linux version 4.4.5-gentoo (root@jarvis) (gcc version 4.9.3 (Gentoo 4.9.3 p1.5, pie-0.6.4) ) #3 SMP Mon Mar 14 16:59:19 CET 2016 [ 0.000000] Command line: BOOT_IMAGE=/michael-emergency-kernel-4.4.5 crashkernel=128M [ 0.000000] efi: EFI v2.40 by American Megatrends [ 0.000000] efi: ESRT=0xdce2cd98 ACPI=0xdb71c000 ACPI 2.0=0xdb71c000 SMBIOS=0xdce2c918[ 0.000000] DMI: ASUSTeK COMPUTER INC. UX303LAB/UX303LAB, BIOS UX303LAB.210 08/25/2015 [ 0.000000] Kernel command line: root=/dev/mapper/vg_jarvis-lv_root dolvm rootfstype=ext4 init=/usr/lib/systemd/systemd BOOT_IMAGE=/michael-emergency-kernel-4.4.5 crashkernel=128M
Created attachment 428360 [details] /sys/class/drm/card0/error [58269.439309] [drm] stuck on render ring [58269.441141] [drm] GPU HANG: ecode 8:0:0xac277ffe, in chrome [5110], reason: Ring hung, action: reset [58269.441143] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace. [58269.441144] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel [58269.441145] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue. [58269.441146] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it. [58269.441147] [drm] GPU crash dump saved to /sys/class/drm/card0/error [58269.526788] drm/i915: Resetting chip after gpu hang
Created attachment 428362 [details] My custom configuration that caused the system to freeze.
Created attachment 428364 [details] My current configuration that somehow prevents the system from freezing.
Created attachment 428366 [details] Diff between the freezing and non-freezing configuration. Seems to be the relevant change to me - not sure tho: > CONFIG_HANGCHECK_TIMER=m Things that could be related: 1187a1188,1190 > # CONFIG_INTEL_MEI is not set > # CONFIG_INTEL_MEI_ME is not set > # CONFIG_INTEL_MEI_TXE is not set < # CONFIG_WATCHDOG_CORE is not set < # CONFIG_WATCHDOG_NOWAYOUT is not set --- > CONFIG_WATCHDOG_CORE=y > CONFIG_WATCHDOG_NOWAYOUT=y 2288c2291 < # CONFIG_SOFT_WATCHDOG is not set --- > CONFIG_SOFT_WATCHDOG=m 2304c2307 < # CONFIG_I6300ESB_WDT is not set --- > CONFIG_I6300ESB_WDT=m 2306c2309,2310 < # CONFIG_ITCO_WDT is not set --- > CONFIG_ITCO_WDT=m > CONFIG_ITCO_VENDOR_SUPPORT=y 2351c2355 < # CONFIG_MFD_CORE is not set --- > CONFIG_MFD_CORE=m 2366c2370 < # CONFIG_LPC_ICH is not set --- > CONFIG_LPC_ICH=m Mistakes that should be unrelated (I reconfigured a lot and forgot to change this after oldconfig in this version - other versions crashed without this incidents): < CONFIG_INITRAMFS_SOURCE="/boot/initramfs/4.1.15-r1.cpio" < # CONFIG_LOCKUP_DETECTOR is not set < # CONFIG_DETECT_HUNG_TASK is not set < # CONFIG_PANIC_ON_OOPS is not set < CONFIG_PANIC_ON_OOPS_VALUE=0 < # CONFIG_DEBUG_RT_MUTEXES is not set
FYI (bugs form the kernel bug tracker that could be related): https://bugzilla.kernel.org/buglist.cgi?bug_status=__all__&content=drm%3Ai915_hangcheck_elapsed
(In reply to Michael Weiss from comment #5) > FYI (bugs form the kernel bug tracker that could be related): > https://bugzilla.kernel.org/buglist. > cgi?bug_status=__all__&content=drm%3Ai915_hangcheck_elapsed After closer looking at them I found the following: https://bugs.freedesktop.org/buglist.cgi?bug_status=__open__&content=drm%3Ai915_hangcheck_elapsed Seems like they are very related (and still open) - did I report this on the wrong place? I'm new to all of this (just trying to help here) - Please let me know what I could improve, change, etc. - thx :)
Experienced three more crashes - one yesterday and two today. The one yesterday froze everything i. e. I have no idea what happened this time. The first one today happened after exiting i3 (shutting down the x-server) but it didn't completely freeze the system I couldn't switch to another VT but the SysRq-Keys still worked - i. e. I have a dump (I'll look into that later). The second one today happened most likely due to one of the following parameters: "drm.debug=0x06 i915.semaphores=1". My 4.4.5-Setup (forgot to include that - semi-working i.e. one crash so far): =sys-kernel/gentoo-sources-4.4.5 =media-libs/mesa-11.0.6 =x11-drivers/xf86-video-intel-2.99.917-r2 =x11-libs/libdrm-2.4.65 My current 4.5.0-Setup (since the third crash - testing now without the parameters): =sys-kernel/gentoo-sources-4.5.0 =media-libs/mesa-11.1.2-r1 =x11-drivers/xf86-video-intel-2.99.917_p20160316 =x11-libs/libdrm-2.4.67 =x11-base/xorg-server-1.18.2 =x11-base/xorg-drivers-1.18-r1 I'll now ask the devs at #intel-gfx@freenode.net what I should do about this (move it over to freedesktop.org, etc.), how I could help, etc.
Created attachment 428656 [details] Output from: lspci -vvv -s 2 My CPU: Intel Core i7-5500U My GPU: Intel HD Graphics 5500 My Notebook: ASUS ZenBook UX303LA
Fixed with the new setup: =sys-kernel/gentoo-sources-4.5.0 =x11-libs/libdrm-2.4.67 =media-libs/mesa-11.1.2-r1 =x11-base/xorg-server-1.18.2 =x11-drivers/xf86-input-synaptics-1.8.2 =x11-drivers/xf86-video-intel-2.99.917_p20160316 =x11-drivers/xf86-input-evdev-2.10.1 =x11-base/xorg-drivers-1.18-r1 Works flawlessly for 3 days now :) All issues are gone and there are no i915 related errors occurring in dmesg anymore.