Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 663844 - media-libs/mesa-18.2.0_rc2 crashes xorg-server
Summary: media-libs/mesa-18.2.0_rc2 crashes xorg-server
Status: RESOLVED TEST-REQUEST
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: x86 Linux
: Normal critical (vote)
Assignee: Gentoo X packagers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-17 05:08 UTC by ad PC
Modified: 2019-05-26 15:23 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info (663844-info.log,18.73 KB, text/x-log)
2018-08-19 18:32 UTC, Bertrand Jacquin
Details
Xorg.0.log (Xorg.0.log,7.44 KB, text/x-log)
2018-08-19 18:32 UTC, Bertrand Jacquin
Details
mesa-18.1.6.log (mesa-18.1.6.log,7.36 KB, text/x-log)
2018-08-20 00:26 UTC, Bertrand Jacquin
Details
mesa-18.2.0-rc3.log (mesa-18.2.0-rc3.log,7.17 KB, text/x-log)
2018-08-20 00:26 UTC, Bertrand Jacquin
Details
build-mesa-18.1.6.log (build-mesa-18.1.6.log.xz,89.10 KB, application/x-xz)
2018-08-20 20:21 UTC, Bertrand Jacquin
Details
build-mesa-18.2.0_rc3.log (build-mesa-18.2.0_rc3.log.xz,64.10 KB, application/x-xz)
2018-08-20 20:21 UTC, Bertrand Jacquin
Details
gdb-mesa-18.1.6.log (gdb-mesa-18.1.6.log,7.36 KB, text/x-log)
2018-08-20 20:22 UTC, Bertrand Jacquin
Details
gdb-mesa-18.2.0_rc3.log (gdb-mesa-18.2.0_rc3.log,7.17 KB, text/x-log)
2018-08-20 20:23 UTC, Bertrand Jacquin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description ad PC 2018-08-17 05:08:59 UTC
This version triggers an assertion that leads to an abort with this backtrace:

[   200.589] (EE) 
[   200.589] (EE) Backtrace:
[   200.589] (EE) 0: /usr/libexec/Xorg (xorg_backtrace+0x4d) [0x70239d204d]
[   200.589] (EE) 1: /usr/libexec/Xorg (0x7023802000+0x1d4869) [0x70239d6869]
[   200.589] (EE) 2: /lib64/libpthread.so.0 (0x36bef869000+0x14050) [0x36bef87d050]
[   200.589] (EE) 3: /lib64/libc.so.6 (gsignal+0x10b) [0x36bef4d76eb]
[   200.589] (EE) 4: /lib64/libc.so.6 (abort+0x151) [0x36bef4d8f31]
[   200.589] (EE) 5: /lib64/libc.so.6 (0x36bef4a1000+0x2e20a) [0x36bef4cf20a]
[   200.589] (EE) 6: /lib64/libc.so.6 (0x36bef4a1000+0x2e292) [0x36bef4cf292]
[   200.589] (EE) 7: /usr/lib64/dri/i965_dri.so (0x36be9db9000+0x19c9a0) [0x36be9f559a0]
[   200.590] (EE) 8: /usr/lib64/dri/i965_dri.so (0x36be9db9000+0x171821) [0x36be9f2a821]
[   200.590] (EE) 9: /usr/lib64/libEGL.so.1 (0x36be8d3c000+0x1b508) [0x36be8d57508]
[   200.590] (EE) 10: /usr/lib64/libEGL.so.1 (eglMakeCurrent+0x219) [0x36be8d4b949]
[   200.590] (EE) 11: /usr/lib64/xorg/modules/libglamoregl.so (0x36beafbb000+0x767c) [0x36beafc267c]
[   200.590] (EE) 12: /usr/lib64/xorg/modules/libglamoregl.so (glamor_init+0x247) [0x36beafc5887]
[   200.590] (EE) 13: /usr/lib64/xorg/modules/drivers/modesetting_drv.so (0x36beb1ef000+0x122fd) [0x36beb2012fd]
[   200.590] (EE) 14: /usr/lib64/xorg/modules/drivers/modesetting_drv.so (0x36beb1ef000+0x9a04) [0x36beb1f8a04]
[   200.590] (EE) 15: /usr/libexec/Xorg (AddScreen+0xe7) [0x70238628d7]
[   200.590] (EE) 16: /usr/libexec/Xorg (InitOutput+0x3fe) [0x70238a94ae]
[   200.590] (EE) 17: /usr/libexec/Xorg (0x7023802000+0x64a7f) [0x7023866a7f]
[   200.590] (EE) 18: /lib64/libc.so.6 (__libc_start_main+0xe7) [0x36bef4c29f7]
[   200.590] (EE) 19: /usr/libexec/Xorg (_start+0x2a) [0x702384f14a]
[   200.590] (EE) 
[   200.590] (EE) 
Fatal server error:
[   200.590] (EE) Caught signal 6 (Aborted). Server aborting
[   200.590] (EE) 
[   200.590] (EE) 
Please consult the The X.Org Foundation support 
	 at http://wiki.x.org
 for help. 
[   200.590] (EE) Please also check the log file at "/var/lib/gdm/.local/share/xorg/Xorg.0.log" for additional information.
[   200.590] (EE) 
[   200.627] (EE) Server terminated with error (1). Closing log file.

Reproducible: Always

Actual Results:  
xserver aborts
Comment 1 Jonas Stein gentoo-dev 2018-08-17 21:40:12 UTC
Thank you for the report. Please recompile and *attach* the logfiles and 
paste the emerge info as described on
https://wiki.gentoo.org/wiki/Attach_the_logs_to_the_bug_ticket
Perhaps we can find further information in the logs.
The logs must be part of the ticket, but not on external websites.
Please reopen this ticket (Status:unconfirmed) afterwards.
Comment 2 ad PC 2018-08-18 07:26:35 UTC
Just a side info for all affected users: masking the following packages resolves the issue for me.

>=x11-libs/libdrm-2.4.93
>=x11-base/xorg-server-1.20.1
>=media-libs/mesa-18.2.0_rc2

I guess that the crash is due to DRI changes in mesa-18.2.0_rc2.
Comment 3 Matt Turner gentoo-dev 2018-08-18 08:17:29 UTC
There's definitely not enough information here. What versions were you using when things were crashing?

I don't particularly like bug wranglers marking bugs as RESOLVED/NEEDINFO but in this case it's exactly right.
Comment 4 Bertrand Jacquin 2018-08-19 18:31:38 UTC
SEGV appear here as well with a driver i915

[   944.539] (EE) Backtrace:
[   944.539] (EE) 0: /usr/bin/X (xorg_backtrace+0x80) [0x55d1e8c54d20]
[   944.539] (EE) 1: /usr/bin/X (0x55d1e8a6f000+0x1ea878) [0x55d1e8c59878]
[   944.539] (EE) 2: /lib64/libpthread.so.0 (0x7f7957bad000+0x151e0) [0x7f7957bc21e0]
[   944.539] (EE) 3: /usr/lib64/dri/i965_dri.so (0x7f7951149000+0x344da9) [0x7f795148dda9]
[   944.539] (EE) 4: /usr/lib64/dri/i965_dri.so (0x7f7951149000+0x499695) [0x7f79515e2695]
[   944.539] (EE) 5: /usr/lib64/dri/i965_dri.so (0x7f7951149000+0x4cce07) [0x7f7951615e07]
[   944.539] (EE) 6: /usr/lib64/dri/i965_dri.so (0x7f7951149000+0x45596a) [0x7f795159e96a]
[   944.539] (EE) 7: /usr/lib64/libgbm.so.1 (0x7f7952024000+0x5326) [0x7f7952029326]
[   944.539] (EE) 8: /usr/lib64/libgbm.so.1 (0x7f7952024000+0x5700) [0x7f7952029700]
[   944.539] (EE) 9: /usr/lib64/libgbm.so.1 (gbm_create_device+0x57) [0x7f79520268b7]
[   944.539] (EE) 10: /usr/lib64/xorg/modules/libglamoregl.so (glamor_egl_init+0x92) [0x7f795223b762]
[   944.539] (EE) 11: /usr/lib64/xorg/modules/drivers/modesetting_drv.so (0x7f7952469000+0xa2b8) [0x7f79524732b8]
[   944.539] (EE) 12: /usr/bin/X (InitOutput+0xa92) [0x55d1e8b15932]
[   944.540] (EE) 13: /usr/bin/X (0x55d1e8a6f000+0x58b46) [0x55d1e8ac7b46]
[   944.540] (EE) 14: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x7f79577ff05d]
[   944.540] (EE) 15: /usr/bin/X (_start+0x2a) [0x55d1e8aaf04a]

See attached emerge --info.

I had to install x11-drivers/xf86-video-intel with USE="-dri -dri3" to recover X
Comment 5 Bertrand Jacquin 2018-08-19 18:32:04 UTC
Created attachment 544044 [details]
emerge --info
Comment 6 Bertrand Jacquin 2018-08-19 18:32:46 UTC
Created attachment 544046 [details]
Xorg.0.log
Comment 7 Matt Turner gentoo-dev 2018-08-19 22:04:03 UTC
Okay, so we're getting a crash down in i965.

To confirm: 

  - mesa-18.2.0_rc2
  - xorg-server-1.19.5(-r2)

Can you please rebuild Mesa with debugging symbols (compile it with -g -O0)?

Is it a particular action that triggers the crash? Closing a window, maximizing something, etc?

18.2.0_rc3 is in the tree as well, so it's worth a try. What was the last version of Mesa you used that worked?

Also, please provide the output of 'lspci -nn | grep VGA'
Comment 8 Bertrand Jacquin 2018-08-20 00:25:50 UTC
(In reply to Matt Turner from comment #7)
> Okay, so we're getting a crash down in i965.

While lshw is reporting the card as i915

> Can you please rebuild Mesa with debugging symbols (compile it with -g -O0)?

Sure, here is the GDB output. See mesa-18.1.6.log

> Is it a particular action that triggers the crash? Closing a window,
> maximizing something, etc?

The crash started to exist after the update, while starting any apps using X. After a reboot, sddm was not started since the whole X could not start.

> 18.2.0_rc3 is in the tree as well, so it's worth a try.

Crash also happens with 18.2.0_rc3, here is the gdb output. See mesa-18.2.0-rc3.log

> What was the last
> version of Mesa you used that worked?

I upgraded from mesa-17.3.9

> Also, please provide the output of 'lspci -nn | grep VGA'

# lspci -nn | grep VGA
00:02.0 VGA compatible controller [0300]: Intel Corporation Haswell-ULT Integrated Graphics Controller [8086:0a16] (rev 09)

# lspci -vv -s 00:02.0
00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
        Subsystem: Samsung Electronics Co Ltd Haswell-ULT Integrated Graphics Controller
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 43
        Region 0: Memory at f7800000 (64-bit, non-prefetchable) [size=4M]
        Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Region 4: I/O ports at f000 [size=64]
        [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: fee0f00c  Data: 41c1
        Capabilities: [d0] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [a4] PCI Advanced Features
                AFCap: TP+ FLR+
                AFCtrl: FLR-
                AFStatus: TP-
        Kernel driver in use: i915

# lshw -C display
  *-display
       description: VGA compatible controller
       product: Haswell-ULT Integrated Graphics Controller
       vendor: Intel Corporation
       physical id: 2
       bus info: pci@0000:00:02.0
       version: 09
       width: 64 bits
       clock: 33MHz
       capabilities: msi pm vga_controller bus_master cap_list rom
       configuration: driver=i915 latency=0
       resources: irq:43 memory:f7800000-f7bfffff memory:e0000000-efffffff ioport:f000(size=64) memory:c0000-dffff
Comment 9 Bertrand Jacquin 2018-08-20 00:26:08 UTC
Created attachment 544076 [details]
mesa-18.1.6.log
Comment 10 Bertrand Jacquin 2018-08-20 00:26:32 UTC
Created attachment 544078 [details]
mesa-18.2.0-rc3.log
Comment 11 Matt Turner gentoo-dev 2018-08-20 00:37:18 UTC
(In reply to Bertrand Jacquin from comment #8)

Thanks!

> (In reply to Matt Turner from comment #7)
> > Okay, so we're getting a crash down in i965.
> 
> While lshw is reporting the card as i915

It's confusing, but the i965 driver in Mesa is for Intel graphics since ~2006 and i915 is for chips earlier than that. The kernel driver is shared between those generations and called i915 as well.

Excellent, all the logs look very helpful. I'm going to ask the appropriate person upstream to see if he has any idea why brw_disk_cache_init() is crashing.
Comment 12 Matt Turner gentoo-dev 2018-08-20 00:42:52 UTC
(In reply to Bertrand Jacquin from comment #9)
> Created attachment 544076 [details]
> mesa-18.1.6.log

Oh, you're able to reproduce with 18.1.6 as well? Uh oh :(
Comment 13 Matt Turner gentoo-dev 2018-08-20 00:52:59 UTC
Looks like our build-id code is failing. Can you attach your Mesa build log and also the output of 'file /usr/lib64/dri/i965_dri.so'?
Comment 14 Bertrand Jacquin 2018-08-20 20:20:02 UTC
(In reply to Matt Turner from comment #13)
> Looks like our build-id code is failing. Can you attach your Mesa build log

See attachement

> and also the output of 'file /usr/lib64/dri/i965_dri.so'?

* mesa-18.1.6

$ file /usr/lib64/dri/i965_dri.so
/usr/lib64/dri/i965_dri.so: ELF 64-bit LSB pie executable x86-64, version 1 (SYSV), dynamically linked, stripped

$ file /usr/lib/debug/usr/lib64/dri/i965_dri.so.debug
/usr/lib/debug/usr/lib64/dri/i965_dri.so.debug: ELF 64-bit LSB shared object x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=88ff306e3cd93545d994354ce0ecbfb06e261846, with debug_info, not stripped

* mesa-10.2.0_rc3

$ file /usr/lib64/dri/i965_dri.so
/usr/lib64/dri/i965_dri.so: ELF 64-bit LSB pie executable x86-64, version 1 (SYSV), dynamically linked, stripped

$ file /usr/lib/debug/usr/lib64/dri/i965_dri.so.debug
/usr/lib/debug/usr/lib64/dri/i965_dri.so.debug: ELF 64-bit LSB shared object x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=5c01f4b72042b9788523a9c228e9dceac6a9f8da, with debug_info, not stripped
Comment 15 Bertrand Jacquin 2018-08-20 20:21:21 UTC
Created attachment 544210 [details]
build-mesa-18.1.6.log
Comment 16 Bertrand Jacquin 2018-08-20 20:21:42 UTC
Created attachment 544212 [details]
build-mesa-18.2.0_rc3.log
Comment 17 Bertrand Jacquin 2018-08-20 20:22:42 UTC
Created attachment 544214 [details]
gdb-mesa-18.1.6.log
Comment 18 Bertrand Jacquin 2018-08-20 20:23:04 UTC
Created attachment 544216 [details]
gdb-mesa-18.2.0_rc3.log
Comment 19 Matt Turner gentoo-dev 2018-08-20 23:30:10 UTC
> strip: x86_64-pc-linux-gnu-strip --strip-unneeded -R .comment -R .GCC.command.line -R .note.gnu.build-id -R .note.go.buildid -R .note.gnu.gold-version


We're stripping the build-id section off, which is necessary for the shader cache.

Maybe the portage team can offer some advice.
Comment 20 Bertrand Jacquin 2018-08-20 23:46:03 UTC
(In reply to Matt Turner from comment #19)
> > strip: x86_64-pc-linux-gnu-strip --strip-unneeded -R .comment -R .GCC.command.line -R .note.gnu.build-id -R .note.go.buildid -R .note.gnu.gold-version
> 
> 
> We're stripping the build-id section off, which is necessary for the shader
> cache.
> 
> Maybe the portage team can offer some advice.

Indeed, this is an option I have in my own make.conf. Is buildid necessary for mesa to work properly ?
Comment 21 Matt Turner gentoo-dev 2018-08-20 23:52:03 UTC
For the shader cache, yes, it is necessary.

In toolchain.eclass, there is a block of code:

    # # Turn on the -Wl,--build-id flag by default for ELF targets. #525942
    # # This helps with locating debug files.
    # case ${CTARGET} in
    # *-linux-*|*-elf|*-eabi)
    #   tc_version_is_at_least 4.5 && confgcc+=(
    #       --enable-linker-build-id
    #   )
    #   ;;
    # esac

It's all commented out. I don't think any binary on your system would have a .note.gnu.build-id even if you didn't strip them.

But Mesa explicitly wants this.

For my own education, how do you even add this in make.conf?
Comment 22 Bertrand Jacquin 2018-08-21 00:05:57 UTC
(In reply to Matt Turner from comment #21)
> For the shader cache, yes, it is necessary.
> 
> In toolchain.eclass, there is a block of code:
> 
>     # # Turn on the -Wl,--build-id flag by default for ELF targets. #525942
>     # # This helps with locating debug files.
>     # case ${CTARGET} in
>     # *-linux-*|*-elf|*-eabi)
>     #   tc_version_is_at_least 4.5 && confgcc+=(
>     #       --enable-linker-build-id
>     #   )
>     #   ;;
>     # esac
> 
> It's all commented out. I don't think any binary on your system would have a
> .note.gnu.build-id even if you didn't strip them.
> 
> But Mesa explicitly wants this.

Interesting, is this something needed for runtime or only for debugging purpose ?

> For my own education, how do you even add this in make.conf?

PORTAGE_STRIP_FLAGS="--strip-unneeded -R .comment -R .GCC.command.line -R .note.gnu.build-id -R .note.go.buildid -R .note.gnu.gold-version"

Which takes precedence of /usr/lib/portage/python3.6/estrip

: ${PORTAGE_STRIP_FLAGS=${SAFE_STRIP_FLAGS} ${DEF_STRIP_FLAGS}}

Would it make sense in mesa ebuilld to warn a user if PORTAGE_STRIP_FLAGS is manually defined ? Or to ensure -R .note.gnu.build-id is not included in PORTAGE_STRIP_FLAGS ?
Comment 23 Bertrand Jacquin 2018-08-21 00:13:00 UTC
> Interesting, is this something needed for runtime or only for debugging
> purpose ?

It looks like this is needed at runtime, after a rebuild with a default PORTAGE_STRIP_FLAGS, X startx properly
Comment 24 Matt Turner gentoo-dev 2018-08-21 00:25:45 UTC
Yes, Mesa reads its own build-id to determine what on-disk cached shaders it produced and only consumes those.

I have never heard of anyone setting PORTAGE_STRIP_FLAGS before, and since as far as I can tell -R .note.gnu.build-id does literally nothing except break Mesa, I think we should mark this as RESOLVED/INVALID.

I'm honestly not sure if we should warn users. Adding stuff like this without any idea why is not a recipe for success.
Comment 25 ad PC 2018-08-21 20:02:14 UTC
The issue is NOT resolved for me. This is the triggered assertion:

Xorg: ../mesa-18.2.0-rc2/src/mesa/drivers/dri/i965/intel_batchbuffer.c:724: execbuffer: Assertion `!(bo->kflags & EXEC_OBJECT_PINNED)' failed.

I will report it to upstream.
Comment 26 Bertrand Jacquin 2018-08-21 21:25:21 UTC
> Adding stuff like this
> without any idea why is not a recipe for success.

Are you saying I added that without any idea why ?
Comment 27 Matt Turner gentoo-dev 2018-08-22 01:35:32 UTC
(In reply to ad PC from comment #25)
> The issue is NOT resolved for me. This is the triggered assertion:
> 
> Xorg: ../mesa-18.2.0-rc2/src/mesa/drivers/dri/i965/intel_batchbuffer.c:724:
> execbuffer: Assertion `!(bo->kflags & EXEC_OBJECT_PINNED)' failed.
> 
> I will report it to upstream.

Thank you. Looks like you indeed have a different issue.

(In reply to Bertrand Jacquin from comment #26)
> > Adding stuff like this
> > without any idea why is not a recipe for success.
> 
> Are you saying I added that without any idea why ?

That would be my guess.
Comment 29 Bertrand Jacquin 2018-08-23 23:14:31 UTC
> (In reply to Bertrand Jacquin from comment #26)
> > > Adding stuff like this
> > > without any idea why is not a recipe for success.
> > 
> > Are you saying I added that without any idea why ?
> 
> That would be my guess.

You are making inappropriate assumption here.

No matter what, a SEGV should not be expected to happen, even if some build-id are missing.
Comment 30 Matt Turner gentoo-dev 2018-08-24 00:59:47 UTC
(In reply to Bertrand Jacquin from comment #29)
> > (In reply to Bertrand Jacquin from comment #26)
> > > > Adding stuff like this
> > > > without any idea why is not a recipe for success.
> > > 
> > > Are you saying I added that without any idea why ?
> > 
> > That would be my guess.
> 
> You are making inappropriate assumption here.

But is it correct?

> No matter what, a SEGV should not be expected to happen, even if some
> build-id are missing.

Patches welcome -- to be sent to mesa-dev@lists.freedesktop.org

I look at this as the same issue as users passing a bunch of -mno-sse* flags when their -march=... value did not imply them and breaking the build when they prevent run-time enabled optimizations from even compiling.

Sure, the build can be broken with flags that are perfectly legal. But there's no reason to pass those flags, so WTF are we doing that?
Comment 31 Jonas Stein gentoo-dev 2019-05-26 14:58:07 UTC
@ad PC, does the commit in

https://bugs.freedesktop.org/show_bug.cgi?id=107651#c4
fix it for you?
Comment 32 ad PC 2019-05-26 15:23:10 UTC
Yes, but I moved to a new kernel version. This incident can be closed now.