Summary: | Can't log in to Gnome: gnome-shell segfault error 4 in libcogl.so.12.1.1 when user is not in "video" group | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Markus Goppelt <markus.goppelt> |
Component: | [OLD] GNOME | Assignee: | Gentoo systemd Team <systemd> |
Status: | RESOLVED CANTFIX | ||
Severity: | normal | CC: | alexander, gnome, jer, xarthisius |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | AMD64 | ||
OS: | Linux | ||
URL: | https://wiki.gentoo.org/wiki/Fglrx#Permissions | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 463242 | ||
Attachments: |
/var/log/messages
Output of "emerge cogl" Relevant part of /var/log/messages full backtrace with gdb output of glxinfo systemd-apply-acl-nvidia.patch |
Description
Markus Goppelt
2013-08-24 15:07:54 UTC
*** This bug has been marked as a duplicate of bug 481348 *** Created attachment 356902 [details]
Output of "emerge cogl"
The patch (see emerge.cogl) from https://bugs.gentoo.org/show_bug.cgi?id=481348 doesn't help: Aug 24 22:20:10 phenom gnome-session[3537]: gnome-session[3537]: WARNING: Application 'gnome-shell.desktop' killed by signal 11 Aug 24 22:20:10 phenom kernel: gnome-shell[3613]: segfault at 0 ip 00007f6c82f88b80 sp 00007fffa3f3ed68 error 4 in libcogl.so.12.1.1[7f6c82f61000+9b000] Aug 24 22:20:10 phenom gnome-session[3537]: WARNING: Application 'gnome-shell.desktop' killed by signal 11 Aug 24 22:20:10 phenom gnome-session[3537]: libGL error: open uki failed (Operation not permitted) Aug 24 22:20:10 phenom gnome-session[3537]: libGL error: reverting to (slow) indirect rendering Aug 24 22:20:10 phenom kernel: gnome-shell[3633]: segfault at 0 ip 00007fe8be0f3b80 sp 00007ffffeb4dc28 error 4 in libcogl.so.12.1.1[7fe8be0cc000+9b000] Aug 24 22:20:10 phenom gnome-session[3537]: gnome-session[3537]: WARNING: Application 'gnome-shell.desktop' killed by signal 11 Aug 24 22:20:10 phenom gnome-session[3537]: gnome-session[3537]: WARNING: App 'gnome-shell.desktop' respawning too quickly Aug 24 22:20:10 phenom gnome-session[3537]: WARNING: Application 'gnome-shell.desktop' killed by signal 11 Aug 24 22:20:10 phenom gnome-session[3537]: WARNING: App 'gnome-shell.desktop' respawning too quickly Aug 24 22:20:10 phenom gnome-session[3537]: Unrecoverable failure in required component gnome-shell.desktop And no Option "Xinerama" in my xorg.conf. You will need to get a better backtrace then: https://wiki.gentoo.org/index.php?title=GNOME/3.8-upgrade-guide&redirect=no#Getting_backtraces Created attachment 356966 [details]
Relevant part of /var/log/messages
Created attachment 356968 [details]
full backtrace with gdb
Problem appears to occur in cogl-driver-gl.c at line 270 (see backtrace): Program terminated with signal 11, Segmentation fault. #0 parse_gl_version (version_string=0x0, major_out=major_out@entry=0x7f74a0, minor_out=minor_out@entry=0x7f74a4) at ./driver/gl/gl/cogl-driver-gl.c:270 @Pacho: Does the backtrace provide enough Information for the developers? Created attachment 356978 [details]
output of glxinfo
The problem could be OpenGL shading language version string: (null) (see output of glxinfo). I looked at parse_gl_version in cogl-driver-gl.c: The segfault could result from version_string being NULL. The problem was that testuser wasn't in the "video" group. With OpenGL shading language version string: 4.20 the segfault doesn't occur (login works). (In reply to Markus Goppelt from comment #10) > The problem was that testuser wasn't in the "video" group. With > > OpenGL shading language version string: 4.20 > > the segfault doesn't occur (login works). I thought systemd was supposed to handle groups itself, not sure if that ACL problems I see in logs could be involved :/ (In reply to Pacho Ramos from comment #11) I suspect this is the case only with Catalyst driver. Markus, could you recheck with xf86-video-ati? But first check that you have enabled CONFIG_TMPFS_POSIX_ACL in your kernel config. The ACL things disappeared after recompiling "my" kernel: grep -i acl /usr/src/linux/.config CONFIG_EXT2_FS_POSIX_ACL=y CONFIG_EXT3_FS_POSIX_ACL=y CONFIG_EXT4_FS_POSIX_ACL=y CONFIG_FS_POSIX_ACL=y CONFIG_GENERIC_ACL=y CONFIG_TMPFS_POSIX_ACL=y # CONFIG_NFS_V3_ACL is not set # CONFIG_NFSD_V3_ACL is not set But still libcogl segfaults with testuser not in the "video" group. At least with x11-drivers/ati-drivers. Do I need radeon-ucode for xf86-video-ati? (In reply to Markus Goppelt from comment #14) > Do I need radeon-ucode for xf86-video-ati? This depend on a video card model, but most likely you need it. It does not segfault with xf86-video-ati. (In reply to Markus Goppelt from comment #16) > It does not segfault with xf86-video-ati. Are you sure that it use the hardware acceleration and not the software rendering (llvmpipe)? You can found this in glxinfo output. glxinfo ... direct rendering: Yes ... No difference between user "markus" (in "video") and user "testuser" (not in "video"). (In reply to Markus Goppelt from comment #18) > glxinfo > ... > direct rendering: Yes No. You should look at the "OpenGL renderer string". For example on my system: hardware accceleration $ glxinfo | egrep '^(OpenGL|direct)' direct rendering: Yes OpenGL vendor string: X.Org OpenGL renderer string: Gallium 0.4 on AMD CAPE VERDE OpenGL version string: 2.1 Mesa 9.2.0-rc2 OpenGL shading language version string: 1.30 OpenGL extensions: llvmpipe $ glxinfo | egrep '^(OpenGL|direct)' direct rendering: Yes OpenGL vendor string: VMware, Inc. OpenGL renderer string: Gallium 0.4 on llvmpipe (LLVM 3.3, 128 bits) OpenGL version string: 2.1 Mesa 9.2.0-rc2 OpenGL shading language version string: 1.30 OpenGL extensions: Same thing. For user "markus": direct rendering: Yes OpenGL vendor string: X.Org OpenGL renderer string: Gallium 0.4 on AMD TURKS OpenGL core profile version string: 3.1 (Core Profile) Mesa 9.1.6 OpenGL core profile shading language version string: 1.40 OpenGL core profile context flags: (none) OpenGL core profile extensions: OpenGL version string: 3.0 Mesa 9.1.6 OpenGL shading language version string: 1.30 OpenGL context flags: (none) OpenGL extensions: For "testuser" (not in "video"): direct rendering: Yes OpenGL vendor string: X.Org OpenGL renderer string: Gallium 0.4 on AMD TURKS OpenGL core profile version string: 3.1 (Core Profile) Mesa 9.1.6 OpenGL core profile shading language version string: 1.40 OpenGL core profile context flags: (none) OpenGL core profile extensions: OpenGL version string: 3.0 Mesa 9.1.6 OpenGL shading language version string: 1.30 OpenGL context flags: (none) OpenGL extensions: Thank you, Markus. @systemd I just noticed that consolekit ebuild does this check: use acl && CONFIG_CHECK="~TMPFS_POSIX_ACL" How about adding it to systemd ebuild? (In reply to Alexander Tsoy from comment #21) > Thank you, Markus. > > @systemd > I just noticed that consolekit ebuild does this check: > use acl && CONFIG_CHECK="~TMPFS_POSIX_ACL" > How about adding it to systemd ebuild? If it helps anyone, why not. Will CC ati-drivers maintainers as, per wiki doc, looks like (at least with consolekit) adding user to "video" shouldn't be needed, but I don't know if it's the same for systemd (usually, it is) (In reply to Michał Górny from comment #22) > > @systemd > > I just noticed that consolekit ebuild does this check: > > use acl && CONFIG_CHECK="~TMPFS_POSIX_ACL" > > How about adding it to systemd ebuild? > > If it helps anyone, why not. See comment 13 and comment 14. CONFIG_TMPFS_POSIX_ACL also affects the acl support on devtmpfs. If this option is disabled, then systemd-logind is unable to control access rights to devices. Relevant part of the log: Aug 25 14:21:11 phenom systemd-logind[2303]: Failed to apply ACLs: Operation not supported Aug 25 14:21:23 phenom systemd-logind[2303]: Failed to apply ACLs: Operation not supported + 06 Sep 2013; Pacho Ramos <pacho@gentoo.org> systemd-204.ebuild, + systemd-206-r3.ebuild: + Check for TMPFS_POSIX_ACL when needed (#482336#c24 by Alexander Tsoy) + Markus, please retry building with acl and enabling needed kernel options (In reply to Pacho Ramos from comment #25) > + 06 Sep 2013; Pacho Ramos <pacho@gentoo.org> systemd-204.ebuild, > + systemd-206-r3.ebuild: > + Check for TMPFS_POSIX_ACL when needed (#482336#c24 by Alexander Tsoy) > + > > Markus, please retry building with acl and enabling needed kernel options My systemd has acl and my kernel has CONFIG_TMPFS_POSIX_ACL. Segfault with ati-drivers, no segfault with xf86-video-ati: "OpenGL renderer string: Gallium 0.4 on AMD TURKS" . After putting user "testuser" in group "video" also no segfault with ati-drivers: "OpenGL renderer string: AMD Radeon HD 6670" . And I'm getting trained at switching back and forth between ati-drivers and xf86-video-ati. Do you have 3D acceleration when not being in "video" group? Yes, I already checked in Comment 20. (In reply to Markus Goppelt from comment #29) > Yes, I already checked in Comment 20. That looks to be with free driver, I mean with proprietary one, the one that makes cogl crash, to see what occurs when you launch any other stuff needing 3D (In reply to Pacho Ramos from comment #30) > (In reply to Markus Goppelt from comment #29) > > Yes, I already checked in Comment 20. > > That looks to be with free driver, I mean with proprietary one, the one that > makes cogl crash, to see what occurs when you launch any other stuff needing > 3D This require to run another DE/WM. =/ Also see the output of "getfacl /dev/dri/card0" in both cases (user in "video" group and user not in "video" group). I did glxinfo with twm. ati-drivers without "video": direct rendering: No (If you want to find out why, try setting LIBGL_DEBUG=verbose) OpenGL vendor string: ATI Technologies Inc. OpenGL renderer string: AMD Radeon HD 6670 OpenGL version string: 2.1 (4.2.12217 Compatibility Profile Context 8.861) OpenGL shading language version string: (null) OpenGL extensions: With "video": direct rendering: Yes OpenGL vendor string: Advanced Micro Devices, Inc. OpenGL renderer string: AMD Radeon HD 6670 OpenGL core profile version string: 4.2.12217 Core Profile Context 8.861 OpenGL core profile shading language version string: 4.20 OpenGL core profile context flags: (none) OpenGL core profile profile mask: core profile OpenGL core profile extensions: OpenGL version string: 4.2.12217 Compatibility Profile Context 8.861 OpenGL shading language version string: 4.20 OpenGL context flags: (none) OpenGL profile mask: compatibility profile OpenGL extensions: xf86-video-ati without "video": libGL error: failed to load driver: r600 libGL error: Try again with LIBGL_DEBUG=verbose for more details. After that segmentation fault. With "video": direct rendering: Yes OpenGL vendor string: X.Org OpenGL renderer string: Gallium 0.4 on AMD TURKS OpenGL core profile version string: 3.1 (Core Profile) Mesa 9.1.6 OpenGL core profile shading language version string: 1.40 OpenGL core profile context flags: (none) OpenGL core profile extensions: OpenGL version string: 3.0 Mesa 9.1.6 OpenGL shading language version string: 1.30 OpenGL context flags: (none) OpenGL extensions: I did the getfacl thing for xf86-video-ati. No difference between "video" and no "video": getfacl: Removing leading '/' from absolute path names # file: dev/dri/card0 # owner: root # group: video user::rw- user:testuser:rw- group::rw- mask::rw- other::--- Should I do getfacl for ati-drivers? I would compare both drivers, yes :/ getfacl for ati-drivers: getfacl: Removing leading '/' from absolute path names # file: dev/ati/card0 # owner: root # group: video user::rw- group::rw- other::--- Same result for "video" and no "video". And there is no /dev/dri folder with ati-drivers. (In reply to Markus Goppelt from comment #34) > getfacl for ati-drivers: > > getfacl: Removing leading '/' from absolute path names > # file: dev/ati/card0 ... > > And there is no /dev/dri folder with ati-drivers. Does ati-drivers install udev rules? I guess "uaccess" tag is needed for ati device. An example for drm subsystem: root # pwd /lib/udev/rules.d root # grep -r drm ./ ./50-udev-default.rules:SUBSYSTEM=="drm", GROUP="video" ./70-uaccess.rules:SUBSYSTEM=="drm", KERNEL=="card*", TAG+="uaccess" (In reply to Alexander Tsoy from comment #35) > (In reply to Markus Goppelt from comment #34) > > getfacl for ati-drivers: > > > > getfacl: Removing leading '/' from absolute path names > > # file: dev/ati/card0 > ... > > > > And there is no /dev/dri folder with ati-drivers. > > Does ati-drivers install udev rules? I guess "uaccess" tag is needed for ati > device. > > An example for drm subsystem: > > root # pwd > /lib/udev/rules.d > root # grep -r drm ./ > ./50-udev-default.rules:SUBSYSTEM=="drm", GROUP="video" > ./70-uaccess.rules:SUBSYSTEM=="drm", KERNEL=="card*", TAG+="uaccess" Please, execute as root the following command and post the output # udevadm info -a -p $(udevadm info -q path -n /dev/ati/card0) root@phenom ~ # udevadm info -q path -n /dev/ati/card0 device node not found root@phenom ~ # ls /dev/ati/card0 -l crw-rw---- 1 root video 251, 0 Sep 13 16:09 /dev/ati/card0 (In reply to Markus Goppelt from comment #37) > root@phenom ~ # udevadm info -q path -n /dev/ati/card0 > device node not found This means that device node is not created by udev, so it is not in udev database. And according to the systemd-login sources [1] ACLs get applied only on devices with "uaccess" tag. This probably can be workarounded by creating udev rule containing something like this: <matching rules>, OPTIONS+="static_node=ati/card0" TAGS+="uaccess" but imho it's not worth to do it. Adding warning to ebuild (like in nvidia-drivers) and a note in systemd documentation should be enough. [1] http://cgit.freedesktop.org/systemd/systemd/tree/src/login/logind-acl.c (In reply to Alexander Tsoy from comment #38) > Adding warning to ebuild (like in nvidia-drivers) and a note in > systemd documentation should be enough. I mean adding warning to ati-drivers ebuild. Also docs from the $URL seems to be outdated. @Alexander, per http://wiki.gentoo.org/wiki/Fglrx#Permissions , looks like consolekit was able to handle this, is this a logind regression or doc needs fixing? (In reply to Pacho Ramos from comment #40) > @Alexander, per http://wiki.gentoo.org/wiki/Fglrx#Permissions , looks like > consolekit was able to handle this, is this a logind regression or doc needs > fixing? AFAIK, consolekit uses "udev-acl" tag for the same purposes as logind uses "uaccess". May be Markus misconfigured something and device /dev/dri/card0 should exist, and /dev/ati/card0 is unrelated to the problem? Currently I have no time to test ati-drivers on my hardened system. %) (In reply to Alexander Tsoy from comment #41) > AFAIK, consolekit uses "udev-acl" tag for the same purposes as logind uses > "uaccess". Related part of 70-udev-acl.rules installed by consolekit: # DRI video devices SUBSYSTEM=="drm", KERNEL=="card*", TAG+="udev-acl" Also see udev-acl.c from consolekit sources. (In reply to Alexander Tsoy from comment #41) > (In reply to Pacho Ramos from comment #40) > > @Alexander, per http://wiki.gentoo.org/wiki/Fglrx#Permissions , looks like > > consolekit was able to handle this, is this a logind regression or doc needs > > fixing? > > AFAIK, consolekit uses "udev-acl" tag for the same purposes as logind uses > "uaccess". May be Markus misconfigured something and device /dev/dri/card0 > should exist, and /dev/ati/card0 is unrelated to the problem? Currently I > have no time to test ati-drivers on my hardened system. %) /dev/ati/card0 is very much related to the problem. I checked with twm. Doing chmod o+rw /dev/ati/card0 enabled direct rendering for testuser (while not in "video"). glxinfo went from "direct rendering: No" to "direct rendering: Yes". startx resets the permissions to rw-rw----. In both cases "OpenGL renderer string: AMD Radeon HD 6670" . Can anyone in systemd team look about this issue with giving needing permissions for ati devices? I am lose here :| (and cannot test as I only have setups with nvidia and intel) My father reported the same problem to me with /dev/nvidiactl and /dev/nvidia0 when using nvidia-drivers <= 304.x while it works with newer drivers :/ It's solved by adding people to "video" group for that old drivers, while not needed with newer, but I don't know the reason for this difference :( Created attachment 370342 [details, diff] systemd-apply-acl-nvidia.patch This is: https://bugzilla.novell.com/show_bug.cgi?id=808319 They apply attached patch to fix it for nvidia... for fglrx driver I am unsure :S, maybe it depends on the version you are running Not sure if ati-drivers and nvidia-drivers maintainers know a better way to handle this -> I don't know how nvidia upstream expects us to handle this issue and how ATI expects to do the appropriate with their drivers (In reply to Pacho Ramos from comment #48) > Not sure if ati-drivers and nvidia-drivers maintainers know a better way to > handle this -> I don't know how nvidia upstream expects us to handle this > issue and how ATI expects to do the appropriate with their drivers All that x11-drivers/nvidia-drivers does with regard to security is set the "video" group through NVreg_DeviceFileGID and NVreg_DeviceFileMode (see files/nvidia-169.07). If systemd disregards that, then it should also reimplement a similar scheme to prevent non-privileged users from accessing the device. Nvidia (upstream) basically do not care about implementing security at all. But it looks like media-libs/cogl is simply trying to access something it's not permitted to, in which case it should gracefully fall back, I guess. OpenSuSE's author of the patch replied me the following: Pacho Ramos wrote: > I have found your mail from the patch you are applying on opensuse for: > https://bugzilla.novell.com/show_bug.cgi?id=808319 > > Would like to know if you tried to forward it to systemd upstream and > how did it end since we doubt about how to better handle this on Gentoo > (either applying your patch or simply telling people to add them to > video group) The video group is a bad solution. On openSUSE we try to avoid groups as much as possible. Systemd upstream did not want to solve the problem directly IIRC. They came up with the dead device nodes for udev. In theory that could be used for the nvidia nodes as well. I was also considering to write a dummy kernel module that just pretends to provide the devices. Since our systemd maintainer didn't mind the patch we didn't pursue the other options further though. The current patch is here: https://build.opensuse.org/package/view_file/Base:System/systemd/apply-ACL-for-nvidia-device-nodes.patch?expand=1 cu Ludwig -- (o_ Ludwig Nussel //\ V_/_ http://www.suse.de/ SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) ---- He just allowed me to forward the mail ;) I don't know if there's a bug that needs solving here, but ati-drivers is dead, and will not be supported for X. |