Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 690066 - link_shader and deserialize_glsl_program suddenly consume huge amount of RAM
Summary: link_shader and deserialize_glsl_program suddenly consume huge amount of RAM
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Eclasses (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Gentoo X packagers
URL: https://bugs.freedesktop.org/show_bug...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-17 16:35 UTC by Plüss Roland
Modified: 2019-09-19 17:28 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge info (emerge_info,6.90 KB, text/plain)
2019-07-31 00:01 UTC, Plüss Roland
Details
mesa build log (meson-logs.txt,49.84 KB, text/plain)
2019-07-31 00:04 UTC, Plüss Roland
Details
meson logs after fixing llvm problem (meson-logs.txt,63.96 KB, text/plain)
2019-07-31 01:30 UTC, Plüss Roland
Details
ninja-log after fixing llvm problem (ninja_log,40.57 KB, text/plain)
2019-07-31 01:31 UTC, Plüss Roland
Details
bisect report (mesa_bisecting_report,3.92 KB, text/plain)
2019-08-27 17:39 UTC, Plüss Roland
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Plüss Roland 2019-07-17 16:35:10 UTC
See upstream bug ticket for logs and detail description: https://bugs.freedesktop.org/show_bug.cgi?id=111077

In short:

Since a recent update a few days ago an application which barely consumes 2G RAM at full load is very slow to load and compiling shaders causes over 16G RAM to be consumed when the app eventually crashes.

I don't know what exactly in the update caused problems but certainly Mesa, the amdgpu driver and LLVM did get updates.

I also tried using Mesa 19.x but the problem is the same.

Driver is xf86-video-amdgpu-19.0.1 . LLVM is 7.0.x .

I've already deleted the mesa shader cache and all caches the application creates. I've totally recompiled the system (GenToo) to make sure no strange problems can be around. I've also tried with a completely fresh user to run the app.

before update (working state):
media-libs/mesa-18.2.8
- x11-drivers/xf86-video-amdgpu-18.1.0
- x11-libs/libdrm-2.4.96
- sys-devel/llvm-6.0.1
- sys-devel/llvmgold-6
- sys-devel/llvm-common-6.0.1

after update (memory consumption bug present):
- media-libs/mesa-18.3.6 (I also tested media-libs/mesa-19.0.6 and
  media-libs/mesa-19.1.1 with same result)
- x11-drivers/xf86-video-amdgpu-19.0.1
- x11-libs/libdrm-2.4.97
- sys-devel/llvm-7.1.0
- sys-devel/llvmgold-7
- sys-devel/llvm-common-7.1.0
Comment 1 Plüss Roland 2019-07-28 16:02:43 UTC
It seems to problem happens with GenToo only. I could not recreate this problem on other Linux Distributions. No idea though how to progress from here. System is practically unusuable for anything related to 3D.
Comment 2 Matt Turner gentoo-dev 2019-07-28 16:49:36 UTC
(In reply to Plüss Roland from comment #1)
> It seems to problem happens with GenToo only. I could not recreate this
> problem on other Linux Distributions. No idea though how to progress from
> here. System is practically unusuable for anything related to 3D.

Try to bisect, like we've talked about in the upstream bug. Maybe simplify the process by using llvm-6 so you don't have to apply a local change to each bisect point?
Comment 3 Plüss Roland 2019-07-30 01:23:52 UTC
I tried emerging llvm-6.0.1 but mesa fails the same way not finding LLVM. Am I missing something configure specific?
Comment 4 Plüss Roland 2019-07-30 01:26:31 UTC
I figured out something which I find strange. mesa config seems to check for "llvm-config" and seems to not find it. When I use my regular user I can find llvm-config in path (hashed at /usr/lib/llvm/7/bin/llvm-config). If I'm root though llvm-config can not be found anymore. Chances are this confuses the mesa build. But why does GenToo do it so strange? eselect has no llvm module so how comes it is like this?
Comment 5 Matt Turner gentoo-dev 2019-07-30 05:34:25 UTC
Please attach a build log for mesa and your `emerge --info` output.
Comment 6 Plüss Roland 2019-07-31 00:01:20 UTC
Created attachment 585212 [details]
emerge info

emerge info as requested
Comment 7 Plüss Roland 2019-07-31 00:04:24 UTC
Created attachment 585214 [details]
mesa build log

mesa build log as requested
Comment 8 Matt Turner gentoo-dev 2019-07-31 00:08:44 UTC
Strange:

llvm-config found: NO need '>= 3.9.0'
Dependency LLVM found: NO (tried config-tool)

meson.build:1017:2: ERROR: Dependency "llvm" not found, tried config-tool
Comment 9 Plüss Roland 2019-07-31 01:30:36 UTC
I recompiled llvm and still got the missing llvm-config. Then I remembered something and tried using "su -" instead of "su". Now llvm-config is found. For game-deving I've got additional include, library and bin path in my .bashrc . Looks like some code gets wonkey if path show up inside one of these env variables that are non-root directories.

I tried compiling now with "su -" which gets passed LLVM but fails to compile mid through. I'll append the logs
Comment 10 Plüss Roland 2019-07-31 01:30:59 UTC
Created attachment 585216 [details]
meson logs after fixing llvm problem
Comment 11 Plüss Roland 2019-07-31 01:31:21 UTC
Created attachment 585218 [details]
ninja-log after fixing llvm problem
Comment 12 Plüss Roland 2019-08-11 08:37:09 UTC
Yesterday I updated GenToo and a few packages received updates including mesa receiving a "downgrade" to media-libs/mesa-19.0.8::gentoo .

I tested and the bug still persists.
Comment 13 Matt Turner gentoo-dev 2019-08-11 17:00:56 UTC
Sorry, I don't know how to help you anymore if you're not able to bisect.
Comment 14 Plüss Roland 2019-08-11 17:26:30 UTC
Well, I posted the compile logs as requested. Anything you might see in there?
Comment 15 Matt Turner gentoo-dev 2019-08-11 17:38:44 UTC
Not really, and those are not the logs I was requesting -- I meant /var/tmp/portage/media-libs/mesa-*/temp/build.log

If that log shows something interesting then the ones you posted might be useful.

But we should be able to zero in on the problem much more quickly if you can bisect. I'm not sure if there's a reason you're not able to do that.
Comment 16 Plüss Roland 2019-08-11 18:52:21 UTC
I think we are miscommunicating here.

1) All mesa version available in portage right now can be emerged. These show the bug behaviour (so far on GenToo only). I'm not sure what you want those logs for but if you insist I can give them.

2) When trying to bisect-compile Mesa from sources using the configure line obtained from portage-mesa then compilation fails. All the logs available (meson, ninja) are attached. If you see something in might answer your question on "why it does not compiler" and hopefully on getting bisect-compile to work.
Comment 17 Plüss Roland 2019-08-16 17:44:27 UTC
It looks like the "drive" to solve this problem does not seem to be very high. So let's try something else. How can I enable verbose building with "meson"? This might help to figure out why the build fails. I tried "--verbose" but "meson" complains not knowing this command line option.
Comment 18 Matt Turner gentoo-dev 2019-08-16 19:00:18 UTC
The bisection fails because it doesn't compile, you say. Is that because of the failure in https://bugs.freedesktop.org/show_bug.cgi?id=111077#c11 that I told you how to work around?

I feel like I've given you enough information to proceed with a bisection, and if you feel like you have responded appropriately then I agree that we must be miscommunicating.

So let's try to clear things up.

(1) Can you manually build Mesa from git?
(1.1) if not, why not?

(2) Can you build Mesa from git from the first bisection point?
(2.1) if not, why not? Please explain what you tried and post a log.

(I have no idea what "ninja-log after fixing llvm problem" is. Please capture the output of the build process with something like "ninja install |& tee log" and then post the "log" file)
Comment 19 Plüss Roland 2019-08-17 07:06:05 UTC
> Is that because of the failure in https://bugs.freedesktop.org/show_bug.cgi?id=111077#c11 that I told you how to work around?
Yes, that's where I'm stuck. But if I modify the sources I can not bisect correctly anymore falsifies the bisecting which is not of help.

> (1) Can you manually build Mesa from git?
no

> (1.1) if not, why not?
build failure as mentioned by you above

> (2) Can you build Mesa from git from the first bisection point?
no

> (2.1) if not, why not? Please explain what you tried and post a log.
see 1.1

> I have no idea what "ninja-log after fixing llvm problem" is.
when building Mesa a file .ninja_log is created in the build directly. I though this might be of help but it seems to not contain much useful information beyond the error shown on screen during compiling.
Comment 20 Plüss Roland 2019-08-17 07:26:54 UTC
I posted the compiling result by mistake in the other bug report over at Mesa, my bad.
Comment 21 Matt Turner gentoo-dev 2019-08-24 16:36:12 UTC
> It looks like the "drive" to solve this problem does not seem to be very high

This is what I don't get. You seem a little unhappy with my responsiveness but it's been a week since I told you (in the other bug report) to disable Clover to continue bisecting...

I've Cc'd myself on the upstream bug. I'm happy to continue helping. There's nothing I can do from the Gentoo side, so I'm going to mark this bug as UPSTREAM until we are able to bisect.
Comment 22 Plüss Roland 2019-08-25 18:41:21 UTC
Request reopening since it's now back to a GenToo problem.

The reason I could not get back earlier are two-fold.

First I got in a bit of time-conflict so I had to resolve something else first.

Second when I tried to test (meaning install) the Git running any application caused Mesa/LLVM to horribly fail during shader compiletion. When I rebooted GenToo greated me with a black screen and I had a tricky time to get portage Mesa back to properly boot again.

So the ball is now back at GenToo. I can compile from GIT but GenToo runs into a black-screen. It seems GenToo prevents a successful bisectin so I move this bug back again here.

What options do we have now to continue? I guess something depends on Mesa too hard so rolling back like this causes troubles. Using compiled mesa without rebooting is not working and rebooting kills the entire system.
Comment 23 Matt Turner gentoo-dev 2019-08-26 00:30:43 UTC
It's just "Gentoo" without a capital T.

You don't need to install the bisected Mesa to your system. In fact, I would recommend that you not do this, for exactly this reason -- if it's bad, it'll prevent you from using the system!

Install to some other directory and then run your application with 

LIBGL_DRIVERS_PATH=/path/to/install/lib64/dri LD_LIBRARY_PATH=/path/to/install/lib64
Comment 24 Plüss Roland 2019-08-26 19:08:51 UTC
I tried running things that way and I get an LLVM error.

# glxinfo
client glx vendor string: Mesa Project and SGI
OpenGL version string: 2.1 Mesa 18.0.0-rc2 (git-241aeb8eb0)
OpenGL ES profile version string: OpenGL ES 2.0 Mesa 18.0.0-rc2 (git-241aeb8eb0)

# LLVM error
LLVM ERROR: Cannot select: 0x7f5b400ade88: v4i32,ch = load<(dereferenceable invariant load 16 from %ir.17, addrspace 2)> 0x7f5b40034b38, 0x7f5b400ade20, undef:i32
  0x7f5b400ade20: i32 = add 0x7f5b400ad328, Constant:i32<16>
    0x7f5b400ad328: i32,ch = CopyFromReg 0x7f5b40034b38, Register:i32 %17
      0x7f5b400ad2c0: i32 = Register %17
    0x7f5b400addb8: i32 = Constant<16>
  0x7f5b400adc18: i32 = undef
In function: main



Not sure what's going on there. Problem here is that using the GIT reference "origin/18.0" I end up at this revision. I see also an "origin/18.3" reference but I would need 18.2.8 to start the bisecting. Can I find this commit?
Comment 25 Matt Turner gentoo-dev 2019-08-26 21:34:14 UTC
(In reply to Plüss Roland from comment #24)
> Not sure what's going on there. Problem here is that using the GIT reference
> "origin/18.0" I end up at this revision. I see also an "origin/18.3"
> reference but I would need 18.2.8 to start the bisecting. Can I find this
> commit?

Yep, you can just do 'git checkout mesa-18.2.8' directly rather than checking out a stable branch (like "origin/18.0").

Similarly, 'git show <ref>' will show you the commit SHA1. E.g., git show mesa-18.2.8

(git tag will show a list of tagged commits, that correspond to particular versions of Mesa. They're always named "mesa-$version")
Comment 26 Plüss Roland 2019-08-26 22:25:05 UTC
This is strange. Such a remote branch does not seem to exist:

error: pathspec 'mesa-18.2.8' did not match any file(s) known to git

Am I on the wrong GIT URL? https://github.com/evelikov/Mesa.git
Comment 27 Matt Turner gentoo-dev 2019-08-26 22:31:16 UTC
(In reply to Plüss Roland from comment #26)
> This is strange. Such a remote branch does not seem to exist:
> 
> error: pathspec 'mesa-18.2.8' did not match any file(s) known to git
> 
> Am I on the wrong GIT URL? https://github.com/evelikov/Mesa.git

No... how did you find that URL? The right link is

https://gitlab.freedesktop.org/mesa/mesa/

(which is the first result when I search for 'mesa git' on google)
Comment 28 Plüss Roland 2019-08-26 23:30:30 UTC
No idea how this came about. I googled for Mesa and GIT and ended up on a (most probably) older page of Mesa which redirected somewhere else and this had been the URL I ended up with. Whatever... that URL has the 18.2.8 branch. I can compile this one and it does indeed not show the memory bug. But now it's late at night. I'll tackle this tomorrow using the 18.2.8 branch as good bisection point.
Comment 29 Plüss Roland 2019-08-27 17:38:57 UTC
Managed to do now a proper bisecting. Attached the report consisting of the git bisect finding and the bisect log leading up to that point.
Comment 30 Plüss Roland 2019-08-27 17:39:23 UTC
Created attachment 588276 [details]
bisect report
Comment 31 Matt Turner gentoo-dev 2019-08-27 19:54:24 UTC
Excellent!

Now, let's see if reverting that commit on 19.0.8 will produce a working Mesa.

> git checkout mesa-19.0.8
> git revert --no-edit 9176703788c66de8287c6224650b1ff8d4238126

Then build and test like when you are bisecting.
Comment 32 Matt Turner gentoo-dev 2019-08-27 19:56:03 UTC
(In reply to Matt Turner from comment #31)
> Excellent!
> 
> Now, let's see if reverting that commit on 19.0.8 will produce a working
> Mesa.
> 
> > git checkout mesa-19.0.8

One more thing. Right here, when you're on 19.0.8 and before you revert the commit, it's probably a good idea to double check that Mesa is broken. So, compile and test at this point and confirm that it doesn't work.

> > git revert --no-edit 9176703788c66de8287c6224650b1ff8d4238126
> 
> Then build and test like when you are bisecting.

Then here, after the revert, your testing will tell you (if this commit is the culprit) that it is definitely at fault.
Comment 33 Plüss Roland 2019-08-28 16:36:49 UTC
1) Check mesa-19.0.8 if it fails: Yes, bug is present
2) Check after reverting commit it works: Yes, bug no more present after reverting commit.
Comment 34 Matt Turner gentoo-dev 2019-08-28 18:32:45 UTC
Great. I'll post on the FDO bug and see if we can get some progress there.
Comment 35 Matt Turner gentoo-dev 2019-09-02 15:43:11 UTC
Are you not seeing the FDO bug emails? Marek (the AMD developer) needs info on how to reproduce the bug.
Comment 36 Plüss Roland 2019-09-02 17:24:20 UTC
Sorry, I was looking on this list here.
Comment 37 Plüss Roland 2019-09-19 17:28:50 UTC
I want to do a final comment here for people stumbling across this ticket at a later time by googling. The issue is not solved. If you have a similar issue please amend to the upstream bug ticket.