See upstream bug ticket for logs and detail description: https://bugs.freedesktop.org/show_bug.cgi?id=111077
Since a recent update a few days ago an application which barely consumes 2G RAM at full load is very slow to load and compiling shaders causes over 16G RAM to be consumed when the app eventually crashes.
I don't know what exactly in the update caused problems but certainly Mesa, the amdgpu driver and LLVM did get updates.
I also tried using Mesa 19.x but the problem is the same.
Driver is xf86-video-amdgpu-19.0.1 . LLVM is 7.0.x .
I've already deleted the mesa shader cache and all caches the application creates. I've totally recompiled the system (GenToo) to make sure no strange problems can be around. I've also tried with a completely fresh user to run the app.
before update (working state):
after update (memory consumption bug present):
- media-libs/mesa-18.3.6 (I also tested media-libs/mesa-19.0.6 and
media-libs/mesa-19.1.1 with same result)
It seems to problem happens with GenToo only. I could not recreate this problem on other Linux Distributions. No idea though how to progress from here. System is practically unusuable for anything related to 3D.
(In reply to Plüss Roland from comment #1)
> It seems to problem happens with GenToo only. I could not recreate this
> problem on other Linux Distributions. No idea though how to progress from
> here. System is practically unusuable for anything related to 3D.
Try to bisect, like we've talked about in the upstream bug. Maybe simplify the process by using llvm-6 so you don't have to apply a local change to each bisect point?
I tried emerging llvm-6.0.1 but mesa fails the same way not finding LLVM. Am I missing something configure specific?
I figured out something which I find strange. mesa config seems to check for "llvm-config" and seems to not find it. When I use my regular user I can find llvm-config in path (hashed at /usr/lib/llvm/7/bin/llvm-config). If I'm root though llvm-config can not be found anymore. Chances are this confuses the mesa build. But why does GenToo do it so strange? eselect has no llvm module so how comes it is like this?
Please attach a build log for mesa and your `emerge --info` output.
Created attachment 585212 [details]
emerge info as requested
Created attachment 585214 [details]
mesa build log
mesa build log as requested
llvm-config found: NO need '>= 3.9.0'
Dependency LLVM found: NO (tried config-tool)
meson.build:1017:2: ERROR: Dependency "llvm" not found, tried config-tool
I recompiled llvm and still got the missing llvm-config. Then I remembered something and tried using "su -" instead of "su". Now llvm-config is found. For game-deving I've got additional include, library and bin path in my .bashrc . Looks like some code gets wonkey if path show up inside one of these env variables that are non-root directories.
I tried compiling now with "su -" which gets passed LLVM but fails to compile mid through. I'll append the logs
Created attachment 585216 [details]
meson logs after fixing llvm problem
Created attachment 585218 [details]
ninja-log after fixing llvm problem
Yesterday I updated GenToo and a few packages received updates including mesa receiving a "downgrade" to media-libs/mesa-19.0.8::gentoo .
I tested and the bug still persists.
Sorry, I don't know how to help you anymore if you're not able to bisect.
Well, I posted the compile logs as requested. Anything you might see in there?
Not really, and those are not the logs I was requesting -- I meant /var/tmp/portage/media-libs/mesa-*/temp/build.log
If that log shows something interesting then the ones you posted might be useful.
But we should be able to zero in on the problem much more quickly if you can bisect. I'm not sure if there's a reason you're not able to do that.
I think we are miscommunicating here.
1) All mesa version available in portage right now can be emerged. These show the bug behaviour (so far on GenToo only). I'm not sure what you want those logs for but if you insist I can give them.
2) When trying to bisect-compile Mesa from sources using the configure line obtained from portage-mesa then compilation fails. All the logs available (meson, ninja) are attached. If you see something in might answer your question on "why it does not compiler" and hopefully on getting bisect-compile to work.
It looks like the "drive" to solve this problem does not seem to be very high. So let's try something else. How can I enable verbose building with "meson"? This might help to figure out why the build fails. I tried "--verbose" but "meson" complains not knowing this command line option.
The bisection fails because it doesn't compile, you say. Is that because of the failure in https://bugs.freedesktop.org/show_bug.cgi?id=111077#c11 that I told you how to work around?
I feel like I've given you enough information to proceed with a bisection, and if you feel like you have responded appropriately then I agree that we must be miscommunicating.
So let's try to clear things up.
(1) Can you manually build Mesa from git?
(1.1) if not, why not?
(2) Can you build Mesa from git from the first bisection point?
(2.1) if not, why not? Please explain what you tried and post a log.
(I have no idea what "ninja-log after fixing llvm problem" is. Please capture the output of the build process with something like "ninja install |& tee log" and then post the "log" file)
> Is that because of the failure in https://bugs.freedesktop.org/show_bug.cgi?id=111077#c11 that I told you how to work around?
Yes, that's where I'm stuck. But if I modify the sources I can not bisect correctly anymore falsifies the bisecting which is not of help.
> (1) Can you manually build Mesa from git?
> (1.1) if not, why not?
build failure as mentioned by you above
> (2) Can you build Mesa from git from the first bisection point?
> (2.1) if not, why not? Please explain what you tried and post a log.
> I have no idea what "ninja-log after fixing llvm problem" is.
when building Mesa a file .ninja_log is created in the build directly. I though this might be of help but it seems to not contain much useful information beyond the error shown on screen during compiling.
I posted the compiling result by mistake in the other bug report over at Mesa, my bad.
> It looks like the "drive" to solve this problem does not seem to be very high
This is what I don't get. You seem a little unhappy with my responsiveness but it's been a week since I told you (in the other bug report) to disable Clover to continue bisecting...
I've Cc'd myself on the upstream bug. I'm happy to continue helping. There's nothing I can do from the Gentoo side, so I'm going to mark this bug as UPSTREAM until we are able to bisect.
Request reopening since it's now back to a GenToo problem.
The reason I could not get back earlier are two-fold.
First I got in a bit of time-conflict so I had to resolve something else first.
Second when I tried to test (meaning install) the Git running any application caused Mesa/LLVM to horribly fail during shader compiletion. When I rebooted GenToo greated me with a black screen and I had a tricky time to get portage Mesa back to properly boot again.
So the ball is now back at GenToo. I can compile from GIT but GenToo runs into a black-screen. It seems GenToo prevents a successful bisectin so I move this bug back again here.
What options do we have now to continue? I guess something depends on Mesa too hard so rolling back like this causes troubles. Using compiled mesa without rebooting is not working and rebooting kills the entire system.
It's just "Gentoo" without a capital T.
You don't need to install the bisected Mesa to your system. In fact, I would recommend that you not do this, for exactly this reason -- if it's bad, it'll prevent you from using the system!
Install to some other directory and then run your application with
I tried running things that way and I get an LLVM error.
client glx vendor string: Mesa Project and SGI
OpenGL version string: 2.1 Mesa 18.0.0-rc2 (git-241aeb8eb0)
OpenGL ES profile version string: OpenGL ES 2.0 Mesa 18.0.0-rc2 (git-241aeb8eb0)
# LLVM error
LLVM ERROR: Cannot select: 0x7f5b400ade88: v4i32,ch = load<(dereferenceable invariant load 16 from %ir.17, addrspace 2)> 0x7f5b40034b38, 0x7f5b400ade20, undef:i32
0x7f5b400ade20: i32 = add 0x7f5b400ad328, Constant:i32<16>
0x7f5b400ad328: i32,ch = CopyFromReg 0x7f5b40034b38, Register:i32 %17
0x7f5b400ad2c0: i32 = Register %17
0x7f5b400addb8: i32 = Constant<16>
0x7f5b400adc18: i32 = undef
In function: main
Not sure what's going on there. Problem here is that using the GIT reference "origin/18.0" I end up at this revision. I see also an "origin/18.3" reference but I would need 18.2.8 to start the bisecting. Can I find this commit?
(In reply to Plüss Roland from comment #24)
> Not sure what's going on there. Problem here is that using the GIT reference
> "origin/18.0" I end up at this revision. I see also an "origin/18.3"
> reference but I would need 18.2.8 to start the bisecting. Can I find this
Yep, you can just do 'git checkout mesa-18.2.8' directly rather than checking out a stable branch (like "origin/18.0").
Similarly, 'git show <ref>' will show you the commit SHA1. E.g., git show mesa-18.2.8
(git tag will show a list of tagged commits, that correspond to particular versions of Mesa. They're always named "mesa-$version")
This is strange. Such a remote branch does not seem to exist:
error: pathspec 'mesa-18.2.8' did not match any file(s) known to git
Am I on the wrong GIT URL? https://github.com/evelikov/Mesa.git
(In reply to Plüss Roland from comment #26)
> This is strange. Such a remote branch does not seem to exist:
> error: pathspec 'mesa-18.2.8' did not match any file(s) known to git
> Am I on the wrong GIT URL? https://github.com/evelikov/Mesa.git
No... how did you find that URL? The right link is
(which is the first result when I search for 'mesa git' on google)
No idea how this came about. I googled for Mesa and GIT and ended up on a (most probably) older page of Mesa which redirected somewhere else and this had been the URL I ended up with. Whatever... that URL has the 18.2.8 branch. I can compile this one and it does indeed not show the memory bug. But now it's late at night. I'll tackle this tomorrow using the 18.2.8 branch as good bisection point.
Managed to do now a proper bisecting. Attached the report consisting of the git bisect finding and the bisect log leading up to that point.
Created attachment 588276 [details]
Now, let's see if reverting that commit on 19.0.8 will produce a working Mesa.
> git checkout mesa-19.0.8
> git revert --no-edit 9176703788c66de8287c6224650b1ff8d4238126
Then build and test like when you are bisecting.
(In reply to Matt Turner from comment #31)
> Now, let's see if reverting that commit on 19.0.8 will produce a working
> > git checkout mesa-19.0.8
One more thing. Right here, when you're on 19.0.8 and before you revert the commit, it's probably a good idea to double check that Mesa is broken. So, compile and test at this point and confirm that it doesn't work.
> > git revert --no-edit 9176703788c66de8287c6224650b1ff8d4238126
> Then build and test like when you are bisecting.
Then here, after the revert, your testing will tell you (if this commit is the culprit) that it is definitely at fault.
1) Check mesa-19.0.8 if it fails: Yes, bug is present
2) Check after reverting commit it works: Yes, bug no more present after reverting commit.
Great. I'll post on the FDO bug and see if we can get some progress there.
Are you not seeing the FDO bug emails? Marek (the AMD developer) needs info on how to reproduce the bug.
Sorry, I was looking on this list here.
I want to do a final comment here for people stumbling across this ticket at a later time by googling. The issue is not solved. If you have a similar issue please amend to the upstream bug ticket.