Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 949016 - dev-lang/ruby-3.2.6-r3: Fails to compile with lib/cgi/util.rb:93: [BUG] Segmentation fault at 0xfffffffffffffff8
Summary: dev-lang/ruby-3.2.6-r3: Fails to compile with lib/cgi/util.rb:93: [BUG] Segme...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: AMD64 Linux
: Normal normal
Assignee: Gentoo Ruby Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-01-29 16:55 UTC by bajcsielias78
Modified: 2025-02-03 11:06 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Build log file (build.txt,91.85 KB, text/plain)
2025-01-29 16:55 UTC, bajcsielias78
Details
emerge --info (file_949016.txt,10.75 KB, text/plain)
2025-01-29 16:56 UTC, bajcsielias78
Details
stack trace (file_949016.txt,11.74 KB, text/plain)
2025-01-29 21:28 UTC, bajcsielias78
Details
Valgrind stack trace (valgrind.txt,136.26 KB, text/plain)
2025-01-29 21:47 UTC, bajcsielias78
Details
new build log with valgrand (file_949016.txt,302.35 KB, text/plain)
2025-01-29 22:39 UTC, bajcsielias78
Details
Valrgind environment (file_949016.txt,1.48 KB, text/plain)
2025-01-29 22:41 UTC, bajcsielias78
Details
Valgrind new (file_949016.txt,23.86 KB, text/plain)
2025-01-29 22:47 UTC, bajcsielias78
Details
build noommitfp (file_949016.txt,105.56 KB, text/plain)
2025-01-29 22:57 UTC, bajcsielias78
Details
valgrind idk :) (file_949016.txt,21.71 KB, text/plain)
2025-01-29 22:59 UTC, bajcsielias78
Details

Note You need to log in before you can comment on or make changes to this bug.
Description bajcsielias78 2025-01-29 16:55:03 UTC
Created attachment 917852 [details]
Build log file

Ruby nolonger compiles and seems to have an issue in ruby-3.2.6/lib/cgi/util.rb:93: [BUG] Segmentation fault at 0xfffffffffffffff8
Comment 1 bajcsielias78 2025-01-29 16:56:40 UTC
Created attachment 917853 [details]
emerge --info
Comment 2 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-01-29 17:32:22 UTC
Did this version build for you before? If so, when? Could you perhaps give qlop -v output between whenever it last built and now, so we can see what packages got updated?
Comment 3 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-01-29 17:32:36 UTC
Also, how about with distcc off?
Comment 4 bajcsielias78 2025-01-29 18:49:07 UTC
> Did this version build for you before?
The r3 version, no, just plain ruby-3.2.6.

> Also, how about with distcc off?
Guess what? After trying to update ruby several times (the last time it built was about 1 month ago or so), it finally did, but not before syncing the edgets overlay which I only added yesterday and had to sync it 3 times already... makes sense /s.

Just to clarify, I didn't disable distcc and ruby and it's deps come from ::gentoo.

So perhaps there's an invalid memory access, like a race condition since it seems to compile by multiple threads using their own build system or something. (Not a ruby expert btw)
Comment 5 Mike Gilbert gentoo-dev 2025-01-29 19:18:50 UTC
Maybe flaky hardware? A memory test might turn something up.
Comment 6 bajcsielias78 2025-01-29 20:39:48 UTC
(In reply to Mike Gilbert from comment #5)
> Maybe flaky hardware? A memory test might turn something up.

Nope, just did a full memtest and it passed. Although I should do multiple tests to be 100% concludary, the RAM sticks are still pretty new and robust.
Comment 7 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-01-29 20:42:02 UTC
If you have 32GB of RAM, you can't do a thorough enough test in that timeframe ;)
Comment 8 bajcsielias78 2025-01-29 20:48:29 UTC
I also saw this report https://bugs.gentoo.org/932849 but since it had different filenames when the segfault appeared, I thought (and still partially do) that this is a different issue.

But at the same time, if there is a race condition, perhaps they are the same issues, no matter which filename it uses.

I don't know how to interpret it.
Comment 9 bajcsielias78 2025-01-29 20:48:53 UTC
(In reply to Sam James from comment #7)
> If you have 32GB of RAM, you can't do a thorough enough test in that
> timeframe ;)

One can only do so much :)
Comment 10 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-01-29 21:12:01 UTC
I generally recommend running it overnight.
Comment 11 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-01-29 21:14:08 UTC
You can also try running the failing command under Valgrind inside the build directory, hopefully it was:
./miniruby -I./lib -I. -I.ext/common  ./tool/generic_erb.rb -o builtin_binary.inc \
	./template/builtin_binary.inc.tmpl -- --cross=no
Comment 12 bajcsielias78 2025-01-29 21:27:08 UTC
Had to restart in order to run memtest, so the temp dirs got deleted.

But I started building ruby again, and not long after that, it crashed with the same filename:

/var/tmp/portage/dev-lang/ruby-3.2.6-r2/work/ruby-3.2.6/lib/cgi/util.rb:93: [BUG] Segmentation fault at 0xfffffffffffffff8

As per the command you've provided, here's the output in the attachement:
Comment 13 bajcsielias78 2025-01-29 21:28:16 UTC
Created attachment 917883 [details]
stack trace
Comment 14 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-01-29 21:28:54 UTC
(In reply to bajcsielias78 from comment #13)
> Created attachment 917883 [details]
> stack trace

Nice. Can you run it again under valgrind? (just prefix the command w/ 'valgrind', so valgrind ./miniruby ...)
Comment 15 bajcsielias78 2025-01-29 21:47:32 UTC
> Nice. Can you run it again under valgrind? (just prefix the command w/
> 'valgrind', so valgrind ./miniruby ...)

My bad, I didn't know what valgrind was until literally 3 minutes ago.
Comment 16 bajcsielias78 2025-01-29 21:47:58 UTC
Created attachment 917886 [details]
Valgrind stack trace
Comment 17 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-01-29 21:59:35 UTC
Gah. The garbage collection stuff is either noise or a real problem that is far beyond my ability to help.

Can you try again with USE=valgrind on Ruby, and also debugging symbols enabled (see https://wiki.gentoo.org/wiki/Debugging#Per-package)?

USE=valgrind on Ruby should mean it has suppressions for the GC noise.
Comment 18 bajcsielias78 2025-01-29 22:39:44 UTC
Created attachment 917889 [details]
new build log with valgrand
Comment 19 bajcsielias78 2025-01-29 22:41:10 UTC
Created attachment 917890 [details]
Valrgind environment
Comment 20 bajcsielias78 2025-01-29 22:42:07 UTC
Okay, done. Do I have to strip any debug symbols with gdb by any chance, or this is enough?
Comment 21 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-01-29 22:44:01 UTC
What you've done is enough, but it looks like we need to run another command under Valgrind, as that last log looks fine.


make[2]: Entering directory '/var/tmp/portage/dev-lang/ruby-3.2.6-r2/work/ruby-3.2.6/ext/rbconfig/sizeof'
../../../miniruby -I'../../..' -I'../../.././lib' -I'../../../.ext/x86_64-linux' -I'../../../.ext/common' ../../.././tool/generic_erb.rb --output=sizes.c \
	../../.././template/sizes.c.tmpl \
	../../.././configure.ac \
	../../.././ext/rbconfig/sizeof/extconf.rb

Try that one? So..

cd /var/tmp/portage/dev-lang/ruby-3.2.6-r2/work/ruby-3.2.6/ext/rbconfig/sizeof
valgrind ../../../miniruby -I'../../..' -I'../../.././lib' -I'../../../.ext/x86_64-linux' -I'../../../.ext/common' ../../.././tool/generic_erb.rb --output=sizes.c \
	../../.././template/sizes.c.tmpl \
	../../.././configure.ac \
	../../.././ext/rbconfig/sizeof/extconf.rb

But it's really weird that it's now a different file and also the error is:

> /var/tmp/portage/dev-lang/ruby-3.2.6-r2/work/ruby-3.2.6/.ext/x86_64-linux/cgi/escape.so: [BUG] Illegal instruction at 0x000055bd7c9b1e40

I'm afraid I do have to say again that there's a real chance of this ending up being a hardware problem. We do see it every so often (in fact, just last week someone built a new PC, and it turned out to be a HW problem). But let's see.
Comment 22 bajcsielias78 2025-01-29 22:47:31 UTC
Created attachment 917891 [details]
Valgrind new
Comment 23 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-01-29 22:50:08 UTC
```
==32703== 
==32703== Warning: client switching stacks?  SP change: 0x1ffe8020d0 --> 0x1fff0000f0
==32703==          to suppress, use: --max-stackframe=8380448 or greater
vex amd64->IR: unhandled instruction bytes: 0xF3 0x48 0xF 0xAE 0xEE 0x48 0x2D 0xFF 0x0 0x0
vex amd64->IR:   REX=1 REX.W=1 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=1
==32703== valgrind: Unrecognised instruction at address 0x1a5e40.
==32703==    at 0x1A5E40: rb_ec_tag_jump (eval_intern.h:162)
==32703==    by 0x1ABC17: rb_longjmp (eval.c:664)
==32703==    by 0x1ABDB3: rb_exc_exception (eval.c:677)
==32703==    by 0x1ABDD8: rb_exc_raise (eval.c:690)
==32703==    by 0x1A1CD4: raise_loaderror (error.c:3165)
==32703==    by 0x1A48CC: rb_loaderror (error.c:3177)
==32703==    by 0x1293C7: dln_load (dmydln.c:7)
==32703==    by 0x32C2C2: rb_vm_call_cfunc (vm.c:2679)
==32703==    by 0x1FFAF8: require_internal (load.c:1223)
==32703==    by 0x1FFD0C: rb_require_string_internal (load.c:1316)
==32703==    by 0x200322: rb_require_string (load.c:1309)
==32703==    by 0x32060B: vm_call_cfunc_with_frame (vm_insnhelper.c:3287)
==32703== Your program just tried to execute an instruction that Valgrind
==32703== did not recognise.  There are two possible reasons for this.
==32703== 1. Your program has a bug and erroneously jumped to a non-code
==32703==    location.  If you are running Memcheck and you just saw a
==32703==    warning about a bad jump, it's probably your program's fault.
==32703== 2. The instruction is legitimate but Valgrind doesn't handle it,
==32703==    i.e. it's Valgrind's fault.  If you think this is the case or
==32703==    you are not sure, please let us know and we'll try to fix it.
==32703== Either way, Valgrind will now raise a SIGILL signal which will
==32703== probably kill your program.
/var/tmp/portage/dev-lang/ruby-3.2.6-r2/work/ruby-3.2.6/.ext/x86_64-linux/cgi/escape.so: [BUG] Illegal instruction at 0x00000000001a5e40
ruby 3.2.6 (2024-10-30 revision 63aeb018eb) [x86_64-linux]
```

The illegal instruction under Valgrind is different from the thing I pasted above. I suspect it jumps to garbage (and then Valgrind sees it's garbage/unknown and dies because it can't decode it). The longjmp is suspicious.

I find it interesting that this is so reproducible for you. Tomorrow I'll try your *FLAGS and see if I can hit it.

One thing for you to try: can you try -fno-omit-frame-pointer in *FLAGS too?
Comment 24 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-01-29 22:53:19 UTC
Notably, it does:
> checking for setjmp type... [33;1m__builtin_setjmp[m

EXTRA_ECONF="--with-setjmp-type=setjmp" may or may not help.

It might be completely unrelated to this issue I've found though, just can't ignore that the crash has longjmp and then gibberish (which is a not-uncommon bug).
Comment 25 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-01-29 22:54:57 UTC
(In reply to Sam James from comment #24)
> Notably, it does:
> > checking for setjmp type... [33;1m__builtin_setjmp[m
> 
> EXTRA_ECONF="--with-setjmp-type=setjmp" may or may not help.
> 
> It might be completely unrelated to this issue I've found though, just can't
> ignore that the crash has longjmp and then gibberish (which is a
> not-uncommon bug).

Ah! In https://bugzilla.redhat.com/show_bug.cgi?id=1545239#c46, Jakub explains it wasn't really specific to arm64 and just luck, which sort of explains the bit I was worried about.
Comment 26 bajcsielias78 2025-01-29 22:57:20 UTC
Created attachment 917892 [details]
build noommitfp
Comment 27 bajcsielias78 2025-01-29 22:59:50 UTC
Created attachment 917893 [details]
valgrind idk :)
Comment 28 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-01-29 23:01:08 UTC
```
==18014== Invalid write of size 8
==18014==    at 0x33BEEC: vm_exec_handle_exception (vm.c:2583)
==18014==    by 0x33BEEC: rb_vm_exec (vm.c:2381)
==18014==  Address 0xfffffffffffffff8 is not stack'd, malloc'd or (recently) free'd
```

is absolutely where we go wrong, the question is why it even ends up handling an exception to begin with.

Try: EXTRA_ECONF="--with-setjmp-type=setjmp" emerge -v1 ...
Comment 29 bajcsielias78 2025-01-29 23:01:53 UTC
(In reply to bajcsielias78 from comment #27)
> Created attachment 917893 [details]
> valgrind idk :)

This one's w/o the EXTRA_ECONF feature btw
Comment 30 bajcsielias78 2025-01-29 23:28:08 UTC
> Try: EXTRA_ECONF="--with-setjmp-type=setjmp" emerge -v1 ...

I added this and built successfully 5 times in a row. I'd call it the fix.
Comment 31 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-01-29 23:31:44 UTC
Excellent! Thanks for persevering.
Comment 32 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-01-29 23:33:56 UTC
While at it, we should also fix 'filter-flags -fomit-frame-pointer' to be 'append-flags -fno-omit-frame-pointer' (given it's implied by -O* on various arches, right now it's doing nothing.. or get rid of it).
Comment 33 bajcsielias78 2025-01-29 23:46:43 UTC
(In reply to Sam James from comment #31)
> Excellent! Thanks for persevering.

No problem. It's actually the first time I had fun in talking with the "support chat".

Wish you well!
- Elias
Comment 34 Larry the Git Cow gentoo-dev 2025-01-30 00:10:49 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=950f851501f6dd30c32054048ca3b4af5dcda591

commit 950f851501f6dd30c32054048ca3b4af5dcda591
Author:     Sam James <sam@gentoo.org>
AuthorDate: 2025-01-30 00:08:07 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2025-01-30 00:08:07 +0000

    dev-lang/ruby: disable dangerous __builtin_setjmp; fixup FP filtering
    
    * Disable dangerous __builtin_setjmp. As discussed it in the bug, it
      really shouldn't be used pretty much ever - rather setjmp should be used.
    
      Ruby upstream are already disabling it for arm64 and others have pointed
      out that it should be done for all arches, but that hasn't happened yet.
    
      Anyway, a user hit the crash, so let's make the change on our end.
    
    * Fix -fno-omit-frame-pointer filtering. For quite some time, -O* on various
      arches already implies -fomit-frame-pointer, hence filtering -fomit-frame-pointer
      by itself isn't sufficient. Add an explicit `append-flags -fno-omit-frame-pointer`
      to get the desired effect. We can drop it entirely if desired but I'm not
      confident in doing that at this point.
    
    Closes: https://bugs.gentoo.org/949016
    Signed-off-by: Sam James <sam@gentoo.org>

 dev-lang/ruby/ruby-3.1.6-r3.ebuild | 289 +++++++++++++++++++++++++++++++++
 dev-lang/ruby/ruby-3.2.6-r4.ebuild | 295 ++++++++++++++++++++++++++++++++++
 dev-lang/ruby/ruby-3.3.7-r1.ebuild | 302 +++++++++++++++++++++++++++++++++++
 dev-lang/ruby/ruby-3.4.1-r1.ebuild | 316 +++++++++++++++++++++++++++++++++++++
 4 files changed, 1202 insertions(+)