This comes from https://github.com/gentoo/gentoo/pull/41130. Filing this for completeness and for future reference. commonmarker-2.1.1 has been RIIR'd and when using RUSTONIG_SYSTEM_LIBONIG=1, tests segfault. I'm going to recreate all my comments on GH here.
First go: ``` /var/tmp/portage/dev-ruby/commonmarker-2.1.1/work/ruby32/commonmarker-2.1.1 $ gdb --args ruby -Ilib:test:. -e 'Dir["test/*_test.rb"].each {|f| require f}' ............... Thread 1 "ruby" received signal SIGSEGV, Segmentation fault. 0x00007ffff760ab01 in main_arena () from /usr/lib64/libc.so.6 (gdb) bt #0 0x00007ffff760ab01 in main_arena () from /usr/lib64/libc.so.6 #1 0x0000000000000001 in ?? () #2 0x0000000000000001 in ?? () #3 0x0000000000000004 in ?? () #4 0x0000000000000001 in ?? () #5 0x0000000000000003 in ?? () #6 0x00007fffdbd8bbc0 in OnigEncodingGB18030 () from /usr/lib64/libonig.so.5 #7 0x0000000000000001 in ?? () #8 0x00007fffdba431b6 in core::core_arch::x86::m128iExt::as_i8x16 () from /var/tmp/portage/dev-ruby/commonmarker-2.1.1/work/ruby32/commonmarker-2.1.1/lib/commonmarker/commonmarker.so #9 0x00005555558fcb10 in ?? () #10 0x0000000000000004 in ?? () #11 0x00007fffdb9acd91 in <Q as hashbrown::Equivalent<K>>::equivalent () from /var/tmp/portage/dev-ruby/commonmarker-2.1.1/work/ruby32/commonmarker-2.1.1/lib/commonmarker/commonmarker.so #12 0x00007fffdb9b5a44 in hashbrown::map::equivalent_key::{{closure}} () from /var/tmp/portage/dev-ruby/commonmarker-2.1.1/work/ruby32/commonmarker-2.1.1/lib/commonmarker/commonmarker.so #13 0x00007fffdb979647 in hashbrown::raw::RawTable<T,A>::find::{{closure}} () from /var/tmp/portage/dev-ruby/commonmarker-2.1.1/work/ruby32/commonmarker-2.1.1/lib/commonmarker/commonmarker.so #14 0x00007fffdb979381 in hashbrown::raw::RawTable<T,A>::find () from /var/tmp/portage/dev-ruby/commonmarker-2.1.1/work/ruby32/commonmarker-2.1.1/lib/commonmarker/commonmarker.so #15 0x00007fffdb9b64d6 in hashbrown::map::HashMap<K,V,S,A>::get_inner () from /var/tmp/portage/dev-ruby/commonmarker-2.1.1/work/ruby32/commonmarker-2.1.1/lib/commonmarker/commonmarker.so #16 0x00007fffdb979eca in std::collections::hash::map::HashMap<K,V,S>::get () from /var/tmp/portage/dev-ruby/commonmarker-2.1.1/work/ruby32/commonmarker-2.1.1/lib/commonmarker/commonmarker.so #17 0x00007fffdb98cd5d in syntect::parsing::scope::ScopeRepository::atom_to_index () from /var/tmp/portage/dev-ruby/commonmarker-2.1.1/work/ruby32/commonmarker-2.1.1/lib/commonmarker/commonmarker.so #18 0x0000000000000000 in ?? () ```` Rebuilding with more debug symbols.
OK, reproduced it manually: ``` ~/bugs/commonmarker $ valgrind --vgdb-error=1 --suppressions=/tmp/supp ruby34 --disable-jit -Ilib:test:. -e 'Dir["test/*_test.rb"].each {|f| require f}' ==2680191== Memcheck, a memory error detector ==2680191== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al. ==2680191== Using Valgrind-3.25.0.GIT and LibVEX; rerun with -h for copyright info ==2680191== Command: ruby34 --disable-jit -Ilib:test:. -e Dir["test/*_test.rb"].each\ {|f|\ require\ f} ==2680191== ==2680191== ==2680191== TO DEBUG THIS PROCESS USING GDB: start GDB like this ==2680191== /path/to/gdb ruby34 ==2680191== and then give GDB the following command ==2680191== target remote | /usr/libexec/valgrind/../../bin/vgdb --pid=2680191 ==2680191== --pid is optional if only one valgrind process is running ==2680191== ==2680191== Warning: set address range perms: large range [0x6e7e000, 0x1ee7e000) (defined) Run options: --seed 48819 # Running: .................................................==2680191== Use of uninitialised value of size 8 ==2680191== at 0x22753C7D: match_at (regexec.c:3069) ==2680191== by 0x227590D6: search_in_range (regexec.c:5713) ==2680191== by 0x2275A58E: onig_search_with_param (regexec.c:5839) ==2680191== by 0x224422F6: onig::Regex::search_with_param (lib.rs:723) ==2680191== by 0x223F1804: syntect::parsing::regex::regex_impl::Regex::search (regex.rs:174) ==2680191== by 0x223D7750: syntect::parsing::regex::Regex::search (regex.rs:64) ==2680191== by 0x22419015: syntect::parsing::parser::ParseState::search (parser.rs:449) ==2680191== by 0x22418850: syntect::parsing::parser::ParseState::find_best_match (parser.rs:374) ==2680191== by 0x224172FA: syntect::parsing::parser::ParseState::parse_next_token (parser.rs:274) ==2680191== by 0x22416F5B: syntect::parsing::parser::ParseState::parse_line (parser.rs:240) ==2680191== by 0x223C2101: syntect::easy::HighlightLines::highlight_line (easy.rs:68) ==2680191== by 0x2231115C: comrak::plugins::syntect::SyntectAdapter::highlight_html (syntect.rs:46) ==2680191== ==2680191== (action on error) vgdb me ... ``` Then w/ (v)gdb: ``` #0 0x0000000022753c7d in match_at (reg=reg@entry=0x2199f300, str=str@entry=0x240f5580 "puts 'wow'\nd", end=end@entry=0x240f558b "d", in_right_range=in_right_range@entry=0x240f558b "d", sstart=sstart@entry=0x240f5580 "puts 'wow'\nd", msa=msa@entry=0x1ffeff7540) at /usr/src/debug/dev-libs/oniguruma-6.9.10/onig-6.9.10/src/regexec.c:3069 #1 0x00000000227590d7 in search_in_range (reg=0x2199f300, str=0x240f5580 "puts 'wow'\nd", end=0x240f558b "d", start=<optimized out>, range=0x240f558b "d", data_range=0x240f558b "d", region=0x1ffeff90e0, option=0, mp=0x21461e40) at /usr/src/debug/dev-libs/oniguruma-6.9.10/onig-6.9.10/src/regexec.c:5713 #2 0x000000002275a58f in onig_search_with_param (reg=<optimized out>, str=<optimized out>, end=<optimized out>, start=<optimized out>, range=<optimized out>, region=<optimized out>, option=0, mp=0x21461e40) at /usr/src/debug/dev-libs/oniguruma-6.9.10/onig-6.9.10/src/regexec.c:5839 #3 0x00000000224422f7 in onig::Regex::search_with_param<&str> (self=0x231fd840, chars="puts 'wow'\n", from=0, to=11, options=..., region=..., match_param=...) at src/lib.rs:723 #4 0x00000000223f1805 in syntect::parsing::regex::regex_impl::Regex::search (self=0x231fd840, text="puts 'wow'\n", begin=0, end=11, region=...) at src/parsing/regex.rs:174 #5 0x00000000223d7751 in syntect::parsing::regex::Regex::search (self=0x231fd820, text="puts 'wow'\n", begin=0, end=11, region=...) at src/parsing/regex.rs:64 #6 0x0000000022419016 in syntect::parsing::parser::ParseState::search (self=0x1ffeff9660, line="puts 'wow'\n", start=0, match_pat=0x231fd800, captures=..., search_cache=0x1ffeff9100, regions=0x1ffeff90e0) at src/parsing/parser.rs:449 #7 0x0000000022418851 in syntect::parsing::parser::ParseState::find_best_match (self=0x1ffeff9660, line="puts 'wow'\n", start=0, syntax_set=0x1ffeffd8b0, search_cache=0x1ffeff9100, regions=0x1ffeff90e0, check_pop_loop=false) at src/parsing/parser.rs:374 #8 0x00000000224172fb in syntect::parsing::parser::ParseState::parse_next_token (self=0x1ffeff9660, line="puts 'wow'\n", syntax_set=0x1ffeffd8b0, start=0x1ffeff8fe0, search_cache=0x1ffeff9100, regions=0x1ffeff90e0, non_consuming_push_at=0x1ffeff9120, ops=0x1ffeff8fe8) at src/parsing/parser.rs:274 #9 0x0000000022416f5c in syntect::parsing::parser::ParseState::parse_line (self=0x1ffeff9660, line="puts 'wow'\n", syntax_set=0x1ffeffd8b0) at src/parsing/parser.rs:240 #10 0x00000000223c2102 in syntect::easy::HighlightLines::highlight_line (self=0x1ffeff9628, line="puts 'wow'\n", syntax_set=0x1ffeffd8b0) at src/easy.rs:68 #11 0x000000002231115d in comrak::plugins::syntect::SyntectAdapter::highlight_html (self=0x1ffeffd8b0, code="puts 'wow'\n", syntax=0x235fff10) at src/plugins/syntect.rs:46 #12 0x0000000022311563 in comrak::plugins::syntect::{impl#1}::write_highlighted (self=0x1ffeffd8b0, output=..., lang=..., code="puts 'wow'\n") at src/plugins/syntect.rs:94 #13 0x00000000222de262 in comrak::html::HtmlFormatter::format_node (self=0x1ffeffd418, node=0x220e88b0, entering=true) at src/html.rs:635 #14 0x00000000222da6f9 in comrak::html::HtmlFormatter::format (self=0x1ffeffd418, node=0x1ffeffdaa8, plain=false) at src/html.rs:403 #15 0x00000000222d6a53 in comrak::html::format_document_with_plugins (root=0x1ffeffdaa8, options=0x1ffeffd648, output=..., plugins=0x1ffeffd890) at src/html.rs:40 #16 0x0000000022236288 in commonmarker::node::CommonmarkerNode::to_html (self=0x218c3cf0, args=&[magnus::value::Value](size=1) = {...}) at ext/commonmarker/src/node.rs:1028 #17 0x0000000022290a33 in core::ops::function::Fn::call<fn(&commonmarker::node::CommonmarkerNode, &[magnus::value::Value]) -> core::result::Result<alloc::string::String, magnus::error::Error>, (&commonmarker::node::CommonmarkerNode, &[magnus::value::Value])> () at /usr/lib/rust/1.85.1/lib/rustlib/src/rust/library/core/src/ops/function.rs:79 #18 0x0000000022226ee2 in magnus::method::MethodCAry::call_convert_value<fn(&commonmarker::node::CommonmarkerNode, &[magnus::value::Value]) -> core::result::Result<alloc::string::String, magnus::error::Error>, &commonmarker::node::CommonmarkerNode, core::result::Result<alloc::string::String, magnus::error::Error>> (self=0x53983d000, argc=1, argv=0x5398ac8, rb_self=...) at /home/sam/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/magnus-0.7.1/src/method.rs:601 #19 0x000000002228ec33 in magnus::method::MethodCAry::call_handle_error::{closure#0}<fn(&commonmarker::node::CommonmarkerNode, &[magnus::value::Value]) -> core::result::Result<alloc::string::String, magnus::error::Error>, &commonmarker::node::CommonmarkerNode, core::result::Result<alloc::string::String, magnus::error::Error>> () at /home/sam/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/magnus-0.7.1/src/method.rs:607 #20 0x00000000222a6f30 in core::panic::unwind_safe::{impl#23}::call_once<core::result::Result<magnus::value::Value, magnus::error::Error>, magnus::method::MethodCAry::call_handle_error::{closure_env#0}<fn(&commonmarker::node::CommonmarkerNode, &[magnus::value::Value]) -> core::result::Result<alloc::string::String, magnus::error::Error>, &commonmarker::node::CommonmarkerNode, core::result::Result<alloc::string::String, magnus::error::Error>>> (self=...) at /usr/lib/rust/1.85.1/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272 #21 0x000000002225ffd0 in std::panicking::try::do_call<core::panic::unwind_safe::AssertUnwindSafe<magnus::method::MethodCAry::call_handle_error::{closure_env#0}<fn(&commonmarker::node::CommonmarkerNode, &[magnus::value::Value]) -> core::result::Result<alloc::string::String, magnus::error::Error>, &commonmarker::node::CommonmarkerNode, core::result::Result<alloc::string::String, magnus::error::Error>>>, core::result::Result<magnus::value::Value, magnus::error::Error>> (data=0x1ffeffe110) at /usr/lib/rust/1.85.1/lib/rustlib/src/rust/library/std/src/panicking.rs:584 #22 0x00000000222a6a0b in __rust_try () from /home/sam/bugs/commonmarker/lib/commonmarker/commonmarker.so #23 0x00000000222a321d in std::panicking::try<core::result::Result<magnus::value::Value, magnus::error::Error>, core::panic::unwind_safe::AssertUnwindSafe<magnus::method::MethodCAry::call_handle_error::{closure_env#0}<fn(&commonmarker::node::CommonmarkerNode, &[magnus::value::Value]) -> core::result::Result<alloc::string::String, magnus::error::Error>, &commonmarker::node::CommonmarkerNode, core::result::Result<alloc::string::String, magnus::error::Error>>>> (f=...) at /usr/lib/rust/1.85.1/lib/rustlib/src/rust/library/std/src/panicking.rs:547 #24 std::panic::catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<magnus::method::MethodCAry::call_handle_error::{closure_env#0}<fn(&commonmarker::node::CommonmarkerNode, &[magnus::value::Value]) -> core::result::Result<alloc::string::String, magnus::error::Error>, &commonmarker::node::CommonmarkerNode, core::result::Result<alloc::string::String, magnus::error::Error>>>, core::result::Result<magnus::value::Value, magnus::error::Error>> (f=...) at /usr/lib/rust/1.85.1/lib/rustlib/src/rust/library/std/src/panic.rs:358 #25 0x0000000022226bd4 in magnus::method::MethodCAry::call_handle_error<fn(&commonmarker::node::CommonmarkerNode, &[magnus::value::Value]) -> core::result::Result<alloc::string::String, magnus::error::Error>, &commonmarker::node::CommonmarkerNode, core::result::Result<alloc::string::String, magnus::error::Error>> (self=0x1ffeffe30000, argc=1, argv=0x5398ac8, rb_self=...) at /home/sam/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/magnus-0.7.1/src/method.rs:606 #26 0x0000000022239297 in commonmarker::node::init::anon (argc=1, argv=0x5398ac8, rb_self=...) at /home/sam/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/magnus-0.7.1/src/method.rs:850 #27 0x0000000004bfe637 in vm_call_cfunc_with_frame_ (ec=0x53983d0, reg_cfp=0x5498278, calling=<optimized out>, argc=<optimized out>, argv=0x5398ac8, stack_bottom=<optimized out>) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/vm_insnhelper.c:3801 #28 0x0000000004bfd51f in vm_sendish.constprop.0 (ec=<optimized out>, reg_cfp=<optimized out>, cd=<optimized out>, block_handler=<optimized out>, method_explorer=mexp_search_method) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/vm_insnhelper.c:5968 #29 0x0000000004c09daf in vm_exec_core (ec=0x227ce200 <FinishCode.1>) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/insns.def:898 #30 0x0000000004c1f23c in vm_exec_loop (ec=<optimized out>, state=<optimized out>, tag=<optimized out>, result=<optimized out>) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/vm.c:2622 #31 rb_vm_exec (ec=0x53983d0) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/vm.c:2598 #32 0x0000000004c0ffc7 in vm_yield_with_cref (ec=<optimized out>, argc=1, argv=0x1ffeffe758, kw_splat=0, cref=0x0, is_lambda=0) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/vm.c:1676 #33 vm_yield (ec=<optimized out>, argc=1, argv=0x1ffeffe758, kw_splat=0) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/vm.c:1684 #34 rb_yield_0 (argc=1, argv=0x1ffeffe758) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/vm_eval.c:1344 #35 rb_yield (val=<optimized out>) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/vm_eval.c:1360 #36 0x00000000048ce8cd in rb_ary_each (ary=679977560) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/array.c:2641 #37 0x0000000004bfe637 in vm_call_cfunc_with_frame_ (ec=0x53983d0, reg_cfp=0x54984e0, calling=<optimized out>, argc=<optimized out>, argv=0x5398900, stack_bottom=<optimized out>) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/vm_insnhelper.c:3801 #38 0x0000000004bfd51f in vm_sendish.constprop.0 (ec=<optimized out>, reg_cfp=<optimized out>, cd=<optimized out>, block_handler=<optimized out>, method_explorer=mexp_search_method) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/vm_insnhelper.c:5968 #39 0x0000000004c0a79b in vm_exec_core (ec=0x227ce200 <FinishCode.1>) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/insns.def:851 #40 0x0000000004c1f23c in vm_exec_loop (ec=<optimized out>, state=<optimized out>, tag=<optimized out>, result=<optimized out>) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/vm.c:2622 #41 rb_vm_exec (ec=0x53983d0) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/vm.c:2598 #42 0x0000000004c0ffc7 in vm_yield_with_cref (ec=<optimized out>, argc=1, argv=0x1ffeffebd8, kw_splat=0, cref=0x0, is_lambda=0) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/vm.c:1676 #43 vm_yield (ec=<optimized out>, argc=1, argv=0x1ffeffebd8, kw_splat=0) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/vm.c:1684 #44 rb_yield_0 (argc=1, argv=0x1ffeffebd8) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/vm_eval.c:1344 #45 rb_yield (val=<optimized out>) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/vm_eval.c:1360 #46 0x00000000048cec6c in rb_ary_collect (ary=599776640) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/array.c:3645 #47 0x0000000004bfe637 in vm_call_cfunc_with_frame_ (ec=0x53983d0, reg_cfp=0x5498630, calling=<optimized out>, argc=<optimized out>, argv=0x53987e0, stack_bottom=<optimized out>) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/vm_insnhelper.c:3801 #48 0x0000000004bfd51f in vm_sendish.constprop.0 (ec=<optimized out>, reg_cfp=<optimized out>, cd=<optimized out>, block_handler=<optimized out>, method_explorer=mexp_search_method) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/vm_insnhelper.c:5968 #49 0x0000000004c0a79b in vm_exec_core (ec=0x227ce200 <FinishCode.1>) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/insns.def:851 #50 0x0000000004c1f23c in vm_exec_loop (ec=<optimized out>, state=<optimized out>, tag=<optimized out>, result=<optimized out>) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/vm.c:2622 #51 rb_vm_exec (ec=0x53983d0) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/vm.c:2598 #52 0x00000000049afa4e in rb_vm_invoke_proc (ec=<optimized out>, proc=<optimized out>, argc=<optimized out>, argv=<optimized out>, kw_splat=0, passed_block_handler=0) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/vm.c:1770 #53 rb_proc_call_kw (self=<optimized out>, args=<optimized out>, kw_splat=0) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/proc.c:988 #54 rb_proc_call (self=568030360, args=<optimized out>) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/proc.c:998 #55 rb_call_end_proc (data=568030360) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/eval_jump.c:13 #56 0x00000000049b51ce in exec_end_procs_chain (procs=0x4f77d88 <end_procs.lto_priv>, errp=0x5398440) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/eval_jump.c:105 #57 rb_ec_exec_end_proc (ec=ec@entry=0x53983d0) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/eval_jump.c:121 #58 0x00000000049b72b4 in rb_ec_teardown (ec=0x53983d0) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/eval.c:155 #59 0x00000000049b8882 in rb_ec_cleanup (ec=0x53983d0, ex=RUBY_TAG_NONE) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/eval.c:207 #60 0x00000000049b94f0 in ruby_run_node (n=<optimized out>) at /usr/src/debug/dev-lang/ruby-3.4.2/ruby-3.4.2/eval.c:319 #61 0x00000000001083f1 in rb_main (argc=5, argv=0x1ffefff6d8) at ./main.c:43 #62 main (argc=<optimized out>, argv=<optimized out>) at ./main.c:68 ```
I don't get where the uninitialised use is supposed to be. Poking at each of the variables in `frame `0, they all look fine. The only funky thing is `end` where Valgrind's `monitor` can't tell me anything useful: ``` (gdb) monitor xb 0x240f558b 8 __ __ __ __ __ __ __ __ 0x240F558B: 0x?? 0x?? 0x?? 0x?? 0x?? 0x?? 0x?? 0x?? Address 0x240F558B len 8 has 8 bytes unaddressable (gdb) p 0x240f558b $30 = 604984715 (gdb) p *0x240f558b $31 = 100 (gdb) p (char*)0x240f558b $32 = 0x240f558b "d" ```
Maybe it's `msa`? ``` (gdb) p msa $33 = (MatchArg *) 0x1ffeff7540 (gdb) p *msa $34 = { stack_p = 0x0, stack_n = 11, options = 0, region = 0x1ffeff90e0, ptr_num = 2, start = 0x240f5580 "puts 'wow'\nd", match_stack_limit = 0, retry_limit_in_match = 10000000, retry_limit_in_search = 0, retry_limit_in_search_counter = 0, mp = 0x21461e40, best_len = -1, best_s = 0xb <error: Cannot access memory at address 0xb>, subexp_call_in_search_counter = 0, skip_search = 0x240f5580 "puts 'wow'\nd" } (gdb) p sizeof(*msa) $36 = 112 (gdb) monitor xb 0x1ffeff7540 112 00 00 00 00 00 00 00 00 0x1FFEFF7540: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 ff ff ff ff 00 00 00 00 0x1FFEFF7548: 0x0b 0x00 0x00 0x00 0x00 0x00 0x00 0x00 00 00 00 00 00 00 00 00 0x1FFEFF7550: 0xe0 0x90 0xff 0xfe 0x1f 0x00 0x00 0x00 00 00 00 00 ff ff ff ff 0x1FFEFF7558: 0x02 0x00 0x00 0x00 0x00 0x00 0x00 0x00 00 00 00 00 00 00 00 00 0x1FFEFF7560: 0x80 0x55 0x0f 0x24 0x00 0x00 0x00 0x00 00 00 00 00 ff ff ff ff 0x1FFEFF7568: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 00 00 00 00 00 00 00 00 0x1FFEFF7570: 0x80 0x96 0x98 0x00 0x00 0x00 0x00 0x00 00 00 00 00 00 00 00 00 0x1FFEFF7578: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 00 00 00 00 00 00 00 00 0x1FFEFF7580: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 00 00 00 00 00 00 00 00 0x1FFEFF7588: 0x40 0x1e 0x46 0x21 0x00 0x00 0x00 0x00 00 00 00 00 ff ff ff ff 0x1FFEFF7590: 0xff 0xff 0xff 0xff 0x00 0x00 0x00 0x00 ff ff ff ff ff ff ff ff 0x1FFEFF7598: 0x0b 0x00 0x00 0x00 0x00 0x00 0x00 0x00 00 00 00 00 00 00 00 00 0x1FFEFF75A0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 00 00 00 00 00 00 00 00 0x1FFEFF75A8: 0x80 0x55 0x0f 0x24 0x00 0x00 0x00 0x00 (gdb) ```
Carrying on from there: ``` ==2680191== Continuing ... ==2680191== Jump to the invalid address stated on the next line ==2680191== at 0x10000E003B0001: ??? ==2680191== by 0x227590D6: search_in_range (regexec.c:5713) ==2680191== by 0x2275A58E: onig_search_with_param (regexec.c:5839) ==2680191== by 0x224422F6: onig::Regex::search_with_param (lib.rs:723) ==2680191== by 0x223F1804: syntect::parsing::regex::regex_impl::Regex::search (regex.rs:174) ==2680191== by 0x223D7750: syntect::parsing::regex::Regex::search (regex.rs:64) ==2680191== by 0x22419015: syntect::parsing::parser::ParseState::search (parser.rs:449) ==2680191== by 0x22418850: syntect::parsing::parser::ParseState::find_best_match (parser.rs:374) ==2680191== by 0x224172FA: syntect::parsing::parser::ParseState::parse_next_token (parser.rs:274) ==2680191== by 0x22416F5B: syntect::parsing::parser::ParseState::parse_line (parser.rs:240) ==2680191== by 0x223C2101: syntect::easy::HighlightLines::highlight_line (easy.rs:68) ==2680191== by 0x2231115C: comrak::plugins::syntect::SyntectAdapter::highlight_html (syntect.rs:46) ==2680191== Address 0x10000e003b0001 is not stack'd, malloc'd or (recently) free'd ==2680191== ==2680191== (action on error) vgdb me ... ``` ``` Thread 1 received signal SIGTRAP, Trace/breakpoint trap. 0x0010000e003b0001 in ?? () (gdb) bt #0 0x0010000e003b0001 in ?? () #1 0x0000001ffeff6564 in ?? () #2 0x0000000000000004 in ?? () #3 0x0000000000000003 in ?? () #4 0x00000000227ce200 in RetryLimitInMatch () from /usr/lib64/libonig.so.5 #5 0x0000001ffeff6564 in ?? () #6 0x0000000000000004 in ?? () #7 0x00000000247caad3 in ?? () #8 0x0000001ffeff65f7 in ?? () #9 0x0000000000000001 in ?? () #10 0x0000000000000001 in ?? () #11 0x0000000000000001 in ?? () #12 0x0000000000000001 in ?? () #13 0x0000000000000001 in ?? () #14 0x0000000000000001 in ?? () #15 0x0000000000000001 in ?? () #16 0x00000000224b6476 in core::slice::index::{impl#4}::get_unchecked_mut<u8> (slice=...) at /usr/lib/rust/1.85.1/lib/rustlib/src/rust/library/core/src/ub_checks.rs:75 #17 core::slice::index::{impl#7}::get_unchecked_mut<u8> (slice=...) at /usr/lib/rust/1.85.1/lib/rustlib/src/rust/library/core/src/slice/index.rs:555 #18 core::slice::index::{impl#7}::index_mut<u8> (self=..., slice=&mut [u8](size=1) = {...}) at /usr/lib/rust/1.85.1/lib/rustlib/src/rust/library/core/src/slice/index.rs:573 #19 0x00000000224b9831 in core::slice::index::{impl#1}::index_mut<u8, core::ops::range::RangeFrom<usize>> (self=&mut [u8](size=576437034) = {...}, index=...) at /usr/lib/rust/1.85.1/lib/rustlib/src/rust/library/core/src/slice/index.rs:27 #20 miniz_oxide::inflate::stream::push_dict_out (state=0x247c7e90, next_out=0x1ffeff6150) at src/inflate/stream.rs:371 #21 0x00000000224b9131 in miniz_oxide::inflate::stream::inflate (state=0x247c7e90, input=&[u8](size=0), output=&mut [u8](size=1) = {...}, flush=miniz_oxide::MZFlush::Finish) at src/inflate/stream.rs:272 #22 0x00000000224a9ad4 in flate2::ffi::rust::{impl#2}::decompress (self=0x1ffeff8cf8, input=&[u8](size=0), output=&mut [u8](size=1) = {...}, flush=flate2::mem::FlushDecompress::Finish) at src/ffi/rust.rs:72 #23 0x00000000224c32d2 in core::slice::index::{impl#4}::get_unchecked<u8> (slice=&[u8](size=0)) at /usr/lib/rust/1.85.1/lib/rustlib/src/rust/library/core/src/ub_checks.rs:75 #24 core::slice::index::{impl#7}::get_unchecked<u8> (slice=&[u8](size=0)) at /usr/lib/rust/1.85.1/lib/rustlib/src/rust/library/core/src/slice/index.rs:549 #25 core::slice::index::{impl#7}::index<u8> (self=..., slice=&[u8](size=0)) at /usr/lib/rust/1.85.1/lib/rustlib/src/rust/library/core/src/slice/index.rs:564 #26 0x000000002241c241 in core::slice::index::{impl#0}::index<u8, core::ops::range::RangeFrom<usize>> (self=&[u8](size=0), index=...) at /usr/lib/rust/1.85.1/lib/rustlib/src/rust/library/core/src/slice/index.rs:16 #27 std::io::impls::{impl#9}::consume (self=0x1ffeff8ce8, amt=0) at /usr/lib/rust/1.85.1/lib/rustlib/src/rust/library/std/src/io/impls.rs:352 #28 0x00000000223fa6e9 in flate2::zio::read<&[u8], flate2::mem::Decompress> (obj=0x1, data=0x1, dst=&mut [u8](size=575366262) = {...}) at /home/sam/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/flate2-1.0.35/src/zio.rs:139 #29 0x0000000000000001 in ?? () #30 0x0000001ffeff65f7 in ?? () #31 0x0000000000000001 in ?? () #32 0x0000000000000000 in ?? () ``` It jumps from: ``` while (1 == 1) { MATCH_AND_RETURN_CHECK(data_range); # <-- here if (s >= range) break; s += enclen(reg->enc, s); ```
``` (gdb) frame 0 #0 0x0000000022762ed0 in match_at (reg=0x20e1e730, str=0x5726340 "puts \"hello\"\nllo\"\n", end=0x572634d "llo\"\n", in_right_range=0x572634d "llo\"\n", sstart=0x5726340 "puts \"hello\"\nllo\"\n", msa=0x1ffeff7720) at /usr/src/debug/dev-libs/oniguruma-6.9.10/onig-6.9.10/src/regexec.c:3078 3078 BYTECODE_INTERPRETER_START { (gdb) p p $29 = (Operation *) 0x20e5f3c0 (gdb) p p.opaddr $30 = (const void *) 0x300000001 (gdb) p opcode_to_label[95] $42 = (const void *) 0x500000001 ``` ``` #ifdef USE_DIRECT_THREADED_CODE if (IS_NULL(msa)) { for (i = 0; i < reg->ops_used; i++) { const void* addr; addr = opcode_to_label[reg->ocs[i]]; p->opaddr = addr; p++; } return ONIG_NORMAL; } #endif ``` On another run: ``` (gdb) frame 0 #0 0x0000000022812ed0 in match_at (reg=0x222825a0, str=0x2223b470 "puts \"hello\"\nllo\"\n#\"", end=0x2223b47d "llo\"\n#\"", in_right_range=0x2223b47d "llo\"\n#\"", sstart=0x2223b470 "puts \"hello\"\nllo\"\n#\"", msa=0x1ffeff4be0) at /usr/src/debug/dev-libs/oniguruma-6.9.10/onig-6.9.10/src/regexec.c:3078 3078 BYTECODE_INTERPRETER_START { (gdb) macro exp BYTECODE_INTERPRETER_START expands to: goto *(p->opaddr); $58 = (Operation *) 0x2368da20 (gdb) p p.opaddr $59 = (const void *) 0x1 ```
Operation is at https://github.com/kkos/oniguruma/blob/master/src/regint.h#L739: ``` (gdb) p &p $66 = (Operation **) 0x1ffeff4928 (gdb) monitor xb 0x1ffeff4928 21 00 00 00 00 00 00 00 00 0x1FFEFF4928: 0x20 0xda 0x68 0x23 0x00 0x00 0x00 0x00 00 00 00 00 00 00 00 00 0x1FFEFF4930: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 ff ff ff ff ff 0x1FFEFF4938: 0x02 0x00 0x00 0x00 0x00 ``` but it's a union so it's not immediately obvious how much of that is a problem. I still don't really see where the uninitialised read is.
Using the ebuild to build 08d36110c5670c815ad6d6f969e578049d209080 which should match the crate _also_ fails.
I started to look at how the crate builds it. They define a bunch of `ONIG_DEBUG_*` at https://github.com/rust-onig/rust-onig/blob/main/onig_sys/build.rs#L70. If I pass those in `CFLAGS` with the ebuild: ``` * CMP: =dev-libs/oniguruma-6.9.10 with dev-libs/oniguruma-6.9.10/image * ABI: libonig.so.5(32) func(+2,~1) * Functions changes summary: 0 Removed, 1 Changed (1 filtered out), 2 Added functions * Variables changes summary: 0 Removed, 0 Changed, 0 Added variable * * 2 Added functions: * * [A] 'function void onig_print_compiled_byte_code_list(FILE*, regex_t*)' {onig_print_compiled_byte_code_list} * [A] 'function int onig_print_names(FILE*, regex_t*)' {onig_print_names} * * 1 function with some indirect sub-type change: * * [C] 'function int onig_parse_tree(Node**, const OnigUChar*, const OnigUChar*, regex_t*, ParseEnv*)' at regparse.c:9390:1 has some indirect sub-type changes: * parameter 5 of type 'ParseEnv*' has sub-type changes: * in pointed to type 'typedef ParseEnv' at regparse.h:455:1: * underlying type 'struct ParseEnv' at regparse.h:423:1 changed: * type size changed from 1312 to 1344 (in bits) * 1 data member insertion: * 'unsigned int max_parse_depth', at offset 1280 (in bits) at regparse.h:452:1 * 1 data member change: * 'unsigned int flags' offset changed from 1280 to 1312 (in bits) (by +32 bits) * ABI: libonig.so.5(64) func(+2,~1) * Functions changes summary: 0 Removed, 1 Changed (1 filtered out), 2 Added functions * Variables changes summary: 0 Removed, 0 Changed, 0 Added variable * * 2 Added functions: * * [A] 'function void onig_print_compiled_byte_code_list(FILE*, regex_t*)' {onig_print_compiled_byte_code_list} * [A] 'function int onig_print_names(FILE*, regex_t*)' {onig_print_names} * * 1 function with some indirect sub-type change: * * [C] 'function int onig_parse_tree(Node**, const OnigUChar*, const OnigUChar*, regex_t*, ParseEnv*)' at regparse.c:9390:1 has some indirect sub-type changes: * parameter 5 of type 'ParseEnv*' has sub-type changes: * in pointed to type 'typedef ParseEnv' at regparse.h:455:1: * underlying type 'struct ParseEnv' at regparse.h:423:1 changed: * type size changed from 2176 to 2240 (in bits) * 1 data member insertion: * 'unsigned int max_parse_depth', at offset 2144 (in bits) at regparse.h:452:1 * 1 data member change: * 'unsigned int flags' offset changed from 2144 to 2176 (in bits) (by +32 bits) * SIZE: 3.07MiB -> 3.38MiB, 28 -> 28 files * ------> ABI(+4,~2) SIZE(+10.15%) ``` They affect a bunch of structures: https://github.com/kkos/oniguruma/blob/05bb130c9ad54877e73d1caf08dd95e6ff199d99/src/regparse.h#L451.
But I think that's a red herring unless it had bundled headers and always passed those flags but the library on the system wasn't built with it. Also, with the debug flags added (gdb is attached separately): ``` .match_at: str: 0x216170d0, end: 0x216170db, start: 0x216170d0 size: 11, start offset: 0 ==3343695== Use of uninitialised value of size 8 ==3343695== at 0x2281661F: match_at (regexec.c:3078) ==3343695== by 0x22838A59: search_in_range (regexec.c:5713) ==3343695== by 0x228390BB: onig_search_with_param (regexec.c:5839) ==3343695== by 0x224F22F6: onig::Regex::search_with_param (lib.rs:723) ==3343695== by 0x224A1804: syntect::parsing::regex::regex_impl::Regex::search (regex.rs:174) ==3343695== by 0x22487750: syntect::parsing::regex::Regex::search (regex.rs:64) ==3343695== by 0x224C9015: syntect::parsing::parser::ParseState::search (parser.rs:449) ==3343695== by 0x224C8850: syntect::parsing::parser::ParseState::find_best_match (parser.rs:374) ==3343695== by 0x224C72FA: syntect::parsing::parser::ParseState::parse_next_token (parser.rs:274) ==3343695== by 0x224C6F5B: syntect::parsing::parser::ParseState::parse_line (parser.rs:240) ==3343695== by 0x22472101: syntect::easy::HighlightLines::highlight_line (easy.rs:68) ==3343695== by 0x223C115C: comrak::plugins::syntect::SyntectAdapter::highlight_html (syntect.rs:46) ==3343695== ==3343695== (action on error) vgdb me ... ==3343695== Continuing ... ==3343695== Jump to the invalid address stated on the next line ==3343695== at 0x1: ??? ==3343695== by 0x22838A59: search_in_range (regexec.c:5713) ==3343695== by 0x228390BB: onig_search_with_param (regexec.c:5839) ==3343695== by 0x224F22F6: onig::Regex::search_with_param (lib.rs:723) ==3343695== by 0x224A1804: syntect::parsing::regex::regex_impl::Regex::search (regex.rs:174) ==3343695== by 0x22487750: syntect::parsing::regex::Regex::search (regex.rs:64) ==3343695== by 0x224C9015: syntect::parsing::parser::ParseState::search (parser.rs:449) ==3343695== by 0x224C8850: syntect::parsing::parser::ParseState::find_best_match (parser.rs:374) ==3343695== by 0x224C72FA: syntect::parsing::parser::ParseState::parse_next_token (parser.rs:274) ==3343695== by 0x224C6F5B: syntect::parsing::parser::ParseState::parse_line (parser.rs:240) ==3343695== by 0x22472101: syntect::easy::HighlightLines::highlight_line (easy.rs:68) ==3343695== by 0x223C115C: comrak::plugins::syntect::SyntectAdapter::highlight_html (syntect.rs:46) ==3343695== Address 0x1 is not stack'd, malloc'd or (recently) free'd ``` That's from stepping immediately after the Valgrind trap on the uninitialised read at the `jump ...`, so if it uses 0x1, it must be that `p.opaddr` is corrupted. Using `rr` (where the invalid value is a bit different but still bogus): At the start of `match_at`: ``` (rr) p p.opaddr $38 = (const void *) 0x7cac8120ab01 <main_arena+65> (rr) x/i $pc => 0x7cac66c68618 <match_at+1404>: mov rax,QWORD PTR [rbp-0xca8] (rr) n 0x00007cac8120ab01 in main_arena () from /usr/lib64/libc.so.6 (rr) x/i $pc => 0x7cac8120ab01 <main_arena+65>: add BYTE PTR [rax],al (rr) n Single stepping until exit from function main_arena, which has no line number information. Thread 1 received signal SIGSEGV, Segmentation fault. 0x00007cac8120ab01 in main_arena () from /usr/lib64/libc.so.6 ``` Then: ``` (rr) watch p.opaddr Hardware watchpoint 6: p.opaddr (rr) reverse-continue Continuing. Thread 1 hit Hardware watchpoint 6: p.opaddr Old value = (const void *) 0x7cac8120ab01 <main_arena+65> New value = (const void *) 0x7ffc5c2d0340 0x00007cac66c680f9 in match_at (reg=0x60d0f7510b90, str=0x60d0f74e87c0 "puts 'wow'\n\201\254|", end=0x60d0f74e87cb "\201\254|", in_right_range=0x60d0f74e87cb "\201\254|", sstart=0x60d0f74e87c0 "puts 'wow'\n\201\254|", msa=0x7ffc5c2cecc0) at /usr/src/debug/dev-libs/oniguruma-6.9.10/onig-6.9.10/src/regexec.c:3009 3009 Operation* p = reg->ops; (rr) p p $76 = (Operation *) 0x7ffc5c2ce058 (rr) p *p $77 = { opaddr = 0x7ffc5c2d0340, [...] (rr) p p.opaddr $78 = (const void *) 0x7ffc5c2d0340 (rr) watch p Hardware watchpoint 7: p (rr) watch p.opaddr Hardware watchpoint 8: p.opaddr (rr) c Continuing. Thread 1 hit Hardware watchpoint 6: p.opaddr Old value = (const void *) 0x7ffc5c2d0340 New value = (const void *) 0x7cac8120ab01 <main_arena+65> Thread 1 hit Hardware watchpoint 7: p Old value = (Operation *) 0x7ffc5c2ce058 New value = (Operation *) 0x60d0f7510d60 Thread 1 hit Hardware watchpoint 8: p.opaddr Old value = (const void *) 0x7ffc5c2d0340 New value = (const void *) 0x7cac8120ab01 <main_arena+65> ``` where we have ``` (rr) list . 3005 unsigned long subexp_call_counters[MAX_SUBEXP_CALL_COUNTERS]; 3006 #endif 3007 3008 OnigOptionType options; 3009 Operation* p = reg->ops; 3010 OnigEncoding encode = reg->enc; 3011 OnigCaseFoldType case_fold_flag = reg->case_fold_flag; 3012 3013 #ifdef USE_CALL 3014 unsigned long subexp_call_nest_counter = 0; (rr) p reg->ops $87 = (Operation *) 0x60d0f7510d60 ``` That comes ultimately from: ``` #3 0x00007cac652222f7 in onig::Regex::search_with_param<&str> (self=0x60d0f7522160, chars=..., from=0, to=11, options=..., region=..., match_param=...) at src/lib.rs:723 [...] ``` which is in the crate: ``` let r = unsafe { let start = beg.add(from); let range = beg.add(to); if start > end { return Err(Error::custom("Start of match should be before end")); } if range > end { return Err(Error::custom("Limit of match should be before end")); } onig_sys::onig_search_with_param( self.raw, beg, end, start, range, match region { Some(region) => region as *mut Region as *mut onig_sys::OnigRegion, None => std::ptr::null_mut(), }, options.bits(), match_param.as_raw(), ) ``` so `self.raw` is garbage (I think), though I can't print it at that point in gdb. I guess this means it's probably a onig crate bug. I know absolutely zero Rust and don't think I can go further.
I give up for now, but here's a reproducer for commonmarker from git: ``` #!/bin/bash set -x #(cd ext/commonmarker/ && ruby extconf.rb) export RUSTONIG_SYSTEM_LIBONIG=1 #export CFLAGS="-Og -ggdb3" export CFLAGS="-Og -ggdb3 -DONIG_DEBUG_PARSE -DONIG_DEBUG_COMPILE -DONIG_DEBUG_COMPILE -DONIG_DEBUG_MATCH" export RUSTFLAGS="-C opt-level=0 -C strip=none -C debuginfo=full" #rm -rf ./ext/commonmarker/target/release/deps #rm -rf ./ext/commonmarker/target/release/build/onig_sys* make -C ext/commonmarker -Onone CFLAGS="-Og -ggdb3 -std=gnu17" || exit 1 cp {ext,lib}/commonmarker/commonmarker.so || exit 1 #exec ruby34 --disable-jit -Ilib:test:. -e 'Dir["test/*_test.rb"].each {|f| require f}' exec ruby34 --disable-jit -Ilib:test:. test/node_test.rb ``` with `test/node_test.rb` being modified to just: ``` # frozen_string_literal: true require "test_helper" class NodeTest < Minitest::Test def setup @document = Commonmarker.parse("Hi *there*. This has __many nodes__!") end class FenceInfoTest < Minitest::Test def setup @document = Commonmarker.parse("``` ruby\nputs 'wow'\n```") @fence_node = @document.first_child end def test_has_fence_info assert_equal("ruby", @fence_node.fence_info) end def test_can_set_fence_info assert_match(/<pre lang=\"ruby\"/, @document.to_html) @fence_node.fence_info = "perl" assert_equal("perl", @fence_node.fence_info) assert_match(/<pre lang=\"perl\"/, @document.to_html) end end end ``` In summary: * It only crashes with the system oniguruma * Using the same version as the crate has bundled via onigurma-9999 + override still crashes * Valgrind reports an uninitialised memory read but it's not clear to me where * I _think_ `p.opaddr` is corrupted, but it gets used in a table so I'm not completely sure (EDIT: see https://github.com/gentoo/gentoo/pull/41130#issuecomment-2742137819) I think the next steps are (not necessarily for me): * Try the onigurma crate testsuite * Try asan+ubsan on ruby/oniguruma/the crate (will need some special flags for the crate) * Reduce the Ruby testcase further * Try to transform the Ruby testcase into just using Ruby's regex engine (which should be the same as oniguruma) * Try to replicate it in a Rust testcase (bleh) using the crate * Try to replicate it in a pure C oniguruma testcase * Report it to commonmarker upstream and see what they say (it's very possibly a bug in that still if it passes something invalid down, maybe?)