890286 – app-admin/rasdaemon-0.6.8-r1 crashes on startup

Bug 890286 - app-admin/rasdaemon-0.6.8-r1 crashes on startup

Summary: app-admin/rasdaemon-0.6.8-r1 crashes on startup

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	All Linux

Importance:	Normal normal (vote)
Assignee:	Gentoo's Team for Core System packages

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2023-01-09 10:20 UTC by Gabriele Svelto
Modified:	2024-01-13 20:35 UTC (History)
CC List:	3 users (show)

See Also:	https://github.com/mchehab/rasdaemon/issues/77 https://github.com/mchehab/rasdaemon/pull/93 https://bugs.debian.org/1054152 922061
Package list:
Runtime testing required:	---

Attachments
`emerge --info` output (emerge_info.txt,7.15 KB, text/plain) 2023-01-09 11:05 UTC, Gabriele Svelto	Details
Output of a rasdaemon-0.8.0 crash under Valgrind (file_890286.txt,19.30 KB, text/plain) 2023-03-25 08:36 UTC, jonys	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Gabriele Svelto 2023-01-09 10:20:17 UTC

I've been using rasdaemon for a while to monitor the ECC memory in my machine and I noticed that recently it started crashing on startup. I'm unsure which was the first version affected but the last one definitely is.

The crash happens using stable versions of all the package's dependencies, everything compiled with sys-devel/gcc-11.3.1_p20221209 which is also the current stable compiler. The build CFLAGS are just "-O2" and nothing else. The crash only happens when the `--record` flag is passed and yields the following stack trace:

#0  ___pthread_mutex_lock (mutex=0x7473656d6974202c) at pthread_mutex_lock.c:80
#1  0x00007ffff7ed379c in sqlite3_finalize (pStmt=0x7fff9800e9b0) at sqlite3.c:87444
#2  0x00005555555687f2 in ras_mc_event_closedb (cpu=27, ras=<optimized out>) at ras-record.c:923
#3  0x0000555555564698 in handle_ras_events_cpu (priv=0x5555555c4e30) at ras-events.c:608
#4  0x00007ffff7ce337a in start_thread (arg=<optimized out>) at pthread_create.c:442
#5  0x00007ffff7d6422c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

I've tried to compile with CFLAGS="-O0" to better narrow down the crash but it disappears w/ optimizations disabled.

Comment 1 Sam James archtester

2023-01-09 10:22:33 UTC

Are you sure it's crashing in that thread and not another one?

Could you share the output of "bt full"?

Comment 2 Sam James archtester

2023-01-09 10:22:44 UTC

(In reply to Sam James from comment #1)
> Are you sure it's crashing in that thread and not another one?
> 
> Could you share the output of "bt full"?

(+ emerge --info please)

Comment 3 Gabriele Svelto 2023-01-09 11:05:11 UTC

Created attachment 848018 [details]
`emerge --info` output

Comment 4 Gabriele Svelto 2023-01-09 11:07:02 UTC

I've double-checked and this is indeed the crashing thread, this is the output of `bt full`:

#0  ___pthread_mutex_lock (mutex=0x7473656d6974202c) at pthread_mutex_lock.c:80
        type = <optimized out>
        __PRETTY_FUNCTION__ = "___pthread_mutex_lock"
        id = <optimized out>
#1  0x00007ffff7ed379c in sqlite3_finalize (pStmt=0x7fffa400e9b0) at sqlite3.c:87444
        v = 0x7fffa400e9b0
        db = 0x7fffa400f310
        rc = <optimized out>
#2  0x00005555555687f2 in ras_mc_event_closedb (cpu=17, ras=<optimized out>) at ras-record.c:923
        rc = <optimized out>
        db = 0x7fffa4001c40
        priv = 0x7fffa4001bf0
        __func__ = "ras_mc_event_closedb"
#3  0x0000555555564698 in handle_ras_events_cpu (priv=0x5555555c4cf0) at ras-events.c:608
        fd = 38
        kbuf = 0x7fffa4001b80
        page = 0x7fffa4000b70
        pipe_raw = "per_cpu/cpu17/trace_pipe_raw", '\000' <repeats 4067 times>
        pdata = <optimized out>
#4  0x00007ffff7ce337a in start_thread (arg=<optimized out>) at pthread_create.c:442
        ret = <optimized out>
        pd = <optimized out>
        out = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737488347696, -2861193018973524210, 140736204891840, 2, 140737350873264, 140736196501504, 2861024794797764366, 2861175029158997774}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0,
              cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
#5  0x00007ffff7d6422c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
No locals.

Comment 5 Gabriele Svelto 2023-01-09 11:12:32 UTC

Also this:

(gdb) p $_siginfo._sifields._sigfault.si_addr
$7 = (void *) 0x0

Looks like a NULL pointer access.

Comment 6 Gabriele Svelto 2023-01-13 09:46:45 UTC

An extra bit of information, I was wrong about the crash not presenting itself when compiling the package with -O0. It still happens, just takes a while longer this issue might be timing-dependent and it doesn't look like it's specific to Gentoo. I've tried with a plain build of the upstream sources and I can still repro. I'll bring this into the bug tracker for the upstream package.

Comment 7 Sam James archtester

2023-01-13 09:47:28 UTC

Thanks, please throw a link here when you do.

My guess is https://github.com/mchehab/rasdaemon/issues/77.

Comment 8 Larry the Git Cow gentoo-dev

2023-02-19 18:37:58 UTC

The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=c5bc82ad10a33da634522bae36d22966485ffbb3

commit c5bc82ad10a33da634522bae36d22966485ffbb3
Author:     Sam James <sam@gentoo.org>
AuthorDate: 2023-02-19 18:37:38 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2023-02-19 18:37:47 +0000

    app-admin/rasdaemon: add 0.8.0
    
    Closes: https://bugs.gentoo.org/890286
    Signed-off-by: Sam James <sam@gentoo.org>

 app-admin/rasdaemon/Manifest                       |  1 +
 .../files/rasdaemon-0.8.0-bashisms-configure.patch | 40 +++++++++++
 app-admin/rasdaemon/rasdaemon-0.8.0.ebuild         | 83 ++++++++++++++++++++++
 3 files changed, 124 insertions(+)

Comment 9 Sam James archtester

2023-02-19 18:39:32 UTC

Sorry, I mixed up libtracefs/libtraceevent. The new version uses an unbundled, much newer copy of libtraceevent.

As for your bug, see https://github.com/mchehab/rasdaemon/issues/77#issuecomment-1399202752.

Comment 10 jonys 2023-03-25 08:34:41 UTC

I'm getting the same bug (identical stacktrace) on app-admin/rasdaemon-0.8.0 as well, with an underlying configuration listed as problematic in the linked bug (AMD CPU with _SC_NPROCESSORS_CONF != _SC_NPROCESSORS_ONLN). I'm attaching a log from running rasdaemon under Valgrind, it seems to come down to a use-after-free bug. I'll comment on the Github issue as well.

Comment 11 jonys 2023-03-25 08:36:57 UTC

Created attachment 858929 [details]
Output of a rasdaemon-0.8.0 crash under Valgrind

Comment 12 Sam James archtester

2023-03-25 08:37:25 UTC

Might also be related to https://github.com/mchehab/rasdaemon/pull/93.

Comment 13 Larry the Git Cow gentoo-dev

2023-12-29 00:24:01 UTC

The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=758c24a6578bad541a188f0fe513906515dd1bda

commit 758c24a6578bad541a188f0fe513906515dd1bda
Author:     Sam James <sam@gentoo.org>
AuthorDate: 2023-12-29 00:22:14 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2023-12-29 00:22:14 +0000

    app-admin/rasdaemon: backport crash for online vs. configured CPUs
    
    Closes: https://bugs.gentoo.org/890286
    Signed-off-by: Sam James <sam@gentoo.org>

 ...on-0.8.0-check-online-cpus-not-configured.patch |  40 +++++
 ...rasdaemon-0.8.0-table-create-offline-cpus.patch | 179 +++++++++++++++++++++
 app-admin/rasdaemon/rasdaemon-0.8.0-r2.ebuild      |  87 ++++++++++
 3 files changed, 306 insertions(+)

Comment 14 Sam James archtester

2023-12-29 00:26:12 UTC

Fingers crossed that does it. Let me know if it doesn't though...

(Also, sorry I didn't backport that *far* sooner. I was hoping for a new release with it and completely forgot.)

Comment 15 jonys 2023-12-29 11:43:24 UTC

(In reply to Sam James from comment #14)
> Fingers crossed that does it. Let me know if it doesn't though...
I just tested the new version (rasdaemon-0.8.0-r2) and it works for me, so it seems the bug is fixed.

Thank you!

Comment 16 Gabriele Svelto 2023-12-29 21:51:14 UTC

It's working fine on my box too, thank you!