Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 674864 - app-backup/amanda throws segfaults
Summary: app-backup/amanda throws segfaults
Status: RESOLVED WORKSFORME
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal major (vote)
Assignee: Robin Johnson
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-08 10:30 UTC by Stefan G. Weichinger
Modified: 2019-05-04 20:57 UTC (History)
7 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan G. Weichinger 2019-01-08 10:30:27 UTC
Amanda (3.5.1-r1 and 3.4.5 ...) can't access the tape drive.

The perl-script amcheck-device triggers a segfault, see dmesg-lines:

[369317.553102] traps: amandad[19623] general protection ip:7fd73b1bff80 sp:7fd739899ea0 error:0 in libc-2.27.so[7fd73b04a000+1be000]
[756069.014520] amcheck-device[14469]: segfault at 8 ip 00007effb48d56e6 sp 00007ffcd05a9bb8 error 4 in libc-2.27.so[7effb4837000+1be000]

I rebuilt packages all over, used older gcc etc

We checked the tape library, I can swap tapes etc

I mark this as "major" bug because I can't write backups to tapes.


# emerge --info
!!! Section 'sgw-overlay' in repos.conf has name different from repository name 'gentoo' set inside repository
Portage 2.3.52 (python 3.6.5-final-0, default/linux/amd64/17.0/systemd, gcc-6.4.0, glibc-2.27-r6, 4.14.83-gentoo-smp x86_64)
=================================================================
System uname: Linux-4.14.83-gentoo-smp-x86_64-Intel-R-_Xeon-R-_CPU_E5345_@_2.33GHz-with-gentoo-2.6
KiB Mem:    16460196 total,   1028088 free
KiB Swap:   10490440 total,  10490440 free
Timestamp of repository gentoo: Tue, 08 Jan 2019 09:04:22 +0000
Head commit of repository gentoo: f8fec5aff37488e221182124921f3d8f51ff722e

sh bash 4.4_p12
ld GNU ld (Gentoo 2.29.1 p3) 2.29.1
app-shells/bash:          4.4_p12::gentoo
dev-lang/perl:            5.24.3-r1::gentoo
dev-lang/python:          2.7.15::gentoo, 3.4.8::gentoo, 3.5.5::gentoo, 3.6.5::gentoo
dev-util/cmake:           3.9.6::gentoo
dev-util/pkgconfig:       0.29.2::gentoo
sys-apps/baselayout:      2.6-r1::gentoo
sys-apps/sandbox:         2.13::gentoo
sys-devel/autoconf:       2.69-r4::gentoo
sys-devel/automake:       1.15.1-r2::gentoo
sys-devel/binutils:       2.29.1-r1::gentoo, 2.30-r4::gentoo
sys-devel/gcc:            6.4.0-r1::gentoo, 7.3.0-r3::gentoo
sys-devel/gcc-config:     2.0::gentoo
sys-devel/libtool:        2.4.6-r3::gentoo
sys-devel/make:           4.2.1-r4::gentoo
sys-kernel/linux-headers: 4.14-r1::gentoo (virtual/os-headers)
sys-libs/glibc:           2.27-r6::gentoo
Repositories:

gentoo
    location: /usr/portage
    sync-type: git
    sync-uri: https://github.com/gentoo-mirror/gentoo
    priority: -1000

ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="* -@EULA"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -march=native -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt /var/spool/munin-async/.ssh"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php7.0/ext-active/ /etc/php/apache2-php7.1/ext-active/ /etc/php/cgi-php7.0/ext-active/ /etc/php/cgi-php7.1/ext-active/ /etc/php/cli-php7.0/ext-active/ /etc/php/cli-php7.1/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-O2 -march=native -pipe"
DISTDIR="/usr/portage/distfiles"
EMERGE_DEFAULT_OPTS="--jobs --load-average 12"
ENV_UNSET="DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync multilib-strict news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://distfiles.gentoo.org"
LANG="de_DE.utf8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
USE="acl amd64 berkdb bzip2 cli crypt cxx dri fortran gdbm iconv ipv6 libtirpc multilib ncurses nls nptl openmp pam pcre readline seccomp ssl systemd tcpd udev unicode xattr zlib" ABI_X86="64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon plan sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="mmx mmxext sse sse2" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="libinput keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php5-6 php7-1" POSTGRES_TARGETS="postgres9_5 postgres10" PYTHON_SINGLE_TARGET="python3_5" PYTHON_TARGETS="python2_7 python3_5 python3_6" RUBY_TARGETS="ruby23 ruby24" USERLAND="GNU" VIDEO_CARDS="amdgpu fbdev intel nouveau radeon radeonsi vesa dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CC, CPPFLAGS, CTARGET, CXX, INSTALL_MASK, LC_ALL, LINGUAS, MAKEOPTS, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 1 Andrew Savchenko gentoo-dev 2019-01-08 14:39:47 UTC
For segfault you are supposed to provide backtrace, see:
https://wiki.gentoo.org/wiki/Debugging_with_GDB

Please provide one.
Comment 2 Stefan G. Weichinger 2019-01-08 16:58:22 UTC
I tried as good as I understand ;-)

rebuilt amanda etc, then ran 

$ gdb --args perl /usr/libexec/amanda/amcheck-device abt abt

("gdb amcheck ..." did not segfault, as it seems to be the perl script called by it faulting)

I get:

Starting program: /usr/bin/perl /usr/libexec/amanda/amcheck-device abt abt
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff1ee6700 (LWP 9049)]

Thread 1 "perl" received signal SIGSEGV, Segmentation fault.
0x00007ffff76ba6e6 in ?? () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff76ba6e6 in ?? () from /lib64/libc.so.6
#1  0x00007ffff7703376 in regexec () from /lib64/libc.so.6
#2  0x00007ffff634d218 in try_match (regex=0x555556b392a0, str=str@entry=0x0, errbuf=errbuf@entry=0x7fffffffc680) at match.c:312
#3  0x00007ffff634da2f in do_match (regex=regex@entry=0x555555ef4000 ".*", str=str@entry=0x0, match_newline=match_newline@entry=1) at match.c:344
#4  0x00007ffff634e856 in match_labelstr (labelstr=labelstr@entry=0x555556b1a880, autolabel=autolabel@entry=0x555556b2f070, label=label@entry=0x0, 
    barcode=barcode@entry=0x555556b33f70 "ABT396L4", meta=meta@entry=0x0, storage=<optimized out>) at match.c:1292
#5  0x00007ffff37c640a in _wrap_match_labelstr (cv=<optimized out>) at Amanda/Util.c:3611
#6  0x00007ffff7abeda3 in Perl_pp_entersub () from /usr/lib64/libperl.so.5.26
#7  0x00007ffff7ab6903 in Perl_runops_standard () from /usr/lib64/libperl.so.5.26
#8  0x00007ffff7a3a921 in Perl_call_sv () from /usr/lib64/libperl.so.5.26
#9  0x00007ffff2703bfd in amglue_source_callback_simple (data=0x555556b31d80) at Amanda/MainLoop.c:1681
#10 0x00007ffff60424f3 in ?? () from /usr/lib64/libglib-2.0.so.0
#11 0x00007ffff6041a6a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#12 0x00007ffff6041e28 in ?? () from /usr/lib64/libglib-2.0.so.0
#13 0x00007ffff6041edc in g_main_context_iteration () from /usr/lib64/libglib-2.0.so.0
#14 0x00007ffff6347cf1 in event_loop_wait (wait_eh=0x0, nonblock=0, return_when_empty=0) at event.c:427
#15 0x00007ffff63480e8 in event_loop_run () at event.c:336
#16 0x00007ffff2705175 in run_c () at Amanda/MainLoop.c:1570
#17 0x00007ffff27051ce in _wrap_run_c (cv=<optimized out>) at Amanda/MainLoop.c:2017
#18 0x00007ffff7abeda3 in Perl_pp_entersub () from /usr/lib64/libperl.so.5.26
#19 0x00007ffff7ab6903 in Perl_runops_standard () from /usr/lib64/libperl.so.5.26
#20 0x00007ffff7a41af3 in perl_run () from /usr/lib64/libperl.so.5.26
#21 0x0000555555554e0b in main ()


does that help in any way?

thanks
Comment 3 Andrew Savchenko gentoo-dev 2019-01-08 17:48:28 UTC
(In reply to Stefan G. Weichinger from comment #2)
> ("gdb amcheck ..." did not segfault, as it seems to be the perl script
> called by it faulting)

Yes, correct.
 
> I get:
> 
> Starting program: /usr/bin/perl /usr/libexec/amanda/amcheck-device abt abt
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> [New Thread 0x7ffff1ee6700 (LWP 9049)]
> 
> Thread 1 "perl" received signal SIGSEGV, Segmentation fault.
> 0x00007ffff76ba6e6 in ?? () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00007ffff76ba6e6 in ?? () from /lib64/libc.so.6
> #1  0x00007ffff7703376 in regexec () from /lib64/libc.so.6

[...]

> does that help in any way?

That's much better. But note ?? on the top level. This means your glibc is compiled without debugging info. Please recompile it with debug info enabled and post results again.
Comment 4 Stefan G. Weichinger 2019-01-08 18:12:22 UTC
Ok, next run with rebuilt glibc. I even see an error myself (but don't know how to fix that):

(gdb) run
Starting program: /usr/bin/perl /usr/libexec/amanda/amcheck-device abt abt
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff1ee6700 (LWP 5351)]

Thread 1 "perl" received signal SIGSEGV, Segmentation fault.
__strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120
120	../sysdeps/x86_64/multiarch/../strlen.S: Datei oder Verzeichnis nicht gefunden.
(gdb) bt
#0  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120
#1  0x00007ffff7704697 in __GI___regexec (preg=preg@entry=0x555556b4b470, string=string@entry=0x0, nmatch=nmatch@entry=0, pmatch=pmatch@entry=0x0, eflags=eflags@entry=0)
    at regexec.c:212
#2  0x00007ffff634d218 in try_match (regex=0x555556b4b470, str=str@entry=0x0, errbuf=errbuf@entry=0x7fffffffc680) at match.c:312
#3  0x00007ffff634da2f in do_match (regex=regex@entry=0x555555ef3ec0 ".*", str=str@entry=0x0, match_newline=match_newline@entry=1) at match.c:344
#4  0x00007ffff634e856 in match_labelstr (labelstr=labelstr@entry=0x555556b1ab50, autolabel=autolabel@entry=0x555556b43510, label=label@entry=0x0, 
    barcode=barcode@entry=0x555556b32ac0 "ABT396L4", meta=meta@entry=0x0, storage=<optimized out>) at match.c:1292
#5  0x00007ffff37c640a in _wrap_match_labelstr (cv=<optimized out>) at Amanda/Util.c:3611
#6  0x00007ffff7abeda3 in Perl_pp_entersub () from /usr/lib64/libperl.so.5.26
#7  0x00007ffff7ab6903 in Perl_runops_standard () from /usr/lib64/libperl.so.5.26
#8  0x00007ffff7a3a921 in Perl_call_sv () from /usr/lib64/libperl.so.5.26
#9  0x00007ffff2703bfd in amglue_source_callback_simple (data=0x555556b31ec0) at Amanda/MainLoop.c:1681
#10 0x00007ffff60424f3 in ?? () from /usr/lib64/libglib-2.0.so.0
#11 0x00007ffff6041a6a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#12 0x00007ffff6041e28 in ?? () from /usr/lib64/libglib-2.0.so.0
#13 0x00007ffff6041edc in g_main_context_iteration () from /usr/lib64/libglib-2.0.so.0
#14 0x00007ffff6347cf1 in event_loop_wait (wait_eh=0x0, nonblock=0, return_when_empty=0) at event.c:427
#15 0x00007ffff63480e8 in event_loop_run () at event.c:336
#16 0x00007ffff2705175 in run_c () at Amanda/MainLoop.c:1570
#17 0x00007ffff27051ce in _wrap_run_c (cv=<optimized out>) at Amanda/MainLoop.c:2017
#18 0x00007ffff7abeda3 in Perl_pp_entersub () from /usr/lib64/libperl.so.5.26
#19 0x00007ffff7ab6903 in Perl_runops_standard () from /usr/lib64/libperl.so.5.26
#20 0x00007ffff7a41af3 in perl_run () from /usr/lib64/libperl.so.5.26
#21 0x0000555555554e0b in main ()
Comment 5 Andrew Savchenko gentoo-dev 2019-01-08 18:19:03 UTC
CC'ing toolchain as this _may_ be a glibc bug.
Comment 6 Stefan G. Weichinger 2019-01-08 18:32:21 UTC
(In reply to Andrew Savchenko from comment #5)
> CC'ing toolchain as this _may_ be a glibc bug.

thanks, I keep checking for any more instructions etc
Comment 7 Stefan G. Weichinger 2019-01-08 19:40:11 UTC
I tried something. Attention, amanda-specific language used now:

The tape label "ABT396L4" was mentioned in the error, so I checked for that tape and found it listed with "unknown barcode". So I forced a relabelling and after that amcheck doesn't crash anymore:

$ amlabel abt -f ABT-396 slot 14
Reading label...
Volume with label 'ABT-396' is active and contains data from this configuration.
Consider using 'amrmtape' to remove volume 'ABT-396' from the catalog.
Writing label 'ABT-396'...
Checking label...
Success!
amanda@juno ~ $ gdb --args perl /usr/libexec/amanda/amcheck-device abt abt
GNU gdb (Gentoo 8.1 p1) 8.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from perl...(no debugging symbols found)...done.
(gdb) run
Starting program: /usr/bin/perl /usr/libexec/amanda/amcheck-device abt abt
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff1ee6700 (LWP 11696)]
slot 24: volume 'ABT-450'
Will write to volume 'ABT-450' in slot 24.
NOTE: skipping tape-writable test
[Thread 0x7ffff7fe4500 (LWP 11691) exited]
[Inferior 1 (process 11691) exited normally]
(gdb) quit


And: amflush starts writing to tape right now!
This is some progress.

I had noticed barcode-related display errors already, now I have to research that in more detail.

Unfortunately the amanda-project itself isn't helpful at all in the last weeks.
Comment 8 Sergei Trofimovich (RETIRED) gentoo-dev 2019-01-08 20:23:25 UTC
If registers are nor already corrupted here string passed here is already NULL:
> #2  0x00007ffff634d218 in try_match (regex=0x555556b4b470, str=str@entry=0x0, errbuf=errbuf@entry=0x7fffffffc680) at match.c:312

U suggest running the command under 'valgrind'. That might show a sign of earlier problem.

Do I have a chance running something similar without special hardware needed?
Comment 9 Stefan G. Weichinger 2019-01-08 20:32:32 UTC
(In reply to Sergei Trofimovich from comment #8)
> If registers are nor already corrupted here string passed here is already
> NULL:
> > #2  0x00007ffff634d218 in try_match (regex=0x555556b4b470, str=str@entry=0x0, errbuf=errbuf@entry=0x7fffffffc680) at match.c:312
> 
> U suggest running the command under 'valgrind'. That might show a sign of
> earlier problem.
> 
> Do I have a chance running something similar without special hardware needed?

I can try to build valgrind and test things, although right now my priority is getting the backups to tape (piled up for 2 weeks or so). So this might take 12 hrs or so until I can test again.

You could set up amanda with a "virtual tapes" setup on disk only, although I don't think this would hit the same bug if the problem really is related to barcodes: the virtual tapes are handled differently afaik.
Comment 10 Sergei Trofimovich (RETIRED) gentoo-dev 2019-01-08 20:36:42 UTC
(In reply to Stefan G. Weichinger from comment #9)
> (In reply to Sergei Trofimovich from comment #8)
> > If registers are nor already corrupted here string passed here is already
> > NULL:
> > > #2  0x00007ffff634d218 in try_match (regex=0x555556b4b470, str=str@entry=0x0, errbuf=errbuf@entry=0x7fffffffc680) at match.c:312
> > 
> > U suggest running the command under 'valgrind'. That might show a sign of
> > earlier problem.
> > 
> > Do I have a chance running something similar without special hardware needed?
> 
> I can try to build valgrind and test things, although right now my priority
> is getting the backups to tape (piled up for 2 weeks or so). So this might
> take 12 hrs or so until I can test again.
> 
> You could set up amanda with a "virtual tapes" setup on disk only, although
> I don't think this would hit the same bug if the problem really is related
> to barcodes: the virtual tapes are handled differently afaik.

Aha. I suggest also building all libraries seen in your backtrace with debugging enabled. Namely:
- dev-libs/glib
- dev-lang/perl

and get new backtrace. 'bt full' might be more useful.

I suspect perl-5.24->5.26 changes C API enough that Amanda's SWIG wrapper noticed.
Comment 11 Stefan G. Weichinger 2019-01-08 20:38:39 UTC
Ok, great, valgrind prepared already, libraries tmrw.
I check back with results.

Although I wonder if I can trigger it again (but should be possible, there are other tapes listed "faulty" in a way).
Comment 12 Stefan G. Weichinger 2019-01-09 18:51:36 UTC
(In reply to Stefan G. Weichinger from comment #11)
> Ok, great, valgrind prepared already, libraries tmrw.
> I check back with results.
> 
> Although I wonder if I can trigger it again (but should be possible, there
> are other tapes listed "faulty" in a way).

Quick status: I could successfully write several tapes now and even amcheck ran through fine multiple times -> so far not able to reproduce the issue.
Comment 13 Stefan G. Weichinger 2019-01-10 15:44:31 UTC
Related to the rebuild of dev-lang/perl I also rebuilt quite a long list of perl-libs via perl-cleaner. Now the output of "amtape conf inventory" also displays all the tapes with correct barcodes, that wasn't the case before.

I assume the perl-rebuilds might have fixed the issue if it really was related to barcodes. I keep the non-stripped binaries for some days and check if I hit the error again. Otherwise it might be solved already.
Comment 14 Stefan G. Weichinger 2019-01-27 18:31:49 UTC
I think we can close here. Noone else reported this and my issue has gone away after several packages were rebuilt during the process.

I will soon rebuild amanda etc without debug symbols etc and assume the problem is fixed.