Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 293835 - sys-cluster/csync2-1.34 segfault after world update
Summary: sys-cluster/csync2-1.34 segfault after world update
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Server (show other bugs)
Hardware: x86 Linux
: High major (vote)
Assignee: Gentoo Cluster Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-11-20 11:04 UTC by ixuz
Modified: 2010-09-10 18:51 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
strace of csync2 segfaulting (csync.strace.log,209.91 KB, text/plain)
2009-11-20 17:36 UTC, Adam Randall
Details
As requested, csync with lots of v's (csync.vvvvvvvvvvvvvvvvvx.log,113.32 KB, text/plain)
2009-11-20 18:46 UTC, Adam Randall
Details
GDB output from csync2 crashing (csync.gdb.slave.driver.log,1.21 KB, text/plain)
2009-11-20 21:48 UTC, Adam Randall
Details
diff -rud with respect to the basic csync-1.34-pure-gnutls.patch (csync-1.34-pure-gnutls-r0-to-r1.patch,407 bytes, patch)
2009-11-20 23:46 UTC, Giampaolo Tomassoni
Details | Diff
New version of the shipped patch (csync2-1.34-pure-gnutls-r2.patch,175.76 KB, patch)
2009-11-25 12:37 UTC, Giampaolo Tomassoni
Details | Diff
Proposed csync2-1.34-r1 ebuild + patch + init script (csync2-1.34-r1.tar.bz2,34.29 KB, application/octet-stream)
2010-06-17 14:49 UTC, Ultrabug
Details
csync2 gentoo init script for standalone (csync2-init,484 bytes, text/plain)
2010-06-19 19:48 UTC, Ultrabug
Details

Note You need to log in before you can comment on or make changes to this bug.
Description ixuz 2009-11-20 11:04:30 UTC
~ # csync -x

ends in a segmentation fault

output of syslog:

Nov 20 04:05:01 xxx kernel: csync2[7634]: segfault at 20 ip b7e5a229 sp bfb0be6c error 6 in libc-2.9.so[b7d7a000+13c000]

This happens since update to latest stable gnutls version and modified csync2 ebuild with ssl support enabled.

Downgrading to last working gnutls version did not solve the problem.

Reproducible: Always

Steps to Reproduce:
1. emerge -avuDN world
so you get csync2 with ssl support and latest gnutls

2. csync -xv


Actual Results:  
Marking file as dirty: /etc/crontab
Connecting to host xxb (SSL) ...
Segmentation fault

Expected Results:  
Configs should be synchronized as normal.
Comment 1 Adam Randall 2009-11-20 16:23:45 UTC
I am also having this same difficulty. Before today I was using net-libs/gnutls-2.6.6 which worked with csync-1.34, but today my masking of gnutls was causing the world update to block. On one of my test machines I updated to gnutls-2.8.4 causing csync2 to rebuild. The build worked, but like the OP it segfaults on sync. I need this working as it is integral to my master/slave setup.
Comment 2 Adam Randall 2009-11-20 17:36:29 UTC
Created attachment 210730 [details]
strace of csync2 segfaulting

The output of `strace csync2 -x` ending in a SEGFAULT
Comment 3 Adam Randall 2009-11-20 18:37:22 UTC
Lazy CC
Comment 4 Giampaolo Tomassoni 2009-11-20 18:43:51 UTC
(In reply to comment #2)
> Created an attachment (id=210730) [details]
> strace of csync2 segfaulting
> The output of `strace csync2 -x` ending in a SEGFAULT

Do you mind to re-run that csync2 command with a lot of -v options and post the output?

Something like:

csync2 -vvvvvvvvvvv -x

should do the work.
Comment 5 Giampaolo Tomassoni 2009-11-20 18:44:27 UTC
Lazy CC me too...
Comment 6 Adam Randall 2009-11-20 18:46:25 UTC
Created attachment 210732 [details]
As requested, csync with lots of v's

A very verbose csync2 -x using many -v flags.
Comment 7 Giampaolo Tomassoni 2009-11-20 21:14:19 UTC
(In reply to comment #6)
> Created an attachment (id=210732) [details]
> As requested, csync with lots of v's
> A very verbose csync2 -x using many -v flags.

Ok, thank.

It seems that the problem happens after ssl/tls handshake. It may be the code which checks and eventually stores a certificate, but I can't say at the moment.

Adam, if you can, I would suggest rebuilding the csync2 executable this way:

CFLAGS="-O0 -g" LDFLAGS= FEATURES=nostrip emerge -v csync2

then start a gdb session issuing:

gdb /usr/sbin/csync2

and finally run the csync2 from gdb issuing:

r -x


Your csync2 session should then start as usual. At the SIGSEGV, the gdb prompt should show again such that you may issue the following command:

bt

which will show a back-trace list of the nested calls which drove to the problem.

Please somehow copy and post here that list.

If you work from a terminal emulator, you may probably copy-and-paste that list. If your terminal can't, try turning session logging/capturing on and you'll get the list into the capture file.
Comment 8 Adam Randall 2009-11-20 21:41:13 UTC
Ugh, slave driver :) Okay, I'm working on that now.
Comment 9 Adam Randall 2009-11-20 21:48:58 UTC
Created attachment 210746 [details]
GDB output from csync2 crashing

Requested output of GDB while csync2 exited out with SEGFAULT
Comment 10 Giampaolo Tomassoni 2009-11-20 23:46:23 UTC
Created attachment 210756 [details, diff]
diff -rud with respect to the basic csync-1.34-pure-gnutls.patch

Ok,

it seems that there is a problem with the patch.

This is the patch to the csync-1.34-pure-gnutls.patch patch.

In order to run this, I suggest to firts download the patch I attached here in your home dir (say MYHOME), then issue:

ebuild /usr/portage/sys-cluster/csync2/csync2-1.34.ebuild unpack

This will create the build environment for csync2 *and* apply the csync-1.34-pure-gnutls.patch file shipped with it. The last line of the command output should show where this environment is located. cd to that path and then cd further into the csync2-1.34 directory you should find there.

At that point issue the command:

patch </MYHOME/csync-1.34-pure-gnutls-r0-to-r1.patch

This will patch the patch.

To build and install the new csync2, issue:

ebuild /usr/portage/sys-cluster/csync2/csync2-1.34.ebuild compile
ebuild /usr/portage/sys-cluster/csync2/csync2-1.34.ebuild install
ebuild /usr/portage/sys-cluster/csync2/csync2-1.34.ebuild qmerge

Then check if csync2 is now working.

Let me know: if it works, I have to create a new patch to substitute the shipped one...
Comment 11 Adam Randall 2009-11-21 00:16:10 UTC
Following your instructions csync2 now is fully operational with the current version of gnutls! For those wanting a quick method, here's a quick command to do the operation:

cd ~ && \
ebuild /usr/portage/sys-cluster/csync2/csync2-1.34.ebuild clean && \
ebuild /usr/portage/sys-cluster/csync2/csync2-1.34.ebuild unpack && \
cd /var/tmp/portage/sys-cluster/csync2-1.34/work/csync2-1.34 && \
patch < ~/csync-1.34-pure-gnutls-r0-to-r1.patch && \
ebuild /usr/portage/sys-cluster/csync2/csync2-1.34.ebuild compile && \
ebuild /usr/portage/sys-cluster/csync2/csync2-1.34.ebuild install && \
ebuild /usr/portage/sys-cluster/csync2/csync2-1.34.ebuild qmerge

I tested the above on four servers. I'm now going to push this to the rest of the servers so that I can finally get them operational again. Thanks so much!
Comment 12 ixuz 2009-11-25 12:10:42 UTC
(In reply to comment #11)
> Following your instructions csync2 now is fully operational with the current
> version of gnutls! For those wanting a quick method, here's a quick command to
> do the operation:
> 
> cd ~ && \
> ebuild /usr/portage/sys-cluster/csync2/csync2-1.34.ebuild clean && \
> ebuild /usr/portage/sys-cluster/csync2/csync2-1.34.ebuild unpack && \
> cd /var/tmp/portage/sys-cluster/csync2-1.34/work/csync2-1.34 && \
> patch < ~/csync-1.34-pure-gnutls-r0-to-r1.patch && \
> ebuild /usr/portage/sys-cluster/csync2/csync2-1.34.ebuild compile && \
> ebuild /usr/portage/sys-cluster/csync2/csync2-1.34.ebuild install && \
> ebuild /usr/portage/sys-cluster/csync2/csync2-1.34.ebuild qmerge
> 
> I tested the above on four servers. I'm now going to push this to the rest of
> the servers so that I can finally get them operational again. Thanks so much!
> 

This worked for our four drbd servers too. Thank you.

It would be wonderful, if this could flow into the official ebuild soon. ;-)

Comment 13 Giampaolo Tomassoni 2009-11-25 12:37:39 UTC
Created attachment 211154 [details, diff]
New version of the shipped patch

Urs and Christian, the r1 version of the patch to be shipped via portage had been published at bug#274213 .

I don't know if you did or didn't get notified about any new comment to that bug, since it is now closed.... Anyway, in order to avoid any problem in getting the patch, I'm going to add it here too.
Comment 14 ixuz 2009-11-25 13:27:26 UTC
Sorry, I marked it as worksforme. Didn't know, that this would close the ticket...
Comment 15 Giampaolo Tomassoni 2009-11-25 13:36:25 UTC
I don't know either: I was speaking of bug#274213 ...

Anyway, I guess reopening this one too souldn't hurt.
Comment 16 Adam Randall 2010-01-19 22:02:21 UTC
What's the status of this updated patch being pushed out officially? We've been using it for weeks now without issue.
Comment 17 Adam Randall 2010-02-26 00:11:17 UTC
We have added another Gentoo server to our mix. This one, though, is running amd64 instead of x86. Using the same instructions to build with the updated patch works very well. I have had no issues with csync2 between the amd64 and x86 machines.
Comment 18 Ultrabug gentoo-dev 2010-06-17 14:49:50 UTC
Created attachment 235723 [details]
Proposed csync2-1.34-r1 ebuild + patch + init script

I can second the fact that with the r0-to-r1 patch, csync2 installs and is stable on both amd64 and x86 (tested on 4 servers)

I'm attaching the modified csync2 portage branch which includes :
- the r1 ebuild itself and its manifest
- the patch file in the files folder
- a gentoo init script for the standalone lovers (the ebuild installs it)

Regards
Comment 19 Giampaolo Tomassoni 2010-06-17 16:03:35 UTC
I've got the feeling that the problem is the Gentoo Linux High-Availability Clustering Team, which is the assignee of this bug, is actually orphaned...

See http://www.gentoo.org/proj/en/metastructure/oldprojects.xml .

I would try to "Reassign bug to default assignee of selected component", whatever this may mean.

Anybody know if this is going to make any difference?
Comment 20 Adam Randall 2010-06-17 16:41:33 UTC
I don't know what it will take, but it would be quite nice to have this fix in the repository. The work around is easy enough to implement, but it's still one of those things.
Comment 21 Giampaolo Tomassoni 2010-06-17 17:42:12 UTC
Well, no. I saw that option while logged out, but once logged in it turns out we can't do that...
Comment 22 Ultrabug gentoo-dev 2010-06-17 19:55:06 UTC
(In reply to comment #19)
> I've got the feeling that the problem is the Gentoo Linux High-Availability
> Clustering Team, which is the assignee of this bug, is actually orphaned...
> 
> See http://www.gentoo.org/proj/en/metastructure/oldprojects.xml .
> 
> I would try to "Reassign bug to default assignee of selected component",
> whatever this may mean.
> 
> Anybody know if this is going to make any difference?
> 

I didn't even know this page, thanks :) Since the cluster team looks empty, I guess this patch will have trouble getting attention from a dev to get into portage.

Maybe we should try our luck and ask on IRC ?

As a sidenote, if there is really nobody taking care of the clustering side of Gentoo, maybe we could learn how to do something about it ? I really dunno how it is done but heartbeat et al need some taking care of too for instance (the scarabeus repo guys did some good job about it).

Regards
Comment 23 Kacper Kowalik (Xarthisius) (RETIRED) gentoo-dev 2010-06-17 20:15:01 UTC
+*csync2-1.34-r1 (17 Jun 2010)
+
+  17 Jun 2010; Kacper Kowalik <xarthisius@gentoo.org>
+  +csync2-1.34-r1.ebuild, +files/csync2-1.34-gnutls.patch:
+  Updating ebuild and gnutls patch. Fixes bug 293835, 293866, 298333. Thanks
+  Giampaolo Tomassoni for patch
+

Please test whether it fix the issue. I'll leave the bug open while waiting for
your reports.

@Giampaolo Tomassoni: please do not diff against files that are regenerated by
autotools. Cutting unnecessary cruft reduced the size of your patch from 170kb
to 8kb. I would be grateful if you could check whether I missed something.
Thanks!

Best regards,
Kacper Kowalik
Comment 24 Ultrabug gentoo-dev 2010-06-19 19:20:19 UTC
(In reply to comment #23)
> +*csync2-1.34-r1 (17 Jun 2010)
> +
> +  17 Jun 2010; Kacper Kowalik <xarthisius@gentoo.org>
> +  +csync2-1.34-r1.ebuild, +files/csync2-1.34-gnutls.patch:
> +  Updating ebuild and gnutls patch. Fixes bug 293835, 293866, 298333. Thanks
> +  Giampaolo Tomassoni for patch

Nice to see someone's helping to push the fix, thanks Kacper Kowalik !
I guess I mistook the 'orphan' with 'empty', my bad ;)

> Please test whether it fix the issue. I'll leave the bug open while waiting for
> your reports.

- Install OK for me on both x86 and amd64.
- Sync OK for me between x86 and amd64 machines.

@Kacper Kowalik, would it be possible to push the Gentoo init script we submitted for those who are not big fans of xinetd please ?
Comment 25 Kacper Kowalik (Xarthisius) (RETIRED) gentoo-dev 2010-06-19 19:37:09 UTC
> @Kacper Kowalik, would it be possible to push the Gentoo init script we
> submitted for those who are not big fans of xinetd please ?
Submitted where? Attachment:
http://bugs.gentoo.org/attachment.cgi?id=235723
has only xinetd file that's already in portage.
Comment 26 Ultrabug gentoo-dev 2010-06-19 19:48:33 UTC
Created attachment 235979 [details]
csync2 gentoo init script for standalone

I did it wrong indeed, I'm sorry :(

Here is the init script attached.

Regards.
Comment 27 Kacper Kowalik (Xarthisius) (RETIRED) gentoo-dev 2010-06-19 20:14:29 UTC
Fixed in tree:
+  19 Jun 2010; Kacper Kowalik <xarthisius@gentoo.org> +files/csync2.initd,
+  csync2-1.34-r1.ebuild:
+  Adding init script. Thanks to Ultrabug <ultrabug@ultrabug.net>

(In reply to comment #24)
> - Install OK for me on both x86 and amd64.
> - Sync OK for me between x86 and amd64 machines.
Closing then. Thanks!
Comment 28 Giampaolo Tomassoni 2010-06-20 07:04:45 UTC
(In reply to comment #23)
> I would be grateful if you could check whether I missed something.

Hi Kacper.

It looks fine. Only a cosmetic note to records: the HAVE_LIBGNUTLS_OPENSSL #define should be really named HAVE_LIBGNUTLS instead, since the openssl-compatible library from gnutls is not used anymore.

But this is something that may probably be more interesting to upstream than gentoo.

Thanks for bringing all this to life!