This is a hard one, as I don't yet understand who's responsible. Perhaps apr.
Symptom: running git-svn init segfaults.
Steps to reproduce and debug:
1. Run git-svn in the perl debugger:
/usr/bin/perl -d /usr/bin/git-svn init -s http://svn.unix-ag.uni-kl.de/vpnc/
2. Attach to that process using gdb in another console:
gdb -p $(ps -C perl -o pid | tail -n1)
3. Resume program in gdb:
4. Continue program to exit command (line 253) in perl debugger:
DB<1> c 253
5. Step through perl cleanup, pressing Enter for every new command
When the perl debugger is at "SVN::_Core::apr_terminate();" from line 58 of SVN/Core.pm, gdb catches the SIGSEGV. A backtrace there looks like this:
#0 0xb7780150 in ?? ()
#1 0xb7aa038a in run_cleanups (cref=0x98f3920) at memory/unix/apr_pools.c:2063
#2 0xb7aa0ac2 in apr_pool_destroy (pool=0x98f3910) at memory/unix/apr_pools.c:756
#3 0xb7aa0aad in apr_pool_destroy (pool=0x96bd110) at memory/unix/apr_pools.c:753
#4 0xb7aa0e1b in apr_pool_terminate () at memory/unix/apr_pools.c:597
#5 0xb7aa1fca in apr_terminate () at misc/unix/start.c:82
#6 0xb7b6365c in _wrap_apr_terminate (my_perl=0x8d52008, cv=0x961407c) at core.c:2300
#7 0x080cc925 in Perl_pp_entersub (my_perl=0x8d52008) at pp_hot.c:2877
#8 0x080b2b8e in Perl_runops_debug (my_perl=0x8d52008) at dump.c:1459
#9 0x08066e22 in S_call_body (my_perl=0x8d52008, myop=0x98f3ae8, is_eval=16 '\020') at perl.c:2731
#10 0x08067ca7 in Perl_call_sv (my_perl=0x8d52008, sv=0x9613ecc, flags=6) at perl.c:2646
#11 0x0806823f in Perl_call_list (my_perl=0x8d52008, oldscope=1, paramList=0x8f5a57c) at perl.c:5207
#12 0x0806d0a2 in perl_destruct (my_perl=0x8d52008) at perl.c:642
#13 0x08063bef in main (argc=6, argv=0xbfc834c4, env=0x0) at perlmain.c:101
(bash-completion curl cvs doc emacs gtk iconv perl subversion
threads tk webdav xinetd
-cgi -mozsha1 -ppcsha1)
(apache2 bash-completion berkdb doc emacs java nls perl python
-debug -elibc_FreeBSD -extras -ruby -vim-syntax -webdav-serf)
(berkdb doc gdbm ithreads
-build -debug -elibc_FreeBSD -perlsuid)
The immediate cause of the sigfault is that run_cleanups calls a function by its pointer when there is no function at that address. That's what stack frame #0 is telling us: address 0xb7780150 got called but the's no function there.
I noticed that the offset of the called function relative to most other functions is pretty much fixed. With that in place, I managed to get a conditioned break point in apr_pool_cleanup_register in order to find out where that function pointer comes from. The result looks like this:
#0 apr_pool_cleanup_register (p=0x93f6848, data=0x9421b90, plain_cleanup_fn=0xb7763150 <cleanup_session>,
child_cleanup_fn=0xb7a833ed <apr_pool_cleanup_null>) at memory/unix/apr_pools.c:1982
#1 0xb7761b3d in svn_ra_neon__open (session=0x93f69c0, repos_URL=0x8fcd4a0 "http://svn.unix-ag.uni-kl.de/vpnc", callbacks=0x93f6950,
callback_baton=0x927a34c, config=0x93f68a8, pool=0x93f6848) at subversion/libsvn_ra_neon/session.c:1027
#2 0xb7ae3b06 in svn_ra_open3 (session_p=Could not find the frame base for
"svn_ra_open3".) at subversion/libsvn_ra/ra_loader.c:471
#3 0xb7ae3c89 in svn_ra_open2 (session_p=Could not find the frame base for
"svn_ra_open2".) at subversion/libsvn_ra/ra_loader.c:501
#4 0xb7ae3d37 in svn_ra_open (session_p=Could not find the frame base for
"svn_ra_open".) at subversion/libsvn_ra/ra_loader.c:524
#5 0xb791f139 in _wrap_svn_ra_open (my_perl=0x8871008, cv=0x91d6044) at svn_ra.c:4832
#6 0x080cc925 in Perl_pp_entersub (my_perl=0x8871008) at pp_hot.c:2877
#7 0x080b2b8e in Perl_runops_debug (my_perl=0x8871008) at dump.c:1459
#8 0x08068a57 in perl_run (my_perl=0x8871008) at perl.c:2361
#9 0x08063c45 in main (argc=6, argv=0xbfa63a84, env=0x10) at perlmain.c:99
Looking at the maps, the address 0xb7763150 of function cleanup_session (which corresponds to 0xb7780150 in my previous run) seems to belong to the code section of /usr/lib/libsvn_ra_neon-1.so.0.0.0. That library is no longer mapped in by the time apr_terminate gets called.
I guess that perl is responsible for the order in which modules are unloaded, so maybe some perl wizard will have a clue as to how this can be solved.
For completeness, this is my neon version:
(doc gnutls kerberos linguas_de nls socks5 ssl zlib
-expat -linguas_cs -linguas_fr -linguas_ja -linguas_nn
-linguas_pl -linguas_ru -linguas_tr -linguas_zh_CN -pkcs11)
(In reply to comment #1)
> I guess that perl is responsible for the order in which modules are unloaded,
I was wrong; all this happens inside svn/apr c code. The neon lib is the first one to get loaded by apr, which happens here:
#0 apr_dso_load (res_handle=0xbfd2a968, path=0x9701978 "libsvn_ra_neon-1.so.0", pool=0x9708858) at dso/unix/dso.c:139
#1 0x42f672b5 in svn_dso_load (dso=<value optimized out>, fname=<value optimized out>) at subversion/libsvn_subr/dso.c:93
#2 0xb7ba6e85 in load_ra_module (func=<value optimized out>, compat_func=<value optimized out>, ra_name=<value optimized out>,
pool=<value optimized out>) at subversion/libsvn_ra/ra_loader.c:166
#3 0xb7ba8a8a in svn_ra_open3 (session_p=Could not find the frame base for "svn_ra_open3".
) at subversion/libsvn_ra/ra_loader.c:446
#4 0xb7ba8c89 in svn_ra_open2 (session_p=Could not find the frame base for "svn_ra_open2".
) at subversion/libsvn_ra/ra_loader.c:501
#5 0xb7ba8d37 in svn_ra_open (session_p=Could not find the frame base for "svn_ra_open".
) at subversion/libsvn_ra/ra_loader.c:524
#6 0xb79e4139 in _wrap_svn_ra_open (my_perl=0x8b7c008, cv=0x94e1044) at svn_ra.c:4832
#7 0x080cc925 in Perl_pp_entersub (my_perl=0x8b7c008) at pp_hot.c:2877
#8 0x080b2b8e in Perl_runops_debug (my_perl=0x8b7c008) at dump.c:1459
#9 0x08068a57 in perl_run (my_perl=0x8b7c008) at perl.c:2361
#10 0x08063c45 in main (argc=6, argv=0xbfd2ad44, env=0x20656c69) at perlmain.c:99
It is also the first one to get unloaded during termination:
#0 dso_cleanup (thedso=0x9708928) at dso/unix/dso.c:62
#1 0xb7b4838a in run_cleanups (cref=0x9708868) at memory/unix/apr_pools.c:2063
#2 0xb7b48ac2 in apr_pool_destroy (pool=0x9708858) at memory/unix/apr_pools.c:756
#3 0xb7b48aad in apr_pool_destroy (pool=0x94c3708) at memory/unix/apr_pools.c:753
#4 0xb7b48e1b in apr_pool_terminate () at memory/unix/apr_pools.c:597
#5 0xb7b49fca in apr_terminate () at misc/unix/start.c:82
#6 0xb7c0b65c in _wrap_apr_terminate (my_perl=0x8b7c008, cv=0x941f340) at core.c:2300
#7 0x080cc925 in Perl_pp_entersub (my_perl=0x8b7c008) at pp_hot.c:2877
#8 0x080b2b8e in Perl_runops_debug (my_perl=0x8b7c008) at dump.c:1459
#9 0x08066e22 in S_call_body (my_perl=0x8b7c008, myop=0x9708938, is_eval=44 ',') at perl.c:2731
#10 0x08067ca7 in Perl_call_sv (my_perl=0x8b7c008, sv=0x941f190, flags=6) at perl.c:2646
#11 0x0806823f in Perl_call_list (my_perl=0x8b7c008, oldscope=1, paramList=0x8d845c4) at perl.c:5207
#12 0x0806d0a2 in perl_destruct (my_perl=0x8b7c008) at perl.c:642
#13 0x08063bef in main (argc=6, argv=0xbfd2ad44, env=0x1) at perlmain.c:101
There are two possible solutions to this I think, one in apr and one in subversion.
To address the issue in apr, you could ensure that cleanup functions like apr_pool_terminate will only unload libraries when all other classes have been cleaned up. This could be done by passing around a temporary pool to which shared objects get moved, and clean that after the rest. Would probably mean a few wrapper functions to keep the interface compatible. The cleanup function pointer could probably be used to identify library objects.
To address the issue in subversion, the order in which poools get allocated and will therefore be freed later on would need to be carefully controlled. There is a special pool for DSOs, but that gets cleaned along with the whole pool hierarchy, it seems, and in this case it probably was allocated before the pool for the neon objects. By ensuring a deterministic order of pool cleanups, DSOs should be cleaned after other objects. The fact that APR seems to reuse clean slots might make this thing more difficult.
Neither of these approaches is particularly easy or nice to implement, and both seem rather like a task for upstream. I'll try to get the apr devs interested in this issue.
Posted upstream with both SVN and APR.
Gmane seems to be lagging behind quite a bit, so here are the official archives:
Just chiming in to say I've been seeing this too, since my subversion got upgraded from 1.4.x to 1.5.x. Because the segfault happens on cleanup this bug doesn't seem to cause serious problems, unless you have some script or program that depends on git-svn's exact output or return code.
Thanks for the detailed investigation Martin.
(In reply to comment #6)
> bug 234826
Don't think so:
* This here is an invalid but non-NULL pointer
* The fix doesn't seem related to this issue here
* The debian bug referenced from these doesn't seem related either
I recently started running into this when upgrading from subversion-1.4.6 to 1.5.1 and from neon-0.26.4 to 0.28.3.
Upstream mail threads seem to have dried up long ago. Next step would probably be filing upstream bug reports for this, but I don't feel like it right now. If anyone else wants to, feel free to do so.
Gentoo users should probably disable the dso USE flag in order to work around this issue. Subversion maintainers might even consider having the ebuild issue a warning when dso is enabled, in order to notify people about this known bug here.
Judging from the backtraces, these two threads seem to cover this bug:
(In reply to comment #10)
> Judging from the backtraces, these two threads seem to cover this bug.
Agreed. I'll write a mail to the original author of said threads.
(In reply to comment #9)
> Next step would probably be filing upstream bug reports for this
I just did so: http://subversion.tigris.org/issues/show_bug.cgi?id=3289
For reference, my 4-message-thread on the subversion list is probably best readable at http://groups.google.com/group/Subversion-development/browse_frm/thread/ef712fbf8a1ed139/ as gmane still has trouble keeping track of this thread.
> Gentoo users should probably disable the dso USE flag
In the meantime I realized that the Gentoo ebuild does set the dso USE flag by default. This setting is within the ebuild itself, so it should be easy to change, and would probably lead users to disable this DSO stuff on their next emerge -uND world. So I recommend removing that single '+' sign, and perhaps revbump to not depend on the -N flag to emerge. Do you need a patch for this?
(In reply to comment #11)
> In the meantime I realized that the Gentoo ebuild does set the dso USE flag by
> default. This setting is within the ebuild itself, so it should be easy to
> change, and would probably lead users to disable this DSO stuff on their next
> emerge -uND world. So I recommend removing that single '+' sign, and perhaps
> revbump to not depend on the -N flag to emerge. Do you need a patch for this?
That won't be necessary, but it is a change to be taken by the package maintainers. However, I'm not sure what the impact of disabling DSO would be.
(In reply to comment #12)
> However, I'm not sure what the impact of disabling DSO would be.
I'd say mostly it would impact time needed for application startup: without dso the libsvn_ra-1.so depends on libsvn_ra_*-1.so.0 so that those and their dependencies have to be loaded and perhaps relocated even when they are not needed. With dso it would load only the module necessary for the server actually used.
Makes me think of how this would interact with prelinking. I'd assume that prelinking could at least reduce the cost of loading those additional modules, while on the other hand the prelinker probably has no chance of optimizing the dynamically loaded modules. Maybe in a prelinked envcironment -dso might even be preferable. Haven't trested it yet, though.
The CHANGES file also has this to say:
* support 'http-library' (if --enable-runtime-module-search) (r31425, -722)
so serf support might have something to do with this as well. Don't know.
Duplicate of bug #221931 ?
(In reply to comment #11)
> In the meantime I realized that the Gentoo ebuild does set the dso USE flag
> by default. This setting is within the ebuild itself, so it should be easy
> to change, and would probably lead users to disable this DSO stuff on their
> next emerge -uND world. So I recommend removing that single '+' sign, and
> perhaps revbump to not depend on the -N flag to emerge.
Introduction of "dso" USE flag was a sufficient compromise (Bug #221931).
dev-util/git[subversion] should temporarily depend on dev-util/subversion[-dso] (Bug #238586).
(In reply to comment #13)
> The CHANGES file also has this to say:
> * support 'http-library' (if --enable-runtime-module-search) (r31425, -722)
> so serf support might have something to do with this as well. Don't know.
It's unrelated to this bug.
*** Bug 252463 has been marked as a duplicate of this bug. ***
I just hit a similar bug -- no idea if it's the same.
$ git svn clone https://svn.dune-project.org/svn/dune-common/trunk dune-common
and it aborts like this
r5412 = f2316fb49400a5896fc10cd9fddd2e16e143c784 (git-svn)
Checked out HEAD:
however, I have
auto = 0
in my ~/.gitconfig, so it shouldn't be from cleanup, yes?
I'm on gcc 4.3.3
* dev-util/git [R 126.96.36.199] <target>
-bash-completion -cgi -curl -cvs -doc emacs gtk iconv -mozsha1 perl (-ppcsha1) subversion -threads -tk vim-syntax -webdav -xinetd build_options: -optional_tests split strip
I should add that, in that dir, I get
$ git gc
Counting objects: 29375, done.
Compressing objects: 100% (29265/29265), done.
Writing objects: 100% (29375/29375), done.
Total 29375 (delta 22347), reused 0 (delta 0)
$ git svn fetch
so it shouldn't be from any kind of cleanup.
(In reply to comment #17)
> I just hit a similar bug -- no idea if it's the same.
Do you have dso enabled for your subversion? Try
echo "dev-util/subversion -dso" >> /etc/portage/package.use
emerge -1 subversion
and see whether that solves your issue. If it does, it is the same thing, and you now have a functioning workaround in place. If it does not, it is a different issue, maybe remotely related, but different enough to warrant a new bug report, I think.
> however, I have
> auto = 0
> in my ~/.gitconfig, so it shouldn't be from cleanup, yes?
Don't know about that, maybe some more git-savvy users can tell, otherwise someone would have to check the sources or run things through the debugger to check for calls to dso_cleanup. I don't think the apr pools can be disabled, though, especially as they are on the subversion side and not the git side, so I have a strong feeling this is the same bug and USE=-dso will solve it.
(In reply to comment #19)
> Do you have dso enabled for your subversion? Try
> echo "dev-util/subversion -dso" >> /etc/portage/package.use
> emerge -1 subversion
> and see whether that solves your issue. If it does, it is the same thing, and
> you now have a functioning workaround in place.
I did have dso enabled and disabling it rid me of the segfault, so it seems I have the very same issue, thanks.
Since things are moving very slowly in the subversion upstream, I have reopened bug 238586.
(In reply to comment #21)
> Since things are moving very slowly in the subversion upstream
AFAIK nobody has provided a reproduction script which uses only Subversion Perl bindings, so currently no Subversion developer works on this bug.
I had the same problem. For me, the segfault only triggers with +dso and +sasl. With +dso and -sasl, the segfault is no longer produced. Maybe this helps someone trying to reproduce the problem. I noticed this correlation with the sasl flag only because dmesg showed the segfault occurring in libntlm.so.2.0.22, and this was not evident from backtrace displayed by gdb.
(In reply to comment #23)
> I had the same problem. For me, the segfault only triggers with +dso and +sasl.
> With +dso and -sasl, the segfault is no longer produced. Maybe this helps
> someone trying to reproduce the problem. I noticed this correlation with the
> sasl flag only because dmesg showed the segfault occurring in
> libntlm.so.2.0.22, and this was not evident from backtrace displayed by gdb.
I've always had Subversion with +dso and -sasl, and I've always had the segfault at the end of git-svn operations.
Where is the culprit here? git, svn, apr?
(In reply to comment #25)
> Where is the culprit here? git, svn, apr?
svn or apr. Depends on your point of view. svn is using apr in a way that wasn't originally intended. So you could say that svn is to blame because they use apr for such stuff, but on the other hand, a fix in apr seems at least as reasonable as one in svn, and would help other similar situations as well. At least that was my understanding when last I looked at this.
In http://thread.gmane.org/gmane.comp.java.wicket.devel/22194/focus=22195 Branko Čibej indicated that apr 1.3 might provide a solution by offering pools which are not children of the apr global pool. Today, apr is at 1.4, and looking at its documentation I find apr_pool_create_unmanaged_ex which apparently does just that. It is documented at http://apr.apache.org/docs/apr/1.4/group__apr__pools.html#gaae7212db77bb57f86419cd594f73a92f
So perhaps the proper fix would be having svn use that function when loading neon. That way, the neon library wouldn't get unloaded, and therefore function pointers wouldn't become invalid. As application shutdown will unload a dso as well, I expect no ill effects from this approach. I hope that simple global static variables are a suitable solution to both keep a pointer to the unmanaged dso pool and keep track of whether or not neon has already been loaded.
As I haven't been using git-svn in a while, and as I have compiled my subversion with USE=-dso to avoid this bug here, I haven't seen it in quite a while. Does this still exist?
Yep, I still see it daily
Created attachment 330772 [details, diff]
First stab at a fix
OK, this is a COMPLETELY UNTESTED patch that might fix this issue. I'm not even sure that it will compile, as I don't have the time to compile it now, and I'm not sure when I'll next be able to have a look at this. In theory it should work, by making the global DSO pool unmanaged, thus avoiding unloading of DSOs.
(Will assign to subversion maintainers to check and keep CCed apr and git maintainers)
(In reply to comment #22)
> AFAIK nobody has provided a reproduction script which uses only Subversion
> Perl bindings, so currently no Subversion developer works on this bug.
I had submitted a reproducing command to http://subversion.tigris.org/issues/show_bug.cgi?id=3289 but that command no longer works as the repository it references was moved.
You can use this instead:
$ perl -MSVN::Ra -e \
Created attachment 330839 [details, diff]
(In reply to comment #28)
> […] I'm not even sure that it will compile, […]
OK, my previous patch did indeed not compile, or rather not link. The problem was that I added the function in an #if / #else / #endif section related to APR_POOL_DEBUG, and in my settup this part of the code wasn't even compiled at all, so the linker complained about an unresolved symbol. Now I moved the new function after the #endif.
The result does compile, and does pass the test from comment #30. Looks good to me.
(In reply to comment #31)
Please send it to upstream.
(In reply to comment #32)
> Please send it to upstream.
Attached the patch from comment #31 to the upstream bug report
However, seeing how that report almost no activity at all, I wouldn't hold my breath waiting for upstream to react on this. I guess most distros won't use the dso configuration, so they won't have any issues like Gentoo does. Therefore the issue doesn't appear to be a major concern for upstream either. So I'd vote for fixing this on the distro level, while upstream takes its time with it.