Summary: | sys-libs/glibc-2.17 - python2.7: relocation error: /lib/libresolv.so.2: symbol __sendmmsg, version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Toralf Förster <toralf> |
Component: | [OLD] Core system | Assignee: | Portage team <dev-portage> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | Martin.vGagern, toolchain |
Priority: | Normal | Keywords: | InVCS |
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
See Also: | https://bugs.gentoo.org/show_bug.cgi?id=464104 | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 445274 | ||
Attachments: |
sys-libs:glibc-2.17:20121227-142120.log.gz
for demonstration only, add proxy to call dlclose from __del__ |
Description
Toralf Förster
2012-12-27 15:38:54 UTC
what do you see if you run the commands as shown below ? $ file -L /lib/libc.so.6 /lib/libc.so.6: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.16, stripped $ readelf -sW /lib/libc.so.6 | grep sendmmsg 1158: 000f58e0 168 FUNC GLOBAL DEFAULT 11 __sendmmsg@@GLIBC_PRIVATE 1187: 000f58e0 168 FUNC WEAK DEFAULT 11 sendmmsg@@GLIBC_2.14 $ readelf -sW /lib/libresolv.so.2 | grep sendmmsg 45: 00000000 0 FUNC GLOBAL DEFAULT UND __sendmmsg@GLIBC_PRIVATE (9) $ /lib/libc.so.6 GNU C Library (GNU libc) stable release version 2.17, by Roland McGrath et al. Copyright (C) 2012 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 4.7.2. Compiled on a Linux 3.7.0 system on 2012-12-25. Available extensions: C stubs add-on version 2.1.2 crypt add-on version 2.1 by Michael Glad and others Gentoo patchset 1 GNU Libidn by Simon Josefsson Native POSIX Threads Library by Ulrich Drepper et al BIND-8.2.3-T5B libc ABIs: UNIQUE IFUNC For bug reporting instructions, please see: <http://www.gnu.org/software/libc/bugs.html>. hmm, actually that was with a 32bit multilib. i just built 32bit native and see slightly different output: $ readelf -sW /lib/libc.so.6 | grep sendmmsg 1181: 000eb4a0 192 FUNC GLOBAL DEFAULT 11 sendmmsg@@GLIBC_2.14 $ readelf -sW /lib/libresolv.so.2 | grep sendmmsg 38: 00000000 0 FUNC GLOBAL DEFAULT UND sendmmsg@GLIBC_2.14 (13) it's a little disconcerting that the native 32bit is diff from the multilib 32bit Here're my outputs : tfoerste@n22 ~/devel/linux $ sudo ~/workspace/bin/chr_uml.sh -r /home/tfoerste/virtual/uml/n22unst4 n22 ~ # file -L /lib/libc.so.6 /lib/libc.so.6: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.16, stripped n22 ~ # n22 ~ # readelf -sW /lib/libc.so.6 | grep sendmmsg 1158: 000f8410 192 FUNC GLOBAL DEFAULT 12 __sendmmsg@@GLIBC_PRIVATE 1187: 000f8410 192 FUNC WEAK DEFAULT 12 sendmmsg@@GLIBC_2.14 n22 ~ # n22 ~ # readelf -sW /lib/libresolv.so.2 | grep sendmmsg 45: 00000000 0 FUNC GLOBAL DEFAULT UND __sendmmsg@GLIBC_PRIVATE (9) n22 ~ # n22 ~ # /lib/libc.so.6 GNU C Library (GNU libc) stable release version 2.17, by Roland McGrath et al. Copyright (C) 2012 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 4.7.2. Compiled on a Linux 3.7.0 system on 2012-12-27. Available extensions: C stubs add-on version 2.1.2 crypt add-on version 2.1 by Michael Glad and others Gentoo patchset 1 GNU Libidn by Simon Josefsson Native POSIX Threads Library by Ulrich Drepper et al BIND-8.2.3-T5B libc ABIs: UNIQUE IFUNC For bug reporting instructions, please see: <http://www.gnu.org/software/libc/bugs.html>. (In reply to comment #3) looks to me like the library is correct. maybe python had libc.so loaded in memory, updated glibc, then invoked some code where it needed to load libresolv.so dynamically ? does python still fail ? i.e. can you do `emerge pax-utils portage-utils` (or some other simple packages) and have it work ? (In reply to comment #4) > (In reply to comment #3) > > looks to me like the library is correct. maybe python had libc.so loaded in > memory, updated glibc, then invoked some code where it needed to load > libresolv.so dynamically ? yes - I think too > does python still fail ? i.e. can you do `emerge pax-utils portage-utils` > (or some other simple packages) and have it work ? no - works fine so far. looks like bad interaction with portage then when upgrading glibc i'd hate to think we'd have to mark the system C library as a reload check point like we do with portage itself, but maybe we don't have a choice :( FWIW the package itself was installed fine, but a lock file wasn't removed from the /var/tmp/portage directory IIRC (In reply to comment #4) > looks to me like the library is correct. maybe python had libc.so loaded in > memory, updated glibc, then invoked some code where it needed to load > libresolv.so dynamically ? It could be the code from bug #439584, which uses ctypes to load libc.so into memory, and caches it: http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=978f3c6d514f0fcf9329d536cc43cf1230e23112 http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=10b6d0129d062f4d5d8a7611023c3f8cc43f1eab Here's a patch to disable caching of libraries: http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=9e37cca4f54260bd8c45a3041fcee00938c71649 (In reply to comment #9) how about keeping the handle cached, but check things like its stat date ? look at mtime/ctime/size/inode ? (In reply to comment #10) > (In reply to comment #9) > > how about keeping the handle cached, but check things like its stat date ? > look at mtime/ctime/size/inode ? I thought about that, but I didn't see a convenient way to get the exact path of the library file, since ctypes.util.find_library('c') only returns 'libc.so.6'. Maybe it's not worth caching the handle anyway. (In reply to comment #11) > (In reply to comment #10) > > (In reply to comment #9) > > > > how about keeping the handle cached, but check things like its stat date ? > > look at mtime/ctime/size/inode ? > > I thought about that, but I didn't see a convenient way to get the exact > path of the library file, since ctypes.util.find_library('c') only returns > 'libc.so.6'. How about parsing output of '/sbin/ldconfig -p'? That's one of the mechanisms utilized by find_library too. > Maybe it's not worth caching the handle anyway. (In reply to comment #12) > How about parsing output of '/sbin/ldconfig -p'? That's one of the > mechanisms utilized by find_library too. I'd prefer to let python's ctypes api handle the parsing, if possible. Anyway, the c library seems to load fairly quickly. _ctypes internally uses C dlopen(), which has its own cache of library handles. Deletion of ctypes.CDLL objects does not trigger call to C dlclose(). Direct call to _ctypes.dlclose() would be needed. (See also http://bugs.python.org/issue14597.) # python Python 3.3.0+ (3.3:44a4f9289faa+, Dec 30 2012, 03:50:43) [GCC 4.7.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import ctypes, _ctypes, os >>> os.system("gcc -xc - -fPIC -shared -o /usr/lib64/libtest.so <<< 'void func1(){}'") 0 >>> l = ctypes.cdll.LoadLibrary("libtest.so") >>> l <CDLL 'libtest.so', handle 1b4f8f0 at 7f9f8e9b7250> >>> l._handle 28637424 >>> l.func1 <_FuncPtr object at 0x7f9f8ea5b6d0> >>> del l >>> _ctypes.dlclose(28637424) >>> _ctypes.dlclose(28637424) Traceback (most recent call last): File "<stdin>", line 1, in <module> OSError: ion: shared object not open >>> l = ctypes.cdll.LoadLibrary("libtest.so") >>> l <CDLL 'libtest.so', handle 1b4f8f0 at 7f9f8ea26b50> >>> l._handle 28637424 >>> l.func1 <_FuncPtr object at 0x7f9f8ea5b7a0> >>> os.system("gcc -xc - -fPIC -shared -o /usr/lib64/libtest.so <<< 'void func2(){}'") 0 >>> l = ctypes.cdll.LoadLibrary("libtest.so") >>> l <CDLL 'libtest.so', handle 1b4f8f0 at 7f9f8ea26c50> >>> l._handle 28637424 >>> l.func1 <_FuncPtr object at 0x7f9f8ea5b870> >>> l.func2 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python3.3/ctypes/__init__.py", line 366, in __getattr__ func = self.__getitem__(name) File "/usr/lib64/python3.3/ctypes/__init__.py", line 371, in __getitem__ func = self._FuncPtr((name_or_ordinal, self)) AttributeError: /usr/lib64/libtest.so: undefined symbol: func2 >>> _ctypes.dlclose(28637424) >>> _ctypes.dlclose(28637424) >>> _ctypes.dlclose(28637424) Traceback (most recent call last): File "<stdin>", line 1, in <module> OSError >>> l = ctypes.cdll.LoadLibrary("libtest.so") >>> l <CDLL 'libtest.so', handle 1b50350 at 7f9f8ea26d10> >>> l._handle 28640080 >>> l.func1 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python3.3/ctypes/__init__.py", line 366, in __getattr__ func = self.__getitem__(name) File "/usr/lib64/python3.3/ctypes/__init__.py", line 371, in __getitem__ func = self._FuncPtr((name_or_ordinal, self)) AttributeError: /usr/lib64/libtest.so: undefined symbol: func1 >>> l.func2 <_FuncPtr object at 0x7f9f8ea5b940> (In reply to comment #14) > _ctypes internally uses C dlopen(), which has its own cache of library > handles. Deletion of ctypes.CDLL objects does not trigger call to C > dlclose(). Direct call to _ctypes.dlclose() would be needed. (See also > http://bugs.python.org/issue14597.) We could just ignore this for now, since when emerge updates packages, the only library that's loaded is always in a subprocess which forked from MergeProcess. That subprocess exits soon after loading the library, and the dlopen cache will disappear with it. Created attachment 333736 [details, diff] for demonstration only, add proxy to call dlclose from __del__ This patch demonstrates that we could use a proxy to call dlclose automatically. Here's a demonstration showing that if we don't hold a reference to the library, it will be deleted when _get_syncfs() returns. Even after dlclose is called by __del__, it seems that it's still possible to successfully call the syncfs _FuncPtr object (though it's probably not very safe practice). In practice, we wouldn't want to call dlclose until after we've called the syncfs. Python 2.7.3 (default, May 5 2012, 23:40:16) [GCC 4.5.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from portage.dbapi.vartree import _get_syncfs >>> syncfs = _get_syncfs() _LibraryHandleProxy __del__ 4152121656 >>> syncfs <_FuncPtr object at 0xa01b1cc> >>> f = open('/tmp/foo', 'wb') >>> syncfs(f.fileno()) 0 For additional safety, I've added an additional fork for every time that we load a library with ctypes: http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=7ebb2f54877edb28621c33e380f8777b1b1dc201 (In reply to comment #17) can't we do something more pythonic ? like attempt an operation and if it fails, catch the exception and retry ? this error case happens very infrequently. (In reply to comment #18) Experience seems to indicate that ctypes is prone to putting the process in an irrecoverable state (leading to a segfault, relocation error, or who knows what). For things that are fragile like this, I typically isolate them in a subprocess in order to protect the main process from corruption. For some kinds corruption, there's no other good way to recover that I'm aware of. This is fixed in 2.1.11.39 and 2.2.0_alpha150. *** Bug 464104 has been marked as a duplicate of this bug. *** (In reply to comment #20) > This is fixed in 2.1.11.39 and 2.2.0_alpha150. Note that bug #464104, which was marked a duplicate of this one here, was reported against portage 2.2.0_alpha169. So it seems the fix either didn't work or there was a regression reintroducing the issue. (In reply to comment #22) > (In reply to comment #20) > > This is fixed in 2.1.11.39 and 2.2.0_alpha150. > > Note that bug #464104, which was marked a duplicate of this one here, was > reported against portage 2.2.0_alpha169. So it seems the fix either didn't > work or there was a regression reintroducing the issue. It seems like a different code path is triggering bug 464104, so I've re-opened it for investigation. |