Created attachment 761726 [details] full gdb backtrace When attempting to do some MIPS keywording, I ran into an issue where running "repoman full -d -x" in a package directory on my developer git tree caused repoman to throw a segmentation fault in libpython3.10.so.1.0. I also reproduced it on the main Portage tree that my machines use for emerge, and also under Python-3.9. I was able to gdb out the cause as being in Python's Objects/exceptions.c:237: (gdb) file /usr/bin/python Reading symbols from /usr/bin/python... Reading symbols from /usr/lib/debug//usr/bin/python-exec2c.debug... (gdb) directory /ramfs/portage/dev-lang/python-3.10.1-r3/work/Python-3.10.1 Source directories searched: /ramfs/portage/dev-lang/python-3.10.1-r3/work/Python-3.10.1:$cdir:$cwd (gdb) run /usr/bin/repoman full -d -x Starting program: /usr/bin/python /usr/bin/repoman full -d -x [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". process 21832 is executing new program: /usr/bin/python3.10 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". RepoMan scours the neighborhood... [Detaching after vfork from child process 21836] [Detaching after vfork from child process 21857] [Detaching after vfork from child process 21878] [Detaching after vfork from child process 21899] Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7d1db5f in BaseException_set_tb (_unused_ignored=0x0, tb=<traceback at remote 0x7ffff60a2900>, self=0x0) at Objects/exceptions.c:237 237 Py_XSETREF(self->traceback, tb); The problem is that 'self' is NULL, so attempting to dereference 'self->traceback' kills it. Both my developer tree and my working repo tree sit on a NAS that my machines mount over NFSv4.2. Already tried rebooting both my dev box as well as the NAS itself to rule out the usual culprits, but no dice. Frame #29 looked interesting, as it's the final codepath that repoman hit before the crash: #29 0x00007ffff7e05e16 in _PyEval_EvalFrame (throwflag=0, f=Frame 0x55555587d270, for file /usr/lib/python3.10/site-packages/portage/xml/metadata.py, line 442, in parse_metadata_use (xml_tree=<lxml.etree._ElementTree at remote 0x7ffff628c640>, uselist={}), tstate=0x555555577470) at ./Include/internal/pycore_ceval.h:46 Line 442 in that file: usetags = xml_tree.findall("use") I dropped a pdb.set_trace() call just before that line, and then stepped through the Python code until it segfaulted: # repoman full -d -x RepoMan scours the neighborhood... > /usr/lib/python3.10/site-packages/portage/xml/metadata.py(444)parse_metadata_use() -> usetags = xml_tree.findall("use") (Pdb) step --Call-- > /usr/lib/python3.10/encodings/iso8859_15.py(14)decode() -> def decode(self,input,errors='strict'): (Pdb) > /usr/lib/python3.10/encodings/iso8859_15.py(15)decode() -> return codecs.charmap_decode(input,errors,decoding_table) (Pdb) --Return-- > /usr/lib/python3.10/encodings/iso8859_15.py(15)decode()->('src/lxml/_elementpath.py', 24) -> return codecs.charmap_decode(input,errors,decoding_table) (Pdb) Segmentation fault This indicates that there is some kind of issue with how Python handles the ISO8859-15 locale (Western w/ Euro). To be sure, I changed to ISO8859-1 (Western w/o Euro), and repoman will then work fine. My guess is if it's a core issue with the ISO8859-15 locale, then glibc is probably at fault in some way, but how one debugs that out, I am unsure. This might make it reproducible on your end. Note that to get ISO8859-15, any references to ISO8859-1 need to be commented out in /etc/locale.gen before running locale-gen, otherwise, it will complain about one being the duplicate of the other and it will skip that one (silently).
Created attachment 761727 [details] emerge --info from my dev box
I think you'll need to report this upstream and we go from there. You might want to try construct a minimal reproducer using the lxml Python module though. Does it happen outside of lxml too? Just parsing any string with this encoding?
(In reply to Sam James from comment #2) > I think you'll need to report this upstream and we go from there. > > You might want to try construct a minimal reproducer using the lxml Python > module though. > > Does it happen outside of lxml too? Just parsing any string with this > encoding? I am not sure. I did not do any further debugging after the call chain in Python's debugger stopped at iso8859_15.py(15)decode(), so I don't know why that got to the point where it needed to throw an exception, but pass in a NULL 'self' to CPython. Both Python's codecs and lxml modules are areas of Python I know very little about. Also, one of the pieces of information I am missing is what in the backend Portage metadata was involved that didn't fly with ISO8859-15? Clearly repoman was in the middle of parsing something when it choked on its own exception. Its other modes, like 'manifest', seem to work fine under that codec, just not the 'full' mode.
This segfaults for me if I configure my locale as en_US.ISO-8859-15. ``` #!/usr/bin/env python3 from lxml import etree xml=""" <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE pkgmetadata SYSTEM "https://www.gentoo.org/dtd/metadata.dtd"> <pkgmetadata> <maintainer type="project"> <email>python@gentoo.org</email> </maintainer> <stabilize-allarches/> <upstream> <remote-id type="pypi">build</remote-id> <remote-id type="github">pypa/build</remote-id> </upstream> </pkgmetadata> """ root = etree.fromstring(xml).findall("use") print(root) ```
(In reply to Sam James from comment #4) > This segfaults for me if I configure my locale as en_US.ISO-8859-15. > ``` > #!/usr/bin/env python3 > from lxml import etree > > xml=""" > <?xml version="1.0" encoding="UTF-8"?> > <!DOCTYPE pkgmetadata SYSTEM "https://www.gentoo.org/dtd/metadata.dtd"> > <pkgmetadata> > <maintainer type="project"> > <email>python@gentoo.org</email> > </maintainer> > <stabilize-allarches/> > <upstream> > <remote-id type="pypi">build</remote-id> > <remote-id type="github">pypa/build</remote-id> > </upstream> > </pkgmetadata> > """ > > root = etree.fromstring(xml).findall("use") > print(root) > ``` That does indeed SIGSEGV, but in a completely different file: # gdb (gdb) file /usr/bin/python Reading symbols from /usr/bin/python... Reading symbols from /usr/lib/debug//usr/bin/python-exec2c.debug... (gdb) directory /ramfs/portage/dev-lang/python-3.10.1-r3/work/Python-3.10.1 Source directories searched: /ramfs/portage/dev-lang/python-3.10.1-r3/work/Python-3.10.1:$cdir:$cwd (gdb) run py-sigsegv-202201.py Starting program: /usr/bin/python py-sigsegv-202201.py [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". process 7966 is executing new program: /usr/bin/python3.10 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. __pyx_f_4lxml_5etree__getThreadErrorLog (__pyx_v_name='_GlobalErrorLog') at src/lxml/etree.c:48609 48609 src/lxml/etree.c: No such file or directory. As the last line indicates, for some reason, the lxml source isn't being unpacked or found (I am probably missing a step somewhere -- isn't lxml built into Python these days?), but the segfault in src/lxml/etree.c looks to be somewhat different than the one in Objects/exceptions.c. This means you accidentally found a new bug, but one that happens to have a similar root cause to the one I reported. Hah! I did initially suspect the metadata.xml file, but I think instead, the buggy data is in the tree metadata that is generated on the rsync master and sent down in $PORTDIR/metadata. I *think* that's what repoman is scanning when it chokes, but probably need a portage dev to validate that, as they know repoman's internals a lot better than I do. FWIW, dmesg shows this as similar to my original crash: repoman[8186]: segfault at 20 ip 00007aaffb480b5f sp 00007ffdb5630b00 error 4 in libpython3.10.so.1.0[7aaffb404000+1f3000] Code: 1f 84 00 00 00 00 00 0f 1f 40 00 48 83 ec 08 48 85 f6 74 5e 48 3b 35 a0 62 26 00 74 0d 48 8b 05 07 62 26 00 48 39 46 08 75 2b <4c> 8b 47 20 48 ff 06 48 89 77 20 4d 85 c0 74 05 49 ff 08 74 0c 31 And your testcase: python3[28567]: segfault at 0 ip 00007bc6a34f4abc sp 00007ffdae805130 error 6 in etree.cpython-310-x86_64-linux-gnu.so[7bc6a34ab000+15c000] Code: 24 38 0f 88 76 02 00 00 49 8b 01 48 83 f8 01 0f 84 49 01 00 00 49 89 01 48 ff 09 0f 84 fa 01 00 00 49 ff 0b 0f 84 c6 01 00 00 <49> ff 0a 0f 84 9c 01 00 00 48 8b bb 90 00 00 00 4c 89 f9 4c 89 c2 Different modules, different source, both are NULL dereferences, but both are related to locale being ISO8859-15. Weird! If you want to try to reproduce my error, set your locale to ISO8859-15 and just run "repoman full -d -x" from a package directory within $PORTDIR. If you have an idea of how to backtrace that to whatever data repoman was chewing on, then I will open an upstream bug with both testcases and see what they think.
(In reply to Joshua Kinard from comment #5) > (In reply to Sam James from comment #4) > > This segfaults for me if I configure my locale as en_US.ISO-8859-15. [snip] > > 48609 src/lxml/etree.c: No such file or directory. > > As the last line indicates, for some reason, the lxml source isn't being > unpacked or found (I am probably missing a step somewhere -- isn't lxml > built into Python these days?), but the segfault in src/lxml/etree.c looks > to be somewhat different than the one in Objects/exceptions.c. This means > you accidentally found a new bug, but one that happens to have a similar > root cause to the one I reported. Hah! Try installing dev-python/lxml with debug symbols & installsources. And wow! > > I did initially suspect the metadata.xml file, but I think instead, the > buggy data is in the tree metadata that is generated on the rsync master and > sent down in $PORTDIR/metadata. I *think* that's what repoman is scanning > when it chokes, but probably need a portage dev to validate that, as they > know repoman's internals a lot better than I do. > Yeah, I went into my git checkout's dev-python/build (noticed it from your attached backtrace), ran 'repoman full -dx' and got the crash. You could strace (possibly with limited syscalls like open()) to see if it touches metadata/*. > FWIW, dmesg shows this as similar to my original crash: > repoman[8186]: segfault at 20 ip 00007aaffb480b5f sp 00007ffdb5630b00 error > 4 in libpython3.10.so.1.0[7aaffb404000+1f3000] > Code: 1f 84 00 00 00 00 00 0f 1f 40 00 48 83 ec 08 48 85 f6 74 5e 48 3b 35 > a0 62 26 00 74 0d 48 8b 05 07 62 26 00 48 39 46 08 75 2b <4c> 8b 47 20 48 ff > 06 48 89 77 20 4d 85 c0 74 05 49 ff 08 74 0c 31 > > And your testcase: > python3[28567]: segfault at 0 ip 00007bc6a34f4abc sp 00007ffdae805130 error > 6 in etree.cpython-310-x86_64-linux-gnu.so[7bc6a34ab000+15c000] > Code: 24 38 0f 88 76 02 00 00 49 8b 01 48 83 f8 01 0f 84 49 01 00 00 49 89 > 01 48 ff 09 0f 84 fa 01 00 00 49 ff 0b 0f 84 c6 01 00 00 <49> ff 0a 0f 84 9c > 01 00 00 48 8b bb 90 00 00 00 4c 89 f9 4c 89 c2 > > Different modules, different source, both are NULL dereferences, but both > are related to locale being ISO8859-15. Weird! > > If you want to try to reproduce my error, set your locale to ISO8859-15 and > just run "repoman full -d -x" from a package directory within $PORTDIR. If > you have an idea of how to backtrace that to whatever data repoman was > chewing on, then I will open an upstream bug with both testcases and see > what they think. Reproduced that too! I'd try shoving a print() on xml_tree at: >Line 442 in that file: > usetags = xml_tree.findall("use")
Okay, I was wrong, you weren't. I should have read the top of metadata.py in repoman's source, and it is looking at the metadata.xml files in the package directories, and not the tree-generated metadata. That said, this "bug" only manifests under some weird conditions, primarily when lxml.etree itself attempts to throw a traceback. If the traceback comes out of core Python, there is no segfault. So it is some kind of interaction between lxml and Python going on. And, I think it is limited to just Gentoo systems. In your provided testcase, because you use the triple quote to define the "xml" variable, that causes your sample XML to begin with a newline, which lxml really doesn't like. On an unaffected system, lxml will complain: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "src/lxml/etree.pyx", line 3252, in lxml.etree.fromstring File "src/lxml/parser.pxi", line 1912, in lxml.etree._parseMemoryDocument File "src/lxml/parser.pxi", line 1793, in lxml.etree._parseDoc File "src/lxml/parser.pxi", line 1082, in lxml.etree._BaseParser._parseUnicodeDoc File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError File "<string>", line 2 lxml.etree.XMLSyntaxError: XML declaration allowed only at the start of the document, line 2, column 6 On an affected system, if using the ISO-8859-15 locale, you get a segmentation fault. Strangely enough, if you fix that issue, the segmentation fault doesn't happen: Traceback (most recent call last): File "./sam.py", line 18, in <module> root = etree.fromstring(xml).findall("use") ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration. I went on to create another testcase that uses etree.parse() incorrectly: #!/usr/bin/env python3 from lxml import etree x=b'<foo>bar</foo>' x2=etree.parse(x) etree.parse() expects a file path, not an XML string, so it should naturally return this exception trace: Traceback (most recent call last): File "/usr/portage/local/sys-kernel/mips-sources/./x.py", line 5, in <module> x2=etree.parse(x) File "src/lxml/etree.pyx", line 3536, in lxml.etree.parse File "src/lxml/parser.pxi", line 1875, in lxml.etree._parseDocument File "src/lxml/parser.pxi", line 1901, in lxml.etree._parseDocumentFromURL File "src/lxml/parser.pxi", line 1805, in lxml.etree._parseDocFromFile File "src/lxml/parser.pxi", line 1177, in lxml.etree._BaseParser._parseDocFromFile File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult File "src/lxml/parser.pxi", line 652, in lxml.etree._raiseParseError OSError: Error reading file '<foo>bar</foo>': failed to load external entity "<foo>bar</foo>" But if our locale is set to ISO-8859-15, we segfault. Both your testcase and my testcase fault in the same C function. It turns out, though, that lxml generates its C sourcecode from Python metacode. Learning that, I pointed GDB at the generated source and discovered that the generated code is not very helpful: Program received signal SIGSEGV, Segmentation fault. __pyx_f_4lxml_5etree__getThreadErrorLog (__pyx_v_name='_GlobalErrorLog') at src/lxml/etree.c:48609 warning: Source file is more recent than executable. 48609 __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0; (gdb) list 48604 __Pyx_XDECREF(((PyObject *)__pyx_r)); 48605 __Pyx_INCREF(((PyObject *)__pyx_v_log)); 48606 __pyx_r = ((struct __pyx_obj_4lxml_5etree__BaseErrorLog *)__pyx_v_log); 48607 __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; 48608 __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0; 48609 __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0; 48610 goto __pyx_L7_except_return; 48611 } 48612 goto __pyx_L6_except_error; 48613 __pyx_L6_except_error:; (gdb) This source is generated from src/lxml/parser.pxi in the lxml source, and the two testcases above fault at two different exception points in the '_raiseParseError' function: Your case: raise error_log._buildParseException(XMLSyntaxError, u"Document is not well formed") My case: raise IOError, message This kinda leads me to think the real fault lies in Objects/exceptions.c, where CPython is not checking a pointer for NULL before attempting to dereference it. Why the call to 'findall' on an "lxml.etree._ElementTree object" gets all the way into Objects/exception.c baffles me. To that end, I have managed to work out a testcase for my original issue: test.xml: <?xml version="1.0" encoding="UTF-8"?> <foo>bar</foo> test.py: #!/usr/bin/env python3 from lxml import etree x=etree.parse("./test.xml") x.findall("foo") I tested this on both my Gentoo dev box (amd64) and my SGI machine (mips), and both return a segmentation fault. The MIPS machine actually hints that the real bug may be in lxml's _elementpath.py file somewhere: do_page_fault(): sending SIGSEGV to python3 for invalid read access from 0000000000000010 epc = 0000000076e65d18 in libpython3.10.so.1.0[76dd0000+310000] ra = 000000007615d754 in _elementpath.cpython-310-mips64-linux-gnuabin32.so[76150000+30000] Looking at _elementpath.py, the "findall" function is just a stub for "iterfind": def findall(elem, path, namespaces=None): return list(iterfind(elem, path, namespaces)) def iterfind(elem, path, namespaces=None): selector = _build_path_iterator(path, namespaces) result = iter((elem,)) for select in selector: result = select(result) return result This is really base-level Python here, so I'm not sure if the real issue is hiding further down in _build_path_iterator or if bad C code is being generated. If I could run lxml as a pure Python module instead of generated C code, I think it'd be easier to trace down (or at least confirm it's bad C code). That said, this issue is only happening on my Gentoo systems. I ran this test on a FreeBSD 13.0-RELEASE-p6 machine under locale ISO8859-15, and it does not segfault, returning a proper traceback. It was using Python 3.8.x. On a Devuan Linux 4 system, after setting the locale to ISO8859-15, it also doesn't segfault on the test case and returns proper tracebacks. That system has Python 3.9. Ran the test on a CentOS 7 VM as well and got the same results, under Python 3.8. At this point, the only thing I can think of is it is either compiler flags causing bad code to get emitted by gcc, or we've got a patch in one of the system libraries causing an issue. In any event, this looks to be more of a general issue in dev-python/lxml and not an issue in repoman. Want me to close this bug and open a new one under the right bug section for lxml?
FWIW, I have pretty much ruled out the cause being in either gcc-11.2.1 or gcc-10.3.1, as I have tested both out in rebuilding Python-3.10 and dev-python/lxml. Still segfaulting on the test cases. I also backed my CFLAGS all the way down to just "-O0 -pipe", still segfaulting. Also tried lxml-4.6.5, segfault there, too. Dropping to O0 did expand a little bit more on the crash from your testcase if there's a newline at the beginning of the string, as I now have some visibility into the _Py_DECREF call: Originally, I couldn't get any further than here: #1 0x00007ffff701a5cd in __pyx_f_4lxml_5etree__getThreadErrorLog (__pyx_v_name='_GlobalErrorLog') at src/lxml/etree.c:48342 warning: Source file is more recent than executable. 48342 __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0; (gdb) list 48337 __Pyx_XDECREF(((PyObject *)__pyx_r)); 48338 __Pyx_INCREF(((PyObject *)__pyx_v_log)); 48339 __pyx_r = ((struct __pyx_obj_4lxml_5etree__BaseErrorLog *)__pyx_v_log); 48340 __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0; 48341 __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0; 48342 __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0; 48343 goto __pyx_L7_except_return; 48344 } 48345 goto __pyx_L6_except_error; 48346 __pyx_L6_except_error:; (gdb) With O0, now I can see this: #0 0x00007ffff6fded08 in _Py_DECREF (op=0x0) at /usr/include/python3.10/object.h:492 492 if (--op->ob_refcnt != 0) { (gdb) list 487 // Non-limited C API and limited C API for Python 3.9 and older access 488 // directly PyObject.ob_refcnt. 489 #ifdef Py_REF_DEBUG 490 _Py_RefTotal--; 491 #endif 492 if (--op->ob_refcnt != 0) { 493 #ifdef Py_REF_DEBUG 494 if (op->ob_refcnt < 0) { 495 _Py_NegativeRefcount(filename, lineno, op); 496 } 'op' is NULL, so an obvious '--op' will definitely segfault. Still uncertain how lxml gets to this point. Vars __pyx_t_5, __pyx_t_7, and __pyx_t_8 are all 0, but only __pyx_t_8 is causing the segfault. I can't tell if it's supposed to be a pointer or a standard integer or such. I am going to try running these testcases under pypy3 (once it decides to stop drawing pretty ASCII pictures in my console), as looking around the lxml source, that will disable the Cython-generated portions, and maybe that will, if the bugs can be reproduced in some fashion, let me trace lxml's raw python code.
We may be able to disable the Cython goo with WITH_CYTHON=false (see also bug 685768, think it may be automagic right now). Also, I can't say this is necessarily it, but there's some reported compatibility problems with lxml and newer libxslt/libxml2, and it's possible that e.g. FreeBSD didn't upgrade to the problematic versions of those.
(In reply to Sam James from comment #9) > We may be able to disable the Cython goo with WITH_CYTHON=false (see also > bug 685768, think it may be automagic right now). > > Also, I can't say this is necessarily it, but there's some reported > compatibility problems with lxml and newer libxslt/libxml2, and it's > possible that e.g. FreeBSD didn't upgrade to the problematic versions of > those. I was looking at lxml's ebuild to see if I could figure out how to disable Cython, but I am not very familiar w/ how our dev-python/* ebuilds work when it comes to configuring dependencies. Call it being way too used to autoconf-based --disable-foo/--without-foo magic. Trying pypy3 out looked to be the quickest way to either get a non-cython build or rely on upstream's generated C files. Starting to think pypy3 is not going to be a quick way, because it is still drawing....fractals (I think) in my console window. I've got a few more ideas to try, including dropping to Python-3.8, before I give up and put something together for lxml upstream. I am worried that they may claim it's a fault unique to us and refuse to help, though. So want to test all the things I am capable of testing beforehand. As far as the FreeBSD attempt went, I only tested the prebuilt binpkg versions and did not attempt to install from Ports. The dependent packages their pkg tool wanted to install were: # pkg install py38-lxml Updating FreeBSD repository catalogue... FreeBSD repository is up to date. All repositories are up to date. Checking integrity... done (0 conflicting) The following 4 package(s) will be affected (of 0 checked): New packages to be INSTALLED: libgcrypt: 1.9.4 libgpg-error: 1.43 libxslt: 1.1.34_2 py38-lxml: 4.7.1 Number of packages to be installed: 4 The process will require 16 MiB more space. So it's not pulling in libxml2 by default. I haven't looked at libxslt yet, as I'm somewhat convinced the bug is definitely within lxml itself. The test cases I have don't even use XSLT. I've reduced the XML test text down to just two lines, the standard XML DOCTYPE and "<foo>bar</foo>". I'll attach them in a few minutes.
Created attachment 762178 [details] lxml test case #1, etree.parse() XML from a file First test case that demonstrates the original SIGSEGV when locale is set to ISO-8859-15 on a Gentoo system. This test uses lxml.etree.parse() to read two lines of XML from the test1.xml, then call the 'findall' method of a bound lxml.etree._ElementTree object. The failure will be in Python core, Objects/exceptions.c:237, BaseException_set_tb, because argument 'self' is NULL and a dereference is attempted. Note: the second test case can also be triggered here if the file argument passed to etree.parse() is missing.
Created attachment 762179 [details] lxml test case #1 test file Test file for lxml test case #1
OK, that libxslt version is ~same as ours. `WITH_CYTHON=false ebuild lxml-4.7.1.ebuild clean merge` avoids Cython for me.
Created attachment 762180 [details] lxml test case #2, XML begins w/ newline Second test case that demonstrates another SIGSEGV when locale is set to ISO-8859-15 on a Gentoo system. This test uses lxml.etree.fromstring() to read two lines of XML from a variable. If the XML in that variable begins with a newline, it causes an exception in lxml because lxml expects the first line to begin with "<?xml ...", and the throwing of that exception leads to a SIGSEGV in Python core because a _Py_DECREF() call was passed a NULL pointer variable that it attempts to pre-decrement and dereference.
(In reply to Sam James from comment #13) > OK, that libxslt version is ~same as ours. > > `WITH_CYTHON=false ebuild lxml-4.7.1.ebuild clean merge` avoids Cython for > me. I'll give this a try, as pypy3 failed to compile and I am not going to try and chase that one down.
(In reply to Joshua Kinard from comment #15) > (In reply to Sam James from comment #13) > > OK, that libxslt version is ~same as ours. > > > > `WITH_CYTHON=false ebuild lxml-4.7.1.ebuild clean merge` avoids Cython for > > me. > > I'll give this a try, as pypy3 failed to compile and I am not going to try > and chase that one down. Doesn't look like this works: * python3_10: running distutils-r1_run_phase python_compile python3.10 setup.py build -j 14 warning: src/lxml/xmlerror.pxi:657:22: local variable 'args' referenced before assignment warning: src/lxml/xmlerror.pxi:658:69: local variable 'args' referenced before assignment warning: src/lxml/xmlerror.pxi:659:20: local variable 'args' referenced before assignment warning: src/lxml/xmlerror.pxi:664:22: local variable 'args' referenced before assignment warning: src/lxml/xmlerror.pxi:665:73: local variable 'args' referenced before assignment warning: src/lxml/xmlerror.pxi:666:20: local variable 'args' referenced before assignment warning: src/lxml/xmlerror.pxi:671:22: local variable 'args' referenced before assignment warning: src/lxml/xmlerror.pxi:672:73: local variable 'args' referenced before assignment warning: src/lxml/xmlerror.pxi:673:20: local variable 'args' referenced before assignment Building lxml version 4.7.1. Building with Cython 0.29.26. Building against libxml2 2.9.12 and libxslt 1.1.34 Compiling src/lxml/etree.pyx because it changed. Compiling src/lxml/objectify.pyx because it changed. Compiling src/lxml/builder.py because it changed. Compiling src/lxml/_elementpath.py because it changed. Compiling src/lxml/html/diff.py because it changed. Compiling src/lxml/html/clean.py because it changed. Compiling src/lxml/sax.py because it changed. [1/7] Cythonizing src/lxml/_elementpath.py [2/7] Cythonizing src/lxml/builder.py [3/7] Cythonizing src/lxml/etree.pyx [4/7] Cythonizing src/lxml/html/clean.py [5/7] Cythonizing src/lxml/html/diff.py [6/7] Cythonizing src/lxml/objectify.pyx [7/7] Cythonizing src/lxml/sax.py [snip]
It looks like setup.py needs --without-cython passed to it. One of the files mentions that the source distribution is supposed to have pre-generated C files so that building without a cython dependency is possible, but in 4.7.1, at least our version, those pre-generated files are missing: Building lxml version 4.7.1. WARNING: Trying to build without Cython, but pre-generated 'src/lxml/etree.c' is not available. WARNING: Trying to build without Cython, but pre-generated 'src/lxml/objectify.c' is not available. WARNING: Trying to build without Cython, but pre-generated 'src/lxml/builder.c' is not available. WARNING: Trying to build without Cython, but pre-generated 'src/lxml/_elementpath.c' is not available. WARNING: Trying to build without Cython, but pre-generated 'src/lxml/html/diff.c' is not available. WARNING: Trying to build without Cython, but pre-generated 'src/lxml/html/clean.c' is not available. WARNING: Trying to build without Cython, but pre-generated 'src/lxml/sax.c' is not available. I unpacked the source tarball of lxml-4.7.1, and 'find . -name \*.c' returns no results. Ditto for lxml-4.6.5. So it looks like upstream is no longer pre-generating those C files, and the build system does not appear to allow a pure Python installation (e.g., even when passing --without-cython, it still attempts to call the C compiler).
(ugh, yes, it looks like it's now hard required.)
Also just tested building dev-python/lxml with cython-0.29.25, segfault in both test cases. I think that's the end of my available rabbit holes, so I guess I will look up how lxml likes having bugs filed and open something up with upstream and see what they say.
I am going to obsolete this bug since repoman is being deprecated as the primary dev tool. If someone else wants to investigate this, I can pass some notes over, though I think everything relevant is already included in this bug.
I don't think this is repoman-specific. The crash is in lxml.
seems to be solved in lxml https://bugs.launchpad.net/lxml/+bug/1972907 Could be nice to push lxml-4.9.0-2
(In reply to Christophe PEREZ from comment #22) > seems to be solved in lxml https://bugs.launchpad.net/lxml/+bug/1972907 > > Could be nice to push lxml-4.9.0-2 I bumped lxml to 4.9.0 earlier and wheel versions shouldn't matter. So, apparently we're done!
(thanks for finding that!)
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=fc6127e07d7aeeb535e85944400d6471282caad4 commit fc6127e07d7aeeb535e85944400d6471282caad4 Author: Sam James <sam@gentoo.org> AuthorDate: 2022-06-01 04:59:37 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2022-06-01 05:06:03 +0000 dev-python/lxml: revbump w/ tigher cython dep to avoid miscompile Generated bad exception handling code. Closes: https://bugs.gentoo.org/830882 Signed-off-by: Sam James <sam@gentoo.org> dev-python/lxml/{lxml-4.9.0.ebuild => lxml-4.9.0-r1.ebuild} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
(In reply to Sam James from comment #24) > (thanks for finding that!) You're welcome ! ;) https://github.com/streamlink/streamlink/issues/4562 is MY bug ! :D