Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug
Bug#: 218378
Alias:
Product:
Component:
Status: RESOLVED
Resolution: FIXED
Assigned To: Python Gentoo Team <python@gentoo.org>
Hardware:
OS:
Version:
Priority:
Severity:
Reporter: Marcin Kurek <morgoth6@gmail.com>
Add CC:
CC:
Remove selected CCs
URL:
Summary:
Status Whiteboard:
Keywords:

Filename Description Type Creator Created Size Actions
emerge-info.txt emerge --info text/plain Marcin Kurek 2008-04-19 09:39 0000 5.47 KB Details
pygtk-try.log Console output from example fail text/plain Marcin Kurek 2008-04-19 09:40 0000 6.93 KB Details
.config .25 config file text/plain Marcin Kurek 2008-04-20 09:44 0000 78.32 KB Details
meminfo-after.txt cat /proc/meminfo text/plain Marcin Kurek 2008-05-22 08:02 0000 748 bytes Details
dmesg.txt dmesg text/plain Marcin Kurek 2008-05-22 08:04 0000 34.00 KB Details
portage-mem.log.gz strace -f -o /tmp/st/portage-mem.log -- emerge gtk-engines-qt text/plain Marcin Kurek 2008-05-22 15:30 0000 264.71 KB Details
python-verbose.txt python -v /usr/bin/emerge text/plain Marcin Kurek 2008-05-28 20:21 0000 5.50 KB Details
python-2.5.2-unicode-listdir.patch python-2.5.2-unicode-listdir.patch patch Duane Griffin 2008-05-31 15:56 0000 851 bytes Details | Diff
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 218378 depends on: Show dependency tree
Bug 218378 blocks:
Votes: 0    Show votes for this bug    Vote for this bug

Additional Comments: (this is where you put emerge --info)


Not eligible to see or edit group visibility for this bug.






View Bug Activity   |   Format For Printing   |   XML   |   Clone This Bug


Description:   Opened: 2008-04-19 09:38 0000
Recently emerge started to fail here with following error:

--------
<root@mordorpc portage> emerge -pv pygtk

These are the packages that would be merged, in order:

Calculating dependencies \Traceback (most recent call last):
  File "/usr/bin/emerge", line 7928, in <module>
    retval = emerge_main()
  File "/usr/bin/emerge", line 7922, in emerge_main
    myopts, myaction, myfiles, spinner)
  File "/usr/bin/emerge", line 7164, in action_build
    retval, favorites = mydepgraph.select_files(myfiles)
  File "/usr/bin/emerge", line 2476, in select_files
    expanded_atoms = self._dep_expand(root_config, x)
  File "/usr/bin/emerge", line 2280, in _dep_expand
    cp_set.update(db.cp_all())
  File "/usr/lib64/portage/pym/portage.py", line 7561, in cp_all
    for y in listdir(oroot+"/"+x, EmptyOnError=1, ignorecvs=1, dirsonly=1):
  File "/usr/lib64/portage/pym/portage.py", line 290, in listdir
    list, ftype = cacheddir(mypath, ignorecvs, ignorelist, EmptyOnError,
followSymlinks)
  File "/usr/lib64/portage/pym/portage.py", line 226, in cacheddir
    list = os.listdir(mypath)
OSError: [Errno 12] Cannot allocate memory: '/usr/portage/net-mail'
--------

This seems to be quite random as I was able to run this command when I run it 4
or 5 times. It seems to happen for random packages and always finally works
fine after some emerge reruns.

Anyway this machine has 2GB of ram then I guess out of memory situation is
quite  impossible as I saw this message when booted in text mode without X. 



Reproducible: Always

Steps to Reproduce:

------- Comment #1 From Marcin Kurek 2008-04-19 09:39:22 0000 -------
Created an attachment (id=150265) [details]
emerge --info

------- Comment #2 From Marcin Kurek 2008-04-19 09:40:53 0000 -------
Created an attachment (id=150266) [details]
Console output from example fail 

As you can see first three commands fails, but next one works fine and another
fails too, etc.

------- Comment #3 From Marcin Kurek 2008-04-20 09:44:18 0000 -------
It seems to be kernel related as I can observe this problem only on 2.6.25
kernel and not on 2.6.24.

------- Comment #4 From Marcin Kurek 2008-04-20 09:44:49 0000 -------
Created an attachment (id=150362) [details]
.25 config file 

------- Comment #5 From Marcin Kurek 2008-05-18 17:03:11 0000 -------
Hmmm, ping ? This is realy anonying when updating system. Any ideas what this
can be or how to debug it ?

------- Comment #6 From Zac Medico 2008-05-18 18:08:46 0000 -------
Unless you show that the problem does not occur with a vanilla kernel (I'm not
sure exactly which kernel sources you are using), it's probably safe to assume
that it's an upstream kernel bug therefore you should be looking to kernel.org
for answers.

------- Comment #7 From Duane Griffin 2008-05-21 12:12:41 0000 -------
If this is a kernel bug it should be assigned to the kernel team. We don't
usually mark kernel issues resolved upstream until they been reported on the
kernel.org bugzilla.

Could you please provide your dmesg and "cat /proc/meminfo" from the system
after it starts showing these symptoms, thanks.

------- Comment #8 From Marcin Kurek 2008-05-22 08:01:19 0000 -------
I guess this can be a portage, bug as this kind of problem should show for
other applications too ? System works perfectly stable for two days now with
heavy usage of deluge, firefox, gcc with no problems.

I can not see anything suspicious in dmesg, but I will attach both files as
suggested.

------- Comment #9 From Marcin Kurek 2008-05-22 08:02:46 0000 -------
Created an attachment (id=153911) [details]
cat /proc/meminfo

------- Comment #10 From Marcin Kurek 2008-05-22 08:04:43 0000 -------
Created an attachment (id=153913) [details]
dmesg

I think there is nothing unusual in it. 

------- Comment #11 From Duane Griffin 2008-05-22 11:51:57 0000 -------
These are from immediately after you saw an OOM? Nothing unusual, as you say.

It could be a bug in portage, but it is strange that it only manifests under
2.6.25. Just to confirm, booting back into 2.6.24 (without changing anything
else) makes the problem go away, right? Could you upload your 2.6.24 config,
please, let's check there weren't any significant changes.

I see you are running with an unstable version of portage, does the problem
still occur if you switch to using 2.1.4.4?

------- Comment #12 From Marcin Kurek 2008-05-22 15:29:00 0000 -------
I tired to look at strace output from faulty call, but I can not see anything
unusual. 

------- Comment #13 From Marcin Kurek 2008-05-22 15:30:23 0000 -------
Created an attachment (id=153949) [details]
strace -f -o /tmp/st/portage-mem.log -- emerge gtk-engines-qt

------- Comment #14 From Daniel Drake 2008-05-22 16:26:49 0000 -------
I looked at the Python source, I think the error is coming from
Modules/posixmodule.c posix_listdir()
        if ((dirp = opendir(name)) == NULL) {
                return posix_error_with_allocated_filename(name);
        }

i.e. opendir() on something is returning NULL, probably /usr/portage/net-mail
opendir() is implemented in libc as a wrapper around open() or something like
that

Then I looked at the strace logs, but it shows that opening
/usr/portage/net-mail quite early on was successful.

It gets to this part:

11723 open("/usr/portage/sec-policy", O_RDONLY|O_NONBLOCK|O_DIRECTORY|0x80000)
= 3
11723 fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
11723 getdents(3, /* 68 entries */, 4096) = 2568
11723 brk(0x2deb000)                    = 0x2d8a000
11723 mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x7f8708afb000
11723 getdents(3, /* 0 entries */, 4096) = 0
11723 close(3)                          = 0
11723 write(2, "Traceback (most recent call last"..., 35) = 35

There are no errors here, and nothing to do with net-mail

libc listdir always does getdents() twice or more in order to decide that it
has finished reading the directory (when it gets a return code of 0). The only
slightly unusual thing is the brk and mmap in the middle, but this seems to be
just python growing its data segment and allocating some anonymously mapped
memory. I don't see why it would have any effect on anything here.

strange...

------- Comment #15 From Duane Griffin 2008-05-28 13:43:09 0000 -------
Very strange. The memory allocation is very suspicious, given the error
reported by python, even though it succeeds(!)

I think the error is coming from just after the opendir, in the for loop
immediately below. The strace logs show getdents are happening, so it must be
inside the loop, doing readdir calls:

        for (;;) {
                Py_BEGIN_ALLOW_THREADS
                ep = readdir(dirp);
                Py_END_ALLOW_THREADS
                if (ep == NULL)
                        break;

Then outside the loop:

        if (errno != 0 && d != NULL) {
                /* readdir() returned NULL and set errno */
                closedir(dirp);
                Py_DECREF(d);
                return posix_error_with_allocated_filename(name); 
        }

Looking at the code one thing that jumps out is the use of errno directly to
check whether the loop was terminated successfully or on error. It looks like
PyEval_RestoreThread takes care not to modify errno, so that seems safe.
However a quick look on the python issue tracker gives:

http://bugs.python.org/issue1608818

This could explain the problem, but only if the path given was unicode. Marcin,
would you be able to recompile python with the patch given in that ticket? If
you need assistance in doing so then I'd be happy to help. It would be very
interesting to see if the problem goes away with it applied.

BTW, regarding net-mail, note that the reporter says it fails on random
directories. In this case it seems to have failed reading
"/usr/portage/sec-policy", but I doubt the particular directory matters much.

------- Comment #16 From Marcin Kurek 2008-05-28 20:18:59 0000 -------
OK, tired this patch and it seems it's a bit broken as it makes python
completly unusable. 'emerge --help' throws:

Traceback (most recent call last):
  File "/usr/bin/emerge", line 31, in <module>
    import portage
  File "/usr/lib64/portage/pym/portage.py", line 20, in <module>
    import copy, errno, os, re, shutil, time, types
ImportError: No module named time

Looking to /usr/lib64/python2.5 directory shows me that lib-dynload directory
was empty (But it was not empty on portage workdir image)

About net-mail directory it indeed fails on random directory in portage only
the stacktrace is similar (Same functions shows and always end's with File
"/usr/lib64/portage/pym/portage.py", line 226, in cacheddir
    list = os.listdir(mypath))

About unicode my system uses unicode by default.

------- Comment #17 From Marcin Kurek 2008-05-28 20:21:20 0000 -------
Created an attachment (id=154619) [details]
python -v /usr/bin/emerge 

Verbose python output after patch

------- Comment #18 From Duane Griffin 2008-05-31 12:58:01 0000 -------
(In reply to comment #16)
> OK, tired this patch and it seems it's a bit broken as it makes python
> completly unusable. 'emerge --help' throws:

Hmm, looks like something went wrong somewhere. The patch is really quite
simple and limited in scope; it certainly shouldn't be causing that sort of
trouble. I've just applied it here without any problem, this is what I did:

ebuild /usr/portage/dev-lang/python/python-2.5.2-r4.ebuild unpack
patch -d /var/tmp/portage/dev-lang/python-2.5.2-r4/work/Python-2.5.2 -p1 <
proposed-patch.txt
ebuild /usr/portage/dev-lang/python/python-2.5.2-r4.ebuild compile install
sudo ebuild /usr/portage/dev-lang/python/python-2.5.2-r4.ebuild qmerge

> About unicode my system uses unicode by default.

Ah, very interesting...

------- Comment #19 From Duane Griffin 2008-05-31 13:42:17 0000 -------
(In reply to comment #18)
> ...I've just applied it here without any problem...

Whoa -- spoke too soon! Sorry about that, I didn't test correctly before. I get
the same error you did. For anyone else playing along at home -- don't follow
those previous instructions.

------- Comment #20 From Duane Griffin 2008-05-31 15:56:58 0000 -------
Created an attachment (id=154963) [details]
python-2.5.2-unicode-listdir.patch

You were right -- the patch was a bit broken. I apologise for not inspecting it
closer or testing it properly before asking you to try it. Here is a working
and tested version, if you don't mind having another go.

------- Comment #21 From Marcin Kurek 2008-06-04 07:05:57 0000 -------
This problem was higly random then I can not be 100% sure it's gone, but I use
portage a few days now with this patch and there was no OOM messages.

I think it was it. ThX for not closing this bug and helping me out witch this
as I propably would never find issue1608818 on python bugzilla as it's quite
ancient now.

------- Comment #22 From Duane Griffin 2008-06-04 13:20:43 0000 -------
Excellent, glad to be of service.

Since this is seems like a fairly critical python bug (for you and anyone else
using unicode, anyway) I'll send it over to the Python team. I'm unfamiliar
with their procedures, but they may want to add the patch to our patch set
and/or try to push it upstream. I've also updated the ticket on the Python bug
tracker and uploaded the working version of the patch.

------- Comment #23 From Marcin Kurek 2008-06-18 07:39:16 0000 -------
I want to confirm that this fixed this portage problem for good :) Also the
same problem appear on my gf machine and I wonder can it be pulled to python
ebuild as soon as possible ?

------- Comment #24 From Marcin Kurek 2008-06-25 16:34:39 0000 -------
I see this patch was not included in new python 2.5.2-r5 as I start to observe
same problem here as soon as I updated python.

------- Comment #25 From Rafal 2008-07-30 07:35:08 0000 -------
(In reply to comment #20)
> Created an attachment (id=154963) [edit] [details]
> python-2.5.2-unicode-listdir.patch
> 
> You were right -- the patch was a bit broken. I apologise for not inspecting it
> closer or testing it properly before asking you to try it. Here is a working
> and tested version, if you don't mind having another go.
> 

Hello

I have this problem to, but I'm green in gentoo, and I don't know what I have
to do whith this: python-2.5.2-unicode-listdir.patch. Can somebody explain me
in easy steps what I have to do ?

------- Comment #26 From Rafal 2008-07-30 12:44:35 0000 -------
I download Python from python.org, change everything wat is written on page:
http://bugs.gentoo.org/attachment.cgi?id=154963&action=diff
Next (./configure, make, make install) and no effect. I have always this same
report when I write emerge (something):

!!! Failed to complete portage imports. There are internal modules for
!!! portage and failure here indicates that you have a problem with your
!!! installation of portage. Please try a rescue portage located in the
!!! portage tree under '/usr/portage/sys-apps/portage/files/' (default).
!!! There is a README.RESCUE file that details the steps required to perform
!!! a recovery of portage.
    No module named _socket

Traceback (most recent call last):
  File "/usr/bin/emerge", line 28, in <module>
    import portage
  File "/usr/lib/portage/pym/portage.py", line 55, in <module>
    import getbinpkg
  File "/usr/lib/portage/pym/getbinpkg.py", line 10, in <module>
    import
htmllib,HTMLParser,string,formatter,sys,os,xpak,time,tempfile,base64,urllib2
  File "/usr/lib/python2.5/urllib2.py", line 92, in <module>
    import httplib
  File "/usr/lib/python2.5/httplib.py", line 71, in <module>
    import socket
  File "/usr/lib/python2.5/socket.py", line 45, in <module>
    import _socket
ImportError: No module named _socket

------- Comment #27 From Tiziano Müller 2008-07-31 14:20:16 0000 -------
Fixed in python-2.5.2-r7, sorry for the delay.

------- Comment #28 From Zac Medico 2008-09-20 15:03:32 0000 -------
*** Bug 238174 has been marked as a duplicate of this bug. ***

Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug