Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 451158

Summary: sys-apps/portage: handle [Errno 28] No space left on device (ENOSPC) errors when there is no space left on /var/tmp/portage
Product: Portage Development Reporter: Enrico Tagliavini <enrico.tagliavini>
Component: CoreAssignee: Portage team <dev-portage>
Status: CONFIRMED ---    
Severity: normal CC: alexanderyt
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: All   
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on:    
Bug Blocks: 456008    

Description Enrico Tagliavini 2013-01-10 10:54:01 UTC
I found this by accident. I was emerging firefox-18 which requires a lot more space then firefox-17. I was using 5.5 GB of tmpfs for /var/tmp/portage. The build failed as expected since the tmpfs was totally filled.

emerge ended its execution with a call trace. Below is the final part of the output

tar: ./libxul.so: Wrote only 1024 of 10240 bytes
./libunicharutil_external_s.a.desc
./libunicharutil_external_s.a
tar: ./libunicharutil_external_s.a.desc: Cannot write: No space left on device
tar: ./libunicharutil_external_s.a: Cannot write: No space left on device
./libxpcomglue_s_nomozalloc.a
tar: ./libxpcomglue_s_nomozalloc.a: Cannot write: No space left on device
./libxpcomglue.a
tar: ./libxpcomglue.a: Cannot write: No space left on device
./libxpcomglue_s.a
tar: ./libxpcomglue_s.a: Cannot write: No space left on device
./libmozalloc.so
./libmozglue.a
tar: ./libmozalloc.so: Cannot write: No space left on device
tar: ./libmozglue.a: Cannot write: No space left on device
./libmemory.a
tar: ./libmemory.a: Cannot write: No space left on device
./libjemalloc.a
tar: ./libjemalloc.a: Cannot write: No space left on device
tar: Exiting with failure status due to previous errors
make[1]: *** [install] Error 2
make[1]: Leaving directory `/var/tmp/portage/www-client/firefox-18.0/work/mozilla-release/obj-x86_64-unknown-linux-gnu/browser/installer'
make: *** [install] Error 2
emake failed
 * ERROR: www-client/firefox-18.0 failed (install phase):
 *   emake install failed
 * 
 * Call stack:
 *     ebuild.sh, line  93:  Called src_install
 *   environment, line 7129:  Called die
 * The specific snippet of code:
 *       MOZ_MAKE_FLAGS="${MAKEOPTS}" emake DESTDIR="${D}" install || die "emake install failed";
 * 
 * If you need support, post the output of `emerge --info '=www-client/firefox-18.0'`,
 * the complete build log and the output of `emerge -pqv '=www-client/firefox-18.0'`.
 * The complete build log is located at '/var/tmp/portage/www-client/firefox-18.0/temp/build.log'.
 * The ebuild environment file is located at '/var/tmp/portage/www-client/firefox-18.0/temp/environment'.
 * Working directory: '/var/tmp/portage/www-client/firefox-18.0/work/mozilla-release/obj-x86_64-unknown-linux-gnu'
 * S: '/var/tmp/portage/www-client/firefox-18.0/work/mozilla-release'
 * QA Notice: Unrecognized configure options:
 * 
 *      --enable-application
 *      --enable-optimize
 *      --with-system-jpeg
 *      --with-system-zlib
 *      --enable-pango
 *      --enable-system-cairo
 *      --disable-pedantic
 *      --disable-updater
 *      --disable-strip
 *      --disable-install-strip
 *      --disable-profilelocking
 *      --enable-default-toolkit
 *      --enable-official-branding
 *      --enable-dbus
 *      --disable-tests
 *      --enable-startup-notification
 *      --disable-system-sqlite
 *      --enable-necko-wifi
 *      --enable-ogg
 *      --enable-wave
 *      --with-system-libvpx
 *      --with-system-nspr
 *      --with-nspr-prefix
 *      --with-system-nss
 *      --with-nss-prefix
 *      --with-system-libevent
 *      --enable-system-hunspell
 *      --disable-gnomevfs
 *      --disable-gnomeui
 *      --enable-gio
 *      --disable-crashreporter
 *      --disable-gconf
 *      --disable-mailnews
 *      --with-system-png
 *      --enable-system-ffi
 *      --with-default-mozilla-five-home
 *      --disable-gstreamer
 *      --disable-system-sqlite
 *      --enable-methodjit
 *      --enable-tracejit
 *      --enable-extensions
 *      --enable-application
 *      --enable-optimize
 *      --with-system-jpeg
 *      --with-system-zlib
 *      --enable-pango
 *      --enable-system-cairo
 *      --disable-pedantic
 *      --disable-updater
 *      --disable-strip
 *      --disable-install-strip
 *      --disable-profilelocking
 *      --enable-default-toolkit
 *      --enable-official-branding
 *      --enable-dbus
 *      --disable-tests
 *      --enable-startup-notification
 *      --disable-system-sqlite
 *      --enable-necko-wifi
 *      --enable-ogg
 *      --enable-wave
 *      --with-system-libvpx
 *      --with-system-nspr
 *      --with-nspr-prefix
 *      --with-system-nss
 *      --with-nss-prefix
 *      --with-system-libevent
 *      --enable-system-hunspell
 *      --disable-gnomevfs
 *      --disable-gnomeui
 *      --enable-gio
 *      --disable-crashreporter
 *      --disable-gconf
 *      --disable-mailnews
 *      --with-system-png
 *      --enable-system-ffi
 *      --with-default-mozilla-five-home
 *      --disable-gstreamer
 *      --disable-system-sqlite
 *      --enable-methodjit
 *      --enable-tracejit
 *      --enable-extensions
Traceback (most recent call last):
  File "/usr/bin/emerge", line 48, in <module>
    retval = emerge_main()
  File "/usr/lib64/portage/pym/_emerge/main.py", line 1044, in emerge_main
    gc_locals=locals().clear)
  File "/usr/lib64/portage/pym/_emerge/actions.py", line 3885, in run_action
    myopts, myaction, myfiles, spinner)
  File "/usr/lib64/portage/pym/_emerge/actions.py", line 464, in action_build
    retval = mergetask.merge()
  File "/usr/lib64/portage/pym/_emerge/Scheduler.py", line 1011, in merge
    rval = self._merge()
  File "/usr/lib64/portage/pym/_emerge/Scheduler.py", line 1396, in _merge
    self._main_loop()
  File "/usr/lib64/portage/pym/_emerge/Scheduler.py", line 1373, in _main_loop
    self._event_loop.iteration()
  File "/usr/lib64/portage/pym/portage/util/_eventloop/EventLoop.py", line 191, in iteration
    if not x.callback(f, event, *x.args):
  File "/usr/lib64/portage/pym/portage/util/_async/PipeLogger.py", line 119, in _output_handler
    self._unregister_if_appropriate(event)
  File "/usr/lib64/portage/pym/_emerge/AbstractPollTask.py", line 129, in _unregister_if_appropriate
    self.wait()
  File "/usr/lib64/portage/pym/_emerge/AsynchronousTask.py", line 57, in wait
    self._wait_hook()
  File "/usr/lib64/portage/pym/_emerge/AsynchronousTask.py", line 161, in _wait_hook
    self._exit_listener_stack.pop()(self)
  File "/usr/lib64/portage/pym/_emerge/SpawnProcess.py", line 142, in _pipe_logger_exit
    self.wait()
  File "/usr/lib64/portage/pym/_emerge/AsynchronousTask.py", line 57, in wait
    self._wait_hook()
  File "/usr/lib64/portage/pym/_emerge/AsynchronousTask.py", line 161, in _wait_hook
    self._exit_listener_stack.pop()(self)
  File "/usr/lib64/portage/pym/_emerge/EbuildPhase.py", line 208, in _ebuild_exit
    self.scheduler.output(msg, log_path=logfile)
  File "/usr/lib64/portage/pym/portage/util/_async/SchedulerInterface.py", line 84, in output
    f.close()
IOError: [Errno 28] No space left on device

Reproducible: Always

Steps to Reproduce:
1. give limited space to /var/tmp/portage (e.g. mount tmpfs, for me 5.5 triggered the problem)
2. emerge something big like firefox-18
3. wait for the space to be filled and the build to fail
4. I think it is possible to speed up this by filling the space manually while emerging with something like dd if=/dev/zero of=/var/tmp/portage/filler bs=1M count=8192 but I have not tried this way
Actual Results:  
emerge exit with a trace

Expected Results:  
emerge should handle the error and exit in a more clean way

This should not be hard to fix. The space is over, so a simple solution can just be try/except this f.close() and pass it. It is the log file, so I think loosing it should not be a problem in the end.

A better solution might be opening the file without buffering, adding a 0 at the end of the open() call. This way the write should fail (and this can be try/excepted) and then the close should work, since no data is pending for the write.
Comment 1 Zac Medico gentoo-dev 2013-01-10 11:07:23 UTC
I think it's fine to exit with a traceback for this kind of error, since it's a severe problem, and it's too much trouble to handle these kinds of exceptions in the multitude of places where they can occur.
Comment 2 Enrico Tagliavini 2013-01-10 11:25:01 UTC
True it is a trouble.... still I think this kind of exit should be avoided if possible. In this case the fix seems very easy to me.... I might be wrong since I don't know the portage code well.... Catching all of the possible weird effects is hard. But this single specific case is worth fixing IMHO.

Having no more space left on the device is not so unusual I think. tmpfs is just a case. You can have your hard driver very filled with the default configuration of the portage tmp dir.
Comment 3 Zac Medico gentoo-dev 2013-01-10 12:43:43 UTC
(In reply to comment #2)
> True it is a trouble.... still I think this kind of exit should be avoided
> if possible.

If the traceback alone is what bothers people, then I think they're a little too concerned with superficiality. OTOH, if it's the fact that the error is fatal, I don't have much sympathy there either, since logging is *very* important (without logging, it can difficult or impossible to diagnose problems).

> In this case the fix seems very easy to me.... I might be wrong
> since I don't know the portage code well.... Catching all of the possible
> weird effects is hard. But this single specific case is worth fixing IMHO.

Does your "fix" involve hiding the traceback and/or making in non-fatal. As explained, those don't really interest me.

> Having no more space left on the device is not so unusual I think. tmpfs is
> just a case. You can have your hard driver very filled with the default
> configuration of the portage tmp dir.

Sure, and with the current behavior, you can easily diagnose the problem and fix it.
Comment 4 Enrico Tagliavini 2013-01-10 19:58:52 UTC
No it is not the traceback alone the problem of course. This is just not elegant IMHO. The problem might be (but from your comment doesn't seems to be the case) emerge just break its execution so it might leave some stuff/process around or who knows what.

(In reply to comment #3)
> Sure, and with the current behavior, you can easily diagnose the problem and
> fix it.

My point was not about that. I was not willing to hide the error, and not even the call trace. Sorry if I just malformed my idea. The try-except-pass was not something you should get litterally. Yes I really explained it very badly. The IOError statement must be printed of course. And you can also print the call trace without interrupting the execution if you wish. I don't think it is usefull to print the call trace to the user but feel free to do so.

But still I think exiting that way is very wrong. For example in the elog there is no trace of the problem. No mention that the space on a device if full. Log are very important you said it.... If emerge can go on with the execution it can write something more to logs just to say. Of course not in case the same device is used for /var/log but in that case you need a logger in ram, like journald. If you use such logger you can still log something. And also, if not now you might need to do some action in the future after such an error. Worth to have an already clean code to work on, easier to maintain.

Also please note I'm just brainstorming, not proposing a definitive solution :)
Comment 5 Zac Medico gentoo-dev 2013-01-11 00:54:19 UTC
(In reply to comment #4)
> No it is not the traceback alone the problem of course. This is just not
> elegant IMHO. The problem might be (but from your comment doesn't seems to
> be the case) emerge just break its execution so it might leave some
> stuff/process around or who knows what.

That's true, it would be much nicer if we handled it like a SIGINT, which causes it to kill all of the subprocesses and close all of the logs.
Comment 6 Enrico Tagliavini 2013-01-11 19:24:31 UTC
(In reply to comment #5)
> That's true, it would be much nicer if we handled it like a SIGINT, which
> causes it to kill all of the subprocesses and close all of the logs.

Sounds like a very good idea