Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 595772 - emerge: Be more informative when a phase is 'killed by signal 7'
Summary: emerge: Be more informative when a phase is 'killed by signal 7'
Status: UNCONFIRMED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Enhancement/Feature Requests (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 910332
  Show dependency tree
 
Reported: 2016-10-01 12:48 UTC by segmentation fault
Modified: 2023-07-14 10:44 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description segmentation fault 2016-10-01 12:48:55 UTC
All of a sudden, after some merges, emerge refused to cooperate and spit:

The ebuild phase 'unpack' has been killed by signal 7

I have not managed to find anything of use to go past this error. Trying with the '--debug' option did not help either: I got a lot of output - but NOT for this failure! For example:

...lots of output here, then:

+ return 0
+ __ebuild_phase post_pkg_setup
+ declare -F post_pkg_setup
+ set +x
^[[31;01m * ^[[39;49;00mThe ebuild phase 'unpack' has been killed by signal 7.

...output that deals with 'die hooks' after the error.

emerge would enter the post_pkg_setup phase, would obviously try to unpack - and fail without telling me why. I have not found any documentation to 'signal 7' either. Hardware, RAM, disk, CPU etc. are all fine.

Finally, after trying 

emerge --resume --nodeps

emerge was able to go a bit further, spit some python errors that were talking about 'mtime' functions (don't have them anymore) indicating that there might be 'no space left on device'!

I was *sure* this could *never* happen to *me* - I check regularly space left with

df

and it has been constantly showing me '36% in use'. And I was sure I had not exhausted the maximum number of files on *that* filesystem, which only held a Gentoo system of 1000 packages on a 160 GB disk...

What did NOT occur to me, however, was to check *inode utilization* with

df -i

which showed: 100%! 2.4 *million* inodes were consumed on my root filesystem. :shock:

Solution
-----------

Delete some files. Check with 'df -i' and 'df'.

I procceded as follows:

I decided to copy and run the du-inodes script, found at:

http://askubuntu.com/questions/316027/find-directories-with-lots-of-files-in

I copied it to:

/usr/local/sbin/du-inodes

and ran it as follows:

du-inodes / > du-inodes-root-filesystem-gentoo-20161001

After some 40 minutes, I was able to find the culprits (mainly some extracted portage snapshots) with:

sort -k1nr < du-inodes-root-filesystem-gentoo-20161001 | head -n 20


Suggestion for enhancement
---------------------------------------

Don't let your users tap in the dark. Maybe there exists some other option, besides '--debug', to let one see the true reason behind such a sudden 'killed by signal 7' error in the 'unpack' phase. To the error

The ebuild phase 'unpack' has been killed by signal 7

you can add a hint:

Disk full? Check with 'df -i' and 'df'.

Took me three days to find out the hard way - that was a fully unnecessary waste of precious time.
Comment 1 Zac Medico gentoo-dev 2016-10-01 13:18:11 UTC
I thought this kind of thing usually triggered ENOSPC. What filesystem are you using?
Comment 2 segmentation fault 2016-10-05 09:09:24 UTC
(In reply to Zac Medico from comment #1)
> I thought this kind of thing usually triggered ENOSPC. What filesystem are
> you using?

This is an ext3 system. 

I was going to write a long post enumerating all my attempts to resolve the "signal 7" error, but it occurred to me that, even though I had recently done

emerge -1 system

I might still need to re-merge tar (I always think that tar is part of the system set, but obviously this is not the case), so I gave it a try:

emerge -1 =app-arch/tar-1.28-r1

After that I tried an emerge command that I knew it would be 'killed by signal 7' during the 'unpack' phase - but it was NOT! 

Thus it *seems* that possible causes of signal 7 may be, among others

- not enough space, killing any process that tries to use some space on disk,
- an installed tar that needs to be re-merged after any upgrades that touch something crucial to it. Note that I had already done
  - revdep-rebuild (more than once, there was nothing to be done the last time)
  - python-updater

so it must be something that these tools do not catch (binutils-related?).

Whatever the true reason, my point here is: please insert some message along the lines of "signal 7 caught - please check a) if you have enough space and inodes with 'df' and 'df -i', then b) re-merge tar and see if the error persists". That is: give your users some hints. They don't have to hit the nail on the head each time - but at least the user has something to try and is not clueless.

Informative messages that give hints do not actually cost anything. It's just a print statement that you have to insert at that point. But they make a huge difference. 

Here is my 'suggestion of the day' (you may nominate it for the "suggestion of the year award" :-)):

Create a new package, say dev-misc/gentoo-hints. It will be a huge database of hints. If installed and enabled, any error caught will trigger a search in that database and will print any hints found there. Start small, build it up day-by-day. Start assembling a (yet another) huge tree of relations:

- error X seams to be related to condition Y.
- condition Y depends on package Z
- error X is mentioned in bugs #XXXXXX, #YYYYYY and #ZZZZZZ
- a partial solution to error X was reported to be...

you get the idea. Then traverse the dependency tree (as portage does with its own dependencies) and print a 'tree of hints'. You might be able to model it quite easily with ebuilds:

Instead of .ebuild files, you have .ehint files. Structure of an .ehint file will be the same as that of an ebuild. But you only use the DEPEND variable - at least for the start. Then:

- Each error gets its own .ehint file, named after the "error key".
- Use available portage functionality to print the 'dependency tree', just like 'emerge --pretend --tree' would do, with the message: 'These are the hints that might be of interest to you:' (in place of 'These are the packages that would be merged, in order:').

Who is going to maintain such a database? Well, we already write tons of infos. We just have to mark them appropriately (a kind of 'semantic web for the poor user'). To this end, each bug might have a button "Add a hint". Upon clicking, one would add a sentence with a hint that is thought to be related to the bug. There you are! Collaborative hint piling! :-) The package maintainers should keep an eye on the hints added to the bugs assigned to them, so that abuse of the system is avoided. A script gathers all hints and packages them automatically into .ehint files and DEPEND variables therein.

With time, you will build a powerful hint system. If enabled, it would provide the user with valuable hints. Amen. :-)