Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 719128 - emerge: handle fetch error gracefully instead of hanging
Summary: emerge: handle fetch error gracefully instead of hanging
Status: CONFIRMED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Core (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-04-23 21:33 UTC by Thomas Capricelli
Modified: 2020-05-01 01:41 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Capricelli 2020-04-23 21:33:20 UTC
My distfiles is shared by NFS on a dozen computers.
On one of them, i have a weird problem, unrelated to this bug report, where network reconnecting kills rpc.statd.

As a result, NFS is half-working: it's ok for reading, even writing, but locks will fail.



Reproducible: Always

Actual Results:  
When emerging, portage will try and fail to lock. Then, it will hang forever. That is
* emerge still displays the package as being emerged
* the lock file in /usr/portage/distfiles/ is there, won't go, will prevent all others to fetch the file


Expected Results:  
Instead of hanging, portage should be properly failing the emerge. Ideally reporting about a fetching error.
Or even better, a lock error while fetching.

/tmp/portage/xxxx/yyyy/temp/build.log actually contains the error. Typically:

!!! Error while waiting to lock '/usr/portage/distfiles/.libvdpau-1.4.tar.bz2.portage_lockfile': [Errno 37] Nessun lock disponibile

So somehow, portage/ebuild is aware that an error happened. But it still hangs.
Comment 1 Zac Medico gentoo-dev 2020-05-01 01:41:06 UTC
Maybe `strace -p <pid>` will show what call it's blocking on? Also you can use `kill -s SIGUSR1 <pid>` to get a pdb shell in the hung process. Then type 'bt' to get a backtrace.

You need to target the pid that is actually hung, since it might have an irrelevant parent process that is waiting for it.