Summary: | sys-apps/portage: emerge sync verification with gemato raised unhandled OSError: [Errno 5] Input/output error | ||
---|---|---|---|
Product: | Portage Development | Reporter: | Zoltan Puskas <zoltan> |
Component: | Core | Assignee: | Portage team <dev-portage> |
Status: | CONFIRMED --- | ||
Severity: | normal | CC: | gentoo, jstein |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
Zoltan Puskas
2021-04-16 12:39:25 UTC
An EIO error is generally unexpected here. It is a form of verification error, and we really should log the full traceback due to the unexpected nature of the error. Do you know the root cause of your EIO error? It could have been due to a hardware or file system problem. If it's not reproducible, and you've determined that the underlying file system and media are healthy, then there's nothing to do except possibly consider how you can prevent this from occurring again (some form of RAID with error correction could possibly help). (In reply to Zac Medico from comment #2) > Do you know the root cause of your EIO error? It could have been due to a > hardware or file system problem. If it's not reproducible, and you've > determined that the underlying file system and media are healthy, then > there's nothing to do except possibly consider how you can prevent this from > occurring again (some form of RAID with error correction could possibly > help). # TL;DR - A single file was corrupted (either file or the checksum) and BTRFS did it's thing, thus it became an I/O error. - Removing the file and running `emerge --sync` again resolved the issue. - It would be nice if verification code could print out which files have failed the verification. # Root causing I had some time over the weekend to look into this on my side. My portage tree is located in /var/db/repos/gentoo, which is the / partition of the SSD. Host seems to work properly otherwise, mount is RW. In the end I managed find the issue that causes this failure, not sure about the root cause for that failure though. What I did: - Ran `emerge-webrsync`. I was thinking maybe there is a portage tree corruption and this will fix it. It did not. - Ran `btrfs check`, which returned no errors. - Ran `smartctl -t short /dev/sda3`, which returned no error. My SSD's TBW and age numbers are also well within their useful life. - Ran `btrfs scrub start /dev/sda3`, which found 2 uncorrectable errors: $ dmesg | grep "checksum error at" | cut -d\ -f24- | sed 's/.$//' 4096, links 1 (path: var/db/repos/gentoo/app-arch/dpkg/Manifest 2341, links 1 (path: var/db/repos/gentoo/app-arch/dpkg/Manifest Running $ less /var/db/repos/gentoo/app-arch/dpkg/Manifest gave me a read error. I'm not sure how this file got corrupted. Maybe it happened due to a power outage or a cosmic ray or something else. Removing the file and rerunning portage tree sync fixed things. So in the end portage was correct to fail the verification, but it was more cryptic then it should be. Rather then a full stack trace maybe bubbling up an exception and catching it to print a proper error message would be more useful. |