Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 698072 - sys-apps/openrc - /etc/init.d/fsck fails on absent/non-essential partitions cause system to be unusable
Summary: sys-apps/openrc - /etc/init.d/fsck fails on absent/non-essential partitions c...
Status: UNCONFIRMED
Alias: None
Product: Gentoo Hosted Projects
Classification: Unclassified
Component: OpenRC (show other bugs)
Hardware: All Linux
: Normal normal with 1 vote (vote)
Assignee: OpenRC Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-10-20 03:08 UTC by augustin
Modified: 2020-12-12 04:16 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description augustin 2019-10-20 03:08:42 UTC
I copy here the relevant sections from:
https://linux.overshoot.tv/wiki/openrc_and_fsck

It covers two different but closely related scenarios, that cause the boot process to fail, leaving the user perplexed as to what happened, in situations where the boot process could very well have completed.


================================================
fsck fails with a non-system critical partition
================================================

The main problem we are dealing with here is that OpenRC can fail critically when failing to check some drives or partitions that are not critical for the system to run. The root partition is fine and can be mounted normally, and possibly even the /home/ partition, but a removable media, a file storage media or another non-critical media (i.e. a partition without which the system can run fine) which causes fsck to fail may cause the boot process to fail altogether, with OpenRC refusing to properly mount the root system.

In such a situation, the only solution is to boot the machine with a live CD or a rescue bootable removable device, manually mount the system's root device, and manually edit /etc/fstab to remove the offending entry, before rebooting the system normally, and only then deal with the failing media.

The above scenario may happen when fsck fails to repair a given partition.

The way OpenRC handles such non-system-critical problems is not ideal. It makes diagnostic and recovery more difficult than it should be.

The most critical problems currently are:
- when fsck fails, the user is dropped to a login shell. The user may have turned the computer on, gone away for a while, and come back to a login shell where she expected to find the usual Desktop Environemnt login screen. There is no indication on the screen on why the regular boot process failed.
- The root partition was never mounted, so nothing was even logged so the user cannot investigate anything. He has to reboot and try as best as he can to follow the boot output and figure out where it fails.
- The root partition is mounted read only. It is expected that the user will now how to remount the root partition in write mode, so that he can modify /etc/fstab to (at least temporarily) fix the problem, and be able to boot the system. That's of course, assuming that the user has already identified what the problem is.

There should be two important goals:
- Make sure that the user/administrator is probably aware that a partition failed fsck.
- Allow diagnostic and recovery to be as painless as possible.

The partition may not be system critical, but the data within is probably important to the user. Thus, it is not enough to rely on logging only. At the same time, the system should be able to continue with the boot process, so that the user can straight away access the critical system files like /etc/fstab and deal with the problem.

The best solution would thus probably to directly drop into interactive mode whenever fsck encounters a critical problem with any drive.
OpenRC should indicate what drive failed fsck, what the error is, and then then wait for user input to continue the boot process, leaving the failing drive unmounted, but mounting and starting the system itself.
Thus the user is properly informed, but he can also easily complete the boot sequence and he can use a fully operational system to deal with the actual problem.

================================================
fsck fails with unplugged devices
================================================

The problem described is this section is the same as the one described in the previous section, but the conditions that trigger it make it all the more unacceptable that the boot process was not allowed to complete.

In this situation, a removable device that was not plugged in, and that the user did not intend to have plugged in, and that is absolutely not necessary for the system to properly complete the boot process and to run normally, caused the system to be unusable, with the boot process interrupted, and the root partition mounted read-only, and the user dropped to a root login shell, without any indication of what went wrong. Again, the solution was to reboot the system with a live CD or a rescue botable removable device, and manually edit /etc/fstab, which requires again, of course, that the user has properly guessed the root cause of the problem.

The situation is thus: the user has an entry in /etc/fstab for a vfat removable usb stick:
UUID=AAAA-BBBB /media/my_usb_drive  vfat user,noauto 0

At the time of booting, the given drive was not plugged in, was not intended to be plugged in, and is not necessary at all for the system to run.
Arriving to a proper diagnostic was difficult because, as in the previous section, nothing is logged because the root filesystem never gets mounted for logs to be written. The error message only flashes very quickly on the screen before the user is dropped to a login shell. Only after several attempts at booting, making a video recording of the screen output, and analysing the video, allowed the user to figure out what the problem was. (Thankfully, nowadays, smartphones with video recording capabilities are commonly available!)

fsck failed with the following error message (manually copied from the video recording, since no logs exist):

fsck.fat 4.1 (2017-01-24)
open: No such file or directory.
* Filesystems couldn't be fixed.
* rc: Aborting!
* fsck:caught SIGTERM, aborting.
INIT: Entering runlevel 3
Comment 1 Joseph 2020-12-12 03:43:16 UTC
I have the same happen to me, this is a new installation (maybe 10-days old)
I was upgrading kernel, when I rebooted I ended with command line prompt login.
And "/" root partition is only mounted in read-only mode.

When I try to start any service I get eg:

When I try to start the network I get:
fsck.fat 4.1 (2017-01-24) open: no such file or directory
Filesystems couldn't be fixed
ERROR: fsck failed to start

It might be related to Vfat boot partition (I don't know). I check the root partition with Gparted and it checked OK. Boot "Vfat" partition failed.  But I can my boot-strap the system and access Boot partition normally.

my fstab:
LABEL=boot		/boot		vfat		noauto,noatime	1 2
UUID=d32946b3-2236-4998-80dd-68b7d78e0c7b  	/	ext4	noatime	0 1
LABEL=swap		none		swap		sw		0 0

I don't know what is the fix. The fstab looks OK.