13320 – checkroot & checkfs don't detect fsck error codes

Bug 13320 - checkroot & checkfs don't detect fsck error codes

Summary: checkroot & checkfs don't detect fsck error codes

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] Core system (show other bugs)
Hardware:	All All

Importance:	High normal
Assignee:	Martin Schlemmer (RETIRED)

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2003-01-05 15:57 UTC by Malcolm Scott
Modified:	2003-07-19 18:05 UTC (History)
CC List:	1 user (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
Patch for checkfs (checkfs.patch,376 bytes, patch) 2003-01-26 08:47 UTC, Malcolm Scott	Details \| Diff
Patch for checkroot (checkroot.patch,328 bytes, patch) 2003-01-26 08:47 UTC, Malcolm Scott	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Malcolm Scott 2003-01-05 15:57:10 UTC

When booting just now, I noticed that my root partition's maximum mount count
had been reached, and the filesystem underwent a thorough check. At completion,
I noticed fsck print the message "**** REBOOT LINUX ****" on the console, but
the status was labelled as "[ ok ]" and the bootup process continued.

According to the fsck man page, that message should have been accompanied by a
return code of 2. I've looked at the checkroot script, and it looks like that
SHOULD have caused 'sulogin' to be run. However, it didn't; I'm not sure why,
but I think this needs investigating - it was only by chance that I happened to
be watching the messages at that point.

-----

On another note, looking at the checkroot script, the message that would have
been printed before 'sulogin' was run reads "Filesystem couldn't be fixed :(" -
this is incorrect, since error 2 means "Filesystem errors corrected; system
should be rebooted". Error codes >=4 mean that the filesystem wasn't fixed.

Comment 1 Martin Schlemmer (RETIRED) gentoo-dev

2003-01-07 13:14:02 UTC

Ok, scripts choose the easiest way, and just drop to sulogin for anything
not ok, or 100% fixed.  We can enhance this if really needed.

Then, the fsck issue ... its either going to be a bash issue (not returning
the correct return code for commands executed), or glibc (as it have the
exec*() functions), or fsck issue specific.   Any chance you can try to
get it to return 2 again in console, and check with 'echo $?' if it really
did return 2 ?  Far fetched, I know :/

I will try to have a look at fsck's source some time, and have a look if
it is fine.

What version of e2fsutils ?

Comment 2 Malcolm Scott 2003-01-07 16:25:41 UTC

Umm, without attacking my hard disk with an axe? :-) I'll see what I can do...

I'm using e2fsprogs-1.32-r2 .

Comment 3 Martin Schlemmer (RETIRED) gentoo-dev

2003-01-08 14:34:52 UTC

Sorry, but much I ask, I know :/  Iike I said, ill to check the source for
problems as soon as I get a chance ...

Comment 4 Martin Schlemmer (RETIRED) gentoo-dev

2003-01-26 03:25:45 UTC

Just had a thought ... what filesystem ?  It could be that its not ext2/3, and
the real binary fsck calls do not return the correct return code ...

Comment 5 Malcolm Scott 2003-01-26 07:50:12 UTC

It's ext3.

Comment 6 Malcolm Scott 2003-01-26 08:46:42 UTC

Aha! Got it. The fault is with the checkroot/checkfs scripts, in the section
where the code reads:

                if [ "$?" -eq 0 ]
                then
                        eend 0
                elif [ "$?" -eq 1 ]
                then
                        ewend 1 "Filesystem repaired"
                else

The first 'if' statement causes the value of '$?' to change to either 0 or 1,
depending on whether the first 'if' test was successful. So the program will
never enter the 'else' section. The following fixes it:

                return=$?
                if [ $return -eq 0 ]
                then
                        eend 0
                elif [ $return -eq 1 ]
                then
                        ewend 1 "Filesystem repaired"
                else

I'll attach patches for both files shortly.

Comment 7 Malcolm Scott 2003-01-26 08:47:29 UTC

Created attachment 7647 [details, diff]
Patch for checkfs

Comment 8 Malcolm Scott 2003-01-26 08:47:46 UTC

Created attachment 7648 [details, diff]
Patch for checkroot

Comment 9 Martin Schlemmer (RETIRED) gentoo-dev

2003-01-26 09:50:02 UTC

Bleh, you are right.  Good eye, thanks!

Fixed in CVS.

Comment 10 Malcolm Scott 2003-02-06 13:52:01 UTC

This change doesn't seem to have made it into baselayout-1.8.6.2 ...

Comment 11 1 2003-07-19 18:05:57 UTC

Martin:
> Ok, scripts choose the easiest way, and just drop to sulogin for anything
> not ok, or 100% fixed.  We can enhance this if really needed.

I think it *is* needed. The wrong message "Filesystem couldn't be fixed" is
shocking and misleading many people, when it actually just requires a reboot.
For some people, ext3 seems to produce this message on *each* check (there
are some threads in the forums about it, e.g.
<http://forums.gentoo.org/viewtopic.php?t=51324>).

Just printing another message if the return code is 2 would be enough, IMO.