Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 139411 - chown utility looks up numeric ids which causes hang in bootmisc with "chown 0:0"
Summary: chown utility looks up numeric ids which causes hang in bootmisc with "chown ...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: High critical (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
: 142626 142790 (view as bug list)
Depends on:
Blocks:
 
Reported: 2006-07-06 05:00 UTC by Torsten Kurbad
Modified: 2006-08-08 17:02 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
coreutils-numeric.patch (coreutils-numeric.patch,7.43 KB, patch)
2006-08-08 09:04 UTC, Doug Goldstein (RETIRED)
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Torsten Kurbad 2006-07-06 05:00:41 UTC
Hi there,

I have a reproducable bug that already occurred on two of my x86 servers:

/etc/init.d/bootmisc

as found in

sys-apps/baselayout-1.12.1

hangs on boot in line 115, while doing

chown 0:0 /tmp/.{ICE,X11}-unix

It's possible to continue the boot process by pressing <ctrl>-c, but that shouldn't be a permanent solution.
I just wonder, if this line is really necessary. The X11 directories are freshly created on wipe as well as on clean, and during the boot process, this is obviously done by root, thus by uid:gid 0:0.
Why then this chown anyway?

Looking forward to be enlightened
Torsten
Comment 1 SpanKY gentoo-dev 2006-07-06 17:00:03 UTC
if a simple chown is hanging, you've got bigger problems

the chown/chmod are sanity checks to make sure we have correct permissions, just like the comment says above the code

i'd suggest you drop the redirect to /dev/null and see if you get any errors ... if not, try adding `strace` before the chown call and see wtf is going on
Comment 2 Torsten Kurbad 2006-07-07 03:42:06 UTC
Thanks for the tipp, strace indeed revealed the problem - and it's not a trivial one:

We use an LDAP directory coupled with a kerberos V password database for authentication of "normal" users against Linux. The nsswitch.conf thus looks as follows:

passwd:      compat files ldap
shadow:      compat files ldap
group:       compat files ldap

In addition there are pam_ldap and pam_krb5 installed - and called by
/etc/pam.d/system-auth

strace shows that chown 0:0 looks for the LDAP entries, but since there are no network interfaces up, it tries indefinetely to resolve the hostname of the LDAP server.
A trivial solution seams to be to substitute chown 0:0 by
chown root:root

Don't ask me, why chown reacts that allergic to numeric IDs.

Anyway, chown root:root seems pretty valid to me (a standard Linux system without a user "root" wouldn't be able to run anything, not even speaking of X11), so can it go to baselayout, please?

Torsten
Comment 3 SpanKY gentoo-dev 2006-07-07 15:01:14 UTC
we use 0:0 because of stupid *BSD systems that map the root group to gid 10 and the wheel group to gid 0

seems odd that numeric id's are looked up in the database though ... is there a case where this would make sense (i cant think of one) ?
Comment 4 Torsten Kurbad 2006-07-09 03:48:12 UTC
Ouch, *BSD - yes I know the problem with groups. Have two FreeBSD servers running myself.

I also don't know, why numeric IDs are looked up and named ones not. I guess, there's something wrong with the way, nss_ldap handles numeric IDs. There was a revision bump lately, maybe the error lies there.
Question is, whether it depends on one of the patches applied in the nss_ldap ebuild, or whether the good folks at PADL software built the sh*t directly into their software.
I will examine that further on monday.
Comment 5 Torsten Kurbad 2006-07-11 02:33:58 UTC
I somehow narrowed the problem down.
The standard bootmisc works smoothly up to

sys-auth/nss_ldap-2.39-r1

The next version in portage is:

sys-auth/nss_ldap-2.49

And this one as well as all following ones tries to resolve the numeric IDs.
My knowledge of C is insufficient to see what leads to that change, so perhaps someone else should have a look...
Comment 6 SpanKY gentoo-dev 2006-07-14 21:28:16 UTC
np, we have a dev who "loves" this package ;)
Comment 7 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2006-07-17 18:56:40 UTC
vapier: I'm looking at it, but could you also please look at coreutils for chown? By default it does a getpwnam/getgrnam on the arguments that are passed in. If  this fails (returning NULL instead of a pointer to a struct), they try to convert the argument to a numeric value. I'm interested if this behavior has always been this way?

The getpwnam/getgrnam would always go to your configured NSS source, so this may be a repeat of the lookup delay issues:
Torston: could you please try nss_ldap-250-r1 and see EXACTLY how long the delays are? (250-r1 changes the timeout behavior of nss_ldap on purpose).
In 249, it shouldn't actually be a hang, but a very long timeout (nearly 5 minutes for each lookup).
Comment 8 Torsten Kurbad 2006-07-20 04:13:30 UTC
Ok, boys, I measured the timeouts. They appear to be exactly 30 seconds with 250-r1. That's indeed not a very long time compared to BIOS posts, SCSI-detection, etc... Anyway, there must be a way to keep nss_ldap from even looking if there is none of the network devices up yet (which is definitely the case during that early boot stage)
Comment 9 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2006-07-20 08:56:12 UTC
ok, if they are 30 seconds with 250-r1, then they were definetly significently longer on older versions.

spanky: in chown from coreutils, could the logic possibly be changed to see that the passed in value is numeric instead of trying to look it up and only after that fails converting to numeric? The file you'd need to change is ${S}/lib/userparse.c

Torsten: the problem is that you can't differentiate between the remote LDAP server being totally down, and the local network being down. Both cases are the same effective error returned to Linux.
Comment 10 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2006-08-03 11:29:47 UTC
*** Bug 142626 has been marked as a duplicate of this bug. ***
Comment 11 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2006-08-03 12:19:30 UTC
base-system: please read the summary below, and fix coreutils asap.

I was asked why this doesn't seem to behave. here's a short summary of what happens:
1. user calls 'chown 0:0 foo'
2. chown splits this into two STRINGS, user="0", group="0"
3. chown (via some code in the lib/ portion of coreutils), does getpwnam("0"), getgrnam("0").
4. this causes NSS to go and look for a user and group with a NAME of "0". Notice not a number of zero, but a string name of "0".
5. NSS checks files for a user/group named "0". Finds nothing.
6. NSS checks ldap for a user/group named "0". LONG delay happens here because the LDAP server (if local) is not yet started or (if remote) is not yet accessible (networking isn't up).
7. chown code decides that if nothing was found so far, try to convert it to a number. This succeeds, and the chown is actually done at this point.

#7 needs to move way up, to realize that the input is a numeric value, and not a string, and should not be looked up at a name.
Comment 12 Jimmy.Jazz 2006-08-03 16:02:48 UTC
(In reply to comment #11)

I agree with your conclusion but upgrading nss_ldap didn't give any improvement :(

As you  asked me to upgrade to nss_ldap-250-r1 and time the exact delay until the ldap request timed out, i'm able to confirm the delay for the lookup is ... longer than 30 minutes. That is further then the 30 seconds awaited. 
Tired to wait for a none coming response, i finally stopped the process. So, chown has never returned and you will certainly be disappointed by the following result:

+ mkdir -p /tmp/.ICE-unix /tmp/.X11-unix
+ date
jeu ao
Comment 13 Jimmy.Jazz 2006-08-03 16:02:48 UTC
(In reply to comment #11)

I agree with your conclusion but upgrading nss_ldap didn't give any improvement :(

As you  asked me to upgrade to nss_ldap-250-r1 and time the exact delay until the ldap request timed out, i'm able to confirm the delay for the lookup is ... longer than 30 minutes. That is further then the 30 seconds awaited. 
Tired to wait for a none coming response, i finally stopped the process. So, chown has never returned and you will certainly be disappointed by the following result:

+ mkdir -p /tmp/.ICE-unix /tmp/.X11-unix
+ date
jeu aoû  3 21:10:54 MEST 2006
+ chown 0:0 /tmp/.ICE-unix /tmp/.X11-unix

Ctrl+c (30 minutes is really really time consuming :))

Moreover i didn't set the idle_timelimit in /etc/ldap.conf and let it simply to its default value (certainly 3600 seconds).

Jj

Comment 14 Jimmy.Jazz 2006-08-03 16:30:57 UTC
(In reply to comment #11)

It's late, it's time for me to go bed. Tomorrow is an other working day ;)

I forgot to add the timings you mentioned so i did it in an other test but without more success.

#cat /etc/ldap.conf
...
nss_reconnect_tries 4                   # number of times to double the sleep time
nss_reconnect_sleeptime 1               # initial sleep value
nss_reconnect_maxsleeptime 16   # max sleep value to cap at
nss_reconnect_maxconntries 2    # how many tries before sleeping
# This leads to a delay of 15 seconds (1+2+4+8=15)

After replacing chown 0:0 with chown root:root, bootmisc doesn't lock anymore.

Definitely you were right.

Good night

Jj
Comment 15 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2006-08-04 07:47:54 UTC
*** Bug 142790 has been marked as a duplicate of this bug. ***
Comment 16 SpanKY gentoo-dev 2006-08-04 08:52:56 UTC
sorry, but this is by design and is required by spec:
http://www.opengroup.org/onlinepubs/009695399/utilities/chown.html

OPERANDS

    The following operands shall be supported:

owner[:group]
    A user ID and optional group ID to be assigned to file. The owner portion of this operand shall be a user name from the user database or a numeric user ID. Either specifies a user ID which shall be given to each file named by one of the file operands. If a numeric owner operand exists in the user database as a user name, the user ID number associated with that user name shall be used as the user ID. Similarly, if the group portion of this operand is present, it shall be a group name from the group database or a numeric group ID. Either specifies a group ID which shall be given to each file. If a numeric group operand exists in the group database as a group name, the group ID number associated with that group name shall be used as the group ID.


what this means is that if you have "0" as a username, then the uid associated with that username will utilized rather than the numeric uid 0

so add this to the end of your /etc/passwd:
 0:x:3456:3456::/:/bin/false
then run:
 touch foo
 chown 0 foo
 stat -c%u foo
notice how the output is uid 3456, not uid 0
Comment 17 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2006-08-05 10:48:21 UTC
Spanky: a LOT of other stuff in the system forbids numeric values as usernames.

# useradd -u 3456 -g 100 -s /bin/false 0
useradd: invalid user name '0'

(add it manually now instead)
# echo "0:x:3456:100:testcase:/tmp:/bin/false" >>/etc/passwd

(now show how getent is broken)
# getent passwd 0
root:x:0:0:root:/root:/bin/bash

The one alternative to not fixing this is to write a service that rotates the correct nsswitch into place at the correct time, which isn't an easy task. (uberlord tried a few variation ideas on it i know).

A different alternative would be to find a chown-like tool that can explicitly be told that it's input is a numeric uid/gid and should not be looked up otherwise.
Comment 18 SpanKY gentoo-dev 2006-08-05 13:38:41 UTC
> a LOT of other stuff in the system forbids numeric values as usernames.

what's your point ?  chown has a spec that is accepted by everyone, it is certainly not our place to go changing that behavior

is said behavior stupid ?  certainly is imho, but it's in the spec, thus it will always retain that behavior until the POSIX/IEEE/whoever changes their mind

> A different alternative would be to find a chown-like tool that can explicitly
> be told that it's input is a numeric uid/gid and should not be looked up
> otherwise.

what i was thinking of was asking the coreutils guys what they thought of a flag to chown/chgrp that explicitly forces numeric ids to not be looked up ... like a -n flag or something
Comment 19 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2006-08-05 14:52:56 UTC
+1 on the -n numeric flag. I'll even code it if they like the idea.
Comment 20 SpanKY gentoo-dev 2006-08-05 15:23:45 UTC
or change bootmisc to 'use net'
Comment 21 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2006-08-05 15:30:26 UTC
'use net' does solve it for those with a local LDAP server, and is also a conflict with runlevels, since bootmisc is in boot, and net is in default.
Comment 22 SpanKY gentoo-dev 2006-08-05 17:36:23 UTC
fixed in svn by dropping the chown as it is just a sanity check

this will cause problems for people who run the `mkdir` as a non root user, but then again in that case the `chown` would have failed anyways as non-root users cannot chown to 0:0
Comment 23 Doug Goldstein (RETIRED) gentoo-dev 2006-08-08 09:04:02 UTC
Created attachment 93767 [details, diff]
coreutils-numeric.patch

Adds -n and --numeric to chown and chgrp. I tried to change as little code as possible.

I'm sure you won't like it Spanky.