First Last Prev Next    No search results available      Search page      Enter new bug
Bug#: 139411
Alias:
Product:
Component:
Status: RESOLVED
Resolution: FIXED
Assigned To: Gentoo's Team for Core System packages <base-system@gentoo.org>
Hardware:
OS:
Version:
Priority:
Severity:
Reporter: Torsten Kurbad <torsten@tk-webart.de>
Add CC:
CC:
Remove selected CCs
URL:
Summary:
Status Whiteboard:
Keywords:

Filename Description Type Creator Created Size Actions
coreutils-numeric.patch coreutils-numeric.patch patch Doug Goldstein 2006-08-08 09:04 0000 7.43 KB Details | Diff
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 139411 depends on: Show dependency tree
Bug 139411 blocks:
Votes: 0    Show votes for this bug    Vote for this bug

Additional Comments: (this is where you put emerge --info)


Not eligible to see or edit group visibility for this bug.






View Bug Activity   |   Format For Printing   |   XML   |   Clone This Bug


Description:   Opened: 2006-07-06 05:00 0000
Hi there,

I have a reproducable bug that already occurred on two of my x86 servers:

/etc/init.d/bootmisc

as found in

sys-apps/baselayout-1.12.1

hangs on boot in line 115, while doing

chown 0:0 /tmp/.{ICE,X11}-unix

It's possible to continue the boot process by pressing <ctrl>-c, but that
shouldn't be a permanent solution.
I just wonder, if this line is really necessary. The X11 directories are
freshly created on wipe as well as on clean, and during the boot process, this
is obviously done by root, thus by uid:gid 0:0.
Why then this chown anyway?

Looking forward to be enlightened
Torsten

------- Comment #1 From SpanKY 2006-07-06 17:00:03 0000 -------
if a simple chown is hanging, you've got bigger problems

the chown/chmod are sanity checks to make sure we have correct permissions,
just like the comment says above the code

i'd suggest you drop the redirect to /dev/null and see if you get any errors
... if not, try adding `strace` before the chown call and see wtf is going on

------- Comment #2 From Torsten Kurbad 2006-07-07 03:42:06 0000 -------
Thanks for the tipp, strace indeed revealed the problem - and it's not a
trivial one:

We use an LDAP directory coupled with a kerberos V password database for
authentication of "normal" users against Linux. The nsswitch.conf thus looks as
follows:

passwd:      compat files ldap
shadow:      compat files ldap
group:       compat files ldap

In addition there are pam_ldap and pam_krb5 installed - and called by
/etc/pam.d/system-auth

strace shows that chown 0:0 looks for the LDAP entries, but since there are no
network interfaces up, it tries indefinetely to resolve the hostname of the
LDAP server.
A trivial solution seams to be to substitute chown 0:0 by
chown root:root

Don't ask me, why chown reacts that allergic to numeric IDs.

Anyway, chown root:root seems pretty valid to me (a standard Linux system
without a user "root" wouldn't be able to run anything, not even speaking of
X11), so can it go to baselayout, please?

Torsten

------- Comment #3 From SpanKY 2006-07-07 15:01:14 0000 -------
we use 0:0 because of stupid *BSD systems that map the root group to gid 10 and
the wheel group to gid 0

seems odd that numeric id's are looked up in the database though ... is there a
case where this would make sense (i cant think of one) ?

------- Comment #4 From Torsten Kurbad 2006-07-09 03:48:12 0000 -------
Ouch, *BSD - yes I know the problem with groups. Have two FreeBSD servers
running myself.

I also don't know, why numeric IDs are looked up and named ones not. I guess,
there's something wrong with the way, nss_ldap handles numeric IDs. There was a
revision bump lately, maybe the error lies there.
Question is, whether it depends on one of the patches applied in the nss_ldap
ebuild, or whether the good folks at PADL software built the sh*t directly into
their software.
I will examine that further on monday.

------- Comment #5 From Torsten Kurbad 2006-07-11 02:33:58 0000 -------
I somehow narrowed the problem down.
The standard bootmisc works smoothly up to

sys-auth/nss_ldap-2.39-r1

The next version in portage is:

sys-auth/nss_ldap-2.49

And this one as well as all following ones tries to resolve the numeric IDs.
My knowledge of C is insufficient to see what leads to that change, so perhaps
someone else should have a look...

------- Comment #6 From SpanKY 2006-07-14 21:28:16 0000 -------
np, we have a dev who "loves" this package ;)

------- Comment #7 From Robin Johnson 2006-07-17 18:56:40 0000 -------
vapier: I'm looking at it, but could you also please look at coreutils for
chown? By default it does a getpwnam/getgrnam on the arguments that are passed
in. If  this fails (returning NULL instead of a pointer to a struct), they try
to convert the argument to a numeric value. I'm interested if this behavior has
always been this way?

The getpwnam/getgrnam would always go to your configured NSS source, so this
may be a repeat of the lookup delay issues:
Torston: could you please try nss_ldap-250-r1 and see EXACTLY how long the
delays are? (250-r1 changes the timeout behavior of nss_ldap on purpose).
In 249, it shouldn't actually be a hang, but a very long timeout (nearly 5
minutes for each lookup).

------- Comment #8 From Torsten Kurbad 2006-07-20 04:13:30 0000 -------
Ok, boys, I measured the timeouts. They appear to be exactly 30 seconds with
250-r1. That's indeed not a very long time compared to BIOS posts,
SCSI-detection, etc... Anyway, there must be a way to keep nss_ldap from even
looking if there is none of the network devices up yet (which is definitely the
case during that early boot stage)

------- Comment #9 From Robin Johnson 2006-07-20 08:56:12 0000 -------
ok, if they are 30 seconds with 250-r1, then they were definetly significently
longer on older versions.

spanky: in chown from coreutils, could the logic possibly be changed to see
that the passed in value is numeric instead of trying to look it up and only
after that fails converting to numeric? The file you'd need to change is
${S}/lib/userparse.c

Torsten: the problem is that you can't differentiate between the remote LDAP
server being totally down, and the local network being down. Both cases are the
same effective error returned to Linux.

------- Comment #10 From Robin Johnson 2006-08-03 11:29:47 0000 -------
*** Bug 142626 has been marked as a duplicate of this bug. ***

------- Comment #11 From Robin Johnson 2006-08-03 12:19:30 0000 -------
base-system: please read the summary below, and fix coreutils asap.

I was asked why this doesn't seem to behave. here's a short summary of what
happens:
1. user calls 'chown 0:0 foo'
2. chown splits this into two STRINGS, user="0", group="0"
3. chown (via some code in the lib/ portion of coreutils), does getpwnam("0"),
getgrnam("0").
4. this causes NSS to go and look for a user and group with a NAME of "0".
Notice not a number of zero, but a string name of "0".
5. NSS checks files for a user/group named "0". Finds nothing.
6. NSS checks ldap for a user/group named "0". LONG delay happens here because
the LDAP server (if local) is not yet started or (if remote) is not yet
accessible (networking isn't up).
7. chown code decides that if nothing was found so far, try to convert it to a
number. This succeeds, and the chown is actually done at this point.

#7 needs to move way up, to realize that the input is a numeric value, and not
a string, and should not be looked up at a name.

------- Comment #12 From Jimmy.Jazz@gmx.net 2006-08-03 16:02:48 0000 -------
(In reply to comment #11)

I agree with your conclusion but upgrading nss_ldap didn't give any improvement
:(

As you  asked me to upgrade to nss_ldap-250-r1 and time the exact delay until
the ldap request timed out, i'm able to confirm the delay for the lookup is ...
longer than 30 minutes. That is further then the 30 seconds awaited. 
Tired to wait for a none coming response, i finally stopped the process. So,
chown has never returned and you will certainly be disappointed by the
following result:

+ mkdir -p /tmp/.ICE-unix /tmp/.X11-unix
+ date
jeu ao

------- Comment #13 From Jimmy.Jazz@gmx.net 2006-08-03 16:02:48 0000 -------
(In reply to comment #11)

I agree with your conclusion but upgrading nss_ldap didn't give any improvement
:(

As you  asked me to upgrade to nss_ldap-250-r1 and time the exact delay until
the ldap request timed out, i'm able to confirm the delay for the lookup is ...
longer than 30 minutes. That is further then the 30 seconds awaited. 
Tired to wait for a none coming response, i finally stopped the process. So,
chown has never returned and you will certainly be disappointed by the
following result:

+ mkdir -p /tmp/.ICE-unix /tmp/.X11-unix
+ date
jeu aoĆ»  3 21:10:54 MEST 2006
+ chown 0:0 /tmp/.ICE-unix /tmp/.X11-unix

Ctrl+c (30 minutes is really really time consuming :))

Moreover i didn't set the idle_timelimit in /etc/ldap.conf and let it simply to
its default value (certainly 3600 seconds).

Jj

------- Comment #14 From Jimmy.Jazz@gmx.net 2006-08-03 16:30:57 0000 -------
(In reply to comment #11)

It's late, it's time for me to go bed. Tomorrow is an other working day ;)

I forgot to add the timings you mentioned so i did it in an other test but
without more success.

#cat /etc/ldap.conf
...
nss_reconnect_tries 4                   # number of times to double the sleep
time
nss_reconnect_sleeptime 1               # initial sleep value
nss_reconnect_maxsleeptime 16   # max sleep value to cap at
nss_reconnect_maxconntries 2    # how many tries before sleeping
# This leads to a delay of 15 seconds (1+2+4+8=15)

After replacing chown 0:0 with chown root:root, bootmisc doesn't lock anymore.

Definitely you were right.

Good night

Jj

------- Comment #15 From Robin Johnson 2006-08-04 07:47:54 0000 -------
*** Bug 142790 has been marked as a duplicate of this bug. ***

------- Comment #16 From SpanKY 2006-08-04 08:52:56 0000 -------
sorry, but this is by design and is required by spec:
http://www.opengroup.org/onlinepubs/009695399/utilities/chown.html

OPERANDS

    The following operands shall be supported:

owner[:group]
    A user ID and optional group ID to be assigned to file. The owner portion
of this operand shall be a user name from the user database or a numeric user
ID. Either specifies a user ID which shall be given to each file named by one
of the file operands. If a numeric owner operand exists in the user database as
a user name, the user ID number associated with that user name shall be used as
the user ID. Similarly, if the group portion of this operand is present, it
shall be a group name from the group database or a numeric group ID. Either
specifies a group ID which shall be given to each file. If a numeric group
operand exists in the group database as a group name, the group ID number
associated with that group name shall be used as the group ID.


what this means is that if you have "0" as a username, then the uid associated
with that username will utilized rather than the numeric uid 0

so add this to the end of your /etc/passwd:
 0:x:3456:3456::/:/bin/false
then run:
 touch foo
 chown 0 foo
 stat -c%u foo
notice how the output is uid 3456, not uid 0

------- Comment #17 From Robin Johnson 2006-08-05 10:48:21 0000 -------
Spanky: a LOT of other stuff in the system forbids numeric values as usernames.

# useradd -u 3456 -g 100 -s /bin/false 0
useradd: invalid user name '0'

(add it manually now instead)
# echo "0:x:3456:100:testcase:/tmp:/bin/false" >>/etc/passwd

(now show how getent is broken)
# getent passwd 0
root:x:0:0:root:/root:/bin/bash

The one alternative to not fixing this is to write a service that rotates the
correct nsswitch into place at the correct time, which isn't an easy task.
(uberlord tried a few variation ideas on it i know).

A different alternative would be to find a chown-like tool that can explicitly
be told that it's input is a numeric uid/gid and should not be looked up
otherwise.

------- Comment #18 From SpanKY 2006-08-05 13:38:41 0000 -------
> a LOT of other stuff in the system forbids numeric values as usernames.

what's your point ?  chown has a spec that is accepted by everyone, it is
certainly not our place to go changing that behavior

is said behavior stupid ?  certainly is imho, but it's in the spec, thus it
will always retain that behavior until the POSIX/IEEE/whoever changes their
mind

> A different alternative would be to find a chown-like tool that can explicitly
> be told that it's input is a numeric uid/gid and should not be looked up
> otherwise.

what i was thinking of was asking the coreutils guys what they thought of a
flag to chown/chgrp that explicitly forces numeric ids to not be looked up ...
like a -n flag or something

------- Comment #19 From Robin Johnson 2006-08-05 14:52:56 0000 -------
+1 on the -n numeric flag. I'll even code it if they like the idea.

------- Comment #20 From SpanKY 2006-08-05 15:23:45 0000 -------
or change bootmisc to 'use net'

------- Comment #21 From Robin Johnson 2006-08-05 15:30:26 0000 -------
'use net' does solve it for those with a local LDAP server, and is also a
conflict with runlevels, since bootmisc is in boot, and net is in default.

------- Comment #22 From SpanKY 2006-08-05 17:36:23 0000 -------
fixed in svn by dropping the chown as it is just a sanity check

this will cause problems for people who run the `mkdir` as a non root user, but
then again in that case the `chown` would have failed anyways as non-root users
cannot chown to 0:0

------- Comment #23 From Doug Goldstein 2006-08-08 09:04:02 0000 -------
Created an attachment (id=93767) [details]
coreutils-numeric.patch

Adds -n and --numeric to chown and chgrp. I tried to change as little code as
possible.

I'm sure you won't like it Spanky.

First Last Prev Next    No search results available      Search page      Enter new bug