(1) in `__syscall_egetcwd()', line #151: `st' is used uninitialized... should be initialized with "/". (2) in `recurser()', line #131 `path_buf' is overwritten with "/.." ... but should have "/.." appended.
Something like this: ----- Index: getcwd.c =================================================================== --- getcwd.c (revision 125) +++ getcwd.c (working copy) @@ -128,7 +128,7 @@ if (strlen(path_buf) + 4 > path_size) { goto oops; } - snprintf(path_buf, 4, "/.."); + snprintf(path_buf + strlen(path_buf), 4, "/.."); if (recurser(path_buf, path_size, root_dev, root_ino) == 0) return 0; @@ -147,6 +147,8 @@ size_t olderrno; olderrno = errno; + if (lstat("/", &st) < 0) + return -1; len = -1; cwd = recurser(buf, size, st.st_dev, st.st_ino); if (cwd) { @@ -175,6 +177,7 @@ if (path == NULL) return NULL; } + snprintf(buf, 2, "."); ret = __syscall_egetcwd(path, alloc_size); if (ret >= 0) { @@ -182,6 +185,7 @@ buf = realloc(path, ret); if (buf == NULL) buf = path; + printf("getcwd (%d) = '%s'\n", strlen(buf), buf); return buf; } if (buf == NULL) @@ -197,11 +201,12 @@ __set_errno(0); tmpbuf = getcwd(buf, size); - if (tmpbuf) { + if (tmpbuf) lstat(buf, &st); - } else { + else if (ENAMETOOLONG == errno) + return __egetcwd(buf, SB_PATH_MAX); + else return tmpbuf; - } if (errno) { /* If lstat() failed with eerror = ENOENT, then its ----- This btw still only have the 4096 max path it can return ...
Anyhow, besides the path size limit, does the first 3 hunks look like the fixes you suggested?
For standard usage, yes. Problem is still PATH_MAX... at the point the test in bug #21766 fails, `cwd_len' is somewhere between (PATH_MAX + 1) and (PATH_MAX + 9)... The problem with `egetcwd()' in sandbox 1.2.10 is it tries to stat() every directory component relative to "/" on its way back... in the outermost recursion it's slightly above PATH_MAX... and fails... The generic one from glibc, on the other hand, uses paths relative to ".", basically a gigantic chain of "../" + currently searched directory name. That means, unless the average component length is less than two, it can handle more than PATH_MAX... in the scenario with `confdir3' (8 bytes + '/'), limit is 12267 (give or take a few, depending on how long the parts before `/confdir3/...' are, but definetely plenty enough). (unfortunately, there's a small error in getcwd.c that looks like is sitting there forever... when it runs out of "../.."s, assuming `dots[]' is "../../../../..", after extending it becomes "../../../../..../../../../.." (note missing slash)... limiting depth to 76. Looking at older sandbox versions, `dots[]' is a bit longer there, but is it enough?)
Created attachment 63103 [details, diff] getcwd_glibc.diff
Created attachment 63103 [details, diff] getcwd_glibc.diff Here's a patch for glibc getcwd.c that removes said limit.
Ok, here is the thing ... I have tried with longer dot list ... and it always fails like you said around 12 thousand. The problem however is that at least on amd64 (can't remember with my p4) the limit is not long enough (need about 13 thousand or 14 ...). Anyhow, somewhere getting too engrossed in finding a generic version, I forgot what the whole issue was about. The issue really was that on some (still an issue?) 2.4 kernels, the getcwd syscall fails before it even gets to PATH_MAX, but do not return an error. The returned cwd is however not valid .. thus the lstat I do. So basically that test is just to get the getcwd() issue on 2.4 resolved, and not to get cwd to handle that long paths. If we anyhow do it differently than normal, coreutils/whatever will any expect something not valid. It would help if we could handle it, but currently any implementation is just _too_ slow.
Sorry, I'm still not convinced. Isn't PATH_MAX 4096 on linux, always? I checked on my ia64 box (64K page size) and amd64 (4K page size, sorry only available option) but PATH_MAX is always 4096. I even compiled old 2.4 kernels on both - same result - coreutils test stops at 4100. I still think the problem is in sandbox.
[Chris... I thought you said you studied bug #21766...] Even though coreutils is only marginally related, here's a short summary, just this once... Once upon a time there used to be kernels whose `getcwd()' syscall would return a truncated path (elements missing at the beginning) if invoked from a path longer than PATH_MAX (or rather, PAGE_SIZE) without setting `errno'. All kernels from 2.4.21 on set errno=ENAMETOOLONG in this case (note that we're talking about the buffer passed to `__d_path()' here, which is always PAGE_SIZE big, and how paths deeper than that are handled. The `getcwd()' syscall itself fails, and always did, with errno=ERANGE in case it is called with an insufficiently sized buffer, /if/ pathlen itself is still within limits). Now, here's what coreutils does during `configure' (comments added by me) /* [snip] */ fail=0; cwd_len=strlen(cwd); while (1) { /* [snip] */ /* cwd_len internally keeps track how long cwd is... */ cwd_len += 1 + strlen (DIR_NAME); /* create and change to DIR_NAME ("confdir3") */ if (mkdir (DIR_NAME, 0700) < 0 || chdir (DIR_NAME) < 0) { /* failing to do so means the whole test failed... bad. (*1) */ fail = 1; break; } if ((c = getcwd (buf, PATH_MAX)) == NULL) { /* getcwd eventually returns NULL if path gets too long... good. */ /* (although it really should check errno, like the original * comment suggests...) */ break; } if ((len = strlen (c)) != cwd_len) { /* getcwd returns non-NULL, but result is too short... means we're * running on linux 2.4.20 or older... bad. */ fail = 1; break; } ++n_chdirs; if (PATH_MAX < len) /* * non-NULL path and len > PATH_MAX means getcwd can actually * return longer paths... good. * * (actually that can never happen because getcwd() is called with * a buffer that is only PATH_MAX big) */ break; } /* * The real problem in #21766 comes next, and is triggered regardless * whether `getcwd()' is good or bad... we're still sitting in a * path that is (PATH_MAX+n) long (n=1..9), but in case the mkdir() * succeeded and chdir() failed above (*1), we're trying to unlink() * DIR_NAME, just to make sure... */ unlink (DIR_NAME); /* [snip] */ Since sandbox hooks `unlink()' and its argument is relative, it tries to calculate rp = `getcwd` + "/" + "confdir3". First case is obvious, in the olden days when `erealpath()' was using the kernel `getcwd', and `getcwd' returns a truncated path rp is not a place where sandbox allows writing, hence the access violation. On arches where PAGE_SIZE is bigger than 4K, errno is ERANGE and rp became "<empty string>" + "/" + "confdir3", with good and bad kernels (actually I've seen that happen on x86 and good kernels too, but never got around to investigate). The easiest hack (what sandbox currently does) is provide a `getcwd()' that can return paths slightly longer than PATH_MAX (+9 should always be sufficient, see above). I suggested the one from glibc a while back, now it's uclibc's (differenly limited and a bit slower, but since there's no need to do anything with the current directory, only its parent, it would still work as long as the parent fits into PATH_MAX). However... I absolutely agree with Martin that in the long run the coreutils test should be modified (may not be accepted upstream because it's sandbox specific, and thus means more work for Gentoo maintainers, but personally I like clean solutions...) My suggestion: split if (mkdir (DIR_NAME, 0700) < 0 || chdir (DIR_NAME) < 0) into two parts and `rmdir(DIRNAME)' if `chdir()' fails, then it's safe to remove the `unlink()' call that is causing all the trouble. Something like if (mkdir (DIR_NAME, 0700) < 0) { /* fail */ } if( chdir (DIR_NAME) < 0) { rmdir (DIR_NAME); /* fail */ } and get rid of the unlink call.
Ok, I wanted to take a shot at explaining, but seems like Barton did nicely. Anyhow, just for the record, the getcwd test goes over 12000: $ ./getcwd-test path size = 12312
Created attachment 63296 [details] getcwd-test.c Source for above test (basically the getcwd test for coreutils).
That is all nice and dandy, but coreutils doesn't get that deep, ever. Bartron, could you please settle this for us one last time?
That depends a bit what problem you're trying to argue... If it's `getcwd()' can't deal with such huge paths, yes, you have a point. If it's whether a configure test can end up being stuck in there... well, comment #9 has an example where exactly that happens... Unlike the test in coreutils 5.2.1, this test just goes on making directories like mad, with 3 checkpoints and only 2 calls to getcwd() along the way... (1) first iteration when `cwd_len' becomes >= PATH_MAX... - fails if c==NULL and ENOENT - fails if c!=NULL or (c==NULL and errno is neither ERANGE nor ENAMETOOLONG) (2) first iteration when `cwd_len' becomes >= dotdot_max + initial_cwd_len - fails if c==NULL, but not if ENAMETOOLONG - fails if c!=NULL, but unexpected length of return value. (3) next iteration from (2), quits loop with fail=0. At (1) I'm getting c=NULL, errno=ENAMETOOLONG, cwd_len=4098... passed. At (2), c=NULL, errno=ENAMETOOLONG, cwd_len=12333... passed. It breaks out at (3), with cwd_len=12342... test successful, getcwd() called only two times (both returned NULL), and here the test is hanging in a working directory that is 12342 bytes long... So I'd guess you're both right... (a) `getcwd()' usually can't deal with a path that long (and if it did, comment #9 would fail at point(1) above), and (b) the attachment in comment #9 is an example of a configure test that is not really sandbox friendly.
(In reply to comment #10) > That is all nice and dandy, but coreutils doesn't get that deep, ever. Bartron, > could you please settle this for us one last time? > Yes, you are right (besides the configure test) ... but what exactly do you expect by from argueing about this? I am probably missing something, but I do not really see what you think should happen?
My problem is, we have an ebuild doing some acrobatics with huge paths in its configure stage (not easily rewritten) just like coreutils. Until sandbox 1.2.9, it just worked. Heavy access violations in 1.2.9 and 1.2.10 (sandbox translates /very/huge/path/to/filename to /filename). With sandbox 1.2.11 (unstable) it seems to work again but it prints "LONG PATH" warnings (in yellow on white) and proceeds with noncanonical arguments (doesn't that mean no sandbox protection at all?) The old variant did not have these problems and our tests have shown it is also five times faster (we measured how many calls to getcwd in 5 secs with old version, then did the same number of calls with new version - 25.3 secs). Why use inferior code?
Private ebuild ? I guess I do not mind changing it back besides the fact that its ugly code and more difficult to port :-) Bartron .. you dont have something laying around that is a bit faster ? ;) Or if not, what was the suggested fix for the old code again ?
PS: are you guys using an older kernel? As the generic version should only be called if the getcwd() call do not fail, but return an invalid path (as Barton explained in comment #7 ) .... Also, sure, we do not 'protect' in that case, but if you actually use the generic code for the test example above, it takes _minutes_ and not just a few seconds longer. Then for the warning .. it should only be one per version .. how do you create the dirs, and why?
A patch that allows a maximum of 1365 (= (PATH_MAX-1) / 3) directories is here http://bugs.gentoo.org/attachment.cgi?id=63296 (or at least, close to that, or assuming the root entry is no longer than 2, next one is no longer than 5, and so forth...) The problem I see is, this is not the only restriction. 1. The test in comment #9 goes even further than that (by at least two)... I'd guess it's not impossible to come up with an algorithm that can handle that too (once a limit is reached, chdir to farthest dir within reach and continue from there, with lots of precautions to restore current working dir under all sorts of error conditions)... but it really does get slow if it's called too often. (tar-1.15 test would take almost 2 minutes even on moderately fast hardware) 2. Sandbox internally uses fixed size buffers of `SB_PATH_MAX' (=PATH_MAX*2) in size at times... what should the maximum be here? In the example above, max depth 1365, avg. dir name length 8, the theoretical maximum path length is already PATH_MAX*3. With an avg. dir name length of 17, it would be PATH_MAX*6, and so on.
Err... wrong URL, should be http://bugs.gentoo.org/attachment.cgi?id=63103 (sorry about that)
Thanks. I still would have liked to know what exactly they are doing, as the getcwd() arobatics we do in sandbox have nothing really to do with the getcwd() call the program does (or any other call for that matter), except if we restrict the app when we should not have.
Marking as reminder, as already pointed out, the one bug was fixed, the code is only used on some older 2.4 kernels, and except if we can really get something speedy, its not worth using a generic implementation if the regular getcwd() fails with ENAMETOOLONG.