Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 171079 - sys-libs/glibc-2.5 - string returned by dirname(argument) is modified if the argument is modified
Summary: sys-libs/glibc-2.5 - string returned by dirname(argument) is modified if the ...
Status: RESOLVED INVALID
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Gentoo Toolchain Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-03-15 20:33 UTC by Jorge Peixoto de Morais Neto
Modified: 2007-03-19 23:51 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jorge Peixoto de Morais Neto 2007-03-15 20:33:49 UTC
#include <libgen.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(){
#define testpath "/usr/bin/dirname"
  char *path=malloc(sizeof(testpath));
  strcpy(path,testpath);
  char *pathdirname=dirname(path);
  puts(pathdirname);
  free(path);
  puts(pathdirname);
  return EXIT_SUCCESS;
}

The above program should print two identical strings in the screen. But it prints only one. freeing the string path should not modify the string pathdirname. There is nothing on the corresponding manual page (accessible via man 3 dirname) that mentions this. Either the function dirname() must be modified, or its documentation. 

Running under gdb, I can see that path and pathdirname point to the same string. This explains the problem. It seems that the dirname function modifies its argument and returns a pointer to its argument. Let me reiterate that this is not mentioned in the documentation.

Reproducible: Always

Steps to Reproduce:
1.Compile the program above with gcc -Wall -Wextra -o dirname dirname.c
2. run the produced executable
3. Notice that it prints the dirname /usr/bin only once, not twice

Actual Results:  
The program prints the dirname /usr/bin only once, not twice

Expected Results:  
The program should print /usr/bin twice. 

Portage 2.1.2.2 (default-linux/x86/2006.1/desktop, gcc-4.1.1, glibc-2.5-r0, 2.6.19-gentoo-r5 i686)
=================================================================
System uname: 2.6.19-gentoo-r5 i686 AMD Athlon(tm) XP 2600+
Gentoo Base System release 1.12.9
Timestamp of tree: Sat, 10 Mar 2007 21:30:08 +0000
dev-java/java-config: 1.3.7, 2.0.31
dev-lang/python:     2.4.3-r4
dev-python/pycrypto: 2.0.1-r5
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.61
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10
sys-devel/binutils:  2.16.1-r3
sys-devel/gcc-config: 1.3.14
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.17-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O2 -march=athlon-xp -pipe -fomit-frame-pointer -fgcse-after-reload -fweb "
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/X11/xkb /usr/share/config"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/java-config/vms/ /etc/revdep-rebuild /etc/terminfo"
CXXFLAGS="-O2 -march=athlon-xp -pipe -fomit-frame-pointer -fgcse-after-reload -fweb "
DISTDIR="/usr/portage/distfiles"
EMERGE_DEFAULT_OPTS="--nospinner"
FEATURES="autoconfig distlocks metadata-transfer parallel-fetch sandbox sfperms strict"
GENTOO_MIRRORS="http://gentoo.localhost.net.ar/ http://www.las.ic.unicamp.br/pub/gentoo/ http://gentoo.osuosl.org/ http://mirrors.usu.edu/mirrors/gentoo/ http://distro.ibiblio.org/pub/linux/distributions/gentoo/"
LANG="en_US"
LC_ALL="en_US"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/portage/local/layman/initng /usr/portage/local/layman/enlightenment /usr/local/jorge-portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="3dnow 3dnowext 7zip X aac aalib alsa aotuv bash-completion bzip2 cairo cdr cracklib dbus divx dri emacs emacs-w3 encode fam firefox fortran gcj gcl gif glibc-omitfp gmedia gnuplot gpm gstreamer gtk2 gzip-el iconv ipv6 jpeg kdeenablefinal kdehiddenvisibility lesstif libcaca lirc mad matroska midi mikmod mmx mmxext mng moznopango mp3 mpeg ncurses nptl nptlonly nsplugin nvidia offensive ogg opengl pam pcre pdf png ppds pppd psyco python quicktime readline realmedia rtc sdl session sox spell sse ssl svg symlink tcltk tcpd theora tk toolkit-scroll-bars truetype truetype-fonts type1-fonts unicode usb v4l2 vorbis wmp x86 xml xorg xv zlib" ALSA_CARDS="cmipci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INITNG_PLUGINS="also bash_launcher chdir chroot conflict cpout critical cron daemon dev dllaunch envparser find fstat history idleprobe initctl interactive iparser last limit logfile netprobe ngc4 pause pidfile reload renice rlparser simple_launcher stcmd stdout suid syncron syslog unneeded provide" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIRC_DEVICES="inputlirc" USERLAND="GNU" VIDEO_CARDS="nv"
Unset:  CTARGET, INSTALL_MASK, LDFLAGS, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 1 Kevin F. Quinn (RETIRED) gentoo-dev 2007-03-15 22:31:00 UTC
man 3 dirname contains the paragraph:

       Both dirname() and basename() may  modify  the  contents  of  path,  so
       copies should be passed to these functions.  Furthermore, dirname() and
       basename() may return pointers to statically allocated memory which may
       be overwritten by subsequent calls.

Which seems clear enough.  It means you cannot assume what you are assuming about dirname's behaviour.  The manual explicitly says you should make copies of strings passed to dirname(); you must do this if you wish to retain the original string.  Here's the code example from 'man 3 dirname' showing how to do this:

EXAMPLE
              char *dirc, *basec, *bname, *dname;
              char *path = "/etc/passwd";

              dirc = strdup(path);
              basec = strdup(path);
              dname = dirname(dirc);
              bname = basename(basec);
              printf("dirname=%s, basename=%s\n", dname, bname);
Comment 2 Jorge Peixoto de Morais Neto 2007-03-16 00:30:03 UTC
The manual says that the function may modify its argument so that I should pass a copy to it. 
Which I did. 

The manual does not mention that the function returns a pointer to the very argument, so that a modification of the argument results in a modification of the result.

I'm reopening the bug. I don't know if this is the appropriate thing to do. If it is not, tell me. Of course, this is the last time I'll reopen the bug, so if you want it to stay closed, just close it once more. 
Comment 3 Jorge Peixoto de Morais Neto 2007-03-16 00:43:41 UTC
Let me explain why do I think the documentation is bad. 

After reading the documentation, it seemed to me that a good way to work with dirname() would be

char *pathcopy=strdup(path);
if (pathcopy==NULL){
   /*handle_error*/
}else{
char * pathdirname=dirname(pathcopy);
free(pathcopy);
/*do something with pathdirname*/
}

I tried something similar to that, and it did not work, because once pathcopy is freed, pathdirname becomes bogus. This is unpredictable and surprising for someone that has read the documentation. I provided to dirname() a copy of path, just as the documentation asked, but the documentation never told me that I could not free the copy and go on working with the returned result. 


   
Comment 4 Kevin F. Quinn (RETIRED) gentoo-dev 2007-03-16 07:19:59 UTC
You did not provide dirname() with a copy of the path.  All you did was copy the pointer to it.  Copying a pointer does not copy the data - that is elementary C (which is itself a very elementary programming language, you have to do almost everything explicitly in C).  If you look at the example in the man page, it illustrates clearly how to copy the argument using strdup().

Believe me, there's no bug here; everything is behaving as specified, and the manual page clearly describes the behaviour of dirname(), and how to use it - even so far as to provide example code to make things perfectly clear.
Comment 5 Jorge Peixoto de Morais Neto 2007-03-16 10:26:34 UTC
In my first example, I allocated new memory using malloc() and copied the string using strcpy(). This copies the data, not the pointer.

In the second example, I used strdup. strdup copies the data, not the pointer.
Comment 6 Kevin F. Quinn (RETIRED) gentoo-dev 2007-03-16 11:16:15 UTC
Well, you copied testpath to path - that's fine, but you never used testpath afterwards anyway.  You used pathdirname, which you set to a copy of the pointer 'path' (via the result of dirname()).

You seem to be expecting pathdirname to get a copy of the path (including contents), which you won't get.  dirname() certainly isn't going to allocate memory, duplicate the path contents, and return a pointer to the copy.  It's left to you as the caller to manage whether a copy is needed or not.

Instead of:

  char *pathdirname=dirname(pathcopy);

you could do:

  char *pathdirname=strdup(dirname(path)));

after which path is modified (by dirname), pathdirname contains a copy of the result (not just a pointer to the modified path).
Comment 7 Jorge Peixoto de Morais Neto 2007-03-16 12:14:47 UTC
>Well, you copied testpath to path - that's fine, but you never used testpath
>afterwards anyway.
I copied testpath to path because if I called dirname directly on testpath I would probably get a segfault. dirname() should not be called on a string literal, because dirname() may atempt to modify its argument.
>You used pathdirname, which you set to a copy of the
>pointer 'path' (via the result of dirname()).
I'm aware of this. I found this out after running with gdb. It is not mentioned in the man page. 

>You seem to be expecting pathdirname to get a copy of the path (including
>contents)
Yes.
>, which you won't get.  dirname() certainly isn't going to allocate
>memory, duplicate the path contents, and return a pointer to the copy.
It seemed to me that dirname() could do one of three things:
1)allocate memory with malloc, put the result into it, and return a pointer to it.
2)put the result into a static buffer, such as a local array declared with the static keyword, or a global array and return a pointer to this buffer.
3)return a pointer to its argument.
The man page said that the function may return a pointer to statically allocated memory which may be overwritten by subsequent calls. This made me think that possibility 2) was happening. To me, possibility 3) seems quite surprising, and should be mentioned on the man page. In fact, any special behavior of any function should be mentioned in its man page. The programmer should not be required to make tests and/or read the function source code to understand its behavior. And in my opinion, the fact that the returned result is modified if the argument is modified is special behavior. 

>  char *pathdirname=strdup(dirname(path)));

Yes, this is what I am doing now in my code.

Anyway, if you don't see it like me, I'll just ignore this issue. 

Also, I took the liberty of emailing my suggestion to the email I found on the page http://www.win.tue.nl/~aeb/linux/man/ (which I found on the man-pages ebuild). 

Thank you for your attention.
Comment 8 SpanKY gentoo-dev 2007-03-16 23:01:52 UTC
tbh, it doesnt matter if you dislike the behavior of dirname() ... the things you dislike are allowed by spec which means your code needs to account for these things, not the specific implementation:
http://www.opengroup.org/onlinepubs/009695399/functions/dirname.html

while yes, i too agree that the spec sucks for this function, Gentoo is not the forum for such things
Comment 9 Kevin F. Quinn (RETIRED) gentoo-dev 2007-03-19 23:51:15 UTC
(In reply to comment #7)
> It seemed to me that dirname() could do one of three things:
> 1)allocate memory with malloc, put the result into it, and return a pointer
> to it.

This is not something the spec allows for - otherwise it would tell you that you need to free the returned pointer once you are finished with it (or would explicitly say it dynamically allocates the returned memory).  Since it doesn't tell you that, and even says the pointer may reference a static buffer, you cannot free() it.  The logical result is that dirname() therefore cannot malloc memory for the result.

> 2)put the result into a static buffer, such as a local array declared with
> the static keyword, or a global array and return a pointer to this buffer.

This is indeed a possibility.  It's unlikely in a modern implementation, as it implies there is a low length limit to the paths that dirname() can handle (since the static buffer must have a pre-defined size and must be reserved solely for the use of dirname, so is unlikely to be large).

> 3)return a pointer to its argument.

Apart from the static buffer method, this is the only way it can be implemented to conform to the spec (because dynamically allocating memory is not permitted as described above).  My guess is that the authors assumed (correctly or not) that a reader would realise this is possible and has to be catered for by the caller.

Other functions that exhibit the same behaviour off the top of my head include strtok(), strsep(), strfry().  The severity of the warnings in each case varies - but like dirname() they all explicitly say the parameter may be modified (and hence can segfault if passed a string literal).

> Anyway, if you don't see it like me, I'll just ignore this issue. 
> 
> Also, I took the liberty of emailing my suggestion to the email I found on the
> page http://www.win.tue.nl/~aeb/linux/man/ (which I found on the man-pages
> ebuild). 

That's probably best in the end.  We don't maintain the man pages, just integrate them into the distribution (bugging here in the first instance is always best, in case an issue is a result of something we've modified or something else we do).

It's worth noting that the specification for many of these functions is simply an explicit documentation of existing behaviour, written to allow for the existing implementations on many Unix systems, some of which date back to the 1960's

BTW your bug report is appreciated, even if we decide to do nothing as a result.