current stable makes vmware 5.5 and 6 sad, cries something about something not being implemented, so it may in fact be a vmware bug, but since dhcpcd-3.0.19 fixed the problem here, it may be a nice idea to stabilize at least that version
Versions after 3.0.16-r1, whilst proving fairly stable did have some flaws. I would much rather wait another month and get dhcpcd-3.1.3 stable. Also due to the fact I punted 3.0.19 from the tree yesterday, before this bug was filed.
I'm fine with 3.1.3 too, seems like everything above 3.0.16-r1 fixes the problem, adjusting the summary
Great! Target date for stable is 01/09/2007 as that's a whole month as required by our guidelines.
Please restore 3.0.19 to the tree or provide a location where I may go to download it -- I inadvertently deleted my copy, and I do in fact need it. At my client site, I found that version is the only one I can get to actually work. Weird, as 3.1.3 works fine for me in every other location I've visited. Here's the scenario for failure -- and perhaps reason to avoid calling 3.1.3 stable, hence my posting of this to this "bug." This run-time failure only occurs if I enable the following flag in /etc/conf.d/net: dhcpcd_eth0="-t 60" Otherwise, I never get an IP address at all. With the flag, all I get are stack dumps, like this one: # /etc/init.d/net.eth0 restart * Caching service dependencies ... * Can't find service 'net.wlan0' needed by 'hostapd'; continuing... [ ok ] * Unmounting network filesystems ... [ ok ] * Stopping ntpd ... [ ok ] * samba -> stop: smbd ... [ ok ] * samba -> stop: nmbd ... [ ok ] * Stopping Cisco VPN Client ... [ ok ] * Stopping eth0 * Bringing down eth0 * Stopping dhcpcd on eth0 ... [ ok ] * Shutting down eth0 ... [ ok ] * Starting eth0 * Bringing up eth0 * dhcp * Running dhcpcd ... *** glibc detected *** /sbin/dhcpcd: malloc(): memory corruption (fast): 0x08058178 *** ======= Backtrace: ========= /lib/libc.so.6[0xb7e0d2b6] /lib/libc.so.6[0xb7e0fa00] /lib/libc.so.6(malloc+0x90)[0xb7e10ae0] /sbin/dhcpcd[0x804b861] /sbin/dhcpcd[0x8049550] /sbin/dhcpcd[0x80499b6] /sbin/dhcpcd[0x804b1b5] /sbin/dhcpcd[0x804f289] /lib/libc.so.6(__libc_start_main+0xe0)[0xb7dbd9e0] /sbin/dhcpcd[0x80494c1] ======= Memory map: ======== 08048000-08055000 r-xp 00000000 08:03 6030144 /sbin/dhcpcd 08055000-08056000 rw-p 0000d000 08:03 6030144 /sbin/dhcpcd 08056000-08077000 rw-p 08056000 00:00 0 [heap] b7c00000-b7c21000 rw-p b7c00000 00:00 0 b7c21000-b7d00000 ---p b7c21000 00:00 0 b7d9c000-b7da6000 r-xp 00000000 08:03 5638476 /usr/lib/gcc/i686-pc-linux-gnu/4.2.0/libgcc_s.so.1 b7da6000-b7da7000 rw-p 00009000 08:03 5638476 /usr/lib/gcc/i686-pc-linux-gnu/4.2.0/libgcc_s.so.1 b7da7000-b7da8000 rw-p b7da7000 00:00 0 b7da8000-b7ed1000 r-xp 00000000 08:03 6082119 /lib/libc-2.6.so b7ed1000-b7ed2000 r--p 00128000 08:03 6082119 /lib/libc-2.6.so b7ed2000-b7ed4000 rw-p 00129000 08:03 6082119 /lib/libc-2.6.so b7ed4000-b7ed8000 rw-p b7ed4000 00:00 0 b7efe000-b7f18000 r-xp 00000000 08:03 6082112 /lib/ld-2.6.so b7f18000-b7f1a000 rw-p 00019000 08:03 6082112 /lib/ld-2.6.so bf9df000-bf9f4000 rw-p bf9df000 00:00 0 [stack] ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso] /lib/rcscripts/net/dhcpcd.sh: line 95: 17795 Aborted /sbin/dhcpcd -h "WDAWSONLT" -t 60 eth0 [ !! ] * Trying fallback configuration * apipa * Searching for free addresses in 169.254.0.0/16 * 169.254.136.191/16 [ ok ] * Mounting network filesystems ... 17905: Connection to grant failed SMB connection failed 17906: Connection to harrison failed SMB connection failed * Could not mount all network filesystems! [ !! ] * Starting ntpd ... [ ok ] * samba -> start: smbd ... [ ok ] * samba -> start: nmbd ... [ ok ] * Starting Cisco VPN Client ... [ ok ] Thank you for your consideration of this issue.
If you can get a backtrace of that and use the -d option for more info I'll restore 3.0.19
http://www.gentoo.org/proj/en/qa/backtraces.xml Instructions on getting a backtrace
Thanks! I'll see about getting you a backtrace a bit later today. As for restoring 3.0.19, it turns out I found a workaround. The comments in bug #187753 made sense for this environment, as this network is very ARP-heavy (storm would be an understatement). I tried the patch given in bug #187753, but that only caused dhcpcd to hang forever. However, modifying my /etc/conf.d/net dhcpcd options to include -A got me going. Now I get my IP address rather quickly and so I'm able to work in this network without need of 3.0.19. As this network is the only environment where I have this stability issue, I'm OK.
Here's a stacktrace -- I hope it helps -- I don't have debugedit installed so installsources won't work -- I'll get it in place tonight and retest tomorrow. (gdb) set args -t 60 -d eth0 (gdb) run Starting program: /sbin/dhcpcd -t 60 -d eth0 Info, eth0: dhcpcd 3.1.3 starting Info, eth0: hardware address = 00:15:c5:0d:b1:70 Info, eth0: DUID = 00:01:00:01:0e:38:58:22:00:15:c5:0d:b1:70 Info, eth0: broadcasting for a lease Debug, eth0: sending DHCP_DISCOVER with xid 0x3d576ff2 Debug, eth0: waiting on select for 60 seconds Debug, eth0: got a packet with xid 0x3d576ff2 Info, eth0: offered 10.33.0.199 from 10.30.0.40 Debug, eth0: sending DHCP_REQUEST with xid 0x3d576ff2 Debug, eth0: waiting on select for 60 seconds Debug, eth0: got a packet with xid 0x3d576ff2 Info, eth0: got subsequent offer of 10.33.1.83, ignoring Debug, eth0: waiting on select for 60 seconds Debug, eth0: got a packet with xid 0x3d576ff2 Info, eth0: checking 10.33.0.199 is available on attached networks Debug, eth0: sending ARP probe #1 Debug, eth0: sending ARP probe #2 Program received signal SIGSEGV, Segmentation fault. 0xb7e7e383 in ?? () from /lib/libc.so.6
That should have dropped a core file - see if you can run a backtrace on it. Or you can do it by hand tar xvjpf /usr/portage/distfiles/dhcpcd-3.1.3.tar.bz2 cd dhcpcd-3.1.3 CFLAGS=-ggdb make Then run that dhcpcd in gdb. When it drops code, use the command "bt" to get a backtrace.
Thanks for the clues on generating a backtrace within gdb. Here's the output -- it appears that I need to re-emerge glibc to capture where it's bombing... # gdb ./dhcpcd GNU gdb 6.6 Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"... warning: not using untrusted file "/home/wdawson/.gdbinit" Using host libthread_db library "/lib/libthread_db.so.1". (gdb) run -t 60 eth0 Starting program: /home/wdawson/dhcpcd-3.1.3/dhcpcd -t 60 eth0 Error, eth0: dhcpcd already running on pid 16026 (/var/run/dhcpcd-eth0.pid) Program exited with code 01. (gdb) run -h WDAWSONLT -t 60 eth0 Starting program: /home/wdawson/dhcpcd-3.1.3/dhcpcd -h WDAWSONLT -t 60 eth0 *** glibc detected *** /home/wdawson/dhcpcd-3.1.3/dhcpcd: malloc(): memory corruption (fast): 0x0805a128 *** ======= Backtrace: ========= /lib/libc.so.6[0xb7e1b2b6] /lib/libc.so.6[0xb7e1da00] /lib/libc.so.6(malloc+0x90)[0xb7e1eae0] /home/wdawson/dhcpcd-3.1.3/dhcpcd[0x804bd41] /home/wdawson/dhcpcd-3.1.3/dhcpcd[0x80496a3] /home/wdawson/dhcpcd-3.1.3/dhcpcd[0x8049a5f] /home/wdawson/dhcpcd-3.1.3/dhcpcd[0x804b480] /home/wdawson/dhcpcd-3.1.3/dhcpcd[0x8050a19] /lib/libc.so.6(__libc_start_main+0xe0)[0xb7dcb9e0] /home/wdawson/dhcpcd-3.1.3/dhcpcd[0x8049621] ======= Memory map: ======== 08048000-08057000 r-xp 00000000 08:03 7102770 /home/wdawson/dhcpcd-3.1.3/dhcpcd 08057000-08058000 rw-p 0000f000 08:03 7102770 /home/wdawson/dhcpcd-3.1.3/dhcpcd 08058000-08079000 rw-p 08058000 00:00 0 [heap] b7c00000-b7c21000 rw-p b7c00000 00:00 0 b7c21000-b7d00000 ---p b7c21000 00:00 0 b7daa000-b7db4000 r-xp 00000000 08:03 5638476 /usr/lib/gcc/i686-pc-linux-gnu/4.2.0/libgcc_s.so.1 b7db4000-b7db5000 rw-p 00009000 08:03 5638476 /usr/lib/gcc/i686-pc-linux-gnu/4.2.0/libgcc_s.so.1 b7db5000-b7db6000 rw-p b7db5000 00:00 0 b7db6000-b7edf000 r-xp 00000000 08:03 5541281 /lib/libc-2.6.so b7edf000-b7ee0000 r--p 00128000 08:03 5541281 /lib/libc-2.6.so b7ee0000-b7ee2000 rw-p 00129000 08:03 5541281 /lib/libc-2.6.so b7ee2000-b7ee6000 rw-p b7ee2000 00:00 0 b7f0c000-b7f26000 r-xp 00000000 08:03 5541271 /lib/ld-2.6.so b7f26000-b7f28000 rw-p 00019000 08:03 5541271 /lib/ld-2.6.so bfd47000-bfd5c000 rw-p bfd47000 00:00 0 [stack] ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso] Program received signal SIGABRT, Aborted. 0xffffe410 in __kernel_vsyscall () (gdb) bt #0 0xffffe410 in __kernel_vsyscall () #1 0xb7ddef05 in raise () from /lib/libc.so.6 #2 0xb7de0721 in abort () from /lib/libc.so.6 #3 0xb7e156ac in ?? () from /lib/libc.so.6 #4 0x00000009 in ?? () #5 0xbfd59278 in ?? () #6 0x00000400 in ?? () #7 0xb7df1f13 in vfprintf () from /lib/libc.so.6 #8 0xb7e1b2b6 in ?? () from /lib/libc.so.6 #9 0x00000002 in ?? () #10 0xb7ec9608 in ?? () from /lib/libc.so.6 #11 0xbfd5b934 in ?? () #12 0xb7ec9724 in ?? () from /lib/libc.so.6 #13 0xbfd597af in ?? () #14 0xb7ec9724 in ?? () from /lib/libc.so.6 #15 0x30687465 in ?? () #16 0x61353038 in ?? () #17 0x00383231 in ?? () #18 0xb7ee0ff4 in ?? () from /lib/libc.so.6 #19 0x0805a120 in ?? () #20 0x00000002 in ?? () #21 0xbfd5984c in ?? () #22 0xb7e1da00 in ?? () from /lib/libc.so.6 #23 0xbfd59898 in ?? () #24 0x0000003c in ?? () #25 0xbfd597e0 in ?? () #26 0xb7e77711 in sendto () from /lib/libc.so.6 #27 0x08054a89 in send_packet (iface=0xbfd59898, type=60, data=0xbfd597e0 "\a", len=-1209567471) at socket.c:495 Backtrace stopped: previous frame inner to this frame (corrupt stack?) (gdb) quit The program is running. Exit anyway? (y or n) y
Created attachment 127179 [details, diff] arp debug OK, here's a patch against 3.1.3 which adds more debug info to the output - just run dhcpcd -d eth0. It looks like the size of the ARP packet is way to bug - it should be just 28 bytes for most people.
OK, here's the gdb run with backtrace using the latest patch: # gdb ./dhcpcd GNU gdb 6.6 Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"... warning: not using untrusted file "/home/wdawson/.gdbinit" Using host libthread_db library "/lib/libthread_db.so.1". (gdb) set args -d -h WDAWSONLT -t 60 eth0 (gdb) run Starting program: /home/wdawson/Software/dhcpcd-3.1.3/dhcpcd -d -h WDAWSONLT -t 60 eth0 Info, eth0: dhcpcd 3.1.4_pre2 starting Info, eth0: hardware address = 00:15:c5:0d:b1:70 Info, eth0: DUID = 00:01:00:01:0e:38:58:22:00:15:c5:0d:b1:70 Info, eth0: broadcasting for a lease Debug, eth0: sending DHCP_DISCOVER with xid 0x117584c0 Debug, eth0: waiting on select for 60 seconds Debug, eth0: got a packet with xid 0x117584c0 Info, eth0: offered 10.33.0.199 from 10.30.0.40 Debug, eth0: sending DHCP_REQUEST with xid 0x117584c0 Debug, eth0: waiting on select for 60 seconds Debug, eth0: got a packet with xid 0x117584c0 Info, eth0: got subsequent offer of 10.33.1.79, ignoring Debug, eth0: waiting on select for 60 seconds Debug, eth0: got a packet with xid 0x117584c0 Info, eth0: checking 10.33.0.199 is available on attached networks Debug, eth0: sending ARP probe #1 Debug, eth0: hwlen 6 family 1 arpsize 28 Debug, eth0: arphdr_len 28 Debug, eth0: sending ARP probe #2 Program received signal SIGSEGV, Segmentation fault. 0xb7eaf383 in ?? () from /lib/libc.so.6 (gdb) bt #0 0xb7eaf383 in ?? () from /lib/libc.so.6 #1 0xb7f5a4d7 in ?? () from /lib/libc.so.6 #2 0xb7f5a4d7 in ?? () from /lib/libc.so.6 #3 0xb7f76120 in ?? () from /lib/libc.so.6 #4 0xb7f76130 in ?? () from /lib/libc.so.6 #5 0xb7f7614c in ?? () from /lib/libc.so.6 #6 0x00000000 in ?? () (gdb) quit The program is running. Exit anyway? (y or n) y
Created attachment 127197 [details] arp packet that crashes dhcpcd I used wireshark to capture the arp traffic. Here it is as a pcap file:
Created attachment 127208 [details, diff] Attempt to fix ARP crash Thanks for the packet - I'll see if I can whip together some code to inject the packet into my dhcpcd to see if I can replicate the crash. In the meantime, I think I've seen some places that could potentially cause a crash, so try this patch. You'll have to remove the old one. Thanks
Here's another backtrace for you based on your last patch: # gdb ./dhcpcd GNU gdb 6.6 Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"... warning: not using untrusted file "/home/wdawson/.gdbinit" Using host libthread_db library "/lib/libthread_db.so.1". (gdb) set args -d -h WDAWSONLT -t 60 eth0 (gdb) run Starting program: /home/wdawson/Software/dhcpcd-3.1.3/dhcpcd -d -h WDAWSONLT -t 60 eth0 Info, eth0: dhcpcd 3.1.4_pre2 starting Info, eth0: hardware address = 00:15:c5:0d:b1:70 Info, eth0: DUID = 00:01:00:01:0e:38:58:22:00:15:c5:0d:b1:70 Info, eth0: broadcasting for a lease Debug, eth0: sending DHCP_DISCOVER with xid 0x4671d57b Debug, eth0: waiting on select for 60 seconds Debug, eth0: got a packet with xid 0x4671d57b Info, eth0: offered 10.33.0.168 from 10.30.0.40 Debug, eth0: sending DHCP_REQUEST with xid 0x4671d57b Debug, eth0: waiting on select for 60 seconds Debug, eth0: got a packet with xid 0x4671d57b Info, eth0: got subsequent offer of 10.33.1.83, ignoring Debug, eth0: waiting on select for 60 seconds Debug, eth0: got a packet with xid 0x4671d57b Info, eth0: checking 10.33.0.168 is available on attached networks Debug, eth0: sending ARP probe #1 Debug, eth0: sending ARP probe #2 Debug, eth0: sending ARP probe #3 *** glibc detected *** /home/wdawson/Software/dhcpcd-3.1.3/dhcpcd: malloc(): memory corruption: 0x0805a200 *** ======= Backtrace: ========= /lib/libc.so.6[0xb7df12b6] /lib/libc.so.6[0xb7df34f1] /lib/libc.so.6(malloc+0x90)[0xb7df4ae0] /home/wdawson/Software/dhcpcd-3.1.3/dhcpcd[0x804bed9] /home/wdawson/Software/dhcpcd-3.1.3/dhcpcd[0x805426e] /home/wdawson/Software/dhcpcd-3.1.3/dhcpcd[0x8049a97] /home/wdawson/Software/dhcpcd-3.1.3/dhcpcd[0x804b554] /home/wdawson/Software/dhcpcd-3.1.3/dhcpcd[0x8050bb1] /lib/libc.so.6(__libc_start_main+0xe0)[0xb7da19e0] /home/wdawson/Software/dhcpcd-3.1.3/dhcpcd[0x8049671] ======= Memory map: ======== 08048000-08057000 r-xp 00000000 08:03 966744 /home/wdawson/Software/dhcpcd-3.1.3/dhcpcd 08057000-08058000 rw-p 0000f000 08:03 966744 /home/wdawson/Software/dhcpcd-3.1.3/dhcpcd 08058000-08079000 rw-p 08058000 00:00 0 [heap] b7c00000-b7c21000 rw-p b7c00000 00:00 0 b7c21000-b7d00000 ---p b7c21000 00:00 0 b7d74000-b7d75000 rw-p b7d74000 00:00 0 b7d75000-b7d88000 r-xp 00000000 08:03 5541289 /lib/libpthread-2.6.so b7d88000-b7d8a000 rw-p 00012000 08:03 5541289 /lib/libpthread-2.6.so b7d8a000-b7d8c000 rw-p b7d8a000 00:00 0 b7d8c000-b7eb5000 r-xp 00000000 08:03 5541281 /lib/libc-2.6.so b7eb5000-b7eb6000 r--p 00128000 08:03 5541281 /lib/libc-2.6.so b7eb6000-b7eb8000 rw-p 00129000 08:03 5541281 /lib/libc-2.6.so b7eb8000-b7ebb000 rw-p b7eb8000 00:00 0 b7ebb000-b7ec2000 r-xp 00000000 08:03 5541282 /lib/librt-2.6.so b7ec2000-b7ec4000 rw-p 00006000 08:03 5541282 /lib/librt-2.6.so b7ec4000-b7ec5000 rw-p b7ec4000 00:00 0 b7edf000-b7ee9000 r-xp 00000000 08:03 5638476 /usr/lib/gcc/i686-pc-linux-gnu/4.2.0/libgcc_s.so.1 b7ee9000-b7eea000 rw-p 00009000 08:03 5638476 /usr/lib/gcc/i686-pc-linux-gnu/4.2.0/libgcc_s.so.1 b7eea000-b7eeb000 rw-p b7eea000 00:00 0 b7eeb000-b7f05000 r-xp 00000000 08:03 5541271 /lib/ld-2.6.so b7f05000-b7f07000 rw-p 00019000 08:03 5541271 /lib/ld-2.6.so bfb50000-bfb66000 rw-p bfb50000 00:00 0 [stack] ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso] Program received signal SIGABRT, Aborted. 0xffffe410 in __kernel_vsyscall () (gdb) bt #0 0xffffe410 in __kernel_vsyscall () #1 0xb7db4f05 in raise () from /lib/libc.so.6 #2 0xb7db6721 in abort () from /lib/libc.so.6 #3 0xb7deb6ac in ?? () from /lib/libc.so.6 #4 0x00000009 in ?? () #5 0xbfb60828 in ?? () #6 0x00000400 in ?? () #7 0xb7eb6ff4 in ?? () from /lib/libc.so.6 #8 0xb7e9d935 in ?? () from /lib/libc.so.6 #9 0xbfb60de4 in ?? () #10 0xb7e9f608 in ?? () from /lib/libc.so.6 #11 0x00000017 in ?? () #12 0xbfb64916 in ?? () #13 0x0000002a in ?? () #14 0xb7e9f621 in ?? () from /lib/libc.so.6 #15 0x00000002 in ?? () #16 0xb7e9c59f in ?? () from /lib/libc.so.6 #17 0x0000001b in ?? () #18 0xb7e9f625 in ?? () from /lib/libc.so.6 #19 0x00000004 in ?? () #20 0xbfb60d5f in ?? () #21 0x00000008 in ?? () #22 0xb7e9f62b in ?? () from /lib/libc.so.6 #23 0x00000005 in ?? () #24 0x0805e854 in ?? () #25 0x0805a268 in ?? () #26 0xb7e9f62b in ?? () from /lib/libc.so.6 #27 0x00000005 in ?? () #28 0xbfb60770 in ?? () #29 0xb7e9f62c in ?? () from /lib/libc.so.6 #30 0x00000025 in ?? () #31 0x0805a283 in ?? () #32 0x0805e74c in ?? () #33 0xbfb60d88 in ?? () #34 0xbfb60d5f in ?? () #35 0x00000008 in ?? () #36 0xbfb60790 in ?? () #37 0x00000000 in ?? () (gdb) quit The program is running. Exit anyway? (y or n) y
Created attachment 127285 [details, diff] Attempt to debug ARP crash This has more debugging to try and isolate the location of the error. Any chance you could attach your backtraces instead of putting them in comments? Thanks
Created attachment 127348 [details] backtrace from latest crash This is the backtrace from the latest patch. FYI, I will only be at this client for today and part of tomorrow, so my ability to use this network for testing DHCPCD is soon coming to an end.
dhcpcd-3.1.4 which has just been released *may* fix your issue. If not, I'll need a new backtrace and we'll start over again. If you build it will full debugging and it segfaults then please attach the core file here. Thanks.