When starting net.eth0 the initscript entered an infinite loop, which I have traced to the while read ... done </proc/net/dev block in the _ifindex() function in iproute2.sh (also in ifconfig.sh but my system doesn't use that). This appears to be the same problem described at: http://lists.debian.org/debian-kernel/2012/02/msg01089.html I have managed to work around it by changing the loop in _ifindex() from: while read line; do ... done </proc/net/dev to: cat /proc/net/dev | while read line; do ... done I've managed to restart the interface before with this kernel (vanilla 3.3.0 with a patch for Apple backlight support) and haven't rebooted since that worked, so I think some set of conditions needs to be met before this bug is triggered, but once I got into the state where reading /proc/net/dev failed it kept failing until I edited the script as described above. Reproducible: Sometimes
@robbat2: Can we rewrite _ifindex() to not use /proc/net/dev, at least for linux systems? I think if we can get away from /proc/net/dev that will be a good solution for this.
WilliamH: can we rely on awk instead? there is basically no other way to get the indices of an interface that will not change the ordering. The interfaces in that file are ordered by when they were added to a system, and that order is important to us. The entire function could be replaced by this awk function: awk " BEGIN{n=-2} /^$IFACE:/{n=NR-2; } END{if(n<0){n=NR-1}; print n}" /proc/net/dev if you want debugging, add this between the /^$IFACE:/ line and the END line: { printf(\"nr-2=%d n=%d s=%s\n\",NR-2,n,\$1);}
(In reply to comment #2) > WilliamH: > can we rely on awk instead? > there is basically no other way to get the indices of an interface that will > not change the ordering. > The interfaces in that file are ordered by when they were added to a system, > and that order is important to us. @robbat2: I would rather not use awk, because that adds another dependency to openrc, and I'm not sure it resolves the issue. Looking at the related debian bug report, it looks like the issue is a kernel issue with reading from /proc/net/dev, regardless of how we do it. Since the issue is sporatic, I'm not sure that the fix originally proposed, or using awk would actually fix this.
(In reply to comment #3) > (In reply to comment #2) > > can we rely on awk instead? > > there is basically no other way to get the indices of an interface that will > > not change the ordering. > > The interfaces in that file are ordered by when they were added to a system, > > and that order is important to us. Is this definitely the case? It seems that the following commit means that the iteration order can now change since the file is now generated by iterating over the buckets in the hash table net->dev_name_head (see list_netdevice() in net/core/dev.c). commit f04565ddf52e401880f8ba51de0dff8ba51c99fd Author: Mihai Maruseac <mihai.maruseac@gmail.com> Date: Thu Oct 20 20:45:10 2011 +0000 dev: use name hash for dev_seq_ops > @robbat2: > I would rather not use awk, because that adds another dependency to openrc, > and I'm not sure it resolves the issue. > > Looking at the related debian bug report, it looks like the issue is a > kernel issue with reading from /proc/net/dev, regardless of how we do it. > Since the issue is sporatic, I'm not sure that the fix originally proposed, > or using awk would actually fix this. I'm not sure the issue is sporadic - I've only had an infinite loop once, but I consistently (across 4 machines) get corrupted output running: while read -r line ; do echo "$line" ; done </proc/net/dev whereas `cat /proc/net/dev` always seems reasonable. After digging into this a bit more the problem seems to be that the read builtin reads a chunk from the file and then seeks backwards to the first newline so that the next call reads the next line and it is this seeking that is broken - so yes, it does seem to be a kernel issue, but it's much simpler to make OpenRC work around the bug. cat seems to use a 4096 byte chunk which is bigger than the total size of /proc/net/dev on all of my systems, so I'm not sure if this still works once you have a lot of interfaces - I suspect that it will be fine since cat will not call lseek at all and the bug appears to be in dev_seq_next() in net/core/dev.c, which does not use the pos parameter in determining what to return - so it will work fine if called to output the devices in order but not if the caller seeks at all.
(In reply to comment #4) > (In reply to comment #3) > > (In reply to comment #2) > > > can we rely on awk instead? > > > there is basically no other way to get the indices of an interface that will > > > not change the ordering. > > > The interfaces in that file are ordered by when they were added to a system, > > > and that order is important to us. > > Is this definitely the case? It seems that the following commit means that > the iteration order can now change since the file is now generated by > iterating over the buckets in the hash table net->dev_name_head (see > list_netdevice() in net/core/dev.c). > > commit f04565ddf52e401880f8ba51de0dff8ba51c99fd > Author: Mihai Maruseac <mihai.maruseac@gmail.com> > Date: Thu Oct 20 20:45:10 2011 +0000 > > dev: use name hash for dev_seq_ops If this order assumption is not valid, there is a way we can get the device names from /sys/class/net. I have a patch that does this. > I'm not sure the issue is sporadic - I've only had an infinite loop once, > but I consistently (across 4 machines) get corrupted output running: > > while read -r line ; do echo "$line" ; done </proc/net/dev > > whereas `cat /proc/net/dev` always seems reasonable. What about using the read command to parse out the device name, e.g.: while read dev rest; do echo $dev; done < /proc/net/dev Does that also give corrupted output?
(In reply to comment #5) > If this order assumption is not valid, there is a way we can get the device > names from /sys/class/net. I have a patch that does this. Interestingly (assuming you're talking about reading /sys/class/net/$dev/ifindex), I get the following ordering on this system - which would seem to imply the ordering assumption is indeed wrong: $ cat /proc/net/dev | cut -c-70 # cut to avoid confusing line wrapping Inter-| Receive | Tr face |bytes packets errs drop fifo frame compressed multicast|byte lo: 8836302 12384 0 0 0 0 0 0 883 virbr0: 0 0 0 0 0 0 0 0 sit0: 0 0 0 0 0 0 0 0 wlan0: 15583373697 11384063 0 0 0 0 0 0 eth0: 0 0 0 0 0 0 0 0 $ for i in /sys/class/net/*; do echo $(cat $i/ifindex) ${i##*/}; done | sort 1 lo 2 eth0 3 sit0 4 wlan0 5 virbr0 > (In reply to comment #4) > > I'm not sure the issue is sporadic - I've only had an infinite loop once, > > but I consistently (across 4 machines) get corrupted output running: > > > > while read -r line ; do echo "$line" ; done </proc/net/dev > > > > whereas `cat /proc/net/dev` always seems reasonable. > > What about using the read command to parse out the device name, e.g.: > > while read dev rest; do echo $dev; done < /proc/net/dev > > Does that also give corrupted output? Yes (with gentoo-sources-3.2.6 on the same machine as above): $ while read dev rest; do echo $dev; done </proc/net/dev Inter-| face virbr0: 0 lo: eth0: I think any solution using read like that will expose the bug because it will try to seek within the file.
Look at how the ordering in ifindex vs /proc/net/dev changes over the change of interfaces here: # for i in /sys/class/net/*; do echo $(cat $i/ifindex) ${i##*/}; done | sort -n cat: /sys/class/net/bonding_masters/ifindex: Not a directory bonding_masters 1 lo 2 eth0 3 eth1 4 eth2 5 eth3 6 bond0 15 bond1 16 dummy0 # cat /proc/net/dev |cut -c-40 Inter-| Receive face |bytes packets errs drop fifo f eth3: 4826672837 16589455 4 100 lo: 2923881716 5759650 0 0 eth2: 4941753388 17055969 0 116 eth1: 10177055798 12884546 0 35777 dummy0: 0 0 0 0 0 eth0: 0 0 0 0 0 bond1: 0 0 0 0 0 bond0: 9768426225 33645424 4 216 # rmmod dummy # for i in /sys/class/net/*; do echo $(cat $i/ifindex) ${i##*/}; done | sort -n cat: /sys/class/net/bonding_masters/ifindex: Not a directory bonding_masters 1 lo 2 eth0 3 eth1 4 eth2 5 eth3 6 bond0 15 bond1 # cat /proc/net/dev |cut -c-40 Inter-| Receive face |bytes packets errs drop fifo f eth3: 4826709377 16589687 4 100 lo: 2923971677 5759851 0 0 eth2: 4941785392 17056227 0 116 eth1: 10177239712 12884987 0 35778 eth0: 0 0 0 0 0 bond1: 0 0 0 0 0 bond0: 9768494769 33645914 4 216 # modprobe dummy # for i in /sys/class/net/*; do echo $(cat $i/ifindex) ${i##*/}; done | sort -n cat: /sys/class/net/bonding_masters/ifindex: Not a directory bonding_masters 1 lo 2 eth0 3 eth1 4 eth2 5 eth3 6 bond0 15 bond1 89 dummy0 # cat /proc/net/dev |cut -c-40 Inter-| Receive face |bytes packets errs drop fifo f eth3: 4826732972 16589806 4 100 lo: 2924075819 5760049 0 0 eth2: 4941809618 17056347 0 116 eth1: 10177583293 12885629 0 35780 dummy0: 0 0 0 0 0 eth0: 0 0 0 0 0 bond1: 0 0 0 0 0 bond0: 9768542590 33646153 4 216
(In reply to comment #6) > I think any solution using read like that will expose the bug because it > will try to seek within the file. I think you mean any solution using read like that with /proc/net/dev right? @robbat2: Can we use the number in /sys/class/net/${IFACE}/ifindex as the default metric? It seems like that number is constant for each interface looking at your information in comment #7. What do you think? Do you know where that value comes from?
(In reply to comment #8) > (In reply to comment #6) > > I think any solution using read like that will expose the bug because it > > will try to seek within the file. > > I think you mean any solution using read like that with /proc/net/dev right? Yes, anything of the form: while read ... ; done </proc/net/dev exposes the bug. AFAICT the issue is in the implementation of /proc/net/dev and affects any attempt to seek within that file.
All, here is my suggestion for a new _ifindex() function for Linux. It returns the value the kernel assigns to /sys/class/net/${IFACE}/ifindex instead of trying to calculate a value. Here are my questions about this: - If we ever hit the else path, what is a reasonable response? I have just assigned 9999 as a metric in that situation, because I do not know what else to do with it. - Does a return code matter since we do not check for it in the network scripts anyway? _ifindex() { local i=0 rc=0 if [ -e /sys/class/net/"${IFACE}"/ifindex ]; then i=$(cat /sys/class/net/"${IFACE}"/ifindex) else rc=1 i=9999 fi echo "${i}" return $rc }
I'm not sure where the kernel gets the ordering in /proc/net/dev from. It's certainly not coming from the ifindex value as that changed in my testcase, while the position of dummy0 stayed the same. (In reply to comment #10) > - If we ever hit the else path, what is a reasonable response? I have > just assigned 9999 as a metric in that situation, because I do not > know what else to do with it. The else path is important. It's a guess of the lowest possible value for the next ifindex value assigned. It needs to be unique as well, so that you don't get conflicting metric values in routes. > - Does a return code matter since we do not check for it in the network > scripts anyway? Nope. > > _ifindex() > { > local i=0 rc=0 > if [ -e /sys/class/net/"${IFACE}"/ifindex ]; then > i=$(cat /sys/class/net/"${IFACE}"/ifindex) > else > rc=1 > i=9999 > fi > echo "${i}" > return $rc > } I suggest this for the else path: index=-1 for f in /sys/class/net/*/ifindex ; do v=$(cat $f) [ $v -gt $index ] && index=$v done echo $(($index+1))
(In reply to comment #11) > I'm not sure where the kernel gets the ordering in /proc/net/dev from. It's > certainly not coming from the ifindex value as that changed in my testcase, > while the position of dummy0 stayed the same. Since the commit referenced in comment #4 (which was included in kernel 3.2) /proc/net/dev is generated by iterating over a hash table, hashed on the name of the interface.
All, here is my updated replacement for _ifindex on linux which encorporates the changes from robbat2. _ifindex() { local index=-1 local f v if [ -e /sys/class/net/"${IFACE}"/ifindex ]; then index=$(cat /sys/class/net/"${IFACE}"/ifindex) else for f in /sys/class/net/*/ifindex ; do v=$(cat $f) [ $v -gt $index ] && index=$v done : $(( index += 1 )) fi echo "${index}" return 0 } @robbat2: Are you ok with this version? If so I'll replace the function for linux systems and commit.
williamh: yes, but definitely as 0.10 version
This is fixed by commit 9127684. Thanks for the report.