I have two full BGP sessions with to providers. And the routing table contains 204456 entrys. As you can see the time to list all this entries takes apox 1 minute and 35 seconds on a 1.7GHz CPU which is 93% IDLE. ecity ~ # time ip route ls|wc -l 204456 real 1m35.370s user 0m1.690s sys 1m32.860s ecity ~ # The problem is that when I want to restart or bring up an interface( /etc/init.d/net/eth2 start for example) it takes to much time because of the functions calculate_metric() from /etc/init.d/net.lo calculate_metric() { local iface="$1" metric="$2" # Have we already got a metric? local m=$(awk '$1=="'${iface}'" && $2=="00000000" { print $7 }' \ /proc/net/route) if [[ -n ${m} ]] ; then echo "${m}" return 0 fi the awk search in /proc/net/route takes too much time when the routing table is very big and it slows down the starting of the interface(any interface). Reproducible: Always Steps to Reproduce: 1. have a very big routing table(aprox 205.000 routes) 2. start an interface(/etc/init.d/net/eth2 start) Actual Results: it takes to much time to bring up that interface Expected Results: bring the interface up quickly.
try rewriting it like: local m=$(awk '$1=="'${iface}'" && $2=="00000000" { print $7 ; exit }' \ /proc/net/route)
hmm, I've just come into a box with similar setup (BGP, lots of routes), so I'll try to figure out something as well. Exiting after the first match doesn't really help, as the match could be at the very end of the data. The rest of the calculate_metric function is going to suck badly for those with lots of routes. However, I think there is an issue with the kernel, read on for more. grubbs-int ~ # time wc -l /proc/net/route 159641 /proc/net/route real 0m0.265s user 0m0.007s sys 0m0.257s grubbs-int ~ # time awk '$1=="'${iface}'" && $2=="00000000" { print $7 }' /proc/net/route 0 real 0m0.344s user 0m0.100s sys 0m0.243s But now, if I get something to do a lot of background access to the routing table (like quagga/zebra)... grubbs-int ~ # time awk '$1=="'${iface}'" && $2=="00000000" { print $7 }' /proc/net/route 0 real 1m25.909s user 0m0.120s sys 1m25.384s grubbs-int ~ # time wc -l /proc/net/route 142435 /proc/net/route real 0m30.962s user 0m0.010s sys 0m30.685s The kernel is at fault here, and I don't know a solution
I suppose we could cat the routing table to a temporary file and then awk that. Aside from that, there's not much more we can do in user land.
# time cat /proc/net/route >route real 1m18.736s user 0m0.000s sys 1m18.081s Here's a possible alternative... For those of us running BGP, how about documenting the policy used by the netscript to set and calculate metrics, so that we can explictly set a metric and avoid calculate_metric running? (Also, I don't see a global 'metric' variable, only the per-interface metric variables).
Just some notes here from a discussion with UberLord, as to a plan of action. 1. Add a plain 'metric' variable, that provides a default value for all metric_$iface variables. 2. Add documentation (in net.example) as to setting a custom metric, as well as the algorithm used to assign metrics (calculate_metric finds an available metric to use, but we still need to see how it fits into the global scope). 3. Those of us with many routes should set metric=0 ( metric_$iface=0 until the global is implemented). Some notes on the algorithm (from grepping the code). 1. wired gets metric=0 2. bridge, tuntap, iptunnel all get metric=1000 3. wireless (iwconfig, wpa_supplicant) gets metric=2000
Fixed in baselayout-1.12.9