When using the gentoo kernel (I'm using 2.6.20-gentoo-r8) with pppoe support some sites are unresponsive and latency is high when contacting others. If I emerge rp-pppoe and use it all the latency and unresponsive issues go away. It took a while to notice the issue because most tcp/ip sites are still reachable, but some of them (like blizzard.com and support.microsoft.com for me) would never respond. The issue was with connecting with the sites on the router or with a system natted behind the router. I could not get the same bad behavior using Microsoft's pppoe client or rp-pppoe. Reproducible: Always Steps to Reproduce: 1. Configure pppoe support in the kernel 2. Configure conf.d/net to load ppp0 connection and load it with init.d/net.ppp0 3. Use a webbrowser to notice poor responsiveness with many sites and some sites won't ever respond. 4. Shutdown ppp connection. 5. Install rp-pppoe 6. use pppoe-start 7. Follow step 3 and notice all issues disapear. Actual Results: Acutal results included in steps above. Expected Results: Expected result should be no latency or tcp/ip site connectivity issues. Please let me know what configuration data or any other information you need from me. I have not attempted to try a different kernel. I haven't the time so far. I have tried different options for the pppoe with no new results. I'm using the same iptable rules when I have bad and good results. I am also doing traffic shaping and it doesn't appear that it affects the results if it is loaded or not.
comment #1 I have the same issue whith kernel 2.6.20-hardened-r2 and kernel-pppoe. I've tested support.microsoft.com and get checksum errors in every first ack packet: TCP: Flags = 0x10 : .A.... TCP: ..0..... = No urgent data TCP: ...1.... = Acknowledgement field significant TCP: ....0... = No Push function TCP: .....0.. = No Reset TCP: ......0. = No Synchronize TCP: .......0 = No Fin TCP: Window = 65535 (0xFFFF) TCP: Checksum = ERROR: CheckSum is 0xA2B2, Should be 0xF225 TCP: Urgent Pointer = 0 (0x0)
It looks like that the newest version of rp-pppoe might have a fix for this problem. What would it take to try to get this fix patched into the kernel so I (or someone) could test it? Here is the CHANGELIST for the most current version of rp-pppoe from their website: Changes from Version 3.7 to 3.8: (2 April 2006) - Adjusted code and made it possible to disable debugging code to shrink size of pppoe executable. - Fixed bug in MD5 code that caused pppoe-server to segfault on 64-bit machines. - Made various functions and variables static that didn't need to be visible outside their source files. It looks like that this revision fixes a MD5 issue that could be the cause of our checksum errors and the drop of packets on our 64 machines. ~James
Sorry, it looks like that change was back in 2006 not 2007 like I first thought. Anyway, I did check out the md5 change om rp-pppoe and it had to do with the size of the __u32 type and I doubled checked it was correct in the kernel and obviously it is correct. So I don't think this change has anything to do with the current problem.
Is this still reproducible on the latest development kernel, currently 2.6.22-rc6?
Have the same problem with the newest kernel (using 2.6.22-gentoo-r1). Unable to reach certain sites using the net config -- others work fine. No problems accessing the below when using rp-pppoe instead. Some sites that i am unable to access include: www.microsoft.com www.blizzard.com www.allmusic.com Relevant net config section: config_ppp0=( "ppp" ) link_ppp0="eth0" plugins_ppp0=( "pppoe" ) pppd_ppp0=("defaultroute") username_ppp0='myusername' password_ppp0='mypassword'
Just to rule out one suspicion, please see if the following helps: # echo 0 > /proc/sys/net/ipv4/tcp_window_scaling
> # echo 0 > /proc/sys/net/ipv4/tcp_window_scaling Sorry, but this didn't produce any different behavior.
This bug is slightly confused by 3 people all reporting similar issues but not really providing enough information to suggest they are the same. James, you mention that 2.6.20 has problems but you don't mention if you have ever found some known working kernels. Can you clarify? Also, please test the latest development kernel, currently 2.6.23-rc1. Rolf: same question applies to you too. Tyler: your report seems to indicate that 2.6.22 is broken but previous kernels work. Is that correct? If so, which previous kernels? Also, please test the latest development kernel, currently 2.6.23-rc1.
(In reply to comment #8) > James, you mention that 2.6.20 has problems but you don't mention if you have > ever found some known working kernels. Can you clarify? Also, please test the > latest development kernel, currently 2.6.23-rc1. I have not found a working kernel, but I have not tried very many as this system is a production box. However I went ahead and tested 2.6.23-rc1 tonight during down time and found it has the same problem as 2.6.20-gentoo-r8 and also with my current kernel, 2.6.22-gentoo-r1.
James, please file an upstream bug at http://bugzilla.kernel.org and post the new bug URL here.
James, Once you have filed a bug upstream, please reopen and copy the URL to this bug for tracking.