When trying to ssh from a box on one private IP subnet to another connected by a IPSEC VPN, ssh fails if the box is running vanilla 2.6.39, but it succeeds if it is running <2.6.39. This does not happen to the 2.6.39 box when ssh-ing to hosts not connected through the IPSEC VPN --- that is to hosts on the same private IP subnet, or on public IP addrs. Eg. My box has IP address 192.168.3.7. Running vanilla 2.6.39 I try to ssh to 192.168.100.193. It freezes with only the following output. This same test succeed when running a 2.6.38 kernel. blueness@yellowness ~ $ ssh -vvv 192.168.100.193 OpenSSH_5.8p1-hpn13v10lpk, OpenSSL 1.0.0d 8 Feb 2011 debug1: Reading configuration data /home/blueness/.ssh/config debug1: Reading configuration data /etc/ssh/ssh_config debug2: ssh_connect: needpriv 0 debug1: Connecting to 192.168.100.193 [192.168.100.193] port 22. debug1: Connection established. debug1: identity file /home/blueness/.ssh/id_rsa type -1 debug1: identity file /home/blueness/.ssh/id_rsa-cert type -1 debug3: Incorrect RSA1 identifier debug3: Could not load "/home/blueness/.ssh/id_dsa" as a RSA1 public key debug2: key_type_from_name: unknown key type '-----BEGIN' debug3: key_read: missing keytype debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug3: key_read: missing whitespace debug2: key_type_from_name: unknown key type '-----END' debug3: key_read: missing keytype debug1: identity file /home/blueness/.ssh/id_dsa type 2 debug1: identity file /home/blueness/.ssh/id_dsa-cert type -1 debug1: Remote protocol version 2.0, remote software version OpenSSH_5.8p1-hpn13v10 debug1: match: OpenSSH_5.8p1-hpn13v10 pat OpenSSH* debug1: Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_5.8p1-hpn13v10lpk debug2: fd 3 setting O_NONBLOCK debug3: load_hostkeys: loading entries for host "192.168.100.193" from file "/home/blueness/.ssh/known_hosts" debug3: load_hostkeys: found key type RSA in file /home/blueness/.ssh/known_hosts:221 debug3: load_hostkeys: loaded 1 keys debug3: order_hostkeyalgs: prefer hostkeyalgs: ssh-rsa-cert-v01@openssh.com,ssh-rsa-cert-v00@openssh.com,ssh-rsa debug1: SSH2_MSG_KEXINIT sent debug1: SSH2_MSG_KEXINIT received debug1: AUTH STATE IS 0 debug2: kex_parse_kexinit: diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1 debug2: kex_parse_kexinit: ssh-rsa-cert-v01@openssh.com,ssh-rsa-cert-v00@openssh.com,ssh-rsa,ssh-dss-cert-v01@openssh.com,ssh-dss-cert-v00@openssh.com,ssh-dss debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: none,zlib@openssh.com,zlib debug2: kex_parse_kexinit: none,zlib@openssh.com,zlib debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: first_kex_follows 0 debug2: kex_parse_kexinit: reserved 0 debug2: kex_parse_kexinit: ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1 debug2: kex_parse_kexinit: ssh-rsa,ssh-dss,ecdsa-sha2-nistp256 debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: none,zlib@openssh.com debug2: kex_parse_kexinit: none,zlib@openssh.com debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: first_kex_follows 0 debug2: kex_parse_kexinit: reserved 0 debug2: mac_setup: found hmac-md5 debug1: REQUESTED ENC.NAME is 'aes128-ctr' debug1: kex: server->client aes128-ctr hmac-md5 none debug2: mac_setup: found hmac-md5 debug1: REQUESTED ENC.NAME is 'aes128-ctr' debug1: kex: client->server aes128-ctr hmac-md5 none debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP Reproducible: Always Steps to Reproduce: 1. Set up two private subnets, A and B, initially isolated. 2. Set up two IPSec routers on each subnet, configured as usual. These are linux boxes running <2.6.39 3. On any box on private subnet A running 2.6.39, try to ssh a box on subnet B. It fails. (Similar failure when ssh-ing from B to A when running 2.6.39). 4. On the same box, boot into a pre 2.6.39, try to ssh to the same remote box, it succeeds.
This happens on both amd64 and i686. I'm doing the git bisect now and will submit upstream.
The culprit is commit 2c8cec5c10bced2408082a6656170e74ac17231c. Its summary says: ipv4: Cache learned PMTU information in inetpeer. So messing with the MTU info messes up when tunneling via an IPSEC vpn or possibly other tunnels. This is consistent with the misbehavior of ssh because it is known that mismatched MTU's which cause fragmentation further cause ssh to freeze at "expecting SSH2_MSG_KEX_DH_GEX_GROUP". For reference see http://www.snailbook.com/faq/mtu-mismatch.auto.html I'm working on a reverse patch right now. Simple revert doesn't apply cleanly. I'll bug upstream.
Okay reading the code carefully, I see that echo 1 > /proc/sys/net/ipv4/ip_no_pmtu_disc fixes the problem. So I'm not sure this is a "bug" or a new behavior that ipsec users should be aware of. Let's let upstream decide.
(In reply to comment #3) > Okay reading the code carefully, I see that > > echo 1 > /proc/sys/net/ipv4/ip_no_pmtu_disc > > fixes the problem. So I'm not sure this is a "bug" or a new behavior that > ipsec users should be aware of. Let's let upstream decide. And it breaks other things, such as browsing some web sites.
(In reply to comment #4) > (In reply to comment #3) > > Okay reading the code carefully, I see that > > > > echo 1 > /proc/sys/net/ipv4/ip_no_pmtu_disc > > > > fixes the problem. So I'm not sure this is a "bug" or a new behavior that > > ipsec users should be aware of. Let's let upstream decide. > > And it breaks other things, such as browsing some web sites. echo 1 > tcp_mtu_probing Fixes the ssh via ipsec problem and does not break other things. At this point I'm not sure if this is a "bug" or a new behavior that people setting up ipsec tunnels should be aware of? Any feelings from the kernel@ people?
We have a workaround, we can watch the upstream bug and determine if something comes from that where we need to do some work