S2S IPSEC tunnels with Linux 3.X Kernel devices unable to traverse the tunnel.

richardstubbs

pfSense site to site IPSEC tunnels with Linux 3.X Kernel devices unable to traverse the tunnel.

It’s a bold title as I haven’t tied it down to a specific kernel version where it first manifests itself.

I had raised an internal issue on our systems and had eventually tied the problem being the pfSense IPSEC tunnel and with my “fudge” solution of enabling IP forwarding on the effected servers the issue was closed.

However with an influx of BYOD in the workplace most of them being Android devices they cannot be used in the full network only the local LAN as we are not in control of them, I’ve put a CyanogenMod 7 based ROM on a device (Linux Kernel 2.6.37.6) and it works fine, and then a CyanogenMod 10.1 based ROM (Linux Kernel 3.1.10) on the device and it fails to be able to transverse IPSEC tunnels, I thought this may have been addressed in pfSense 2.0.2 (although nothing was specifically addressed in the release notes), but after the upgrade the problem still persists.

The problem initially manifested itself with an Ubuntu upgrade.

The following is copied verbatim from

https://groups.google.com/forum/#!topic/sane-ug/iSsuo1VgXhk

=========================
Morning all.

Ok have and interesting (*subjective) problem to do with Ubuntu 12.0.4.1 LTS.

Instead of me explaining the complex routing topology and multiple
IPSec connected sites that this particular problem manifests itself in
I’ll do a simple 2 site setup that this problem is also valid for and
has been verified.
History: A client has a standalone web application that has to run on
certified OS for compliance reasons and has been happy running on
Ubuntu 10.04.4 LTS and the “do-release-upgrade” has been bugging me
and as of the point one release of 12 it was on the certified list.

So being the diligent professional I snapshotted the VM and (installed
a test 12.0.4.1 VM to verify deployment of the application) and did a
“do-release-upgrade”.

All went fine and was working, until reports of the app not being
accessible from the other sites.

Despite the VM being routable and accessible to all sites it could not
access systems in the other networks or be accessible from them.

So as an example a simple “Site A” = “192.168.1.0/24 GW 192.168.1.1”
and “Site B” = “192.168.2.0/24 GW 192.168.2.1” connected via IPSec
(done by the GW's).

All sites have a mix of Windows / FreeBSD / Linux running in VM’s on
ESX, MTU/MSS for the Tunnel is all fine and ALL the other systems are
fine.

Deployed a stock 10.04.4 LTS VM from an ISO with DHCP and a stock
12.04.1 LTS VM from an ISO with DHCP, “out of the box” 10.04.4. LTS
was accessible from all sites as expected but the 12.04.1 LTS version
wasn’t.

NOW I can fudge the 12.04.1 LTS instances by enabling IP forwarding

sysctl -w net.ipv4.ip_forward=1

And lo and behold the VM instantly becomes accessible to all the
networks, now I shouldn’t HAVE to do this and it makes me a little
uncomfortable doing it as this box shouldn't be enabled to forward
packets and it brakes their security policy.

Now my whole point to this email is to ask a) has something changed in
Ubuntu that I have to reconfigure / allow that I’m not aware of or b)
as I suspect this is a bug.

Anyone shine a light for me?

Regards Richard

@richardstubbs
http://www.richard-stubbs.com

I’m going to start taking some wire shark analysis of the traffic but if anyone else knows the solution or the problem, as more and more embedded devices are string to run the 3.x kennel and obviously standalone Linux severs this would be greatly appreciated.

Regards Richard

@richardstubbs
http://www.richard-stubbs.com

joegeorge

Have you made any headway?

adprom

Is there any progress on this at all? This is a pretty major bug. Have experienced same thing myself.

panicattheterminal

Panic!

I have 20 Nexus 7’s for managers on my desk to configure for services on other sites, we use PFSenses’ IPSEC implementation to connect all out branch offices most offices, to get around this “bug” as a stop gap I’m using an OpenVPN client on the devices to connect to the public IP’s of the various firewalls we have. This is obviously a labour intensive task.

Is this as simple as raising ticket as a bug has someone done this? ???

I don’t want to sound ungrateful I would just like to know if this can be fixed?

But I don’t have the time to research an alternative; does anyone know a web based firewall solution with an IPsec implementation so I can reluctantly replace PFSense?

PS my nexus 7 is rooted with the “Hack/Fudge” :)

dhatz

This sounds like a huge deal, but after a quick googling and searching in the ipsec-tools and FreeBSD forums, I haven't been able to any references about this problem.

The IPsec programs/libraries used by pfSense have been around for many years and are considered to be very stable.

joegeorge

From what gather it's not an issue with BSD or IPSec, just a specific version of the linux kernel.

dhatz

@joegeorge:

From what gather it's not an issue with BSD or IPSec, just a specific version of the linux kernel.

With the Linux 3.x kernel being so widely deployed there should have been similar reports in the FreeBSD forums and mailing-lists by now, and I haven't seen any …

richardstubbs

ProxyArp!

I got around to looking this morning.

A packet dump was showing just “who-has” ARP requests again this could possibly be suppressed on the boxes but again it’s something you don’t want to do for 10 or 100’s of machines if at all possible on embedded devices .

On the firewall in the network with 3.x Kernel Linux devices that are affected
Firewall -> Virtual IP’s ->
Type “Proxy ARP”
Interface “LAN”
Type “Network”
Address “The network range your other device that are trying to connect eg 192.168.1.0/24”

This will make the firewall respond to any ARP request for this network.

Regards

Richard

@richardstubbs
http://www.richard-stubbs.com

cmb

That means your Linux hosts have a broken network config in some regard, they're ARPing things they should never be ARPing because they're not local. Wrong subnet mask, wrong routing table entry, or IP alias in a subnet they shouldn't have one are the most likely possibilities. That's an ugly band aid that's covering up the real problem on the Linux boxes, I'd troubleshoot them further and fix the source of the problem.

stalks

I have just come accross this issue on pfSense 2.0.1-RELEASE

I have the following setup

[LAN2@172.30.0.0/24] <–> [ServerA/LAN2Gateway@172.30.0.1]
(172.30.0.0/24 #IPSec# 192.168.7.0/24)
[pfSense/LAN1Gateway@192.168.7.254] <–> [LAN1@192.168.7.0/24]

ServerA/LAN2 can communicate to LAN1 fine
LAN1 can communicate to ServerA/LAN2 fine
pfSense cannot reach ServerA/LAN2

It is expected that pfSense cannot talk to LAN2, as locally generated traffic doesn't follow the same routing system. As explained on this FAQ.

If I then follow those FAQ guidelines and create a new gateway of 192.168.7.254, and a static route of 172.30.0.0/24 -> 192.168.7.254, I then get the same issue as experienced by others above.

Once the static route is created, the pfSense box begins to issue ICMP redirect replies for traffic going from LAN1 to LAN2. The redirect is pointing the machines on LAN1 directly to 172.30.0.1 which is a bogus route as its not on the same network. It is because of this dynamically generated route to 172.30.0.1 that I am then seeing ARP requests from machines on LAN1 to 172.30.0.0/24.

For example, I begin pinging ServerA from a machine on LAN1, and after a few seconds I then click "Apply Changes" in pfSense to apply the local route for 172.30.0.0/24 via 192.168.7.254.

I also show the route the box will use for 172.30.0.1 before and after.

# ip route get 172.30.0.1
172.30.0.1 via 192.168.7.254 dev br0  src 192.168.7.200 
    cache  ipid 0xf2ae
# ping 172.30.0.1
PING 172.30.0.1 (172.30.0.1) 56(84) bytes of data.
64 bytes from 172.30.0.1: icmp_req=1 ttl=63 time=25.8 ms
64 bytes from 172.30.0.1: icmp_req=2 ttl=63 time=25.8 ms
64 bytes from 172.30.0.1: icmp_req=3 ttl=63 time=28.6 ms
64 bytes from 172.30.0.1: icmp_req=4 ttl=63 time=26.3 ms
64 bytes from 172.30.0.1: icmp_req=5 ttl=63 time=26.1 ms
64 bytes from 172.30.0.1: icmp_req=6 ttl=63 time=26.2 ms
From 192.168.7.254: icmp_seq=7 Redirect Host(New nexthop: 172.30.0.1) ## Clicked APPLY CHANGES here.
64 bytes from 172.30.0.1: icmp_req=7 ttl=63 time=26.2 ms
From 192.168.7.200 icmp_seq=8 Destination Host Unreachable
From 192.168.7.200 icmp_seq=9 Destination Host Unreachable
From 192.168.7.200 icmp_seq=10 Destination Host Unreachable^C
# ip route get 172.30.0.1
172.30.0.1 dev br0  src 192.168.7.200 
    cache <redirected>  ipid 0xf2ae</redirected>

So you can see because pfSense added a dynamic route direct to the IP, this machine loses connectivity. I don't see the same issue on Windows boxes, only on linux boxes with 3.x kernel.

I have been able to fix this by running the following commands on those linux machines: (see better fix at bottom edit)
sysctl -w net.ipv4.conf.all.accept_redirects=0
sysctl -w net.ipv6.conf.all.accept_redirects=0

However the issue needs to be investigated as to why pfSense is sending the redirects.

Edit: Added tshark of what a ping looks like with accept_redirects disabled.

Capturing on eth0
  0.000000 192.168.7.203 -> 172.30.0.1   ICMP 98 Echo (ping) request  id=0x17cb, seq=38/9728, ttl=64
  0.000343 192.168.7.254 -> 192.168.7.203 ICMP 70 Redirect             (Redirect for host)
  0.026524   172.30.0.1 -> 192.168.7.203 ICMP 98 Echo (ping) reply    id=0x17cb, seq=38/9728, ttl=63
  1.001713 192.168.7.203 -> 172.30.0.1   ICMP 98 Echo (ping) request  id=0x17cb, seq=39/9984, ttl=64
  1.002096 192.168.7.254 -> 192.168.7.203 ICMP 70 Redirect             (Redirect for host)
  1.028305   172.30.0.1 -> 192.168.7.203 ICMP 98 Echo (ping) reply    id=0x17cb, seq=39/9984, ttl=63
  2.003490 192.168.7.203 -> 172.30.0.1   ICMP 98 Echo (ping) request  id=0x17cb, seq=40/10240, ttl=64
  2.003824 192.168.7.254 -> 192.168.7.203 ICMP 70 Redirect             (Redirect for host)
  2.029856   172.30.0.1 -> 192.168.7.203 ICMP 98 Echo (ping) reply    id=0x17cb, seq=40/10240, ttl=63
  3.005023 192.168.7.203 -> 172.30.0.1   ICMP 98 Echo (ping) request  id=0x17cb, seq=41/10496, ttl=64
  3.005366 192.168.7.254 -> 192.168.7.203 ICMP 70 Redirect             (Redirect for host)
  3.031629   172.30.0.1 -> 192.168.7.203 ICMP 98 Echo (ping) reply    id=0x17cb, seq=41/10496, ttl=63

Another edit:

A better fix is to disable pfSense from sending the redirects in the first place. This doesn't break anything on my network as everything is static, so decide yourself whether you need ICMP redirects:

Running this from SSH console as root has worked:

     sysctl -w net.inet.ip.redirect=0

BBcan177

I am having a similar issue with an Ubuntu Machine.

A Network 10.10.1.0/24

B Network 10.10.2.0/24

C Network 10.10.3.0/24

I have setup an Ipsec VPN tunnel from A - B, and A - C (all pfsense Boxes)

I have an Ubuntu Server on A network. An ubuntu machine on B network.

When I ping/ssh from the Ubuntu machine on B to A network, i am getting a Host Unreachable/Destination Host Unreachable
The Ubuntu machine can resolve the host and Ip as is confirmed with a DIG -x command. The Ubuntu machine on B can ping the local pfsense router and anything local or internet based. But it cant ping anything on the A network including the A router. All other devices have no issue. Just this one ubuntu machine.

I have no issue with connectivity between the A and C networks.

If I run this command on the Ubuntu machine in B network

sysctl -w net.ipv4.ip_forward=1

I can ping/ssh from A <-> B. The ubuntu machine has one NIC and two additional for a TAP monitoring system so they are set to

eth0 Link encap:Ethernet HWaddr xx:xx:xx:xx:xx:xx
inet addr:xx.xx.xx.xx Bcast:xx.xx.xx.255 Mask:255.255.255.0
inet6 addr: xxxx::xxx:xxxx:xxxx:xxxx/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:750659 errors:0 dropped:0 overruns:0 frame:0
TX packets:460220 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:122675477 (122.6 MB) TX bytes:409259079 (409.2 MB)
Interrupt:19 Memory:f0180000-f01a0000

eth1 Link encap:Ethernet HWaddr xx:xx:xx:xx:xx:xx
UP BROADCAST RUNNING NOARP PROMISC MULTICAST MTU:1500 Metric:1
RX packets:4857110 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1017130172 (1.0 GB) TX bytes:0 (0.0 B)
Interrupt:16 Memory:f0280000-f02a0000

eth2 Link encap:Ethernet HWaddr xx:xx:xx:xx:xx:xx
UP BROADCAST NOARP PROMISC MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
Interrupt:16 Memory:f0300000-f0320000

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:554233 errors:0 dropped:0 overruns:0 frame:0
TX packets:554233 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:783922279 (783.9 MB) TX bytes:783922279 (783.9 MB)

route -n

Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 xx.xx.xx.xx 0.0.0.0 UG 100 0 0 eth0
xx.xx.xx.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
169.254.0.0 0.0.0.0 255.255.0.0 U 1000 0 0 eth0

So when the "sysctl -w net.ipv4.ip_forward=1" ping and ssh works but the traceroute doesnt seem as expected.
I dont understand how the machine is forwarding when only one NIC has an address?

PING xx.xx.xx.xx (xx.xx.xx.xx) 56(84) bytes of data.

From xx.xx.xx.xx: icmp_seq=1 Redirect Host(New nexthop: xx.xx.xx.xx)
64 bytes from xx.xx.xx.xx: icmp_req=1 ttl=63 time=46.8 ms

traceroute xx.xx.xx.xx (Traceroute from SO Sensor to SO Server)

traceroute to xx.xx.xx.xx (xx.xx.xx.xx), 30 hops max, 60 byte packets
1 xx.xx.xx.xx (xx.xx.xx.xx) 0.545 ms 0.532 ms 0.519 ms
2 * * *
3 * * *
4 * * *
5 * * *
6 * * *
7 * * *
8 * * *
9 * * *
10 * * *
11 * * *
12 * * *
13 * * *
14 * * *
15 * * *
16 * * *
17 * * *
18 * * *
19 * * *
20 * * *
21 * * *
22 * * *
23 * * *
24 * * *
25 * * *
26 * * *
27 * * *
28 * * *
29 * * *
30 * * *

There are no Blocks in IPTables and UFW is set to allow the connectivity.
If anyone has any suggestions, I would appreciate it as I've tried several things to fix this issue without success.

S2S IPSEC tunnels with Linux 3.X Kernel devices unable to traverse the tunnel.

@richardstubbs http://www.richard-stubbs.com

@richardstubbs
http://www.richard-stubbs.com