VPN Client Cannot Connect Through pfSense

EddieA

My wife has her company's VPN client on her laptop. Prior to switching to pfSense (1.2.3), I was using a Linux/iptables based firewall. With that, there were no problems, but now, with pfSense, she cannot connect. There were no special rules added for this, in iptables, it just worked.

There are no firewall messages logged, for either outgoing, nor incoming traffic.

I've attached all my NAT and Firewall rules.

I did find this article on Cisco VPN passthrough, but her client, which does use some Cisco components, doesn't have a Transport tab, nor any options that control the tunneling.

The other part of that advice intrigues me, because enabling AON, generates a single Outbound NAT rule, to replace the internally generated one(s), from IPSec passthrough. But, in reading this, about Static ports, it's saying that the Automatic option has some internal rules, which ensure that certain ports, like UDP 500, for some VPN traffic, are NOT modified. So, surely using AON, without any modification to the provided rule, would, in fact, actually break the VPN passthrough, because it does not use static ports.

What can I collect to try and see why this is not working. I cannot install any traffic sniffer on her laptop, so any tracing would be limited to the pfSense machine.

Cheers.

BTW If this forum is not the correct one, please feel free to move this post.

PortForward.png_thumb

OutBound.png_thumb

LANFirewall.png_thumb

WanFirewall.png_thumb

danswartz

Have you tried switching to manual NAT and enabling static port?

EddieA

Yes, and it still failed. But, I think I may need to try again tonight, because since posting my question, I found an article that mentioned that any existing states may override new changes. So, I need to clear them out, after each change I make.

Are you thinking that there are other ports, that might have to remain "static". If so, is there any way I can validate that.

Cheers.

danswartz

I had a VOIP-related issue that was fixed by using static ports. AFAIK, there is no downside to using static ports except for rare cases. Yes, I have been burned the same way with forgetting to flush the state table. Let us know!

EddieA

Well, that didn't help. :(

It still gets part of the way through the negotiation, and then, well, it fails.

What I did do though, was to run a packet capture, on the LAN interface, grabbing everything from the laptop host. Maybe, over the weekend, if I get time, I'll try and resurrect my Linux/iptables firewall, and grab a matching trace, to see if I can spot any anomalies. But, I'll admit that I'm not that good at reading packet traces. ;D

But, unless I can get this resolved, it isn't looking good for pfSense, as she needs to work from home a couple of days a week, and I can't pull everything off the network, to let her connect directly.

Changed NAT and Firewall rules posted.

Because I can't be 100% sure about the order I changed the rules around, the only other test I want to try, first thing tomorrow, is my current rule set, but letting the ICMP packets, either in, or at least logged, as I've seen those cause issues with a different VPN in the past.

Cheers.

OutBound2.png_thumb

WanFirewall2.png_thumb

EddieA

OK, I set up a second copy of pfSense, to run some tests. The only changes, from the default setup, were as follows:

On the WAN interface, I unchecked both "Block Private Networks", and "Block Bogon Networks".

On the Outbound NAT, I selected "Manual Outbound NAT rule generation (Advanced Outbound NAT (AON))", and then in the generated rule, I changed "Static Port" to YES.

No other rules were added or changed.

My wife's VPN still would not connect. Is there ANYTHING else that is changed, by pfSense, as the packet traverses the firewall, apart from what NAT requires, that can be controlled, that might be causing this.

I did a full packet capture, on the LAN interface, of the connection attempt, with pfSense as the firewall.

I then switched back, to my "old" Linux firewall, using iptables, and repeated the same packet capture, when she was able to connect, without any issues.

If anyone, with a little more knowledge of ISAKMP negotiation would like to take a quick look, at the captures, I've posted them both on my ftp server: ftp.BogoLinux.net/pub/pfSense.cap and ftp.BogoLinux.net/pub/iptables.cap

I'd love to get to the bottom of this, as pfSense gives me way more control over my firewall that iptables ever did.

Or, should I wait a couple of days, and if I don't get an answer here, take it to the mailing list.

Cheers.

danswartz

Yuck, that is a pain :( Looked at both traces, but am not an IPSEC expert, so didn't mean a lot to me. Does anything show up as blocked in the filter log?

danswartz

A further thought: I assume the trace was captured on the LAN? Can you try on the WAN and see what is getting through outbound?

EddieA

@danswartz:

Yuck, that is a pain :( Looked at both traces, but am not an IPSEC expert, so didn't mean a lot to me. Does anything show up as blocked in the filter log?

Nope, the firewall log was completely blank. That was one of the first things I checked, when I started this exercise.

@danswartz:

A further thought: I assume the trace was captured on the LAN? Can you try on the WAN and see what is getting through outbound?

I can, but in order to make the log readable, I'd have to shut down/disconnect the other machines in the house. I guess that might be the next step.

Cheers.

danswartz

Not necessarily. From what I can see, you are never getting the tunnel up, due to a failure in the ISAKMP stuff, which is all related to UDP port 500, so if you captured only based on that…

EddieA

@danswartz:

Not necessarily. From what I can see, you are never getting the tunnel up, due to a failure in the ISAKMP stuff, which is all related to UDP port 500, so if you captured only based on that…

True. I might just limit it to UDP though, just in case it's something really funky, and the Static port option isn't working correctly.

The one thing I did notice, going back through the traces, is that the "failure" occurs after the VPN has sent a fragmented packet out, and it doesn't get a reply. Packets 43, 44 for pfSense, and 79, 80, for iptables. Maybe the "scrub" option is breaking things, although the fragment bits appear to be correctly set. It's something else to try tonight, when she gets home again.

Cheers.

danswartz

Ah, I think this is a false alarm. If you are using wireshark (as I was), the fragments are NOT considered UDP, but vanilla IP, so they will not show up. If you remove the UDP filter, you will (I think) see those frames. This is another reason to try sniffing the WAN, I guess.

EddieA

@danswartz:

Ah, I think this is a false alarm. If you are using wireshark (as I was), the fragments are NOT considered UDP, but vanilla IP, so they will not show up. If you remove the UDP filter, you will (I think) see those frames. This is another reason to try sniffing the WAN, I guess.

I wasn't filtering, and I did see all the packets.

OK, next experiment. I ran, matching, traces on both the LAN and WAN interfaces, again for both pfSense and iptables. The relevant capture files can be found, on my FTP server, ftp.BogoLinux.net/pub, for those interested.

Concentrating on the packet(s) sent, that fail to illicit a response, here's what I see. Note that all interpretation of the packets was done using WireShark.

On the LAN side, a UDP request of 2220 bytes was sent, which was spread over two packets. The first, was identified by WireShark as an IP packet, and contained 1280 bytes of data. The "More fragments" bit is set. The second packet, was identified as UDP/ISAKMP, containing 940 bytes, with no fragmentation bits set. WireShark also shows the reassembled data. Obviously, this is identical for both pfSense and iptables.

On the WAN side, is where things get different, obviously, as one works, and the other doesn't. ;D

For iptables, what I see, are 5 packets. The 1st 4 are all identified as IP packets, each containing 552 bytes of data, and all have the "More fragments" bit set. The 5th packet, is identified as UDP/ISAKMP, with 12 bytes of data, and no fragmentation bits set. Again, WireShark also shows the reassembled data.

Now, for pfSense. Here there are just 2 packets. The 1st one is identified as UDP/ISAKMP, with 1480 bytes of data, and the "More fragments" bit set. The 2nd packet is identified as IP, with 740 bytes of data, and no fragmentation bits set. WireShark does not show reassembled data.

So, why has WireShark interpreted the 2 packets on the WAN side, for pfSense, the opposite way around, and because of this, it appears not to realise that they are fragments, and should be reassembled. I checked through all the headers, but was unable to see anything different, that might cause this behaviour. Now, if WireShark has mis-interpreted the packets, is it also possible, that the receiving VPN server has also made the same "mistake".

As a final test, I set "Disable firewall scrub", as it looks like the "scrub" also tries to re-assemble fragments, which reloaded the rules, cleared the State Table, for the IP, on the LAN, trying to connect, and tried again. This still didn't work. Would that be sufficient to reset the "scrub" rules, or should I really have re-booted, to make sure everything was "clean". Unfortunately, I didn't have a trace running for that last attempt.

Since then, I have rebooted, so guess the only test left, is to repeat the "Disabled scrub" one, this time with a trace running. Unless anyone has any other ideas to try.

Cheers.

EddieA

OK, because I suspected the way "scrub" wasn't completely removed in the tests I did previously, I re-booted pfSense an re-ran the traces. This still failed.

BUT, after disabling "scrub", the fragmented packets are no longer NATted. ??? Yeah, that's right. Any packets passing through pfSense, that are not fragmented, are NATted. The fragmented packets are NOT. Take a look at the screenshot, from Wireshark. Packets 20 ->23 and 27 -> 31. WTF

Should I pursue this, on this thread, or start a new one, to deal with this no-NAT.

Cheers.

NoNAT.png_thumb

kc8apf

I've been pulling my hair out over the exact same issue. This worked perfectly fine under 1.2.2. I'm not sure what changed in 1.2.3 that caused it to break.

For my setup, I see the entire key exchange happen successfully on port 500 and then my laptop sends a fragmented udp packet on port 4500. The remote side never seems to acknowledge that packet. If I connect directly to the WAN, everything seems to work fine.

I also saw the lack of NAT if scrub is disabled. I believe this is just part of how pf works. In order for NAT to be applied to the packet, it needs to reassemble the fragmented packet first. Scrub has multiple options, one is for reassembling fragmented packets. pfsense seems to turn on 'fragment reassemble' and 'random-id' on by default.

kc8apf

FWIW, I happen to have a setup and enough gear such that I could setup a parallel pfsense installation for testing. Modifying the config for slight differences in hardware (fxp instead of em, more VLANs since only 2 NICs instead of 5), I found that 1.2.3 does work correctly. So it seems to be related to my hardware or possibly the em driver. It looks like there are some reports of problems with checksum offloading in the em driver in FreeBSD 8.0, but it isn't clear if they apply to earlier versions as well.

After a very hectic night of upgrading my live system to pfSense 2.0-beta1 and then reinstalling 1.2.3-RC1 from scratch to rule out driver problems, it seems that I might have a hardware issue. with 2.0, I was able to connect to my work VPN, but I saw lots of other odd issues such as corrupt packets and the inability for one of my hosts to get a single packet through pf even though the rules allowed it.

I remember the system being 1.2.2 before I did the upgrade to 1.2.3, but 1.2.2 doesn't recognize the em devices in my system. I remember having that experience before, so it's very possible that I was using 1.2.3-RC1 originally.

With 1.2.3-RC1, I'm still seeing some corrupt packets that are blocked by pf due to bad TCP headers. It appears that the packet is being misparsed and so the header data ends up being completely wrong which leads to a few corrupt packet entries for a single packet. I did notice that my system was a bit hot to the touch (it's a fanless design that I've had some concerns about heat dissipation about), so I enabled powerd. That seemed to lower the temperature (at least by feel), but I still get occasional corrupt packets. I had previously experimented with powerd, but I don't believe it would have been active after a reboot in the original configuration. I have been unable to VPN successfully with this setup which is very odd since my original configuration was able to.

I've ordered another of the same hardware config so I can rule out flakey hardware (and it'd be handy to have as a spare anyway). It should arrive in a few days. I'll see what happens in that case. I'll also be trying to figure out a better heat dissipation system given the machine's installed location. More to follow.

danswartz

Interesting. Let us know what shakes out of this.

kc8apf

I'm still waiting for the spare machine to arrive. In the meantime, I happened to have a fan handy so I tried cooling the live machine to see if that had any effect. The machine is now cool to the touch. I still get the occasional corrupt packets with bad TCP headers and cannot get VPN to connect.

A few other interesting observations:

The packet corruption seems to only occur on subnets that are bridged
- specifically, em0 is bridged with vlan 2 on em3 and em1 is bridged with vlan 2 on em2
turning off txcsum, rxcsum, and vlanhwtag on all interfaces has no effect
netstat -s and the driver stats don't show any problems other than the bad hdrs

The spare machine should arrive tomorrow. I need to find a hub or another machine with 2 nics to do tcpdump from a machine other than the pfsense machine since its possible that the problem is happening in the nic.

kc8apf

Looks like the corrupt packets aren't really corrupt after all. Just a mistake on my behalf. The default snaplength for tcpdump is 64 bytes which isn't necessarily enough for the packets that show up on pflog0. Adding -s 256 seems to have cleaned those up.

So, I've done all I can with the live system without perturbing the family.

EddieA

I really haven't had any time to progress on this, unfortunately. However, I'm hoping that this weekend, I can try again.

But, I think I now know what the issue is, why I can make the connection using Linux iptables, and cannot with pfSense, which I want to confirm with some traces, and "experimentation".

I, understand that without scrub reassembling the fragmented packets, it can't do NAT correctly, and looking at the traces, for iptables, I'm assuming that it has to do reassembly, in order to recalculate the checksums after changing the source IP.

However, it's what happens after this reassembly where iptables and pfSense differ. I know I say pfSense, but I also know it's the underlying FreeBSD that's really in control here, and there's not a lot pfSense can do to influence this.

In my case, the VPN Client sends 2220 bytes of data, split as 1280 and 940. Now, that gives, on the wire, 1300 and 960. Hmmmmmm, 1300 is the "standard" MTU for Cisco VPNs, isn't it.

Anyway, back to the saga. After traversing iptables, the outbound packets are EXACTLY the same size. However, with pfSense, the packets are split, based on the WAN MTU of 1500, or that's what I'm guessing, because what I have is 1500 and 760, on the wire, which gives 1480 and 740 as the data.

So, it looks like iptables "remembers" how the packets were fragmented, and ensures that the outgoing packets are exactly the same size. Obviously, if the WAN MTU is lower, then the packets are fragmented based on that instead, which is what I saw on my initial trace of iptables.

Is there any way I can make pfSense/FreeBSD replicate this behaviour. I'm going to "force" it, by dropping the MTU, on the WAN, to 1300, to see if the VPN will connect. But then this change will affect all packets, not just these particular ones.

I wasn't able to validate this "theory", by using ping, as the target IP does accept a 1472 byte payload, with the "don't fragment" option set. But I don't know where, at the destination, the packets are reassembled, and what the MTU is at the point this happens.

I'll post back here, once I've run those tests.

Cheers.