IPSec Issues 2.2.3 and 2.2.4
-
Hi folks,
I bought one of the pfSense branded firewalls, the SG-4860. I'm starting to regret it. I have used old computers with Intel NIC's up until this one and never had these kind of problems.
It arrived to 2.2.2, I setup an IPSec tunnel to my Palo Alto firewall, it just worked. I then bridged the remaining ports to make this behave like a home router (single WAN, multiple LAN ports). Did some throughput testing, easily saturated the 50 down 3 up link I had.
Then the update to 2.2.3. I was running AEI-NI, and like everyone else, no traffic over the tunnel. This was very frustrating because I had sent this home to another city with my CEO. I looked like a fool. He brings it back, I put it on the bench, can't find anything wrong. Start looking around the forums, didn't find anything until last night, looks like 2.2.3 broke IPSec when running AES-256 which is what I am running.
So I applied the 2.2.4 patch this evening. Immediately traffic came up over the tunnel, but I see strange throughput issues. Downloading a large ISO over the tunnel, 50/3 pipe, it starts around 47-50, bounces around for around 400MB of the ISO, then drops to a solid unwavering 7.6Mbit/sec until that transfer completes. If I start another one, it repeats the same behavoir.
MBUF looks good (I'm set to 1,000,000), nothing else appears to be grossly wrong. Tunnel is pretty standard, AES-256, SHA1, P2's are the same, DH5.
I plug in my old core 2 box, also running 2.2.4, no issues at all, exactly the same tunnel.
I'm considering rolling the firewall back to 2.2.2, but there have been a couple of important security updates.
So are these SG-4860's just lemons? Or having used old computers as firewalls for years, am I just spoiled over "plain reliable hardware".
TIA
-
Certainly not lemons, they're your best bet for ensuring hardware-specific things will "just work" on every version. If you had good experiences using old recycled desktops, you're lucky. Generally that's where people have issues because of flaky old hardware.
The AES-NI regression in 2.2.3 was an unfortunate oversight, where all our AES-CBC test setups didn't have AES-NI, and all our production and test systems with AES-NI were only using AES-GCM. Our test procedures have been updated to ensure coverage of all the possible combinations there so such things won't recur, and it's of course fixed in 2.2.4.
There aren't any general performance issues with IPsec or AES-NI either. I have 100 Mb of bandwidth from home to office over Internet, with a 4860 at home, and can max that out consistently no issue.
The symptoms you describe sound like the situation where MSS clamping will help. Try enabling that on the IPsec advanced tab at 1400 and see what the behavior is like.
-
Just enabled from work, will repeat my testing from home tonight and report back.
-
Seeing the same thing, works fine for a bit, then flatlines just over 7.6Mbps. Rebooted it twice just to make sure.
-
Are you able to switch it to AES-GCM to see if that makes any difference? Depends on whether the other side supports it.
If not, I don't expect it'll make any difference, but try disabling AES-NI and see if that changes the behavior any.
-
No, only a aes128-CCM16 (nothing GCM). Otherwise just AES variants, 3DES. Is GCM all that and a bag of chips? I'm not familiar with it.
No difference with AES-NI disabled, if anything, a bit slower but same behavior (that was the first thing I tested on the new 2.2.4).
-
Those things were to try to narrow down the problem. That pretty much eliminates all those possibilities.
GCM is significantly faster with AES-NI than CBC. It's always preferable with AES-NI where available for that reason. It isn't an answer to a problem like that, just another thing that might possibly help narrow down the cause. Though switching it to 3DES helps similarly in narrowing down the cause. With 3DES it behaved more or less the same? 3DES is much slower, though probably have enough headroom on that CPU that it'll max out your connection fine.
Under System>Advanced, Networking, do you still have TSO and LRO disabled? Both are disabled by default. Enabling them can cause similar sounding issues.
Does your other hardware definitely not have issues still?
-
OK, I'll try 3DES tonight and report back.
Yes, TSO/LRO still disabled (on all three tunnels). Besides increasing MBUF buffers, doing the bridge, the unit is pretty much stock.
And yes, I have two other tunnels on generic hardware. One on a dell desktop, Core i5, 4gb ram, small SSD, running AES-NI with an Intel Quad port PCIe nic, just using two nics, still running 2.2.2 that I don't want to upgrade because it's a 2 hour flight away. My other one is what I use at home, it's a circa 2009 Core 2 duo, 2GB ram, old 80gb rotational disk using the onboard nic for lan (I think it's a lower end intel), and a single port PCI Intel GT nic. This one running 2.2.4. It does not have AES-NI support.
Both running full speed of their local connection capabilities (50/50, and 50/3 respectively). I can move around large files without issue/slowdown/etc. Neither have the MSS clamping set.
I did testing on the 4860 before it left me with the 2.2.2 firmware, and it had no issues. Is it reasonably easy to roll back to 2.2.2? That would answer if there is a hardware problem or something with the current point version of pfSense.
-
TSO and LRO are global to the system, not a per-tunnel config.
I thought from your earlier description you had an old system of some sort that you were swapping between at the same location and same config as the 4860, is that not the case? The ones running at other locations wouldn't be relevant.
In that case it's safe to downgrade to 2.2.2 for testing that circumstance. Can use the manual update under System>Firmware with:
https://files.pfsense.org/mirror/updates/old/pfSense-Full-Update-2.2.2-RELEASE-amd64.tgzIt will complain that your config revision is newer, but that's OK in this specific case.
-
That's correct, the Core 2 system is at my house, so is the 4860. I've been swapping back and forth between those two systems just to validate that there is nothing upstream or network related.
The third system is in the other city.
I'll try 3DES tonight, failing that, I'll put the 2.2.2 on.
-
Sorry, didn't get to testing last night, will try tonight.
-
Tested 3DES with and without MSS clamping, even worse throughput than AES, about 4.7Mbit/sec.
Tried AES-256, get about 9.6Mbit/sec.
Rolling back to 2.2.2 right now.
-
So yeah, there is something seriously wrong with builds 2.2.3 and 2.2.4 with respect to IPsec.
Rolled back to 2.2.2, and my throughput goes back to maxing out the circuit (50 down/3up). I've attached a screenshot proving this.
From a "just work" on every version of pfSense with these SG boxes sold on the pfsense store, what options do I have here? Do I submit a ticket of some sort? It's easily to reproduce.
With regards,
-
No, only a aes128-CCM16 (nothing GCM). Otherwise just AES variants, 3DES. Is GCM all that and a bag of chips? I'm not familiar with it.
No difference with AES-NI disabled, if anything, a bit slower but same behavior (that was the first thing I tested on the new 2.2.4).
AES-CCM isn't a great mode for IPSec. In fact, the only support I can find in the FreeBSD kernel for it is in the wireless code, so I'm confused how you've configured to use it. (AES-CCM gets used a lot in 802.11.)
If you don't want to use AES-GCM, have you tried AES-CBC-128 with HMAC-SHA1, because that's the bog-standard "best practice" until you get concerned with the strength of SHA1 and a 128-bit key length.
In face, I can't find any support for using AES-CCM in the IPSec subsystem in FreeBSD. Here are the auth and encryption tokens that 'setkey' will recognize. These are copy-pasta straight for the source code.
/* authentication alogorithm */
hmac-md5
hmac-sha1
keyed-md5
keyed-sha1
hmac-sha2-256
hmac-sha2-384
hmac-sha2-512
hmac-ripemd160
aes-xcbc-mac
tcp-md5
null/* encryption alogorithm */
des-cbc
3des-cbc
null
simple
blowfish-cbc
cast128-cbc
des-deriv
des-32iv
rijndael-cbc
aes-ctr
camellia-cbcNot can I find any support in the GUI for AES-CCM.
BTW, the only modes registered with the AES-NI module are:
AES-CBC
AES-ICM
AES-GCM
AES-GHASH (128, 192, 256 bit)
AES-XTSThat said, AES-NI isn't going to help much for modes with a separate HMAC (basically all but AES-GCM) because the pass over the packet with the HMAC will dominate the time to encode/decode the packet before transmit/reception.
This is why AES-GCM is a 'win' with AES-NI.
I have ZERO doubt that 3DES is slower than AES.
please send the output of "ipsec statusall". I don't suggest posting it here in the forum. Since you purchased these from the pfSense store, you have support. Open a ticket. If it's a bug that we've somehow missed, then I'll ensure that you don't "use" that ticket.
-
SG-2220 (yes, they do exist, C2358 2 cores @ 1.7GHz) at home.
C2758 (8 cores @ 2.4GHz) as VPN gateway at work.
Both running pfSense software version 2.2.41Gbps link from home, 1Gbps link at work, what happens between those two is good, but not ideal.
Jims-MacBook-Pro:~ jim$ ping -c 3 nfs4
PING nfs4.pfmechanics.com (172.27.32.4): 56 data bytes
64 bytes from 172.27.32.4: icmp_seq=0 ttl=61 time=4.352 ms
64 bytes from 172.27.32.4: icmp_seq=1 ttl=61 time=4.434 ms
64 bytes from 172.27.32.4: icmp_seq=2 ttl=61 time=4.860 ms–- nfs4.pfmechanics.com ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 4.352/4.549/4.860/0.223 ms
Jims-MacBook-Pro:~ jim$ ssh nfs4
Last login: Sat Aug 1 15:48:30 2015 from 172.21.0.26
FreeBSD 10.1-RELEASE-p5 (GENERIC) #0: Tue Jan 27 08:55:07 UTC 2015[jim@nfs4 ~]$ rm testfile
[jim@nfs4 ~]$ dd if=/dev/random of=testfile bs=1k count=200k
204800+0 records in
204800+0 records out
209715200 bytes transferred in 6.281192 secs (33387802 bytes/sec)
[jim@nfs4 ~]$ ls -l testfile
-rw-r–r-- 1 jim netgate 209715200 Aug 1 15:49 testfile
[jim@nfs4 ~]$ exit
logout
Connection to nfs4 closed.
Jims-MacBook-Pro:~ jim$ scp nfs4:testfile /tmp/testfile
testfile 100% 200MB 22.2MB/s 00:09
Jims-MacBook-Pro:~ jim$ tcsh
[Jims-MacBook-Pro:~] jim% repeat 10 sftp nfs4:testfile /dev/null
Connected to nfs4.
Fetching /usr/home/jim/testfile to /dev/null
/usr/home/jim/testfile 100% 200MB 25.0MB/s 00:08
Connected to nfs4.
Fetching /usr/home/jim/testfile to /dev/null
/usr/home/jim/testfile 100% 200MB 25.0MB/s 00:08
Connected to nfs4.
Fetching /usr/home/jim/testfile to /dev/null
/usr/home/jim/testfile 100% 200MB 16.7MB/s 00:12
Connected to nfs4.
Fetching /usr/home/jim/testfile to /dev/null
/usr/home/jim/testfile 100% 200MB 16.7MB/s 00:12
Connected to nfs4.
Fetching /usr/home/jim/testfile to /dev/null
/usr/home/jim/testfile 100% 200MB 18.2MB/s 00:11
Connected to nfs4.
Fetching /usr/home/jim/testfile to /dev/null
/usr/home/jim/testfile 100% 200MB 25.0MB/s 00:08
Connected to nfs4.
Fetching /usr/home/jim/testfile to /dev/null
/usr/home/jim/testfile 100% 200MB 28.6MB/s 00:07
Connected to nfs4.
Fetching /usr/home/jim/testfile to /dev/null
/usr/home/jim/testfile 100% 200MB 25.0MB/s 00:08
Connected to nfs4.
Fetching /usr/home/jim/testfile to /dev/null
/usr/home/jim/testfile 100% 200MB 25.0MB/s 00:08
Connected to nfs4.
Fetching /usr/home/jim/testfile to /dev/null
/usr/home/jim/testfile 100% 200MB 25.0MB/s 00:08
[Jims-MacBook-Pro:~] jim%![Screen Shot 2015-08-01 at 4.06.37 PM.png](/public/imported_attachments/1/Screen Shot 2015-08-01 at 4.06.37 PM.png)
![Screen Shot 2015-08-01 at 4.06.37 PM.png_thumb](/public/imported_attachments/1/Screen Shot 2015-08-01 at 4.06.37 PM.png_thumb) -
And a little longer test
[jim@nfs4 ~]$ dd if=/dev/random of=testfile bs=1k count=2000k
2048000+0 records in
2048000+0 records out
2097152000 bytes transferred in 64.266291 secs (32632224 bytes/sec)
[jim@nfs4 ~]$ ls -l testfile
-rw-r–r-- 1 jim netgate 2097152000 Aug 1 16:10 testfile
[jim@nfs4 ~]$ exit![Screen Shot 2015-08-01 at 4.21.22 PM.png](/public/imported_attachments/1/Screen Shot 2015-08-01 at 4.21.22 PM.png)
![Screen Shot 2015-08-01 at 4.21.22 PM.png_thumb](/public/imported_attachments/1/Screen Shot 2015-08-01 at 4.21.22 PM.png_thumb) -
and, now that I've recovered the nuc from last night's 1hr+ power hit at work…
Note that running across a LAN is faster, but no VPN.
jim@nucatwork:~ % sudo scp jim@nfs4:testfile /usr/local/www/apache24/data/
Password for jim@nfs4:
testfile 100% 2000MB 87.0MB/s 00:23
jim@nucatwork:~ %![Screen Shot 2015-08-01 at 5.10.57 PM.png](/public/imported_attachments/1/Screen Shot 2015-08-01 at 5.10.57 PM.png)
![Screen Shot 2015-08-01 at 5.10.57 PM.png_thumb](/public/imported_attachments/1/Screen Shot 2015-08-01 at 5.10.57 PM.png_thumb) -
@jwt:
AES-CCM isn't a great mode for IPSec. In fact, the only support I can find in the FreeBSD kernel for it is in the wireless code, so I'm confused how you've configured to use it. (AES-CCM gets used a lot in 802.11.)
I didn't, likely my post wasn't clear. Those are options on the other end of the tunnel, a Palo Alto Networks 3000 series. The response was to let CMB know what other options I have available to try. Although I appreciate your long response!
I sent an email to the support from the store asking how to use that support ticket, but I have not yet heard back. That was last week Monday when I sent it. I'd love to report back that I got great support on this hardware.
-
I don't doubt you are getting that on different hardware. I'm getting a lot better on some of my old home built hardware from the scrap heap. But not an apples to apples comparison.
Once I rolled back to 2.2.2 I'm getting reasonable performance from the tunnel. With nothing else changing except moving to 2.2.3 or 2.2.4 the tunnel fails to pass traffic and passes it terribly slow respectively. Not exactly "just works".
-
I don't doubt you are getting that on different hardware. I'm getting a lot better on some of my old home built hardware from the scrap heap. But not an apples to apples comparison.
It's pretty close, actually. I'm quite familiar with the SG-4860. If anything, the 2220 is slower, and that was the point. It's really straight-forward to get > 200Mbps using AES-GCM with AES-NI.
If I'd wanted to quote lab performance, I've seen > 1.5Gbps using fairly modern Xeons. But the SG-2220 is slower than what you're using.
Once I rolled back to 2.2.2 I'm getting reasonable performance from the tunnel. With nothing else changing except moving to 2.2.3 or 2.2.4 the tunnel fails to pass traffic and passes it terribly slow respectively. Not exactly "just works".
Have you turned off AES-NI?