IPSec goes down with high throughput…



  • Until recently, everything was fine. Then I decided to upgrade my FiOS link from 15/5 to 50/15 Mbit/s down/up.
    As mentioned elsewhere, that FiOS link carries all my IPv4 internet traffic to a colocation service, where I have another box that acts as a gateway for my old class-C net. So all internet traffic goes over an IPSec tunnel from my place to the colocation where it gets out to the real internet.

    So until that speed upgrade, everything was fine and dandy, and quite reliable. Since the speed upgrade, as long as I just browse the net, things are mostly OK. The moment I start doing anything that even comes close to using the bandwidth available, the IPSec link collapses, or better: stops carrying data. On the dashboard the link continues to show as being up, and raccoon is still shown as running.

    I was searching high and low, looking for rules, snort auto blocking, etc. as reasons why my data stopped flowing, until almost by accident I noticed, that things continue working again, if I cycle racoon by clicking on the restart service button.

    This is now very easy to recreate, start a massive file transfer, and within seconds, a couple of minutes later, the link is dysfunctional. Restart racoon and as quickly as an IPSec connection can be renegotiated, things are working again.

    So that poses a few questions:
    a) why would pushing the throughput make the transport unstable? If the CPU is over the limit, that should just present a speed bottleneck, but shouldn't bring things to a grinding halt.
    b) why do both racoon and the link show up as "up" on the dashboard, when no more data is able to flow?
    c) anyone else has similar/related issues?
    d) any advice on how to get useful system feedback that would allow narrowing down what's really going on here?

    Thanks!



  • You offer very little technical information, for someone to try and diagnose the issue.

    Anyway, my first instinct would be to check for MTU issues …
    Go to pfsense > System > Advanced, on the Misc tab "enable MSS clamping"



  • @dhatz:

    You offer very little technical information, for someone to try and diagnose the issue.

    Anyway, my first instinct would be to check for MTU issues …
    Go to pfsense > System > Advanced, on the Misc tab "enable MSS clamping"

    That setting is already set.

    Anyway, I'm happy to supply whatever info would be helpful, just don't know what that would be. Anything specific?



  • I think this has less to do with pfsense and more to do with your internet.
    We know that FIOS is one side of the connection.  Who provides the other side?

    One thing I've noticed about internet company's in the USA (and other places), the closer you get to actually using what you pay for, the more disconnects, throttling and random drop-outs you will get.  Seems "getting your money's worth" often violates "fair use".


  • Rebel Alliance Developer Netgate

    ALIX?

    If so, it's possible that the box does not have the CPU power to push traffic that fast over IPsec and when it tries, the CPU maxes out, then it can't respond to DPD queries, and then the tunnel dies.



  • Some more technical info:

    on the remote end is a ZyWALL P1, which is maybe a bit overpowered by the new speeds. But it doesn't need to be cycled for things to get back up. As for the other end of the internet connection: it's a commercial colocation service, which gets OC3 links and who knows what from several providers, so it's rather well linked into the internet backbone, I have a 100mbit ethernet link into their infrastructure there. So I don't suspect that side's network being the problem (of course stranger things have happened, so it can't 100% excluded).

    On the other hand, my pfSense unit should be plenty powerful enough, with a dual-core plus hyperthreading Atom D510 chip and 4GB RAM. I haven't really seen it's CPU break a sweat so far, and if I were willing to pay more, the FiOS link would be able to sustain considerably higher speeds both up and down.

    Frankly, I don't even really need encryption, I just need some sort of way of tunneling the info there, and IPSec is the only reasonable thing my ZyWALL allows. It would support the NULL "encryption" method, but the pfSense box doesn't support that, otherwise I could cut out all the encryption overhead.

    Eventually, a second pfSense box will go to the remote end, but since any major problem would require me to take a plane to go there and fix things, I'd rather wait with putting one there until after 2.1 is released and OS upgrades are not "beta" and more rare.

    Is there a way to throttle the throughput of the IPSec "interface"? All I really care is the 15mbit/s upload speed, so e.g. throttling download speeds from 50mbit/s down to a symmetric 15mbit/s wouldn't bother me at all.

    Still, it's a bit disturbing that there isn't some sort of flow control, and the protocol can be busted that easily just by increasing bandwith. Seems to be somewhat of a design flaw in the IPSec protocol, if that's what's happening.

    –-

    Put a limiter rule on the WAN interface for packets from the remote peer to the LAN subnet, and set it to 15Mbit/s.
    Hope I did it right. I'll see if that helps. Next time I do more than just browse the web, it will show...



  • I've heard that some IPSec implementations have issues with Dead Peer Detection under heavy load. That's probably why everyone turns it off.



  • The probable answer for you is to use pfsense 2.1 since I've been told that it handles WAN IP changes and temporary internet drops between the pfsense box and the remote clients better than 2.03 or 2.02.

    You don't need heavy bandwidth usage to reproduce these issues you are seeing.
    I get the exact same thing at much lower speeds on 2.03 anytime there is a short disconnect then reconnect.

    I'd try 2.1.



  • @Klaws:

    I've heard that some IPSec implementations have issues with Dead Peer Detection under heavy load. That's probably why everyone turns it off.

    I turned it off now, but since it's day-time I don't get the high bandwidth saturating speeds right now, so it's hard to say if if that fixes it. I do get occasional stalls. Maybe these would have been fails before and are now recovering after a second or two…



  • @kejianshi:

    The probable answer for you is to use pfsense 2.1 since I've been told that it handles WAN IP changes and temporary internet drops between the pfsense box and the remote clients better than 2.03 or 2.02.

    You don't need heavy bandwidth usage to reproduce these issues you are seeing.
    I get the exact same thing at much lower speeds on 2.03 anytime there is a short disconnect then reconnect.

    I'd try 2.1.

    Actually, I'm on 2.1, which is why I'm posting here ;)

    But in my vain attempts to throttle the bandwidth with limiters, I noticed that none of my LAN or WAN interfaces show up in the Traffic Shaper page, and if I try to run any of the wizards, they all complain that I have not enough interfaces, even though I tell the system that I have only one LAN and one WAN interface.
    So something's wonky there, too, besides what mistakes I might otherwise make trying to set up the bandwidth limiting.



  • Well - That is unfortunate.  I was assured it was taken care of in 2.1.  haha.
    I don't think its caused by bandwith.
    Question.  What do you see in the status > system logs > IPsec?



  • @kejianshi:

    Well - That is unfortunate.  I was assured it was taken care of in 2.1.  haha.
    I don't think its caused by bandwith.
    Question.  What do you see in the status > system logs > IPsec?

    Well, turning off DPD seems to make the issue less acute, but stilll…
    ...when I have it turned on to more easily trigger the issue, there's nothing of relevance in the log: the initial negotiation, and then a bunch of these errors, which are also there when things work just fine:

    racoon: [12.34.56.78] ERROR: exchange Identity Protection not allowed in any applicable rmconf.
    

    (IP address changed)

    As far as the bandwidth not being the problem that may be the case, except an increase in bandwidth triggered the issue, didn't have it before. So it's likely some sort of interaction between a somewhat stressed ZyWALL P1 which is being pushed by the higher bandwidth demand and the pfSense software. Who/what/which/why is of course the issue, just because it takes some circumstance to trigger bad behavior doesn't mean that it's the true cause.



  • @rcfa:

    Well, turning off DPD seems to make the issue less acute, but stilll…

    Last night (with less traffic elsewhere to naturally slow down my link), things went down quickly regardless of DPD being turned on or off. So I can remove that from the list of potential culprits.

    Also, things are a bit more stable from my WiFi connected laptop, because it seems the WiFi network acts somewhat like a throttle, meaning the bandwidth is less likely to be pushed up where things go down. But of course that's no solution.

    Have yet to get bandwidth limiters working. :(



  • Is there some reason I'm missing why you can't use site-to-site openvpn for this and MUST use ipsec?

    Inside openvpn on the client side there is a handy little box:

    Limit outgoing bandwidth  (insert bandwidth limit)
    Maximum outgoing bandwidth for this tunnel. Leave empty for no limit. The input value has to be something between 100 bytes/sec and 100 Mbytes/sec (entered as bytes per second).

    (I eagerly await an irritated reply)

    P.S.  I'm still not convinced that your ISP isn't trashing your connection when you start pulling more upload-bandwidth than they feel like giving you.
    I've seen ISPs throttle and reset connections over and over, although they never admit to the practice.  I think its part of their business model.  If the connection does drop or is reset by the ISP or the modem flaking for a second or whatever, raccoon will not, in my experience, successfully renegotiate that client connection and begin passing data again.  It will reconnect.  Show connected on both sides, but will not pass any traffic until you reset raccoon (as you mentioned earlier).  It will also fill your IPsec logs with exactly the same errors every time this happens.  The changes recently made to IPsec in pfsense account for WAN IP changes.  The raccoon process will auto restart if WAN IP changes, but it doesn't account for a connection being reset or dropping out for a minute or 2.  I think that feature should be added though and should reset raccoon exactly same way it would if a WAN IP changed.  Currently IPsec is only any good for me over connections that never falter.



  • @kejianshi:

    Is there some reason I'm missing why you can't use site-to-site openvpn for this and MUST use ipsec?

    Inside openvpn on the client side there is a handy little box:

    Limit outgoing bandwidth   (insert bandwidth limit)
    Maximum outgoing bandwidth for this tunnel. Leave empty for no limit. The input value has to be something between 100 bytes/sec and 100 Mbytes/sec (entered as bytes per second).

    (I eagerly await an irritated reply)

    Sorry to disappoint you on the irritated part ;) While I'm irritated by the facts, I'm not with the people who try to help.
    The reason for not using OpenVPN is twofold threefold:
    a) the ZyWall at the other end can't do it
    b) the second pfSense box that is destined to replace that ZyWall unit is sitting here waiting for the release of 2.1, because I don't want to risk some update glitch and then being in a position to have to book a flight to Michigan to bring the unit back under control. And since things worked reliably until recently, there wasn't a big problem with waiting. If I can't find a solution (e.g. get the bandwidth throttling to work), then I may have to reconsider.
    c) the web configurator and OpenVPN would likely run on the same port and possibly interfere with each other (don't like non-standard ports if I can avoid it), and also, I rather avoid the encryption overhead, if I can avoid it. So once I have a second pfSense box there, I may try with simple IP tunnels like GRE or the like, provided my ISP won't filter them out.

    @kejianshi:

    P.S.   I'm still not convinced that your ISP isn't trashing your connection when you start pulling more upload-bandwidth than they feel like giving you.
    I've seen ISPs throttle and reset connections over and over, although they never admit to the practice.  I think its part of their business model.  If the connection does drop or is reset by the ISP or the modem flaking for a second or whatever, raccoon will not, in my experience, successfully renegotiate that client connection and begin passing data again.  It will reconnect.  Show connected on both sides, but will not pass any traffic until you reset raccoon (as you mentioned earlier).  It will also fill your IPsec logs with exactly the same errors every time this happens.  The changes recently made to IPsec in pfsense account for WAN IP changes.  The raccoon process will auto restart if WAN IP changes, but it doesn't account for a connection being reset or dropping out for a minute or 2.  I think that feature should be added though and should reset raccoon exactly same way it would if a WAN IP changed.  Currently IPsec is only any good for me over connections that never falter.

    Well, I wouldn't put it past Verizon to do something like that. But since I only use bandwidth in bursts e.g. downloading a software update or something like that, it would be pretty low if Verizon would on one hand charge me big bucks for the higher FiOS bandwidth, and then throttle me on smallish downloads here and there. I mean we're talking things like a few tens of megabytes worth of downloads, with major idle periods interrupted only by a trickle of incoming e-mails and some random script-kiddy attack attempts on my address block (which don't take up a lot of bandwidth). So it's not like I'm anywhere near pushing the official specs of the connection. So if Verizon were intentionally do this under these circumstances, it would be darn low on their side.



  • Yeah - I read the specs on your little IPsec appliance.  It looks pretty nice.  Too bad it doesn't support something other than IPsec. 
    As far as the encryption overhead goes, IPsec has that also. 
    As far as waiting on the 2.1 release, I wouldn't.  2.03 is stable and works very well with openvpn NOW.
    2.1 is an RC with no official release date.  Could be today or next year.  I don't know.
    Its also not like 2.03 is going to break and quit working just because 2.1 is released.
    You might find yourself with a reliable bullet proof connection on 2.03 and not want to tempt fate with an update.
    If you are serious about not caring about encryption, I think point-to-point L2TP would be fastest although openvpn has been so solid for me I couldn't imagine wanting to not use it.  As far as conflicts with Openvpn and Web Configuator, you can run one on port 80 and the other on port 443.  They won't interfer and it shouldn't hurt your security if you put the web configuator on http because if you are sane at all you will only access it via the vpn or ssh anyway.



  • Is there anyway you can try pfSense 1.2.3 for this testing - just to see if it exhibits same IPSEC issue?



  • @pinoyboy:

    Is there anyway you can try pfSense 1.2.3 for this testing - just to see if it exhibits same IPSEC issue?

    Pretty difficult. If I could import a 2.1 settings file into a 1.2.3 setup, I could flash a 1.2.3 image on a CF card and have the system boot from that instead of from the SATA SSD. But trying to recreate that setup, particularly since I never worked with the 1.x versions of pfSense, would be rather error prone, and thus may not say much.

    Is there a specific reason why you're suggesting that test?



  • @kejianshi:

    Yeah - I read the specs on your little IPsec appliance.  It looks pretty nice.  Too bad it doesn't support something other than IPsec. 
    As far as the encryption overhead goes, IPsec has that also.

    Sure, I'm aware of the IPSec encryption overhead with pfSense. The ZyWall would allow a NULL encryption, which would turn the whole thing into a simple tunnel, but pfSense doesn't allow that.

    Of course, once I have a pfSense box on either side, I can just use some other tunnel that can be used without encryption, but ironically, then the encryption overhead won't matter that much anymore (except for lowering latency a bit) because the CPUs on both sides will be sufficiently powerful



  • I'm sure things will change with IPSEC stability and resilience for the better in pfsense soon.  There are just too many smart guys I see here paying attention to it, even though most of them don't seem to like, want or need it.
    Of course, by then you will not want to use it.  You will be sold on OpenVPN :P


  • Banned

    @rcfa:

    The ZyWall would allow a NULL encryption



  • I got that question wrong also…  Its kettle right? ::)