After Upgrade to 2.6.0 traffic sent over VPN Tunnel sporadically hangs
-
Greetings Everyone, There's a lot to unpack here, from an upgrade to 2.6 to routing etc so apologies if this isn't the place best suited for it.
Monday, I upgraded an old firewall setup from 2.4.5 to 2.6.0 We have a VLAN segment on a dedicated Interface that connects to a VPN appliance as next hope that maintains a constant tunnel to another remote location for use for SSH sessions, a web environment, dev environment etc etc etc.
Everything was working perfectly literally for months on end without an issue BEFORE the upgrade, but after the upgrade we are seeing SSH sessions timeout after 30 seconds to 2 minutes maximum depending on activity. Likewise, we're noticing that http GET commands and file transfers of a certain size also free, while smaller ones get through without an issue. This was never the case prior to the upgrade, and NOTHING VPN/tunnel side has changed. This type of behavior is specific to the VPN/tunnel segment itself as any SSH sessions, file transfer on ANY other network segments work without an issue.
I've tried modifying MSS/MTU values, looking at packet/traffic scrubbing, DF values, NOTHING has made a dent. a tcpdump shows that during a file transfer, ALL packets pack it through without an issue and then suddenly the last packet seems to be blocked at which point retransmissions are attempted and ultimately failed.
I know there have been some reports for 2.6.0's occasional bugs and instabilities, but I've not been able to find anything even remotely related to behaviors like this. I've been tearing my hair out for 4 days now, and am absolutely out of ideas.
Were/Are there any new settings that I might have missed introduced in 2.6 that could be causing these issues in this VPN Tunnel network segment?
Thanks to all in advance. This has been absolutely maddening to troubleshoot.
-
Failing after 30s like that sounds a lot like an asymmetric routing issue. If it's on a separate interface that shouldn't be an issue. Unless perhaps the VPN device has multiple interfaces?
Do you have a diagram?
After it fails what do you d to restore it? Can you just start a new connection immediately?
Steve
-
@stephenw10 Thanks for the response! Yes, that was my take as well, as we've seen that on older systems that temporarily straddled two VLANs connected to the firewall when doing some system migration work. but this happened only after upgrading from 2.4.5 to 2.6.0 (by way of 2.5.2, which I'm making a semi-broad assumption of the fact that IF its firewall related was a byproduct of the 12.0 version of FreeBSD. There are only 2 of us that maintain this infrastructure and swearing up and down that no changes were made prior or post pfsense upgrade.
It is on its own interface/segment that's specifically VLANNED. The weird part is the file transition. a file of 5000 bytes is transfered without issue when using a curl command, but up that to 50000 bytes and itll hang.
The connection fails, and you can immediately reconnect only to have it fail again for ssh. if it's an HTTP "GET" of some sort, it'll work just fine unless the page being loaded is over a certain size. Right now we have our dev team routing around it but allowing us to still directly test in it.
I'll create a quick diagram in a few and post it as well as other relevant info. I appreciate the help all the same. Thank you again! -
Just a quick followup that I figured out the issue to this problem.
The problem had to do with a rule cleanup that took place prior to the upgrade. While while the rules that were cleaned up didn't pertain to the VPN traffic directly, it did reveal that the rules specific to this segment's traffic were impacted by two specific issues. 1. The direction of the traffic flow since a floating rule that altered the gateway used existed. and 2. Quick match was not enabled which means the rules pertaining to the traffic were not being applied immediately and were PROBABLY being addressed by a rule downstream.
some additional tcpdumps that showed the return traffic hitting the firewall on the new VLAN segment for the VPN, but NOT hitting one of our SERVER VLANS where the request originated. This pinpointed the issue as being firewall related. I didn't want to just dismiss it as a bug without further troubleshooting, but was running out of ideas initially.
At any rate, all has been fixed and is working again. Thanks so much again for chiming in!