Dropping connection to mssql on the internal network due to WAN issues
-
Whether you're using PfSense 2.7.2-RELEASE or 25.07.1-RELEASE, the behavior is the same. The hardware these two versions run on is different, meaning it works the same on different hardware.
We encountered the following problem.
We have an application that connects to an MSSQL database directly from the client using SQL Server Native Client.
When a user is on a different VLAN in a subnet than the server itself, and routing occurs through PfSense, the connection to MSSQL is lost due to connection issues on the WAN interfaces (we have two of them).
This means either prolonged loss of connection on one of the WAN interfaces, several seconds, or if automatic WAN switching is enabled, if one of them experiences significant loss, then in all these cases, the connection to MSSQL is lost.
If PfSense doesn't route traffic from the client to the server, meaning both the client and server are on the same subnet, then these problems don't occur.
We couldn't figure out what the disconnection problem was, and then at some point I noticed a clear pattern. I double-checked, and indeed, as soon as there was a loss on the WAN or the WAN switching process, the connection immediately dropped.
Can anything be done about this? This is a critical issue, and I don't think it should be happening; it's a sign of poor router performance.
But I also want to say that I haven't noticed any impact on other types of connections yet. Apparently, the connection to MSSQL is particularly sensitive to any loss...
Since, for example, pinging or copying files over SMB from the same server works fine at the same time. -
@alex82 said in Dropping connection to mssql on the internal network due to WAN issues:
it's a sign of poor router performance.
Routing occurs at L3 and pfSense is, in fact, a software routing platform. You've chosen to use a routing platform like a multilayer switch; results may vary depending on system configuration.
How are your multiple WAN connections configured? Gateway monitoring configured how?
Have you read (at least) the Multiple WAN Connections article from the documentation?
Any SQL Server troubleshooting attempted?
-
Are you policy routing traffic?
What exactly are you seeing when the connection lost?
Do the states get killed?
You might have State Killing on Gateway Failure set:
https://docs.netgate.com/pfsense/en/latest/config/advanced-misc.html#state-killing-on-gateway-failure -
This is how gateway is configured

Reset all states is off

This is the rule in LAN (everything enabled from LAN (to VLAN aswell)

connection is lost within our LAN. It's just client in LAN and server in VLAN and this happens when WAN has issues with connections
-
We route VLANs through PfSense because the switches only support L3 with the switch manufacturer's router, which we don't use.
WAN monitoring is enabled by default, but I don't think it's particularly relevant in this case, since the WAN issue isn't on the PfSense side, but on the ISP side. The fact is, when problems occur on the WAN interfaces, we experience connectivity issues within the local network when routing through PfSense.
I suspect the issues could also be with RDP connections between VLANs, but I'll double-check that.When the connection is lost, I see a message in my application stating that the SQL connection to the database has been lost and that the application needs to be restarted.
Connection reset on WAN failures or changes is disabled.
But I thought that even if this feature is enabled, only connections related to the WAN interface are reset, right? -
I'll add one more thing, as you can see from the screenshots above.
We've disabled automatic WAN failover for now and set one WAN as the default. But even then, when WAN connection issues arise (again, these issues aren't on PfSense's end), we get the effect I described.
Moreover, it seems this happens even when the problem isn't on the default WAN, but I'll double-check that. -
@alex82 said in Dropping connection to mssql on the internal network due to WAN issues:
We route VLANs through PfSense
VLANs don't route. They switch.
the switches only support L3 with the switch manufacturer's router,
You mean to say something about some non-standard multilayer switches I think—but what? Something about an as-of-yet unnamed vendor's poor network hardware design?
WAN monitoring is enabled by default, but I don't think it's particularly relevant in this case
Well you hadn't said anything about the actual use case for your multiple WAN connection. But it would now appear to be failover.
We've disabled automatic WAN failover for now
While troubleshooting without any failover configured, ensure
System / Routing / Gateways / GW_WAN / Gateway Actionis checked (i.e., action disabled) in order to properly rule out anything relate to gateway monitoring.What, exactly, is happening with the GW_WAN ISP connection? What type of connection? Why so unstable?
-
Mmm, what is logged when this happens? Is the firewall simply overloaded with processes triggered by the gateway state change perhaps.
But I would look for the states opened between the subnets and whether or not they are being killed.
Is other traffic interrupted?
-
@tinfoilmatt said in Dropping connection to mssql on the internal network due to WAN issues:
VLANs don't route. They switch.
Okay, I just said it that way. I meant it works through PfSense and not directly through the switches.
@tinfoilmatt said in Dropping connection to mssql on the internal network due to WAN issues:
You mean to say something about some non-standard multilayer switches I think—but what? Something about an as-of-yet unnamed vendor's poor network hardware design?
Maybe I'm missing something (most likely :) ) and you can help.
The UniFi switches support L3, but as far as I understand, you need their router for this to work. Or is it possible to configure it?@tinfoilmatt said in Dropping connection to mssql on the internal network due to WAN issues:
Well you hadn't said anything about the actual use case for your multiple WAN connection. But it would now appear to be failover.
That's what you meant. I just don't think it's crucial. But yes, if one WAN is unavailable, the other one becomes the primary WAN.
@tinfoilmatt said in Dropping connection to mssql on the internal network due to WAN issues:
While troubleshooting without any failover configured, ensure System / Routing / Gateways / GW_WAN / Gateway Action is checked (i.e., action disabled) in order to properly rule out anything relate to gateway monitoring.
What, exactly, is happening with the GW_WAN ISP connection? What type of connection? Why so unstable?
Yes, we enabled Disable Gateway Monitoring Action today. We're monitoring the impact.
We're hoping the local ISPs are experiencing temporary issues; we're experiencing occasional losses. They say they're fixing the problem.
This hasn't happened before, or only for a short time, so this is the first time we've encountered such behavior where WAN issues persist for a significant period of time.Could you please tell me what PfSense should do when it's set to Reset all states if the WAN IP address changes?
Only WAN connections are terminated, not all connections. Is that logical?Could this be related to a long-standing issue with PfSense, where nothing is connected to the WAN interfaces, making it quite difficult to access the router's web interface over the local network? I've been using PfSense for as long as I can remember, and this problem persists. If I disable the WAN or simply configure a new router and connect to it only via LAN without enabling the WAN, the web interface becomes virtually impossible to use; everything freezes constantly or doesn't work at all.
-
@stephenw10 said in Dropping connection to mssql on the internal network due to WAN issues:
Is other traffic interrupted?
I'm just about to check this. But it's not that easy; you have to seize the moment; WAN issues still occur periodically. I'll try to test this soon by simply unplugging the cable.
But I have a feeling the problem also occurs with RDP. -
@alex82 said in Dropping connection to mssql on the internal network due to WAN issues:
I'll try to test this soon by simply unplugging the cable
Not the same as intermittent connectivity (i.e., latency).
@alex82 said in Dropping connection to mssql on the internal network due to WAN issues:
The UniFi switches
Exactly.
Could you please tell me what PfSense should do when it's set to Reset all states if the WAN IP address changes?
The state table is flushed and any/all connections prior to said flushing must be reestablished anew.
How is
GW_WANconfigured underSystem / Routing / Gateways / GW_WAN / State Killing on Gateway Failure?Could this be related to a long-standing issue with PfSense, where nothing is connected to the WAN interfaces, making it quite difficult to access the router's web interface over the local network? I've been using PfSense for as long as I can remember, and this problem persists. If I disable the WAN or simply configure a new router and connect to it only via LAN without enabling the WAN, the web interface becomes virtually impossible to use; everything freezes constantly or doesn't work at all.
Can't help myself here—but these are nothing statements, much less relevant.
-
Try opening an SSH session between hosts on different subnets and see if it fails. That should also be easy to see if the states are removed.
-
@alex82 said in Dropping connection to mssql on the internal network due to WAN issues:
Could this be related to a long-standing issue with PfSense, where nothing is connected to the WAN interfaces, making it quite difficult to access the router's web interface over the local network?
No. That happens because some elements of the webgui attempt to connect to external resources to check for updates. That is unrelated to routing between internal subnets.
-
@stephenw10 said in Dropping connection to mssql on the internal network due to WAN issues:
internal subnets.
VLAN segments are not subnets. All VLANs in a router-on-a-stick topology (like the one at-issue here) are on the same subnet.
-
I guess they wouldn't have to be using separate subnets but they almost certainly are. There would be no point using a VLAN there if they weren't.
-
@tinfoilmatt said in Dropping connection to mssql on the internal network due to WAN issues:
How is GW_WAN configured under System / Routing / Gateways / GW_WAN / State Killing on Gateway Failure?
Use global behavior (default)
But now they have installed it
Gateway Action - Disable Gateway Monitoring ActionAs I understand it, this parameter should not have any effect now.
-
@stephenw10 Private IP address space has been arbitrarily 'sub-divided' along CIDR 'boundaries', which frames are presumably being tagged accordingly. But it's all the same subnet as far as the router is concerned.
-
Hmm, maybe I've misunderstood the setup here.
@alex82 said in Dropping connection to mssql on the internal network due to WAN issues:
When a user is on a different VLAN in a subnet than the server itself, and routing occurs through PfSense, the connection to MSSQL is lost due to connection issues
The client and server are in different VLANs and in different subnets (as I would expect). pfSense has interfaces in both those subnets and is routing traffic between them.
Thus the most likely cause of the problem here is that the states on each interface are being unexpectedly killed breaking the connection.
Second possibility is that during the gateway event all traffic stops and the SQL connection hits some low timeout value.
-
@stephenw10 said in Dropping connection to mssql on the internal network due to WAN issues:
Thus the most likely cause of the problem here is that the states on each interface are being unexpectedly killed breaking the connection.
So, yes :) But the question is, why does this happen with the described events on the WAN?
After a while, I think it will at least be clear whether the setting - Gateway Action - Disable Gateway Monitoring Action helped. -
@stephenw10 Everything that's been stated gives no indication that either pfSense system has any more than one non-WAN router interface. The topology that I assume here, is a Ubiquiti managed switch connected to a single LAN router interface (i.e., 'pfSense-on-a-stick' so to speak). One L3 subnet, multiple L2 broadcast domains. And I fret to think of what might be downstream of said Ubiquiti managed switch.
@alex82 said in Dropping connection to mssql on the internal network due to WAN issues:
But the question is, why does this happen with the described events on the WAN?
After a while, I think it will at least be clear whether the setting - Gateway Action - Disable Gateway Monitoring Action helped.My money is on 'yes.' That is, most likely, the reason. There are legitimate reasons to flush all states when a gateway goes down or appears to be degraded beyond a defined threshold.
But your infrastructure design still says nothing about pfSense's routing performance.