TCP Transfers failing after ~65k

A Former User

@palesius, stranger can see something blocked at the moment that does not work ?

palesius

@silence I don't think I understand your question. There shouldn't be anything blocked. There haven't been any changes to the firewall config for at least a month (until I started trying to troubleshoot it). The problem started two days ago.

stephenw10

I have seen similar behavior to this when there is a routing conflict. Specifically almost that exact thing when two WANs share the same gateway. Is that possible?

I assume you see this same traffic pattern from a client behind pfSense that uses WAN2?

I also a assume the default gateway for pfSense itself is WAN1?

A good test here would be to change the default gateway to WAN2 and re-run those iperf tests from pfSense. Does the fault now appear to be on WAN1?

You may have something by-passing route-to somehow. Though that wouldn't have just started spontaneously.

Steve

palesius

@stephenw10 Thanks.
Not sure what the initial issue was that was affecting the systems behind the firewall, but that seems to have resolved.

You were correct that switching the default gateway changed the interface that was exhibiting the issue. But right now the issue only seems to happen if I run curl/iperf from the firewall itself, over the non default WAN interface. So I guess I out-clevered myself by testing it directly on the firewall :(

FWIW, the two interfaces do not share a gateway, they are two different circuits with different IPs, subnets, and gateways. My guess is that the computers behind the pfSense that are told to use the WAN2 are using the correct gateway for that circuit. When the traffic originates on the pfSense itself, there are no rules telling it it should be using the gateway on WAN2, so it hits the end of the TCP window without ever getting an ACK and stops, presumably because the ACKs are coming back on the wrong route & interface.

stephenw10

@palesius said in TCP Transfers failing after ~65k:

When the traffic originates on the pfSense itself, there are no rules telling it it should be using the gateway on WAN2

In fact I think it's closer to the opposite of that. There are rules in place that tag any traffic using the source address of WAN2 with route-to via the WAN2 gateway. But when routing that traffic the system still looks at the routing table and if the default gateway/route is WAN1 it will try to send it that way. If you check the state table you will probably see states on WAN1 but with the source address of WAN2. Confusingly!
We made changes to the underlying code to address the reply-to bug in 2.5.0/1 and I believe this has introduced this behaviour. However if you test 2.4.5 you will see exactly the same thing except the connection does not fail. It appears that this worked previously because of an undetected bug and that has now been fixed revealing this issue.

Steve

palesius

@stephenw10 said in TCP Transfers failing after ~65k:

@palesius said in TCP Transfers failing after ~65k:

When the traffic originates on the pfSense itself, there are no rules telling it it should be using the gateway on WAN2

In fact I think it's closer to the opposite of that. There are rules in place that tag any traffic using the source address of WAN2 with route-to via the WAN2 gateway. But when routing that traffic the system still looks at the routing table and if the default gateway/route is WAN1 it will try to send it that way. If you check the state table you will probably see states on WAN1 but with the source address of WAN2. Confusingly!
We made changes to the underlying code to address the reply-to bug in 2.5.0/1 and I believe this has introduced this behaviour. However if you test 2.4.5 you will see exactly the same thing except the connection does not fail. It appears that this worked previously because of an undetected bug and that has now been fixed revealing this issue.

Steve

So on traffic originating from inside the firewall it uses the WAN2 gateway and has route-to set as WAN2, whereas traffic originating from the firewall itself will send over WAN1 with a route-to of WAN2?

Anyway, my immediate problem is fixed, and now I know that until this issue is fixed, that testing WAN2 from the firewall itself is a bad idea. (At least over TCP).

stephenw10

Traffic from inside the firewall gets tagged with route-to on way into the firewall, before the routing decision. That seems to be the key difference. It's unclear, to me at least, exactly what triggers it but we are aware of the issue.

Steve

Anyuta1166

Is there any solution for this issue? We have recently faced this issue and it is very critical for us as it breaks our production environment. We are using pfSense 2.7.2.

stephenw10

What are you actually seeing? The route-to/reply-to bug discussed here is fixed in 2.7.2.

Anyuta1166

I actually see the same issue as discussed here.
There are 2 WAN (WAN1 and WAN2). WAN1 is the main and WAN2 is reserve. WAN1 is the default gateway.
We have a server that can be accessed from the Internet via WAN2 IP (DNAT from WAN2:443 to INTERNAL_SERVER_IP:443).
The issue is that TCP transfers from the Internet to this server via WAN2 IP stucks after 65kb.

stephenw10

Ok so only for file transfers? You can access the server correctly otherwise?

Do you see that traffic passed by the correct rule on WAN2?