Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    TCP Transfers failing after ~65k

    Scheduled Pinned Locked Moved General pfSense Questions
    16 Posts 4 Posters 1.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P
      palesius
      last edited by palesius

      I'm having a strange issue on my pfSense firewall (a Netgate SG-5100).

      Rather suddenly, with no configuration changes on the firewall, I am seeing transfers totally stall after transferring around 65k. I'm thinking maybe a TCP window issue or something similar. When I use curl to download a large file directly, the amount downloaded before it stalls doesn't seem to change for a given host and is always close to 65k (64882,64964,etc)

      The firewall in question has 2 WAN connections, and this issue is only impacting one of them (comcast cable). There are no traffic limiters.
      A laptop connected directly to the cable modem functions normally. I've tried changing the ethernet ports on both the firewall and cable modem, as well as using a different ethernet cable. I've also tried setting up a fresh interface on the firewall in case there was some hidden setting causing the issue.

      I'm unsure at this point what else to do to try to diagnose it. I'm hesitant to kick it back to the ISP at this point, since they will blame the firewall because a PC connected to the cable modem performs properly.

      ? P 2 Replies Last reply Reply Quote 0
      • ?
        A Former User @palesius
        last edited by

        @palesius ok i'm here to help you...

        but should you post more information, for example you said you have 2 WAN?

        do you use load balancing?

        how do you know that the other wan works well?

        At the time this happens, what does your pfsense show?

        what services are you running?

        P 1 Reply Last reply Reply Quote 0
        • P
          palesius @palesius
          last edited by

          Also wanted to add that if I run iperf over UDP everything seems fine.

          1 Reply Last reply Reply Quote 0
          • P
            palesius @A Former User
            last edited by

            @silence Yes, we usually use load balancing, but only for failover. Under normal circumstances our servers use Connection 1 and our desktops use Connection 2. If one connection fails, the systems using that will failover.

            I've tried running iperf (on the pfsense shell) from the interface address of each WAN connection.
            I don't see anything at all strange on the dashboard, no CPU load or high memory usage.

            Services:
            dpinger
            ipsec (don't actually use it, but the service is active)
            3x openvpn
            pcscd
            sshd
            syslogd
            unbound

            P 1 Reply Last reply Reply Quote 0
            • P
              palesius @palesius
              last edited by

              @palesius Just to illustrate here are the results of running iperf from wan1 and wan2 over TCP and UDP

              WAN2 over TCP
              [21.05.2-RELEASE][root@__hostname__]/root: iperf3 -c targetIP -B WAN2_IP
              Connecting to host targetIP, port 5201
              [ 5] local WAN2_IP port 56291 connected to targetIP port 5201
              [ ID] Interval Transfer Bitrate Retr Cwnd
              [ 5] 0.00-1.01 sec 128 KBytes 1.04 Mbits/sec 2 1.41 KBytes
              [ 5] 1.01-2.01 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes
              [ 5] 2.01-3.02 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes
              [ 5] 3.02-4.01 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes
              [ 5] 4.01-5.05 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
              [ 5] 5.05-6.00 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes
              [ 5] 6.00-7.01 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
              [ 5] 7.01-8.02 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
              [ 5] 8.02-9.03 sec 0.00 Bytes 0.00 bits/sec 0 1.41 KBytes
              [ 5] 9.03-10.01 sec 0.00 Bytes 0.00 bits/sec 1 1.41 KBytes


              [ ID] Interval Transfer Bitrate Retr
              [ 5] 0.00-10.01 sec 128 KBytes 105 Kbits/sec 7 sender
              [ 5] 0.00-10.06 sec 82.0 KBytes 66.8 Kbits/sec receiver

              iperf Done.

              WAN1 over TCP
              [21.05.2-RELEASE][root@__hostname__]/root: iperf3 -c targetIP -B WAN1_IP
              Connecting to host targetIP, port 5201
              [ 5] local WAN1_IP port 45481 connected to targetIP port 5201
              [ ID] Interval Transfer Bitrate Retr Cwnd
              [ 5] 0.00-1.00 sec 849 KBytes 6.95 Mbits/sec 0 47.0 KBytes
              [ 5] 1.00-2.00 sec 1.82 MBytes 15.3 Mbits/sec 0 85.5 KBytes
              [ 5] 2.00-3.00 sec 2.85 MBytes 23.9 Mbits/sec 0 124 KBytes
              [ 5] 3.00-4.00 sec 3.94 MBytes 33.0 Mbits/sec 0 164 KBytes
              [ 5] 4.00-5.00 sec 5.08 MBytes 42.6 Mbits/sec 0 202 KBytes
              [ 5] 5.00-6.00 sec 5.89 MBytes 49.2 Mbits/sec 38 112 KBytes
              [ 5] 6.00-7.00 sec 3.32 MBytes 27.9 Mbits/sec 5 149 KBytes
              [ 5] 7.00-8.00 sec 4.50 MBytes 37.7 Mbits/sec 0 188 KBytes
              [ 5] 8.00-9.00 sec 5.47 MBytes 45.9 Mbits/sec 0 226 KBytes
              [ 5] 9.00-10.00 sec 5.34 MBytes 44.8 Mbits/sec 21 136 KBytes


              [ ID] Interval Transfer Bitrate Retr
              [ 5] 0.00-10.00 sec 39.0 MBytes 32.7 Mbits/sec 64 sender
              [ 5] 0.00-10.06 sec 38.7 MBytes 32.3 Mbits/sec receiver

              iperf Done.

              WAN2 over UDP
              [21.05.2-RELEASE][root@__hostname__]/root: iperf3 -c targetIP -u -b 10M -B WAN2_IP
              Connecting to host targetIP, port 5201
              [ 5] local WAN2_IP port 20254 connected to targetIP port 5201
              [ ID] Interval Transfer Bitrate Total Datagrams
              [ 5] 0.00-1.00 sec 1.19 MBytes 10.0 Mbits/sec 856
              [ 5] 1.00-2.00 sec 1.19 MBytes 10.0 Mbits/sec 856
              [ 5] 2.00-3.00 sec 1.19 MBytes 10.0 Mbits/sec 856
              [ 5] 3.00-4.00 sec 1.19 MBytes 10.0 Mbits/sec 856
              [ 5] 4.00-5.00 sec 1.19 MBytes 10.0 Mbits/sec 857
              [ 5] 5.00-6.00 sec 1.19 MBytes 10.0 Mbits/sec 856
              [ 5] 6.00-7.00 sec 1.19 MBytes 10.0 Mbits/sec 856
              [ 5] 7.00-8.00 sec 1.19 MBytes 10.0 Mbits/sec 856
              [ 5] 8.00-9.00 sec 1.19 MBytes 10.0 Mbits/sec 856
              [ 5] 9.00-10.00 sec 1.19 MBytes 10.0 Mbits/sec 856


              [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
              [ 5] 0.00-10.00 sec 11.9 MBytes 10.0 Mbits/sec 0.000 ms 0/8561 (0%) sender
              [ 5] 0.00-10.05 sec 11.9 MBytes 9.95 Mbits/sec 0.994 ms 0/8561 (0%) receiver

              iperf Done.

              WAN1 over UDP
              [21.05.2-RELEASE][root@__hostname__]/root: iperf3 -c targetIP -u -b 10M -B WAN1_IP
              Connecting to host targetIP, port 5201
              [ 5] local WAN1_IP port 59558 connected to targetIP port 5201
              [ ID] Interval Transfer Bitrate Total Datagrams
              [ 5] 0.00-1.00 sec 1.19 MBytes 10.0 Mbits/sec 856
              [ 5] 1.00-2.00 sec 1.19 MBytes 10.0 Mbits/sec 856
              [ 5] 2.00-3.00 sec 1.19 MBytes 10.0 Mbits/sec 856
              [ 5] 3.00-4.00 sec 1.19 MBytes 10.0 Mbits/sec 856
              [ 5] 4.00-5.00 sec 1.19 MBytes 10.0 Mbits/sec 857
              [ 5] 5.00-6.00 sec 1.19 MBytes 10.0 Mbits/sec 856
              [ 5] 6.00-7.00 sec 1.19 MBytes 10.0 Mbits/sec 856
              [ 5] 7.00-8.00 sec 1.19 MBytes 10.0 Mbits/sec 856
              [ 5] 8.00-9.00 sec 1.19 MBytes 10.0 Mbits/sec 856
              [ 5] 9.00-10.00 sec 1.19 MBytes 10.0 Mbits/sec 856


              [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
              [ 5] 0.00-10.00 sec 11.9 MBytes 10.0 Mbits/sec 0.000 ms 0/8561 (0%) sender
              [ 5] 0.00-10.06 sec 11.9 MBytes 9.94 Mbits/sec 0.075 ms 0/8561 (0%) receiver

              iperf Done.

              ? 1 Reply Last reply Reply Quote 0
              • ?
                A Former User @palesius
                last edited by

                @palesius, stranger can see something blocked at the moment that does not work ?

                P 1 Reply Last reply Reply Quote 0
                • P
                  palesius @A Former User
                  last edited by

                  @silence I don't think I understand your question. There shouldn't be anything blocked. There haven't been any changes to the firewall config for at least a month (until I started trying to troubleshoot it). The problem started two days ago.

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    I have seen similar behavior to this when there is a routing conflict. Specifically almost that exact thing when two WANs share the same gateway. Is that possible?

                    I assume you see this same traffic pattern from a client behind pfSense that uses WAN2?

                    I also a assume the default gateway for pfSense itself is WAN1?

                    A good test here would be to change the default gateway to WAN2 and re-run those iperf tests from pfSense. Does the fault now appear to be on WAN1?

                    You may have something by-passing route-to somehow. Though that wouldn't have just started spontaneously.

                    Steve

                    P 1 Reply Last reply Reply Quote 0
                    • P
                      palesius @stephenw10
                      last edited by palesius

                      @stephenw10 Thanks.
                      Not sure what the initial issue was that was affecting the systems behind the firewall, but that seems to have resolved.

                      You were correct that switching the default gateway changed the interface that was exhibiting the issue. But right now the issue only seems to happen if I run curl/iperf from the firewall itself, over the non default WAN interface. So I guess I out-clevered myself by testing it directly on the firewall :(

                      FWIW, the two interfaces do not share a gateway, they are two different circuits with different IPs, subnets, and gateways. My guess is that the computers behind the pfSense that are told to use the WAN2 are using the correct gateway for that circuit. When the traffic originates on the pfSense itself, there are no rules telling it it should be using the gateway on WAN2, so it hits the end of the TCP window without ever getting an ACK and stops, presumably because the ACKs are coming back on the wrong route & interface.

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        @palesius said in TCP Transfers failing after ~65k:

                        When the traffic originates on the pfSense itself, there are no rules telling it it should be using the gateway on WAN2

                        In fact I think it's closer to the opposite of that. There are rules in place that tag any traffic using the source address of WAN2 with route-to via the WAN2 gateway. But when routing that traffic the system still looks at the routing table and if the default gateway/route is WAN1 it will try to send it that way. If you check the state table you will probably see states on WAN1 but with the source address of WAN2. Confusingly!
                        We made changes to the underlying code to address the reply-to bug in 2.5.0/1 and I believe this has introduced this behaviour. However if you test 2.4.5 you will see exactly the same thing except the connection does not fail. It appears that this worked previously because of an undetected bug and that has now been fixed revealing this issue.

                        Steve

                        P 1 Reply Last reply Reply Quote 0
                        • P
                          palesius @stephenw10
                          last edited by

                          @stephenw10 said in TCP Transfers failing after ~65k:

                          @palesius said in TCP Transfers failing after ~65k:

                          When the traffic originates on the pfSense itself, there are no rules telling it it should be using the gateway on WAN2

                          In fact I think it's closer to the opposite of that. There are rules in place that tag any traffic using the source address of WAN2 with route-to via the WAN2 gateway. But when routing that traffic the system still looks at the routing table and if the default gateway/route is WAN1 it will try to send it that way. If you check the state table you will probably see states on WAN1 but with the source address of WAN2. Confusingly!
                          We made changes to the underlying code to address the reply-to bug in 2.5.0/1 and I believe this has introduced this behaviour. However if you test 2.4.5 you will see exactly the same thing except the connection does not fail. It appears that this worked previously because of an undetected bug and that has now been fixed revealing this issue.

                          Steve

                          So on traffic originating from inside the firewall it uses the WAN2 gateway and has route-to set as WAN2, whereas traffic originating from the firewall itself will send over WAN1 with a route-to of WAN2?

                          Anyway, my immediate problem is fixed, and now I know that until this issue is fixed, that testing WAN2 from the firewall itself is a bad idea. (At least over TCP).

                          stephenw10S 1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator @palesius
                            last edited by

                            Traffic from inside the firewall gets tagged with route-to on way into the firewall, before the routing decision. That seems to be the key difference. It's unclear, to me at least, exactly what triggers it but we are aware of the issue.

                            Steve

                            1 Reply Last reply Reply Quote 0
                            • A
                              Anyuta1166
                              last edited by

                              Is there any solution for this issue? We have recently faced this issue and it is very critical for us as it breaks our production environment. We are using pfSense 2.7.2.

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S
                                stephenw10 Netgate Administrator
                                last edited by

                                What are you actually seeing? The route-to/reply-to bug discussed here is fixed in 2.7.2.

                                1 Reply Last reply Reply Quote 0
                                • A
                                  Anyuta1166
                                  last edited by

                                  I actually see the same issue as discussed here.
                                  There are 2 WAN (WAN1 and WAN2). WAN1 is the main and WAN2 is reserve. WAN1 is the default gateway.
                                  We have a server that can be accessed from the Internet via WAN2 IP (DNAT from WAN2:443 to INTERNAL_SERVER_IP:443).
                                  The issue is that TCP transfers from the Internet to this server via WAN2 IP stucks after 65kb.

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    Ok so only for file transfers? You can access the server correctly otherwise?

                                    Do you see that traffic passed by the correct rule on WAN2?

                                    1 Reply Last reply Reply Quote 0
                                    • First post
                                      Last post
                                    Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.