Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    2.4.5 High latency and packet loss, not in a vm

    Scheduled Pinned Locked Moved
    Problems Installing or Upgrading pfSense Software
    22
    81
    14.4k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • ?
      A Former User
      last edited by A Former User

      Bare metal server. Supermicro 5018D-FN4T. No pfblocker, no limiters or queues. As generic as it can be other than my VLAN's. This happens on every boot. Can not log into the admin interface for 1-2 minutes after the login screen is presented. Latency and packet loss persist for 1-2 minutes after the loging in.

      The same issue but to a lessor degree happens (Latency and packet loss) happens every time the filters are reloaded/alias added or edited and persists for 2-3 minutes before settling down.

      After filter reload:
      Screen Shot 2020-03-29 at 09.33.29.png

      and:

      Screen Shot 2020-03-29 at 12.00.20.png

      After boot:
      Boot Latency.jpg

      1 Reply Last reply Reply Quote 1
      • ?
        A Former User
        last edited by

        Terrible subject line :P
        Sounds exactly like what we're all discussing here.

        1 Reply Last reply Reply Quote 0
        • ?
          A Former User
          last edited by

          Thanks. I've seen that thread, didn't read every post, it's mostly installs in a vm and with pfblocker. pfblocker exacerbates the underlying problem as best I can tell but isn't the issue.

          ? 1 Reply Last reply Reply Quote 0
          • ?
            A Former User @A Former User
            last edited by

            @jwj said in 2.4.5 High latency and packet loss, not in a vm:

            Thanks. I've seen that thread, didn't read every post, it's mostly installs in a vm and with pfblocker. pfblocker exacerbates the underlying problem as best I can tell but isn't the issue.

            It has sadly been taken over with people that think pfBlockerNG is something to do with it. I have exactly the same problem as you've posted, where changes to the platform result in latency and packet loss. It certainly seems to affect vm platforms worse, but you have the same symptoms.
            Anyway, the more threads the merrier I guess so that people realise pfBlockerNG isn't the cause (though the rules it applies does seem to help surface the underlying problem)

            1 Reply Last reply Reply Quote 0
            • cmcdonaldC
              cmcdonald Netgate Developer
              last edited by

              I'm seeing the same thing. bare metal and in VMs.

              Need help fast? https://www.netgate.com/support

              1 Reply Last reply Reply Quote 0
              • getcomG
                getcom
                last edited by

                Hello all,

                I experienced similar issues also on bare metal. My conclusion is that it is traffic related. pfBlockerNG is also producing traffic with the lists, DNSBL & Maxmind updates.
                There was a netgate patch of pfctl in FreeBSD 11.3 which may has indifferent side effects.
                Here are some more details beginning from here: https://forum.netgate.com/post/901257
                I catched all reported problems beginning from broken mirror, missing PHP files, high latency on both gateways, high system load, unresponsible console.
                I will restore to 2.4.4-P3 tomorrow.

                1 Reply Last reply Reply Quote 0
                • M
                  mikekoke
                  last edited by

                  Same problem in a physical box.
                  When I edit a rule and apply the changes, the latency rises to 300 ms.

                  1 Reply Last reply Reply Quote 0
                  • A
                    asan
                    last edited by

                    A
                    asan 17 minutes ago

                    I'm also affected.
                    HW: SG-4860

                    If the process pfctl has a 100% peak, ping latency is also very high.

                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=1125ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=1613ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=1190ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=5ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
                    Reply from 9.9.9.9: bytes=32 time=2ms TTL=55

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Try running a packet capture on the WAN when you see this. Filter by pings.
                      Check to see where the latency is happening. Ping requests delayed sending, delayed responses or somehow delayed within pf before it gets back to the ping process.

                      Steve

                      1 Reply Last reply Reply Quote 0
                      • ?
                        A Former User
                        last edited by

                        Delayed by pf. Pings between vlans see the latency when tables are reloaded.

                        From one vlan to another:

                        Screen Shot 2020-04-04 at 16.19.02.png

                        1 Reply Last reply Reply Quote 0
                        • DerelictD
                          Derelict LAYER 8 Netgate
                          last edited by

                          That is not a packet capture.

                          Chattanooga, Tennessee, USA
                          A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                          DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                          Do Not Chat For Help! NO_WAN_EGRESS(TM)

                          1 Reply Last reply Reply Quote 0
                          • ?
                            A Former User
                            last edited by A Former User

                            I am aware of that. Standby for a packet capture.

                            pcap.jpg

                            Screen Shot 2020-04-04 at 16.54.05.png

                            1 Reply Last reply Reply Quote 0
                            • DerelictD
                              Derelict LAYER 8 Netgate
                              last edited by

                              If you are not able to test in a way that allows you to post actual pcaps I don't know how much good it is going to do anyone.

                              It is past the point of trying to convince people this is a problem (in apparently edge cases). Now it's about trying to compile information so it can be identified and corrected.

                              Chattanooga, Tennessee, USA
                              A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                              DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                              Do Not Chat For Help! NO_WAN_EGRESS(TM)

                              1 Reply Last reply Reply Quote 0
                              • ?
                                A Former User
                                last edited by

                                That is a pcap, in wireshark with my public ip blanked out. I would be happy to send you the file if you would like but I'll decline to post it publicly, some knuckle head will just decide to go fishing around at my public ip.

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  I find adding the 'time difference' and 'response time' columns useful here.

                                  That will show if the request is delayed. And what the actual response time on the wire is. Like:

                                  Selection_817.png

                                  ? 1 Reply Last reply Reply Quote 0
                                  • DerelictD
                                    Derelict LAYER 8 Netgate
                                    last edited by

                                    I just don't think this data is very helpful at diagnosing exactly what is happening.

                                    Chattanooga, Tennessee, USA
                                    A comprehensive network diagram is worth 10,000 words and 15 conference calls.
                                    DO NOT set a source address/port in a port forward or firewall rule unless you KNOW you need it!
                                    Do Not Chat For Help! NO_WAN_EGRESS(TM)

                                    1 Reply Last reply Reply Quote 0
                                    • ?
                                      A Former User @stephenw10
                                      last edited by A Former User

                                      @stephenw10 said in 2.4.5 High latency and packet loss, not in a vm:

                                      I see delta time but not response time as column choices. Maybe it would be more expedient for me to send the pcap. I have used wireshark exactly once, this time. :)

                                      OK, I see now. Custom column and then icmp.resptime. Does that make any sense if it's not sorted by the icmp seq number?

                                      1 Reply Last reply Reply Quote 0
                                      • ?
                                        A Former User
                                        last edited by

                                        I hope this is more useful. If not I'll try again.

                                        pcap2.jpg

                                        1 Reply Last reply Reply Quote 0
                                        • ?
                                          A Former User
                                          last edited by

                                          I'll add this to the mix. I changed the average time in the gateway settings. That's the time dpinger averages over. When changing the setting, saving and then applying it the interface locked up for an extended time (minutes).

                                          So, I ssh'd in, ran top and did it again:

                                          Screen Shot 2020-04-04 at 21.04.40.png

                                          I can see dpinger using some resources, but why pfctl, ntpd and sshd? I'm not sure if that means anything, but it sure appears odd to me.

                                          1 Reply Last reply Reply Quote 0
                                          • R
                                            riften
                                            last edited by

                                            This looks so much like the problem I had, even before PFS 2.45. The symptoms. Latency spikes, then packet loss over and over. I had just created my first VLAN and gave the VLAN interface a static IPV6 in one of the 64s I should have. But no route and this horrible latency and packet drop. I followed the info HERE and created a 'Configuration Override' on the WAN IPV6 and set my VLAN static IPV6 and that was the only way to get darn ATT to route IPV6 from my VLAN. It made it trouble free after I spent almost a week pulling out my hair. So just wondering, can you guys ping (route) from your LAN or from the VLANS in ipv6? I am seeing ipv4 pings but did I miss the ipv6 pings...
                                            I'm on 2.45 with no issues, and am using the latest PFBLOCKERNG. It just looks so familiar...

                                            1 Reply Last reply Reply Quote 0
                                            • ?
                                              A Former User
                                              last edited by

                                              I can ping ipv6 without issue. I get a /56 from my isp.

                                              The only thing that has changed in my configuration is the pfsense version.

                                              I have offered to share my config.xml to test on matching hardware. My Supermicro hardware is the same as a box Netgate sells other than not being Netgate branded.

                                              This is an frustrating problem, more so for Netgate than anyone else I'm sure.

                                              1 Reply Last reply Reply Quote 0
                                              • stephenw10S
                                                stephenw10 Netgate Administrator
                                                last edited by

                                                Can you see what is calling pfctl if you run, say: ps -auxdww | grep pfctl.

                                                1 Reply Last reply Reply Quote 0
                                                • ?
                                                  A Former User
                                                  last edited by A Former User

                                                  root 25572 33.5 0.0 8828 4888 - R 09:34 0:04.12 | | `-- /sbin/pfctl -o basic -f /tmp/rules.debug

                                                  1 Reply Last reply Reply Quote 0
                                                  • ?
                                                    A Former User
                                                    last edited by A Former User

                                                    I was able to run ps auxdww >> psoutput a few times before the shell locked up.

                                                    Here it is: (removed)

                                                    1 Reply Last reply Reply Quote 0
                                                    • stephenw10S
                                                      stephenw10 Netgate Administrator
                                                      last edited by

                                                      Thanks, that could be useful.
                                                      Interesting there are things there using far more CPU than I would ever expect.

                                                      You might want to remove it though if those public IPs are static.

                                                      Steve

                                                      ? 2 Replies Last reply Reply Quote 0
                                                      • ?
                                                        A Former User @stephenw10
                                                        last edited by

                                                        @stephenw10 Dynamic. No open ports, so they can bang away all they want ;)

                                                        1 Reply Last reply Reply Quote 1
                                                        • ?
                                                          A Former User @stephenw10
                                                          last edited by A Former User

                                                          @stephenw10

                                                          I have some spare cycles, I suppose a lot of people do. You, however, are slammed.

                                                          If it would be helpful I'm willing to run through a methodical sequence of configurations and test to try to get a handle on the issue(s).

                                                          If you provided an outline of configurations like: Generic install, no ipv6, Test. Make big table(s). Test. Turn on ipv6, test. Make big ipv6 tables. Test. Like that.

                                                          I can give it some hours over the next day or two and see if that helps get a handle on the issue(s).

                                                          I would ask that the tests be specific and the data needed be spelled out clearly so my gaps in experience doesn't reduce the usefulness of the exercise.

                                                          I have a Supermicro 5018D-FN4T (32GB ECC) which is the same as Netgates XG-1541. I have been doing zfs (single ssd) UEFI installs.

                                                          I wonder if there is something apparently unrelated going on that is common with the installations that are experiencing these issues. Something simple like UPnP or the like. I wouldn't think so, but it would be nice to know exactly what is what as each service is configured in a methodical sequence.

                                                          Anyhow, just a thought.

                                                          1 Reply Last reply Reply Quote 0
                                                          • stephenw10S
                                                            stephenw10 Netgate Administrator
                                                            last edited by

                                                            The fact that pfctl is running for so long and using so many cycles implies it's having a very hard time loading the ruleset for some reason.
                                                            I would manually check the /tmp/rules.debug file. Make sure it's not absolutely huge for example.
                                                            If it isn't then start disabling things that add anything to it. So UPnP, and packages like pfBlocker.

                                                            Steve

                                                            1 Reply Last reply Reply Quote 0
                                                            • ?
                                                              A Former User
                                                              last edited by

                                                              Nothing in there that shouldn't be. I have disabled everything including pfblocker. Made a big url alias and the problem persists.

                                                              To me, I could be wrong, it looks like big tables big issue. Small tables small issue. Small tables don't cause a dramatic issue so it appears as if everything is ok when it isn't.

                                                              My curiosity to find out what is going on is waning. If there is anything I can do to help I'd be happy to do so. Otherwise I'll go back to 2.4.4-p3 or onto something else.

                                                              1 Reply Last reply Reply Quote 0
                                                              • stephenw10S
                                                                stephenw10 Netgate Administrator
                                                                last edited by

                                                                I agree it does seem like that.

                                                                If you don't actually have any large tables try setting the sysctl in System > Advanced > Firewall back to something closer to the default. So set Firewall Maximum Table Entries to, say, 65k or something even smaller.

                                                                There was coded added to allow that to be set and others have seen that as the issue. We see some reports (I have seen it myself) where you get the error 'unable to allocate memory for (some large table) but it then loads fine for subsequent reloads. It appears that's way pfctl may be doing something it shouldn't.

                                                                Steve

                                                                ? 1 Reply Last reply Reply Quote 0
                                                                • ?
                                                                  A Former User @stephenw10
                                                                  last edited by

                                                                  @stephenw10 I have done that and seen the can't allocate memory when total table entries > Maximum Table Entries. I can have my config (lots of vlans) with no packages, ipv6 enabled so the big bogonsv6 table and have the issue. Turn off block bogons and the symptoms are eliminated. The Max table entries setting has nothing to do with it. Total table entries is what matters. The only thing I haven't done is start from scratch and added stuff, I always started from my config and then disabled stuff.

                                                                  Anyhow, what happens will happen. I'm not going to get stuck on this for much longer.

                                                                  1 Reply Last reply Reply Quote 0
                                                                  • stephenw10S
                                                                    stephenw10 Netgate Administrator
                                                                    last edited by stephenw10

                                                                    Ok, so to confirm the presence of the large table(s), irrespective of the max table size value, triggers the latency/packet-loss/cpu usage?
                                                                    And removing the table completely eliminates it?

                                                                    Steve

                                                                    1 Reply Last reply Reply Quote 0
                                                                    • ?
                                                                      A Former User
                                                                      last edited by A Former User

                                                                      @stephenw10

                                                                      Total table size is limited by max table size. If I set max tables at some arbitrarily large number, say 20000000 but only have a few small tables (no bogonsv6, no ip block lists) things are fine, meaning the symptoms of the problem are not noticeable. I have done that.

                                                                      It's obvious that the opposite can not be configured, large tables small max tables.

                                                                      I'll demonstrate some time later today, things to do right now.

                                                                      1 Reply Last reply Reply Quote 0
                                                                      • ?
                                                                        A Former User
                                                                        last edited by A Former User

                                                                        @stephenw10

                                                                        OK, only took a moment.

                                                                        Set max tables to 20000000.
                                                                        Turned off block bogons.
                                                                        Disabled pfblocker.

                                                                        Rebooted.

                                                                        Reboot was fast, 2.4.4-p3 fast.
                                                                        Ran ps auxdww | grep in a while 1 loop
                                                                        Reloaded the filters (status->filter reload)

                                                                        No lag, no latency, didn't notice it in any way.

                                                                        Screen Shot 2020-04-06 at 11.23.07.png

                                                                        Screen Shot 2020-04-06 at 10.54.33.png

                                                                        Screen Shot 2020-04-06 at 10.54.48.png

                                                                        Screen Shot 2020-04-06 at 11.04.36.png

                                                                        Didn't even see pfctl pop up when running top. Must have happened in between refreshes.

                                                                        Conclude what you will from this. The evidence shows max tables limits total table size (what is supposed to do) but the total table entries is what causes the symptoms of the issue (cause currently unknown, some regression maybe in pf) to become obvious.

                                                                        1 Reply Last reply Reply Quote 0
                                                                        • ?
                                                                          A Former User
                                                                          last edited by

                                                                          @stephenw10

                                                                          So, I'm either going to go back to 2.4.4-p3 or another solution (I have a ISR I could drag out of the closet). I want to go back to the set and forget setup I have enjoyed with pfsense for a while now.

                                                                          The question that I feel needs to be answered by the FreeBSD team is this:

                                                                          Why was that hard limit implemented? I would assume there was some observed reason for rewriting that with a hard limit.

                                                                          1 Reply Last reply Reply Quote 0
                                                                          • M
                                                                            mikekoke
                                                                            last edited by

                                                                            Has anyone managed to find a permanent solution to the problem where pfblocker and bogons can be enabled without latency or loss?

                                                                            ? 1 Reply Last reply Reply Quote 0
                                                                            • ?
                                                                              A Former User @mikekoke
                                                                              last edited by A Former User

                                                                              @mikekoke Not that I can see.

                                                                              There is a bug in redmine that has exactly one update from Netgate, can't reproduce in their testing environment. We are passed the idea that it is a bug. It is. It sure looks like a bug that would require upstream (FreeBSD) participation in resolving.

                                                                              The question is do they even bother fixing it?

                                                                              You could say:

                                                                              1. Use 2.4.5 if you do not have a large number of total items in tables.
                                                                              2. Stay on 2.4.4-p3 if you have a large number of total table items.

                                                                              2.4.4-p3 remains a viable release. Accommodations made to set repositories to the 2.4.4 versions make it a reasonable option.

                                                                              Put all the effort into 2.5 knowing that both current options are safe and secure or divert resources to fixing 2.4.5? FreeBSD 11.3 is not EOL but it is also not a target for ongoing development. Will FreeBSD put resources into this bug?

                                                                              I don't know the answers to those questions. I not going to offer an opinion on one way or the other. I do think Netgate should put out a statement setting out their position for the short term. 2.5 is the long term resolution.

                                                                              S 1 Reply Last reply Reply Quote 0
                                                                              • S
                                                                                SteveITS Rebel Alliance @A Former User
                                                                                last edited by

                                                                                @jwj said in 2.4.5 High latency and packet loss, not in a vm:

                                                                                Accommodations made to set repositories to the 2.4.4 versions make it a reasonable option.

                                                                                Does that repo/branch choice also affect packages update/installation?

                                                                                Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                                                                                When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                                                                                Upvote 👍 helpful posts!

                                                                                1 Reply Last reply Reply Quote 0
                                                                                • ?
                                                                                  A Former User
                                                                                  last edited by

                                                                                  Yeah, there are two drop down menu choices under System->Update->System Update and System->Update->Update Settings.

                                                                                  The base OS/pfsense and the package repo should be correct. As always backup your configuration, make a snapshot if your in a virtual env, and have a plan to recover if you end up FUBAR.

                                                                                  It is too bad the download link for 2.4.4-p3 has not been restored. You can open a ticket and ask (nicely :) for one even if you do not own Netgate HW or have a support contract.

                                                                                  1 Reply Last reply Reply Quote 0
                                                                                  • S
                                                                                    SteveITS Rebel Alliance
                                                                                    last edited by

                                                                                    @jwj said in 2.4.5 High latency and packet loss, not in a vm:

                                                                                    System->Update->Update Settings.

                                                                                    Thanks. I got around to testing and this affects what package updates are detected, e.g. Suricata 4.1.7 vs 5.x. So that's good to know. Would be handy if they left the previous version there all the time (and/or had a warning on the package page if you're checking the wrong repo for your version) but nice it's there now.

                                                                                    Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                                                                                    When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                                                                                    Upvote 👍 helpful posts!

                                                                                    1 Reply Last reply Reply Quote 0
                                                                                    • First post
                                                                                      Last post

                                                                                    Looks like your connection to Netgate Forum was lost, please wait while we try to reconnect.