Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    pfsense latency spikes in ESXi

    Scheduled Pinned Locked Moved Virtualization
    38 Posts 8 Posters 7.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • O
      oiyae
      last edited by

      Damn, I even tried latest OPNSense - didn’t help, same story. Also looses settings entirely after hard reset due to power loss for whatever reason.

      1 Reply Last reply Reply Quote 0
      • R
        Rod-It
        last edited by

        Your latencies are WAN side?

        Who is your ISP, is your modem in bridged mode?

        My understanding is, if you remove Pfsense from the mix and only use your ISPs kit, it will work fine, I also believe this issue is not apparent in the 2.5.x builds, though I've not tested this myself. One other suggestion that seems to work is put your ISPs modem back in modem mode, and double-NAT, while not ideal, apparently the issue also goes away.

        The issue seems to affect ESXi more than other hypervisors.

        Here is my post on the same thing (it sounds similar to yours)
        https://forum.netgate.com/topic/155642/troubleshooting-wan-latency

        I have Codel limiters in place which help a little, but i still get WAN spikes and no matter how many times people tell me it's my ISP, if i remove PfSense and go back to the junk my ISP provides, i do not face these issues.

        My WAN spikes are not limited to only when the WAN is in use, i also see huge spikes when only connected to work via VPN and i barely use 1Mbits, but i can get cut off calls because of the WAN spikes. I have found ways to limit this but it's still a PITA

        O 1 Reply Last reply Reply Quote 1
        • O
          oiyae @Rod-It
          last edited by oiyae

          @rod-it puma6 is involved in my case too. Latency spikes happen both on WAN and LAN. I disabled CoDel completely because it made things worse. I don't quite understand why pfsense and superhub don't like each other that much.
          Isolated testing of my network showed that:

          • only pfsense causes LAN latency spikes. If it's excluded from the network everything's fine.
          • pfsense causes packet loss on the last hop to google according to MTR, if connected directly to the router in bridge mode there's no packet loss whatsoever

          Will try to move pfsense out of ESXi to HW box, hope it'll help. This issue is very annoying because at times even simple web browsing turns into a struggle. OPNsense has the same issues. Especially noticeable with UDP traffic.

          FYI: this issue is still under discussion here https://forum.kitz.co.uk/index.php/topic,24600.195.html and useful tips regarding unbound settings were given here: https://forum.kitz.co.uk/index.php/topic,24600.90.html

          N 1 Reply Last reply Reply Quote 0
          • N
            netblues @oiyae
            last edited by

            Well. the guys at kitz basically found out the same
            There are hardware issues with the modems, which is fudged in firmware fixes.
            Most probably those fixes were done/possible only in router mode, not in bridge mode.

            I doubt moving pfsense to dedicated hw will fix the spikes.

            Dns resolution is irrelevant to the problem (unless of course dns udp is ones ONLY traffic.)
            And so is mtr packet loss at a certain hop.

            As a side note, any direct comparison with once upon a time forks, can't be conclusive for any pfsense functionality/feature/issue

            1 Reply Last reply Reply Quote 0
            • R
              Rod-It
              last edited by

              It does seem to point to it being a Pf issue, however many discussions still point it to the ISP or router hardware - which is fine, but there are many other people who double-NAT, use Pf 2.5.x or remove Pf from the mix and the problems go away - for this reason i am waiting eagerly for 2.5 to be official, i dont want to run beta software at the moment, though i still dabble with the idea of trying it, i just dont seem to find the time.

              That said, it can't be too far away, so it's likely just as easy to tough it out a little longer.

              R 1 Reply Last reply Reply Quote 0
              • R
                Rod-It @Rod-It
                last edited by

                @rod-it
                When i say pf, i mean specifically in the BSD network stack, not pf directly.

                I note this because BSD is an older version in 2.4 and bug fixes in the network may be resolved in 2.5 due to the upgraded OS.

                This is just a guess, but there are online videos that show the issue being resolved in 2.5

                N 1 Reply Last reply Reply Quote 0
                • Cool_CoronaC
                  Cool_Corona
                  last edited by

                  Are the two modems in bridge mode, having the same LAN subnet?

                  O 1 Reply Last reply Reply Quote 0
                  • N
                    netblues @Rod-It
                    last edited by

                    @rod-it But if you do nat at the isp router(it doesn't have to be double), then the corrected firmware "covers" the issue.
                    When in bridge mode, it does/can not.

                    Unless there is a situation when another router is being used in bridge mode and not experiencing the lags, then it can be attributed to pf.

                    I doubt 2.5 can solve this.

                    1 Reply Last reply Reply Quote 0
                    • R
                      Rod-It
                      last edited by

                      While i agree with you, there are videos showing that for some 2.5 does fix the issue - the forum linked above also shows people using pf with SH3 and not facing the issues. Perhaps it's related to the NICs in the Pf box and how the SH3 is talking to it.

                      While i know Puma is the cause, based on what you say above, if we rule out Pf completely, there is no fix, other than downgrading broadband to a SH2 or moving ISP, which in some cases is not an option, the only other option is to remove Pf completely and go back to a physical router (My Nighthawk didn't show these symptoms) or just use the SH3 directly, which is a bad choice for anyone who wants to do more than basic internet. I can also tell you that if i leave the modem in bridged mode and connect a laptop or PC directly, i also do not have these issues.

                      I can also tell you that my issues spiked (no pun intended) back in Feb/March of 2020, and you'll see this posted all over the place too, I believe this was the date VM put the fix described above in place for anyone in router mode, but sadly this made things worse for those of us who use bridged mode.

                      To be clear, i am not saying this is a Pf or BSD issue directly, but something not gelling well between the sH3 and Pf/BSD

                      I, like many others would just love to know a solution, replacing the modem or changing ISPs is not an option, reverting to a hardware router and not using Pfsense is an option, but not one i'd like to have to choose.

                      Appreciate everyone input and help though

                      1 Reply Last reply Reply Quote 0
                      • O
                        oiyae @Cool_Corona
                        last edited by oiyae

                        @cool_corona completely different subnets, in bridge mode

                        1 Reply Last reply Reply Quote 0
                        • N
                          netblues
                          last edited by

                          If we start doubting things like how ethernet works at the hardware level, then we have a multitude of options we can't really control. to blame too.
                          For example, if traffic for the bridged modem passes through a l2 switch which gets it from a tagged trunked port and feeds it untagged to the bridged modem , it also adds a store and forward buffer and a few microseconds of delay.

                          It should be neglicible and hard to measure.

                          However a faulty receiving end on the bridged modem could be positively or negatively affected with this buffering.

                          I'm wild guessing here, lets hope someone stumbles upon a combination that will work for the rest of us.

                          1 Reply Last reply Reply Quote 0
                          • O
                            oiyae
                            last edited by oiyae

                            Happy new year everyone!

                            Now I have brand new hardware box - 6 port Qotom Q555G6 with i5-7200U, 8GB RAM, 64 GB SSD.

                            Installed pfsense, set it up manually from scratch. In general, latency decreased a bit but latency spikes and packet drops are the same as they were before (1-30%).
                            Installed opnsense, set it up manually again, from scratch. Same latency spikes with packet drops (20-30 % at times).
                            Both used in Fail-safe 2 WAN configuration as per configuration with one gateway group and dpinger enabled.
                            Direct connection to the router doesn't have such issues. Ping spikes and packet loss happen on both WAN interfaces even when ISP modems connected directly to pfsense/opnsene hw box (one is puma6 affected with firmware patches installed, the other one is not affected).

                            No idea what else to do here, it's annoyed the hell out of me already.

                            It looks like it's an issue with freebsd itself. Seems to be related to https://forum.netgate.com/topic/151819/2-4-5-high-latency-and-packet-loss-not-in-a-vm/
                            I also increased Firewall Maximum States to 1632000 and Firewall Maximum Table Entries to 2000000 but it didn't make any difference at all.

                            1 Reply Last reply Reply Quote 0
                            • R
                              Rod-It
                              last edited by

                              2.4.5 had this issue and it was supposed to be fixed in p1 for anyone who used more than a single core, however it has not fixed it for me, i still suffer the same WAN issues. (notably, I didn't have them in 2.4.5)

                              I do not see this when directly connected to my ISPs router, but part of the reason for moving away from it was it's very basic and Pfsense does a lot for me.

                              I've still yet to find the time to install a 2.5 and play with it, but with it being not so far off it's possible final release, I've waited this long, a little longer wont hurt.

                              I still think something else is going on under the hood and I've offered logs, I just wouldn't know where to start to troubleshoot this.

                              O 1 Reply Last reply Reply Quote 0
                              • O
                                oiyae @Rod-It
                                last edited by oiyae

                                @rod-it FYI, on the opnsense roadmap for 21.1 which is due to be released at the end of January (hopefully) very first bullet is "Fix stability and reliability issues with regard to vmx(4), vtnet(4), ixl(4), ix(4) and em(4) ethernet drivers." Hope they'll manage to eliminate this issue.
                                I went through alternatives and about to give openwrt a try. The rest opensource/free firewalls are far behind pf/opnsense functionality wize.

                                With 2.4.5 I had way too high CPU load.

                                bingo600B 1 Reply Last reply Reply Quote 0
                                • bingo600B
                                  bingo600 @oiyae
                                  last edited by

                                  @oiyae

                                  I have no issues with my Qotom , see signature

                                  If you find my answer useful - Please give the post a 👍 - "thumbs up"

                                  pfSense+ 23.05.1 (ZFS)

                                  QOTOM-Q355G4 Quad Lan.
                                  CPU  : Core i5 5250U, Ram : 8GB Kingston DDR3LV 1600
                                  LAN  : 4 x Intel 211, Disk  : 240G SAMSUNG MZ7L3240HCHQ SSD

                                  1 Reply Last reply Reply Quote 0
                                  • P
                                    pfsensation
                                    last edited by pfsensation

                                    I'm running ESXi with lots of VM's similar to OP and also using a Virgin Media Superhub 3 in Bridged mode connected to a virtualised pfSense box 4vCPU's and 6GB of RAM assigned.

                                    @Rod-It On 2.4.5 I experienced some issues with unbound crashing and pfBlockerNG DNSBL, however 2.4.5-P1 resolved most of these problems. I'm using FQ_Codel which has helped keep my latencies more in check, but the underlying issues with the SH3 and the ISP Virgin media as a whole still exist. Only recently my area got upgraded to Docsis 3.1, despite the modem not being able to support it I suspect the upgrades helped out with congestion/latency in the area.

                                    Here's a quick screenshot of the latencies mapped out by Grafana on my monitoring stack:
                                    alt text
                                    Few spikes here and there, but nothing too major.

                                    I had a quick skim read through the thread, but does taking pfSense out the equation help at all? @oiyae

                                    PS: I'm on Vivid 350 package HUB firmware: 9.1.1912.302

                                    O 1 Reply Last reply Reply Quote 0
                                    • O
                                      oiyae @pfsensation
                                      last edited by oiyae

                                      @pfsensation it does, when I got PC connected directly into SH3 router running for a good few hours with MTR to 8.8.8.8 there were no issues whatsoever. At the same time laptop connected over the wire to pfsense box shown packet drops and latency spikes with MTR to 8.8.8.8 and google.com

                                      Docsis 3.0
                                      HW version 5.01
                                      SW version 6.12.18.26

                                      P 1 Reply Last reply Reply Quote 0
                                      • P
                                        pfsensation @oiyae
                                        last edited by

                                        @oiyae said in pfsense latency spikes in ESXi:

                                        @pfsensation it does, when I got PC connected directly into SH3 router running for a good few hours with MTR to 8.8.8.8 there were no issues whatsoever. At the same time laptop connected over the wire to pfsense box shown packet drops and latency spikes with MTR to 8.8.8.8 and google.com

                                        Docsis 3.0
                                        HW version 5.01
                                        SW version 6.12.18.26

                                        Call up VM and get the SH3 replaced, yours seems to be a much older revision and it's running software a lot older compared to mine. I would start there as in the newer firmware versions they have improved the latency issues quite a bit by offloading tasks off to the WiFi SoC.

                                        Here's the info on mine below

                                        Standard specification compliant : DOCSIS 3.0
                                        Hardware version : 10
                                        Software version : 9.1.1912.302
                                        
                                        O 1 Reply Last reply Reply Quote 0
                                        • O
                                          oiyae @pfsensation
                                          last edited by

                                          @pfsensation they'll send me a replacement that supports 1Gb, will see how it goes

                                          R 1 Reply Last reply Reply Quote 0
                                          • R
                                            Rod-It @oiyae
                                            last edited by

                                            @oiyae

                                            As far as i know they'll only send a SH4 if you buy the gig1 package or specifically ask for one and the rep is kind enough to honour it, but even so they still use the Puma chipset.

                                            They do mask the problem better, but it's still there.

                                            If they do send you a SH4 though I'm going to try that also

                                            P 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.