Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    pfsense latency spikes in ESXi

    Scheduled Pinned Locked Moved Virtualization
    38 Posts 8 Posters 6.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • O
      oiyae @Cool_Corona
      last edited by oiyae

      @cool_corona completely different subnets, in bridge mode

      1 Reply Last reply Reply Quote 0
      • N
        netblues
        last edited by

        If we start doubting things like how ethernet works at the hardware level, then we have a multitude of options we can't really control. to blame too.
        For example, if traffic for the bridged modem passes through a l2 switch which gets it from a tagged trunked port and feeds it untagged to the bridged modem , it also adds a store and forward buffer and a few microseconds of delay.

        It should be neglicible and hard to measure.

        However a faulty receiving end on the bridged modem could be positively or negatively affected with this buffering.

        I'm wild guessing here, lets hope someone stumbles upon a combination that will work for the rest of us.

        1 Reply Last reply Reply Quote 0
        • O
          oiyae
          last edited by oiyae

          Happy new year everyone!

          Now I have brand new hardware box - 6 port Qotom Q555G6 with i5-7200U, 8GB RAM, 64 GB SSD.

          Installed pfsense, set it up manually from scratch. In general, latency decreased a bit but latency spikes and packet drops are the same as they were before (1-30%).
          Installed opnsense, set it up manually again, from scratch. Same latency spikes with packet drops (20-30 % at times).
          Both used in Fail-safe 2 WAN configuration as per configuration with one gateway group and dpinger enabled.
          Direct connection to the router doesn't have such issues. Ping spikes and packet loss happen on both WAN interfaces even when ISP modems connected directly to pfsense/opnsene hw box (one is puma6 affected with firmware patches installed, the other one is not affected).

          No idea what else to do here, it's annoyed the hell out of me already.

          It looks like it's an issue with freebsd itself. Seems to be related to https://forum.netgate.com/topic/151819/2-4-5-high-latency-and-packet-loss-not-in-a-vm/
          I also increased Firewall Maximum States to 1632000 and Firewall Maximum Table Entries to 2000000 but it didn't make any difference at all.

          1 Reply Last reply Reply Quote 0
          • R
            Rod-It
            last edited by

            2.4.5 had this issue and it was supposed to be fixed in p1 for anyone who used more than a single core, however it has not fixed it for me, i still suffer the same WAN issues. (notably, I didn't have them in 2.4.5)

            I do not see this when directly connected to my ISPs router, but part of the reason for moving away from it was it's very basic and Pfsense does a lot for me.

            I've still yet to find the time to install a 2.5 and play with it, but with it being not so far off it's possible final release, I've waited this long, a little longer wont hurt.

            I still think something else is going on under the hood and I've offered logs, I just wouldn't know where to start to troubleshoot this.

            O 1 Reply Last reply Reply Quote 0
            • O
              oiyae @Rod-It
              last edited by oiyae

              @rod-it FYI, on the opnsense roadmap for 21.1 which is due to be released at the end of January (hopefully) very first bullet is "Fix stability and reliability issues with regard to vmx(4), vtnet(4), ixl(4), ix(4) and em(4) ethernet drivers." Hope they'll manage to eliminate this issue.
              I went through alternatives and about to give openwrt a try. The rest opensource/free firewalls are far behind pf/opnsense functionality wize.

              With 2.4.5 I had way too high CPU load.

              bingo600B 1 Reply Last reply Reply Quote 0
              • bingo600B
                bingo600 @oiyae
                last edited by

                @oiyae

                I have no issues with my Qotom , see signature

                If you find my answer useful - Please give the post a šŸ‘ - "thumbs up"

                pfSense+ 23.05.1 (ZFS)

                QOTOM-Q355G4 Quad Lan.
                CPUĀ  : Core i5 5250U, Ram : 8GB Kingston DDR3LV 1600
                LANĀ  : 4 x Intel 211, DiskĀ  : 240G SAMSUNG MZ7L3240HCHQ SSD

                1 Reply Last reply Reply Quote 0
                • P
                  pfsensation
                  last edited by pfsensation

                  I'm running ESXi with lots of VM's similar to OP and also using a Virgin Media Superhub 3 in Bridged mode connected to a virtualised pfSense box 4vCPU's and 6GB of RAM assigned.

                  @Rod-It On 2.4.5 I experienced some issues with unbound crashing and pfBlockerNG DNSBL, however 2.4.5-P1 resolved most of these problems. I'm using FQ_Codel which has helped keep my latencies more in check, but the underlying issues with the SH3 and the ISP Virgin media as a whole still exist. Only recently my area got upgraded to Docsis 3.1, despite the modem not being able to support it I suspect the upgrades helped out with congestion/latency in the area.

                  Here's a quick screenshot of the latencies mapped out by Grafana on my monitoring stack:
                  alt text
                  Few spikes here and there, but nothing too major.

                  I had a quick skim read through the thread, but does taking pfSense out the equation help at all? @oiyae

                  PS: I'm on Vivid 350 package HUB firmware: 9.1.1912.302

                  O 1 Reply Last reply Reply Quote 0
                  • O
                    oiyae @pfsensation
                    last edited by oiyae

                    @pfsensation it does, when I got PC connected directly into SH3 router running for a good few hours with MTR to 8.8.8.8 there were no issues whatsoever. At the same time laptop connected over the wire to pfsense box shown packet drops and latency spikes with MTR to 8.8.8.8 and google.com

                    Docsis 3.0
                    HW version 5.01
                    SW version 6.12.18.26

                    P 1 Reply Last reply Reply Quote 0
                    • P
                      pfsensation @oiyae
                      last edited by

                      @oiyae said in pfsense latency spikes in ESXi:

                      @pfsensation it does, when I got PC connected directly into SH3 router running for a good few hours with MTR to 8.8.8.8 there were no issues whatsoever. At the same time laptop connected over the wire to pfsense box shown packet drops and latency spikes with MTR to 8.8.8.8 and google.com

                      Docsis 3.0
                      HW version 5.01
                      SW version 6.12.18.26

                      Call up VM and get the SH3 replaced, yours seems to be a much older revision and it's running software a lot older compared to mine. I would start there as in the newer firmware versions they have improved the latency issues quite a bit by offloading tasks off to the WiFi SoC.

                      Here's the info on mine below

                      Standard specification compliant : DOCSIS 3.0
                      Hardware version : 10
                      Software version : 9.1.1912.302
                      
                      O 1 Reply Last reply Reply Quote 0
                      • O
                        oiyae @pfsensation
                        last edited by

                        @pfsensation they'll send me a replacement that supports 1Gb, will see how it goes

                        R 1 Reply Last reply Reply Quote 0
                        • R
                          Rod-It @oiyae
                          last edited by

                          @oiyae

                          As far as i know they'll only send a SH4 if you buy the gig1 package or specifically ask for one and the rep is kind enough to honour it, but even so they still use the Puma chipset.

                          They do mask the problem better, but it's still there.

                          If they do send you a SH4 though I'm going to try that also

                          P 1 Reply Last reply Reply Quote 0
                          • P
                            pfsensation @Rod-It
                            last edited by

                            @rod-it said in pfsense latency spikes in ESXi:

                            @oiyae

                            As far as i know they'll only send a SH4 if you buy the gig1 package or specifically ask for one and the rep is kind enough to honour it, but even so they still use the Puma chipset.

                            They do mask the problem better, but it's still there.

                            If they do send you a SH4 though I'm going to try that also

                            For the time being, yeah. They're only giving it out to gig1 customers or anyone in a severely high utilisation area that makes enough noise.

                            I've asked several times, bypassed the Indian customer service and spoke to someone in the UK. The general gist I got is, only their Level 2 and higher technical team can order a SH4 in specific circumstances (low supply?). However, getting through to one of those guys right now is near impossible.

                            Do let us know how it goes! I've recently signed up to extend my virgin media contract because no one else in my area supplies more than 60meg. Being a network engineer myself, not a great fan of their network or their hardware as you can imagine...

                            A 1 Reply Last reply Reply Quote 0
                            • A
                              asche @pfsensation
                              last edited by

                              For ESXi deployments, note that FreeBSD and hence pfsense disables MSX-I interrupt handling per default and thus causes substantially higher CPU load. This could lead to spikes.

                              Try inserting the following into your /boot/loader.conf.local:

                              hw.pci.honor_msi_blacklist=0

                              References:

                              • https://forum.netgate.com/topic/157688/remove-vmware-msi-x-from-the-pci-blacklist
                              • further Google hits
                              R 1 Reply Last reply Reply Quote 0
                              • R
                                Rod-It @asche
                                last edited by

                                @asche

                                That link talks about PCI passthrough, for me i am running the system fully virtual, no pass-through.

                                My issue was present in ESXi 6.7U2, U3 and I've since upgraded to ESXi 7.01 and the issue still persists, I also changed my storage from PCIe SSDs to NVMe SSDs.

                                CPU is less than 5% most of the time in my setup.

                                My spikes are only on the WAN too, not the LAN, and have never been LAN side.

                                Thanks for the article though.

                                A 1 Reply Last reply Reply Quote 0
                                • A
                                  asche @Rod-It
                                  last edited by

                                  @rod-it look at the netgate documentation -> tuning -> vmxnet, they have the same advice there (since a few months I believe).

                                  https://docs.netgate.com/pfsense/en/latest/hardware/tune.html#vmware-vmx-4-interfaces

                                  R 1 Reply Last reply Reply Quote 0
                                  • R
                                    Rod-It @asche
                                    last edited by

                                    @asche

                                    Thank you again, i will look at it more tomorrow, but one has to assume if this was the cause, both the LAN and the WAN would have these issues, no?

                                    FYI, my server has 4 on-board NICs;

                                    Broadcom NetXtreme BCM5719

                                    In case it helps or is relevant.

                                    I do also have a couple of spare 4 port Intel NICs, I'm half tempted to throw one in the server and see if this makes any difference moving my settings over - specifically the WAN one for now.

                                    My issues also didn't start until around March time of last year

                                    B 1 Reply Last reply Reply Quote 0
                                    • B
                                      biggsy @Rod-It
                                      last edited by

                                      @rod-it

                                      If you are willing to experiment, try removing the VMX interfaces presented to your pfSense VM and replace them with E1000e.

                                      Not the recommended solution, I know, but I had a lot of problems with VMX and FreeBSD (both my mail server and pfSense). Changing to E1000e worked for me.

                                      I'm not sure if it's related to the problems I saw but pass-through is enabled automatically when adding VMX interfaces through the ESXi GUI.

                                      R 1 Reply Last reply Reply Quote 0
                                      • R
                                        Rod-It @biggsy
                                        last edited by

                                        @biggsy

                                        I will try that later if I can, adding / removing and reconfiguring NICS can be done on the fly.

                                        Note though that you are likely referring to DirectPath I/O, which passes through specific functions of the cards, not the cards themselves.

                                        This can of course be disabled if needs be.

                                        If I get the chance later today I will edit the WAN NIC to be E1000e and see how it goes over night.

                                        R 1 Reply Last reply Reply Quote 0
                                        • R
                                          Rod-It @Rod-It
                                          last edited by Rod-It

                                          While i will do some testing if time permits, this question was raised back in April, a month after my issues started (March), and the latter post suggests as many of us suspect, something changed, somewhere. It could be driver, ISP, package configuration or PfSense with ESXi and VMXNET3 specifically - since the common factors for people having issues are;

                                          VM in ESXi
                                          VMXNET3 adapter (in most cases, since this is the default)
                                          ISP in the UK is Virgin Media *SH3 and SH4 known to be buggy)
                                          WAN latencies and packet loss

                                          https://forum.netgate.com/topic/152770/is-e1000e-better-supported-than-vmxnet3-in-pfsense/

                                          Within the above it is recommended that VMXNET3 is the used adapter

                                          One poster is having issues LAN side, not WAN side, however this was also posted for 2.4.5 where other known issues were fixed in P1.

                                          My issues seemed to only start in P1

                                          It could be (in my case at least) related to FW in the ISPs modem vs driver support in the FreeBSD OS.

                                          P 1 Reply Last reply Reply Quote 0
                                          • P
                                            pfsensation @Rod-It
                                            last edited by pfsensation

                                            @rod-it said in pfsense latency spikes in ESXi:

                                            While i will do some testing if time permits, this question was raised back in April, a month after my issues started (March), and the latter post suggests as many of us suspect, something changed, somewhere. It could be driver, ISP, package configuration or PfSense with ESXi and VMXNET3 specifically - since the common factors for people having issues are;

                                            VM in ESXi
                                            VMXNET3 adapter (in most cases, since this is the default)
                                            ISP in the UK is Virgin Media *SH3 and SH4 known to be buggy)
                                            WAN latencies and packet loss

                                            https://forum.netgate.com/topic/152770/is-e1000e-better-supported-than-vmxnet3-in-pfsense/

                                            Within the above it is recommended that VMXNET3 is the used adapter

                                            One poster is having issues LAN side, not WAN side, however this was also posted for 2.4.5 where other known issues were fixed in P1.

                                            My issues seemed to only start in P1

                                            It could be (in my case at least) related to FW in the ISPs modem vs driver support in the FreeBSD OS.

                                            Honestly, I think in your case its just ISP related. In 2.4.5 I did have issues with larger pfblocker lists and unbound. This was patched on P1.

                                            I'm on the same ISP as you, and I've had major issues myself. It seems they cannot do anything right, crappy modem, crappy network, crappy peering (often loss to certain places like cloud flare). Now speaking to you and the OP seems even the same model of the flawed puma 6 modem doesn't even run the same firmware (guess they're cocking up firmware updates). I've had to get my modem replaced to improve my situation + use things like Fq_codel. Still nowhere near perfect but probably the best you'll get from flawed puma chipset modems + just a overall sketchy ISP.

                                            All in all, it's just a landmine. If possible I recommend you ditch the ISP and move onto something better. With working from home and online learning more prevalent, no one has time to endlessly faff around with Virgin. I've had no choice until recently, I've got community fibre coming to my area which is full fibre to the premises, I'm switching to that right away.

                                            PS: I assumed everyone read the pfsense guides, I've got the bootloader.conf.local options recommended by netgate setup.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.