Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    PfSense underperforming, high jitter + random packet loss

    Scheduled Pinned Locked Moved General pfSense Questions
    27 Posts 3 Posters 6.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • N
      NaterGator
      last edited by

      I could use some help in troubleshooting this issue which I've recently uncovered. Some background information on my setup:

      PC is a Windows 8.1 HTPC that is always on and uses a Ceton tuner to record live TV. It has an i5-2500k processor, Asus P8Z68-Pro mobo, an Intel PRO/1000 PT Dual Port Server Adapter, and 8GB of ram. I use Cox internet on an Arris SB8200 with the 150/10 tier internet. pfSense runs as a Hyper-V VM with exclusive access to the two NIC interfaces.

      For months I've had pfSense running my LAN with 2 VLANs; VLAN 1 for default network connectivity and VLAN 5 for devices that want to connect to the internet via policy based routing through my privacy VPN. As part of troubleshooting this problem I removed the VLAN configuration and simplified back down to 2 NIC interfaces and disabled the privacy VPN. At the same time I've been working with my ISP to fix numerous noise issues in the HFC plant in my area, so I have smokeping running on an AWS instance hitting my CMTS and Modem separately with pings every 30 seconds to check packet loss. The plant noise is gone but the red flag was some lingering low-grade packet loss I was seeing.

      Having said that: the primary issue I'm having is pfSense routing performance. It introduces unexplained jitter and frequent stalls in packet processing (especially when traffic shaping is turned on) to the point that I notice it quite severely during online gaming and VoIP calls.

      I finally realized it was pfSense after getting frustrated and trying the following which failed to net any change whatsoever:

      1. starting a new pfSense VM from scratch to validate it was some config change I made
      2. Booting pfSense directly without a hypervisor to validate it was not hyper-v getting in the way
      3. Moving to an entirely new x86 machine and running natively to validate it was not some Z68 or I5-2500K latency issue

      I was finally convinced it was actually pfSense itself that was to blame when I decided to boot up an old Untangle VM I had lying around from back when I decided to try it. The jitter and momentary connection stalls were immediately gone and I haven't had any measurable packet loss while running that VM. To be clear this is not coming from my ISP or CPE; in the span of about 1 minute I shutdown the pfSense VM and boot up the Untangle VM and it all disappears.

      Here are some representative tests I ran back to back as quickly as possible to maintain pretty consistent network conditions (click on the download/upload graph label to view phase bufferbloat):
      pfSense 2.4.2, no traffic shaper (1040ms spike during download, single 420ms spike during upload)
      pfSense 2.4.2, codelq traffic shaper (2 small 400ms download spikes, 17 ~300+ms upload spikes)
      Untangle 13.1.0, no traffic shaper
      Untangle 13.1.0, fq_codel shaper

      The fq_codel result on Untangle is, as expected, pretty much perfect. The pfSense results on the other hand look ridiculous, and are far from the worst I've actually collected.

      I've tried going through the pfSense wikis for Low Throughput Troubleshooting and Tuning and Troubleshooting Network Cards. I've tried searching the forums and web to no avail. I'm not that familiar with FreeBSD but nothing immediately stands out to me as a red flag (system / interrupt load appears low, etc). I have all NIC offloading features disabled on both the guest and host (no LRO, TSO, etc). Hell, I even replicated this problem on another machine I had lying around with a fresh default pfSense install.

      I'm really at a loss and could use some help regarding next steps for troubleshooting to bring pfSense performance back into line. I much prefer pfSense's interface/power to Untangle and would like to get back to it ASAP.
      pfsense_steady_packet_loss.png
      pfsense_steady_packet_loss.png_thumb

      1 Reply Last reply Reply Quote 0
      • C
        Chrismallia
        last edited by

        My results pfsense 2.4.2 with no traffic shaping

        http://www.dslreports.com/speedtest/26817207

        1 Reply Last reply Reply Quote 0
        • N
          NaterGator
          last edited by

          Thanks for the results Chris. Can you try with Hi-Res bufferbloat enabled?

          I really don't think it is the NIC, but out of sheer "I'm out of other ideas" desperation I ordered an I350T2V2 from Arrow to test.

          1 Reply Last reply Reply Quote 0
          • C
            Chrismallia
            last edited by

            Sure  that changes the results

            1 no shaping

            http://www.dslreports.com/speedtest/26818126

            2 I enabled fq_codel  and limiters  in pfsense  but with those bufferbloat setting  still the internet came to a crawl

            http://www.dslreports.com/speedtest/26818505

            1 Reply Last reply Reply Quote 0
            • N
              NaterGator
              last edited by

              Thanks for that. There are maybe some hints of a similar (or the same?) problem in your results, but nothing particularly conclusive or definitive. May I ask what hardware you're running on?

              For those playing along, here's how pfSense is comparing to Untangle on the exact same hardware minutes apart:

              1 Reply Last reply Reply Quote 0
              • C
                Chrismallia
                last edited by

                This is on a j1900
                I did a other test with traffic shaper and enabling codel in  every q
                internet kept working fine while testing

                http://www.dslreports.com/speedtest/26819500

                I have Untangle also will give it a spin

                1 Reply Last reply Reply Quote 0
                • C
                  Chrismallia
                  last edited by

                  Dude tested UT on same HW it first errored the test for a few times then I got these, pfsense did better with HFSC and Codel

                  http://www.dslreports.com/speedtest/26821430

                  1 Reply Last reply Reply Quote 0
                  • C
                    Chrismallia
                    last edited by

                    UT proof

                    utCapture.PNG
                    utCapture.PNG_thumb

                    1 Reply Last reply Reply Quote 0
                    • N
                      NaterGator
                      last edited by

                      @Chrismallia:

                      Dude tested UT on same HW it first errored the test for a few times then I got these, pfsense did better with HFSC and Codel

                      http://www.dslreports.com/speedtest/26821430

                      Bizarre. What NIC are you running?

                      1 Reply Last reply Reply Quote 0
                      • C
                        Chrismallia
                        last edited by

                        Nic is Intel dual port server grade

                        1 Reply Last reply Reply Quote 0
                        • N
                          NaterGator
                          last edited by

                          Tested using a brand new Intel I350T2V2, exactly the same results.

                          1 Reply Last reply Reply Quote 0
                          • C
                            Chrismallia
                            last edited by

                            My ping spikes up to 300ms sure but it goes down and I get A with no interruption to the services  same on UT,  can you post your results with Intel nic ? was the internet slow while performing the test ? try traffic shaper with HFSC and enable codel on every q and post your results,

                            1 Reply Last reply Reply Quote 0
                            • N
                              NaterGator
                              last edited by

                              The issue seems to be entirely with ALTQ shaping.

                              I decided to spend the day booted natively into pfSense (home alone, so nobody to be bothered with intermittent internet and no access to the TV) to troubleshoot this.

                              Ultimately after different iterations of ALTQ shapers with and without codel I couldn't find a single one that offered even remotely acceptable performance and that didn't introduce gigantic latency / bufferings spikes.

                              I decide to try this: https://forum.pfsense.org/index.php?topic=126637.0

                              Lo-and-behold, it worked like a charm. Using dummynet and real fq_codel on limiters gives me results I would expect without the altq insanity.
                              https://www.dslreports.com/speedtest/26865693

                              I don't know if I'm the only one experiencing thing, but it honestly seems like currently altq is introducing side effects worse than the problems it is supposed to fix.

                              1 Reply Last reply Reply Quote 0
                              • w0wW
                                w0w
                                last edited by

                                pfSense 2.4.3 alphabeta built on Sat Dec 16 11:23:26 CST 2017,
                                Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz e3c226d2i (2xi210 LAN)
                                tunables
                                kern.ipc.maxsockbuf 256000000
                                hw.igb.rxd="4096"
                                hw.igb.txd="4096"
                                net.inet.tcp.syncache.hashsize=1024
                                net.inet.tcp.syncache.bucketlimit=100
                                net.isr.defaultqlimit=4096
                                net.link.ifqmaxlen=10240
                                hw.igb.rx_process_limit="-1"
                                hw.igb.num_queues=2
                                dev.igb.0.fc=0
                                dev.igb.1.fc=0
                                kern.ipc.nmbjumbo9="20000"
                                kern.ipc.nmbclusters="1000000"
                                WAN is PPPoE 300/300Mbit over gigabit LAN to ISP router (some CISCO with 10G fiber optic connection)

                                FQ_CODEL enabled,  Hi-Res bufferbloat and other settings as posted by NaterGator:

                                http://www.dslreports.com/speedtest/26877901

                                FQ_CODEL enabled,  Hi-Res bufferbloat and 30/30 streams:

                                http://www.dslreports.com/speedtest/26877933

                                FQ_CODEL disabled, Hi-Res bufferbloat and other settings as posted by NaterGator:

                                http://www.dslreports.com/speedtest/26877771

                                FQ_CODEL disabled, Hi-Res bufferbloat and 30/30 streams:

                                http://www.dslreports.com/speedtest/26877806

                                FQ_CODEL disabled, no tunables, Hi-Res bufferbloat and other settings as posted by NaterGator:

                                http://www.dslreports.com/speedtest/26877572

                                FQ_CODEL disabled, no tunables, Hi-Res bufferbloat and 30/30 streams:

                                http://www.dslreports.com/speedtest/26877682

                                I do not see any huge difference, just some fluctuations that are mostly on ISP side I think.

                                If you want me to test ALTQ shaper, please provide some sample configuration. But really, I have had some not very good experience with ALTQ at least it have twice as much overhead bandwidth comparing to IPFW shaper.

                                1 Reply Last reply Reply Quote 0
                                • N
                                  NaterGator
                                  last edited by

                                  Interesting results… I wonder if asymmetric link bandwidth is having a greater influence?

                                  This was my "typical" basic altq test with no limiter/fq_codel: https://i.imgur.com/d1vQLFc.png (only the one shaper on the WAN interface)

                                  I also tried the configuration outlined here: http://www.speedtest.net/insights/blog/maximized-speed-non-gigabit-internet-connection/

                                  Also...go bolts?

                                  1 Reply Last reply Reply Quote 0
                                  • w0wW
                                    w0w
                                    last edited by

                                    ALTQ CODELQ, NaterGator settings — http://www.dslreports.com/speedtest/27005845  As you can see dslreports automatucally dropped to 18 : 6 streams.
                                    And for the 30/30 streams we have a problem! Triple test start ended with stuck on idle latency testing with spikes (failed due to overall timeout. error:2) and at the end I've got this with 24/24 http://www.dslreports.com/speedtest/27006168
                                    And repeat test with FQ_CODEL and 30/30 — http://www.dslreports.com/speedtest/27006586
                                    There is something broken in ALTQ CODELQ…

                                    1 Reply Last reply Reply Quote 0
                                    • N
                                      NaterGator
                                      last edited by

                                      Thanks for the extra effort and offering some level of confirmation that I'm not totally crazy. I'm not sure if this is an issue I should submit to the pfSense tracker or if this belongs upstream on FreeBSD's end.

                                      FWIW: To reduce variables I use the preferences on the dslreports test to set fixed servers that I know are close by and a fixed number of streams.

                                      1 Reply Last reply Reply Quote 0
                                      • C
                                        Chrismallia
                                        last edited by

                                        @w0w

                                        You get great results :) using fq_codel. The minimum  ping spike I could get was 150 something just on download, upload is fine , but I think ISP matters and also that you have a symmetrical  speed  makes a difference

                                        1 Reply Last reply Reply Quote 0
                                        • w0wW
                                          w0w
                                          last edited by

                                          Chrismallia, yes it's ISP, just very good ISP network at least in my location.
                                          NaterGator, it's FreeBSD, but I don't think anybody cares ALTQ CODELQ, you have alternative with HFSC and codel enabled queue. I think next 3-5 years we will see some progress for IPFW or ALTQ — it does not matter they both need code to be rewritten from scratch, because of used 32-bit integers they both do not support modern traffic bandwidth (over 4 Gigs/sec).

                                          1 Reply Last reply Reply Quote 0
                                          • N
                                            NaterGator
                                            last edited by

                                            @w0w:

                                            NaterGator, it's FreeBSD, but I don't think anybody cares ALTQ CODELQ, you have alternative with HFSC and codel enabled queue. I think next 3-5 years we will see some progress for IPFW or ALTQ — it does not matter they both need code to be rewritten from scratch, because of used 32-bit integers they both do not support modern traffic bandwidth (over 4 Gigs/sec).

                                            Hmm, I do see this issue in HFSC with and without codel enabled. What I'm saying is any altq enabled shaping at all triggers the issue.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.