Latency impact 2.4.4p3 > 2.4.5 > 2.5



  • I wondered if this has been observed by other users, or if its specific to my pfSense firewall.

    Bare metal install:

    • Supermicro X11DSV-8C-TN8F (Intel Xeon D-2146NT CPU @ 2.30GHz, hyper threading disabled)
    • 16GB RAM
    • dual NVMe ZFS SSDs
    • onboard i350 NIC connected to Arris SB8200 DOCSIS 3.1 modem (i.e not puma6 affected)
    • Spectrum 1000/35mbps service

    Since earlier this year I've been fighting increased latency, this clearly coincided with the release of 2.4.5 but I've also noticed increased latency of 2.4.5-p1 and 2.4.5-dev too. Here's my graph from the start of the year.

    2020year.png

    Having struggled to work remotely under 2.4.5, I upgraded to 2.5.dev which was averaged better but was still demonstrating some increases that made videoconferencing challenging. I rolled back to 2.4.3-p3 and life was good again.

    week.png

    I decided to try 2.4.5-p1 and although things are certainly better than 2.4.5, its noticeably averaging higher latency than 2.4.4-p3.

    All of this is with a largely static configuration with no significant changes to rule size, packages etc. I'm curious, is this observed with other users, are negate aware, is this the new normal, or can we help provide info to help reduce this latency back to previous levels?



  • @q54e3w said in Latency impact 2.4.4p3 > 2.4.5 > 2.5:

    2.4.4-p3.

    Still on 2.4.4-p3 because of all the issues i am seeing with squid and package manager



  • You guess where I upgraded to 2.4.5p1...

    91a926f1-4c68-48bb-bd31-89ac650306f4-billede.png



  • Hi, I don't have this problem at all. Working flawless here, like it always do :)

    6ff353a1-e301-4df8-8b92-9ebafeaab242-image.png



  • Appreciate you guys following up with some data. I've spent some time comparing my monitoring data but this doesn't show anything, although the dashboard CPU utilization is off the chart for this spec CPU.

    cpu.png

    thisamavedelvssys.png



  • can you enable HT??



  • CPU utilization does seem quite high at first glance. What packages do you have installed? Also, what does Diagnostics > System Activity show as using all that CPU time?



  • I've experimented with HT on and off over the last few months, it made no noticeable difference.

    top shown pfctl and openvpn being the highest consumers of CPU. pfctl can peg a single core intermittently, OpenVPN less so. Downloading a large file over a I would expect OpenVPN would likely top out at 100% of one core due to its single threaded nature, not occupy 4 cores. Because the latency increases with traffic (even a 250mbps download), my suspicion is theres something going on in the background with 2.4.5 that isn't in 2.4.4p3.

    Its hard to grab an output that shows the peaky/dynamic nature of the top output, open to suggestions how to though.

    Packages include: arping, avahi, cron, perf, mtr-nox11, netsnmp, nut, openvpn export and pfblocker-dev. pfBlocker is run with IP blocking only with PRI1 (16k), PR2 (267k) IPv4 blacklists. No DNSBL was enabled during the majority of this window.



  • Disabled PRI2 for pfBlocker.

    Last 1 / 5 / 15 minute CPU average
    cpu1515.png

    traffic graph
    wan.png

    top
    top.png



  • Hi,
    My suggestion is to do a clean install of your pfSense. (make an backup first)
    After clean install is finish do not install any package. Just do the small changes you need to get internet access. Then check and see if you still have the bad latency. If everything looks normal, start to do more changes in your pfSense. Unbound and other changes without installing any packages. If everything looks good, start installing one package at the time. And for every package install and configuration, check your latency. I know this going to take some time, but a good way to figure out when the problem appears.

    You can also start with disabling OpenVPN and pfblocker-dev first to see if your latency change back to more normal before you do an clean install. And stay with pfSense v. 2.4.5 p1.



  • @q54e3w - just to confirm, are you using 2.4.5-p1 presently and still seeing high CPU usage on pfctl? If yes, did you make any adjustments on 2.4.5 to mitigate the pfctl issue that may need to be undone? Seeing this level of CPU usage perplexes me as I'm using 2.4.5-p1 on a similarly spec'd system (Xeon D-1518 based baremetal install) with ~375K entries across two pfblockerNG-dev IP block lists and everything is working fine. There were high CPU/latency issues with 2.4.5, but those went away for me after upgrading to 2.4.5-p1.



  • @tman222 said in Latency impact 2.4.4p3 > 2.4.5 > 2.5:

    There were high CPU/latency issues with 2.4.5, but those went away for me after upgrading to 2.4.5-p1.

    I connect to @tman222 , it has very high CPU usage!!!

    on a similar system Supermicro Epyc 3151 with many more installed applications with 200 -250 users 7 - 8%

    ERGO: has some more serious problems lurking in the background



  • @tman222 said in Latency impact 2.4.4p3 > 2.4.5 > 2.5:

    @q54e3w - just to confirm, are you using 2.4.5-p1 presently and still seeing high CPU usage on pfctl? If yes, did you make any adjustments on 2.4.5 to mitigate the pfctl issue that may need to be undone? Seeing this level of CPU usage perplexes me as I'm using 2.4.5-p1 on a similarly spec'd system (Xeon D-1518 based baremetal install) with ~375K entries across two pfblockerNG-dev IP block lists and everything is working fine. There were high CPU/latency issues with 2.4.5, but those went away for me after upgrading to 2.4.5-p1.

    Yes, still on 2.4.5-p1. I made no changes to 2.4.5 as I migrated to 2.5-dev before 2.4.5p1 was released, as you can see from graphs 2.5 was better but still not perfect.
    Then to validate I did a clean install of 2.4.4.p3 no rollback strategy in place), reinstalled bare config, and upgraded to 2.4.5-p1. 2.4.4p3 doesnt exhibit any of this additional latency.

    I am still seeing high pfctl usage, here's the last 48 hours.

    last2days.png

    I was curious if the latency only affected virtual machines but I'm seeing similar on bare metal too.

    Latency vs traffic
    vstraffic.png



  • Hi @q54e3w - just to confirm the last sequence of installs were first starting with 2.4.4-p3 and then upgrading directly to 2.4.5-p1, correct? 2.4.5-p1 should have fixed the issues with increased latency and cpu usage so I'm surprised you are still seeing it. Have you tried a fresh standalone install of 2.4.5-p1 yet to see if that makes a difference?



  • You are correct. I haven’t done a clean 2.4.5p1 yet though. I might get chance this weekend fingers crossed.
    This is way better Under 2.4.5p1 than it was under 2.4.5 which was basically unusable.



  • Clean install and minimal restore done this morning. Will update in a day or two when I've had a chance to gather some data.



  • Ive been running a clean install (i.e not an upgrade of 2.4.4p3) of 2.4.5p1 with a minimal restore of config (too many rules to recreate from total scratch) for a few days now and gradually things have worsened. The culprit is pfctl which can be seen consuming 100% of CPU in System Activity.
    I have pfBlocker installed but have it set to run cron once per day at 3am and it doesnt appear to cause the load.
    I did notice on day 1 of running a clean install that the increase in latency occurred at exactly the precise time /usr/bin/nice -n20 /etc/rc.update_urltables was scheduled to run.
    Rebooting seems to reset things to some degree but at this point I suspect I have to roll back to 2.4.4p3 as this latency is making video conferencing virtually impossible.

    overview.png

    states.png

    notraffic.png

    memory.png

    cpuload.png


Log in to reply