[Solved] pfsense is not making sense



  • I have the weirdest issue which is driving me crazy. I have been working on this for days trying to isolate what the exactly problem is. Here's where I'm at,

    Running pfsense 2.4.1 - Release (amd64)
    ISP provides 120Mbps down/ 40Mbps up

    I've had pfsense running in the office for months. I originally started with a simple setup, no packages installed. Everything was fine, speeds tests were reporting full bandwidth in both down/up.

    Since then, the network topology has not changed. I have installed pfsense OS updates along the way, Snort, squid (with cache and AV), and pfblocker. I have been running speed tests recently and my upload is consistently fine. The issue is with my download speeds. I can't get above ~97 Mbps. I know this sounds like an auto-negotiation or cabling issue, but I can assure you that's not the problem. Windows shows a 1 Gig connection and I have tried several PC's.

    I have disabled all packages mentioned above which I suspected could have caused issues. I even disabled some packages like nut just because I'm running out of ideas. I have tried at least 5 or more different client PC's on the network and they all have the issue. Here are a few scenarios:

    Client 1 (Windows 10) - This is a client native to the pfsense network and it has the issue on the pfsense network. When connected to a completely different LAN with a different non-pfsense router (Same upstream ISP gateway), I get full 120 Mbps download speeds.

    Client 2 (Windows 7 64 bit) - This is a client native to the pfsense network and it has the issue on the pfsense network. I tried using a second NIC port on this same PC, it then got full 120 Mbps download speeds on the same pfsense network with the exact same cable that was plugged into the original NIC port!

    Client 3 (Windows 7 64 bit) - This client is NOT native to the pfsense network. When I connected it to the pfsense network, it got the full 120 Mbps!

    It's almost as if pfsense is remembering the MAC or the specific NIC and not allowing it to go above a certain speed. I tried rebooting pfsense, but that didn't help. For client 1, I tried removing the static DHCP lease and giving it a new IP. That didn't help. I don't think it has to do with static DHCP leases anyway because client 2 did not have a static lease and still had the issue.

    I'm sure there must be a way to help narrow this down further. Would Wireshark be any help in a situation like this? If so, what do I look for?
    I am trying to avoid reinstalling pfsense from scratch since the office is running off this system. Plus, I would like to understand what's going on and hopefully contribute to the cause with some useful diagnostics.

    Sorry for the long post. If you got this far, thanks for reading!



  • What kind of system do you have?


  • LAYER 8 Global Moderator

    Do you have any sort of limiters or qos setup on pfsense?



  • johnpoz, good question. No, I don't have any QOS or limiters setup.

    kejianshi, my system info is attached.

    Thanks
    Raffi

    ![System info.JPG](/public/imported_attachments/1/System info.JPG)
    ![System info.JPG_thumb](/public/imported_attachments/1/System info.JPG_thumb)



  • @kejianshi:

    What kind of system do you have?

    In case you were wondering if this is running on a VM. It's a full install on the actual hardware.

    Raffi



  • No.  I want to know if your processor is a wimp or a brute.

    The services you have running are difficult for weak processors.



  • It's definitely not a high end system, but the processor never breaks a sweat. The CPU usage is almost always close to no usage at all even when I had all those services running during the day when most users were on the network. I have disabled all the services mentioned since then, but no joy.



  • You need to provide system specs including hard drive type and amount of memory as well as squid config and processor type.  As well as interface speeds.

    (I see your cpu is enough and so is ram)

    I'm leaning towards you problem being squid.



  • I have a 120GB Samsung evo SSD installed. The processor and RAM info are in the screen shot I provided.
    Intel(R) Celeron(R) CPU 1017U @ 1.60GHz
    2 CPUs: 1 package(s) x 2 core(s)
    AES-NI CPU Crypto: No

    4GB RAM

    I disabled squid to help isolate the issue and make troubleshooting simpler. Would the squid settings still be a factor even if the service is disabled?

    I only have two NICs on the system. One for WAN and another for LAN. Both interfaces are Gigabit.

    Raffi



  • Is squid still running?  Check processes.



  • I double checked it, squid is not running. I also attached a screen shot of my services




  • I don't know.

    Burn the install.  Reinstall.  Test again.



  • lol I wanna do what your avatar is doing right now. I think you're right though, I may have no choice.



  • BTW - I meant check it at a real console.  ps -aux



  • [2.4.1-RELEASE][admin@pfsense.telebyte]/root: ps -aux
    USER      PID  %CPU %MEM    VSZ  RSS TT  STAT STARTED      TIME COMMAND
    root      11 200.0  0.0      0    32  -  RL  16:59  2659:43.17 [idle]
    root        0  0.0  0.0      0  208  -  DLs  16:59      0:00.19 [kernel]
    root        1  0.0  0.0  5024  908  -  ILs  16:59      0:00.01 /sbin/init –
    root        2  0.0  0.0      0    16  -  DL  16:59      0:00.00 [crypto]
    root        3  0.0  0.0      0    16  -  DL  16:59      0:00.00 [crypto retur
    root        4  0.0  0.0      0    32  -  DL  16:59      0:00.01 [cam]
    root        5  0.0  0.0      0    16  -  DL  16:59      0:00.01 [soaiod1]
    root        6  0.0  0.0      0    16  -  DL  16:59      0:00.01 [soaiod2]
    root        7  0.0  0.0      0    16  -  DL  16:59      0:00.01 [soaiod3]
    root        8  0.0  0.0      0    16  -  DL  16:59      0:00.01 [soaiod4]
    root        9  0.0  0.0      0    16  -  DL  16:59      0:00.00 [sctp_iterato
    root      10  0.0  0.0      0    16  -  DL  16:59      0:00.00 [audit]
    root      12  0.0  0.0      0  272  -  WL  16:59      4:41.33 [intr]
    root      13  0.0  0.0      0    32  -  DL  16:59      0:00.00 [ng_queue]
    root      14  0.0  0.0      0    48  -  DL  16:59      0:00.01 [geom]
    root      15  0.0  0.0      0  256  -  DL  16:59      2:36.05 [usb]
    root      16  0.0  0.0      0    16  -  DL  16:59      0:24.10 [pf purge]
    root      17  0.0  0.0      0    16  -  DL  16:59      0:13.27 [rand_harvest
    root      18  0.0  0.0      0    16  -  DL  16:59      0:02.78 [acpi_thermal
    root      19  0.0  0.0      0    16  -  DL  16:59      0:00.32 [acpi_cooling
    root      20  0.0  0.0      0    16  -  DL  16:59      0:00.07 [enc_daemon0]
    root      21  0.0  0.0      0    48  -  DL  16:59      0:04.35 [pagedaemon]
    root      22  0.0  0.0      0    16  -  DL  16:59      0:00.00 [vmdaemon]
    root      23  0.0  0.0      0    16  -  DL  16:59      0:00.00 [pagezero]
    root      24  0.0  0.0      0    16  -  DL  16:59      0:00.40 [bufspacedaem
    root      25  0.0  0.0      0    32  -  DL  16:59      0:02.04 [bufdaemon]
    root      26  0.0  0.0      0    16  -  DL  16:59      0:00.38 [vnlru]
    root      27  0.0  0.0      0    16  -  DL  16:59      0:07.44 [syncer]
    root      60  0.0  0.0      0    16  -  DL  16:59      0:00.08 [md0]
    root      300  0.0  0.7 282676 29264  -  Ss  16:59      0:02.47 php-fpm: mast
    root      338  0.0  0.1  19436  4400  -  INs  16:59      0:00.02 /usr/local/sb
    root      340  0.0  0.1  19436  4216  -  IN  16:59      0:00.00 check_reload_
    root      353  0.0  0.1  9556  5516  -  Ss  16:59      0:00.04 /sbin/devd -q
    root    4772  0.0  0.1  19324  3196  -  Ss  17:00      0:00.37 /usr/local/sb
    root    5504  0.0  0.1  13084  2776  -  IN  00:01      0:00.00 /bin/sh /etc/
    root    5543  0.0  0.0  6172  1928  -  IN  00:01      0:00.00 sleep 81230
    root    7987  0.0  0.2  20348  6116  -  Ss  16:59      0:10.19 /usr/local/sb
    root    8940  0.0  0.1  12696  2392  -  Ss  16:59      0:06.17 /usr/local/sb
    root    12193  0.0  0.2  53488  6968  -  Ss  16:59      0:00.00 /usr/sbin/ssh
    root    12368  0.0  0.1  10580  2180  -  Is  16:59      0:00.00 /usr/local/sb
    root    14985  0.0  0.1  15076  2384  -  Is  16:59      0:11.32 /usr/local/bi
    root    19768  0.0  0.1  13084  2844  -  IN  13:29      0:01.18 /bin/sh /var/
    root    33534  0.0  0.0  8224  2004  -  Is  17:00      0:00.00 /usr/local/bi
    root    33889  0.0  0.0  8224  2020  -  I    17:00      0:00.03 minicron: hel
    root    34129  0.0  0.0  8224  2004  -  Is  17:00      0:00.00 /usr/local/bi
    root    34552  0.0  0.0  8224  2016  -  I    17:00      0:00.00 minicron: hel
    root    34737  0.0  0.0  8224  2004  -  Is  17:00      0:00.00 /usr/local/bi
    root    35020  0.0  0.0  8224  2016  -  I    17:00      0:00.00 minicron: hel
    root    37355  0.0  0.0  6172  1928  -  IN  15:39      0:00.00 sleep 60
    root    37366  0.0  0.2  78836  8140  -  Ss  15:39      0:00.03 sshd: admin@p
    root    48169  0.0  0.2  25416  6724  -  Is  17:00      0:00.00 nginx: master
    root    48399  0.0  0.2  27464  7768  -  I    17:00      0:00.59 nginx: worker
    root    48521  0.0  0.2  27464  8188  -  I    17:00      0:01.90 nginx: worker
    root    48884  0.0  0.1  12496  2368  -  Is  17:00      0:00.50 /usr/sbin/cro
    root    49416  0.0  0.3  24604 12424  -  Ss  17:00      0:04.41 /usr/local/sb
    root    60609  0.0  0.7 282676 29268  -  I    15:37      0:00.00 php-fpm: pool
    root    65254  0.0  0.1  10368  2088  -  Ss  17:00      0:11.20 /usr/sbin/pow
    root    70050  0.0  0.1  10580  2308  -  Ss  17:00      0:00.00 /usr/local/sb
    root    71912  0.0  0.0  10288  2012  -  Is  13:37      0:00.00 /usr/local/sb
    dhcpd  74470  0.0  0.2  16648  7836  -  Ss  15:22      0:00.06 /usr/local/sb
    root    78540  0.0  0.2  41504  7588  -  I    13:34      0:00.00 /usr/local/sb
    root    78860  0.0  0.2  52880  9108  -  Ss  13:34      0:01.14 /usr/local/sb
    unbound 79886  0.0  0.8  64468 33648  -  Ss  09:58      0:17.38 /usr/local/sb
    root    80737  0.0  0.1  10472  2532  -  Ss  17:00      0:09.21 /usr/sbin/sys
    root    68908  0.0  0.1  39432  2836 v0  Is  17:00      0:00.01 login [pam] (
    root    70053  0.0  0.1  13084  2924 v0  I    17:00      0:00.00 -sh (sh)
    root    70341  0.0  0.1  13084  2800 v0  I+  17:00      0:00.00 /bin/sh /etc/
    root    69122  0.0  0.1  10388  2128 v1  Is+  17:00      0:00.00 /usr/libexec/
    root    69382  0.0  0.1  10388  2128 v2  Is+  17:00      0:00.00 /usr/libexec/
    root    69546  0.0  0.1  10388  2128 v3  Is+  17:00      0:00.00 /usr/libexec/
    root    69647  0.0  0.1  10388  2128 v4  Is+  17:00      0:00.00 /usr/libexec/
    root    69652  0.0  0.1  10388  2128 v5  Is+  17:00      0:00.00 /usr/libexec/
    root    69953  0.0  0.1  10388  2128 v6  Is+  17:00      0:00.00 /usr/libexec/
    root    70040  0.0  0.1  10388  2128 v7  Is+  17:00      0:00.00 /usr/libexec/
    root    37841  0.0  0.1  13084  2800  0  Ss  15:39      0:00.00 /bin/sh /etc/
    root    40476  0.0  0.1  13392  3632  0  S    15:39      0:00.01 /bin/tcsh
    root    42749  0.0  0.1  21104  2716  0  R+  15:39      0:00.00 ps -aux



  • The "idle" process is using way too much processor…  (kidding)

    Don't see anything odd.  I'd reinstall and test again.



  • haha tech humor. I'm going to hold off a reinstall for now since it's not a show stopper, but I have a feeling that may be the only option. I'll have to find a good time to get it done.

    Thanks for the help.

    Raffi



  • Yeah -  I'd wait for a good time.  It could take seconds or perhaps minutes to hit the "default settings" button in the console.

    Might work as well as a fresh install.



  • lol good idea, I'll try that first.

    Have you had any experience with a reinstall when an issue came up? I wonder if restoring my config on a fresh install would also "restore" the issue? I guess, I'll only know by trying.



  • Likely so.  I've noticed that when I screw up my settings, save them and then restore them, they are still screwed up.  Maybe its just me.



  • It turns out it's not my settings. A factory reset didn't help either. Is a factory reset the same as a fresh install? Could there still be some files that are corrupt or not quite right?

    I'm beginning to think it could be due to the jump from 2.3.x to 2.4.0. I think that's when it  also changed the freeBSD version to 11? I won't know for sure until I try a fresh install of 2.3.x and see if that fixes it or not.



  • Id try a fresh install before I blamed the new version.  I think that even a factory reset could leave some stray code, depending on whats been done to it.



  • I'll have to wait for a time when the office is nearly empty before I do a fresh install. I may not be able to get that done for a while since I won't be in the office again till Tuesday. I guess the bit of good news is that it looks like it's not my settings. If it is due to some bit of bad/left over code, doing a fresh install of 2.4.1 will hopefully take care of that. I could run a test right after the install. Then, restore my latest config and it should get me back up and running, hopefully without issues. We shall see… but that is the game plan for now.



  • I just happened to be searching around tonight as I'm embarking on my own pfsense installation.

    You description seems like it somewhat matches that of this video on Youtube:  https://www.youtube.com/watch?v=v2rK5F461aM

    He upgraded the processor and problems went away.  You may be under powered since you turned a bunch of stuff on.

    Roveer



  • Since then, the network topology has not changed. I have installed pfsense OS updates along the way, Snort, squid (with cache and AV), and pfblocker. I have been running speed tests recently and my upload is consistently fine. The issue is with my download speeds. I can't get above ~97 Mbps.

    Snort, Squid, ClamAV and pfBlockerNG means you were turning your pfSense into a fully acting UTM device and this
    on a small Atom based board with 1.6GHz so it could really be that you are not right sorted with enough horse power.

    He upgraded the processor and problems went away.  You may be under powered since you turned a bunch of stuff on.

    Could be also that the memory system gets saturated. To small footprint or to lame RAM.



  • I'm in alignment with roveer's post, your box is underpowered.

    Per the PFsense hardware requirements page (https://www.pfsense.org/products/#requirements), for your bandwidth you should be running:

    "No less than a modern Intel or AMD CPU clocked at 2.0 GHz. Server class hardware with PCI-e network adapters, or newer desktop hardware with PCI-e network adapters."

    I would also double your ram at a minimum.



  • His box may technically be underpowered, but it is not showing any usual load.

    @OP: Run "ps -aux" while you're doing a speedtest. We need to see what's using CPU, if any, under load.


  • LAYER 8 Moderator

    on a small Atom based board with 1.6GHz so it could really be that you are not right sorted with enough horse power.

    Geez, guys! The celeron 1017U is an Ivy Bridge gen. Notebook CPU. Not a small-time old-school Atom.

    "No less than a modern Intel or AMD CPU clocked at 2.0 GHz. Server class hardware with PCI-e network adapters, or newer desktop hardware with PCI-e network adapters."

    What for? That recommendation is really old-school, even the pfSense hardware doesn't match that ;) Not even their own SG-2440 would match that description and is described as running IDS and Proxies just fine. I agree with Harvy, the screens don't show high CPU load and if the box should be that underpowered you'd see that in the 5 or 15m load values. The Celeron is a dual core, so a load of 2 would still be acceptable at peaks.



  • Thanks for the replies. I wish it were as simple as my hardware being under powered. I have no beast under the hood, but I have several points to squash that argument.
    1. My CPU load has never been max out even under the heaviest of use.
    2. My CPU load is almost always sitting close to 0% usage. The biggest load is probably me accessing the GUI/graphs.
    3. The idle process uses most of the processor.
    4. I disabled all the mentioned services which are known to be a burden and still have the issue.
    5. I did a factory reset and still had the issue.
    6. I have 4GB of newish laptop ram. It is not fully utilized.
    7. There is no use and never has been any use of swap space.

    I did not have this issue when I originally ran the system on 2.3.x, so I'm beginning to think it could be due to the jump to 2.4.x. It could also be that I have a botched install which happened somewhere along the way. I'm pretty sure the factory reset simply restores a config file with all the defaults from a fresh install. It's not re-imaging the partition from a recovery partition. I realized this when I saw my custom WPAD files still in the /usr/local/www/ directory even after the factory reset. I deleted those files as well just to be sure they had no part in the problem, but this made me think, if those files were untouched, what if a potentially corrupted file was also untouched. I think the only thing that makes sense at this point is a fresh install. I'll keep you all posted.

    Thanks.



  • It will be interesting to see what a fresh install does.



  • It sounds like you have a bad Network Card, maybe not necessarily bad, but not a good supported driver.  HAVP and Squid will kill your network speeds if you have a bad or unsupported driver.



  • I noticed you said that Windows shows a 1Gb connection but what does the speed show as connected in pfSense?  Also, anything in the logs?  I've seen where it flaps so that every couple of seconds the link goes down for a couple of milliseconds and comes back causing issues like this.  Doubtful since it is a VM but just an idea.  Since it is a VM, how about just building a second VM and swapping over for a few minutes to test?



  • Thanks for the replies.

    scottdam, a bad NIC/driver could also be a possible reason. I will only know for sure if I do a fresh install. I may also have to try going back to a fresh install of 2.3.x if it is a driver issue with 2.4.0. I did disable all the packages such as squid, snort and pfblocker. That didn't help.

    Stewart, both the interfaces are 1 Gb. pfSense only shows the WAN as "Media 1000baseT <full-duplex,master>" under Status > Interfaces. It doesn't show that same line for the LAN, but I do know it's gigabit. Plus, if they weren't I wouldn't have gotten 120 Mbps when connecting a non-native PC to the network on the same exact cabling. What should I look for in the logs specifically? I don't see anything indicating a dropped connection on the system tab. Would it be there or elsewhere? Do I have to change the verbose mode of the logging to see it maybe? Right now it's set to the default.

    I'm not running a VM, I have it running on actual hardware.</full-duplex,master>



  • Alright… so I spent several hours on this again last night while the office was quiet.

    Here is what I'm 100% sure of now... it is the pfsense box. How did I come to that conclusion? In addition to everything else, my last resort was to disconnect the WAN/LAN cable from pfsense and plug it into my old Netgear which it replaced. With the same exact network topology/IP's, I was getting the full 120 Mbps down. I plugged the pfsense box back in and was getting over 100 Mbps, but still not a solid and consistent 120 Mbps I should be getting.

    What did I do before plugging in the Netgear? I did a fresh install of 2.4.1. I was still not getting full down speeds. I then decided to do a fresh install of 2.3.5, but still not luck. I then swapped out the one NIC I was a little weary of, my USB 3.0 to GBE adapter. I plugged in a brand new one, but still no solution. I know... not the best NIC to be using, but I have no real choice on this box I'm running. Besides, that NIC was giving me my full download speed at one point, so I don't believe that is the issue.

    During all these trials mentioned above, I was using factory default settings with no additional packages installed. The only thing I did was configure the WAN and LAN IP's. The same IP's I've been using forever.

    I have one last idea which I will try hopefully tonight. I have hardware checksum offloading enabled. I'm almost certain my USB 3.0 NIC is not up to par for that feature, and according to the pfsense book that feature is broken in some NICs and will cause problems with corrupted packets and throughput. I'm suspecting I have both problems. So I'm gonna try to disable that, cross my fingers, and then reboot the box.



  • Generally it's a bad idea to use the usb NICs, I see no one have luck with this crap. If it's impossible to install pci-e intel card, but you have one embedded then use VLANs and VLANs capable switch, otherwise you will need different hardware setup to make things work as desired.



  • Yea, the USB NIC is not intended to be used the way I'm using it on a firewall. I was hoping I could get away with it. I thought I did for a while, but maybe I was wrong. I'm not giving up on it just yet though. Call me stubborn, but I really want to be able to make use of this tiny PC that was collecting dust, especially since it has the same footprint as my old Netgear so it fits right in. The VLAN approach might be a good solution if it does turn out to be a bad NIC. That could also be a good excuse to justify purchasing a managed switch. I do have another PC laying around with actual PCIe slots. It a has MUCH bigger desktop footprint. If this becomes a real big issue, I may end up switching over to that.



  • After all this you're using USB NICS LULZ.  You can't compare a Netgear (with no USB NIC) to a PFSense with a USB NIC.  That's like tying one hand behind the back of the PFSense.

    The USB NIC is your problem.  Even IF it worked well on a prior version there's no way I'd put my life in the hands of a USB anything (besides a keyboard and mouse or my phone chargers LOL).

    You're getting 100 Mb/s so who cares about the 20 Mb/s…?  Sure it's annoying BUT are you ever hitting all 100 Mb/s being consumed on your network?  Is Internet "slow" because you're missing 20 Mb/s...?  It seems like you're missing 20% of your bandwidth but if you're only consuming say... 50 Mb/s you actually have 50% utilization you're not even using so you're not even missing the 20 Mb/s / 20%.

    I'd stop the madness and so something more productive like drink beer :P



  • The 20 Mbps is what raised the red flag. I'm not losing sleep over the 20 Mbps because like you said, I'm not even using close to full bandwidth. At this point I want to understand why I'm losing it, not because I actually need it. This issue has helped me learn a lot about pfsense (I'm a newbie). The education is well worth the cost of 20 (unused) Mbps and a few forum posts.

    By the way, disabling the hardware check sum offloading didn't help either. I know everyone on these forums hates the USB NICs, but hatred alone wouldn't hold up in court. I'm still trying to understand how to definitively diagnose if it is a NIC issue. There must be dropped/corrupted packets if that's the case. The attached packet graphs are not clear to me. Is the WAN inpass supposed to be close to LAN outpass? The WAN to LAN would be my downstream. It looks like some packets are not making it out onto the LAN. For example, in the average, I have 983.73 pps coming into the WAN but only 915.54 pps making it out of the LAN. That's roughly 7% loss? Are there other factors such as packets not allowed due to filtering? Or would those go under the in/out block category and have nothing to do with it?

    Thanks all for the help.




  • @raffi30:

    During all these trials mentioned above, I was using factory default settings with no additional packages installed. The only thing I did was configure the WAN and LAN IP's. The same IP's I've been using forever.

    Did you restore your settings as part of the factory default?  Or did you go into the UI and manually create default settings for this test?

    I am very paranoid because of issues I've had in the past doing pfSense upgrades (since 1.2.3).  I input my settings from scratch after each upgrade.  Why?  Paranoia, and I don't have issues after upgrades.

    So I'd be keen to understand if you did minimal settings manually to run the tests or upload your previous settings prior to testing.



  • Hi tim.mcmanus,
    Sorry for not being clear on my post. The settings were factory default because I did a fresh install of 2.4.1. All I did after the fresh install was configure WAN and LAN IP's. I then ran my test again and found no difference. So I then repeated the same process of fresh install with 2.3.5, configured IP's, and ran the test. Neither made any difference.

    After all the tests not making any difference, I decided I might as well upgrade to 2.4.1 again and restore all my settings. I haven't had any new issues with it. I did have to reconfigure my WPAD file (as expected) and also my snort disable.conf SID management file (unexpected).

    Raffi


Log in to reply