Low throughput on 2.x



  • Hello. I run a small network of pfSense boxes using very old PCs (mainly P4 at 2Ghz with fxp NICs). All 6 of these pfSense boxes have been kept on latest software and are setup for basic firewall+router+pptp features (no packages). All this meaning that I have some experience gained over the last 4 years but only on basic features.

    A couple of weeks ago a relative of mine asked me to take over the maintenance of his SOHO router which is a pfSense running a HP DL385 G5 server.
    When I took over, this box had pfSense 1.2.3 along with squid+squidguard+snort and a couple of minor packages.
    My immediate main concern was with how old that pfS version was and Heartbleed immediately crossed my mind.
    So last Wednesday I ran an upgrade to 2.1.2 using the SSH console.
    During the upgrade the box freezed. I gave it more than 30 minutes and it just didn't finish the upgrade. A person on site told me that there was nothing in the console and that the CPU fans appeared to be at 100%. There was nothing we could do so we restarted the server and I was very much surprised to see that it booted into 2.1.2 maintaining the previous configuration with the exception of squid and snort.
    I configured those packages and all seemed to be working fine.

    The next day I started getting complaints of slow network. Some quick checking showed that internet access through pfSense is currently working at 3,5mbps DL and 300kbps UL. I disconnected the pfSense and connected my laptop directly to the ISP modem and I can get 7mbps DL and 800kbps UP.
    Besides, they say they would get 10-11MB/s data transfer between hosts (on different vlans) and right now they're only getting 2-3MB/s.
    Looking at the logs I can't see anything wrong.
    MBUF was strangely high so I ran some tweaks found here https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards because all the 6 NICs on this box are either em or bce. MBUF is now much lower but network throughput is still low.
    Any ideas/tips?
    Thanks in advance. :)
    Cheers



  • When doing your speed tests, what processes show up consuming your CPU? Diagnostics->"System Activity"

    Copying the data and posting here could be helpful.


  • Netgate Administrator

    1.2.3 was not vulnerable to Heartbleed, too old.  ;) Many other vulnerabilities though.

    Check the CPU usage at the console with 'top -SH'.
    This is a full install running from a harddisk I assume?

    You have probably setup Snort differently, maybe it's running on all interfaces now and previously wasn't?

    11MBps between internal interfaces doesn't seem that great to me.

    Steve



  • Hi Steve. Thank you so much for your reply :)
    @stephenw10:

    1.2.3 was not vulnerable to Heartbleed, too old.  ;) Many other vulnerabilities though.

    I didn't know that otherwise I would had given it a second thought.
    OTOH I'm sure heartbleed was not the only serious vulnerability to have emerged between 1.2.3 and 2.x.

    @stephenw10:

    Check the CPU usage at the console with 'top -SH'.
    This is a full install running from a harddisk I assume?

    Yes, this is a full install on a 15k SAS hdd.
    I honestly haven't run a top on the CLI but dashboard shows cpu usage 0 or very near 0 and load average is around 0.20 for a dualcore cpu.
    Will run a top and report here a bit later today.

    @stephenw10:

    You have probably setup Snort differently, maybe it's running on all interfaces now and previously wasn't?

    Could be some snort difference but I have already stopped snort service for some time and the problem is still visible. And in fact I just checked that Snort is only running on WAN. As it is supposed, right?

    @stephenw10:

    11MBps between internal interfaces doesn't seem that great to me.

    No, it isn't. But I'm afraid 11MB/s is the max they can get on that network.
    You see, they have two old managed switches. One for the server room and another for the office open-space. Both are Cisco 26 port - 24x 10/100 and 2x 10/100/1000.
    Since both switches only have 2 gigE ports, what the other guy did was connect opt2 to a VMware ESXi server which runs all (3) servers in the company and connect the pfS lan port to one of the server room switch gigE ports. Then, the other gigE port on that switch is connected to the office switch on a gigE port.
    This way they have gigE between servers and the router (aka pfSense) and gigE between the router and both switches.
    But then all (5) desktops and 1 laptop connect to the office switch on 10/100 ports. And since 100mbps is roughly 12MB/s I guess 11MB/s is a quite okay figure. Problem is, they're stuck to 2-3MB/s



  • If you did a in-place upgrade, try backing up the config and re-install/re-apply config.
    Is there anything setup to limit bandwidth?



  • @podilarius:

    If you did a in-place upgrade, try backing up the config and re-install/re-apply config.

    I will most probably do that. In fact I had already thought about that during today.
    BTW, this server doesn't have any cd-rom reader. Should I download the memstick image and when it finishes booting up, hit 'I' to start the installer?

    @podilarius:

    Is there anything setup to limit bandwidth?

    No, not really. At first I still considered running the traffic shaping wizard because this is only a 7mbps/800kbps link. But something else came up and then I forgot to do that.
    So, if anything is setup to limit bandwidth it must have been in the config when it was still running 1.2.3.
    And thanks for your reply :)



  • I had this issue with the traffic shaper enabled. I had to switch to limiter. I have 100Mbs connection and the limiter forced it to only 50. limiter doesn't have a problem. I have not tried it again since 2.1release. I didn't notice a change in the shaper code. You could also run through the config and see if a bug enabled a feature or something.



  • Great to know what I was doing something unexpected.


  • Netgate Administrator

    Check the config file is a good call. It's hard to see what might be causing that sort of restriction.
    Bad switch? Bad cable? Check the error/collision count on the Status: Interfaces: page.

    Steve



  • @stephenw10:

    Check the config file is a good call. It's hard to see what might be causing that sort of restriction.

    Yeah, I know. I've checked the config file but I don't know what I am looking for.
    For all I know, everything seems plausible in the xml file.

    @stephenw10:

    Bad switch?

    Checked. It's not the switch, they have a third switch on site (similar to the other 2 I've mentioned) and they configured the trunk port for the vlans and all the other ports in access for the required vlans and the problem is still there.

    @stephenw10:

    Bad cable?

    Checked. We had done that already. Sorry for not mentioning it.

    @stephenw10:

    Check the error/collision count on the Status: Interfaces: page.

    Checked. Status Interfaces does not have any error/collision on any of the NICs. It's been like that ever since we last rebooted the box yesterday in the morning.

    Since all the above failed, I took a good look at https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards.
    This wiki entry mentions bce and em NICs.
    So I copied the wiki fixes for both NICs, added them to /boot/loader.conf.local, rebooted and apparently the issue is fixed.
    I say apparently because the ADSL modem is synchronized at 7mbps/800kbps and yet all my speedtests are giving me 5mbps/500kbps without any other user connected to the router.
    Still looks like very low figures but it is definitely an improvement.

    Besides that, they were also suffering a high number of unavailable websites due to a problem in squid :o or shall I say in squid configuration?
    Whatever, I removed squid, squidGuard and snort and now everything seems to be working fine.

    Also, I am preparing a 2nd box that I will be temporarily put in place while I format and reinstall pfSense 2.1.3 from ground up on the original server.
    I will also start with a fresh and new config file. Just in case ;)
    Will let you know how it goes after finishing my tests.



  • Which opteron CPU is in that DL365?


  • Netgate Administrator

    Ah, nice. You mentioned having looked at that page in your first post so I assumed you'd added the recommended tweaks already. Never assume anything!  ::)

    Steve



  • @podilarius:

    Which opteron CPU is in that DL365?

    pfSense says it's an Opteron 2216 (Dualcore)



  • That should be enough horsepower for a 7Mbps connection.  Are you still seeing only 5Mbps when all the extra service are disabled?


  • Netgate Administrator

    Easily enough power even with all the packages.
    It's rare that you can actually get near the sync speed, there is some overhead in DSL.

    Steve



  • True, but I think it would be closer to 7, like 6.5 to 6.9. What is the CPU load when you are testing? Are you doing iperf testing with multiple streams?



  • @podilarius:

    Are you doing iperf testing with multiple streams?

    ^ Sorry, don't know what that is :(
    CPU load is usually within 0.5 and 0.8.

    I've since installed 2.1.3 from ground up with a new config file from scratch.
    Internet speed remains the same but inside the LAN (and the several vlans) it now seems pretty much ok with 11-12MB/s.
    I have tested the adsl circuit with my pc connected to the adsl modem and it only gives me 5mbps/500kbps.
    Thanks for everybody contributing to this topic :)
    Cheers


  • Netgate Administrator

    If there is something throttling your bandwidth it may be doing it on a 'per connection' basis. Many traffic shaping tools work like that. Thus if you open up multiple simultaneous connections you often see an improvement in total throughput. The Speetest.net web site for example loads a local javascript client that opens up to 4 connections to get the best bandwidth reading it can.

    Anyway you seem to be sorted though it's always disappointing to fix something without knowing what the problem was.  ;)

    Steve



  • @stephenw10:

    …though it's always disappointing to fix something without knowing what the problem was.  ;)

    Exactly. I hate fixing something without knowing what the problem was.
    It's like installing Windows again when all it was needed was to change a registry entry.
    But I guess it's better than nothing.
    Now I'm sure I'll have to open up a new thread for squid/squidGuard and then another one for snort.
    Thanks for the support, guys.  ;)
    Cheers


Log in to reply