No matter what I do, through pfSense I'm getting between 190-200Mb down, and between 400-600Mb up..

MrSassinak

So actually I have this same issue, EXCEPT that I'm on a full 1Gig connection (symmetrical) in my Data Center.

If I remove pfsense from the equation I get a solid 982Mb up/down. (connecting laptop directly to the ethernet line provided by the Data Center.

Originally I was thinking this was due to Suricata/Snort so pulled them from the mix (and while it did improve the speed, it was only by 1-2Mb difference (which makes me happy actually that it was not causing that much of a hit). So I rolled everything back to basics (reverted all tunables, uninstalled all packages.. and no matter what, my download is still roughly half of my upload via pfsense.. Related to this I would like to find out why I'm maxing out at 600Mb (using the same processor as netgate's XG-7100 1U server with 8GB of RAM... CPU is hit hard when doing my tests (hits about 40-60% CPU load).

stephenw10

You have any traffic shaping in play?

Any packages installed?

Check Status > Interfaces are they all synced at 1Gb? Any errors showing?

40-60% cpu load could be two cores at 100%. Try running top -aSH at the CLI to see how that usage is broken down.

Steve

MrSassinak

@stephenw10 Nope.. I eliminated all the packages and removed all traffic shaping.. So its pretty much a basic pfsense box.

All the interfaces are synced at 1Gb (actually Fixed Speed/Duplex since auto-negotiate sometimes causes issues).

This is what things look like when idle (sending about 1-2Mb of data from some residual streams):

last pid: 70132;  load averages:  0.11,  0.09,  0.08                                                                                                    up 0+11:42:20  13:24:17
200 processes: 5 running, 156 sleeping, 39 waiting
CPU:  0.1% user,  0.0% nice,  0.7% system,  1.0% interrupt, 98.2% idle
Mem: 17M Active, 113M Inact, 338M Wired, 27M Buf, 7408M Free
Swap: 16G Total, 16G Free

  PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
   11 root     155 ki31     0K    64K CPU2    2 690:14  99.02% [idle{idle: cpu2}]
   11 root     155 ki31     0K    64K RUN     3 688:31  98.55% [idle{idle: cpu3}]
   11 root     155 ki31     0K    64K CPU1    1 684:06  97.76% [idle{idle: cpu1}]
   11 root     155 ki31     0K    64K CPU0    0 673:14  95.98% [idle{idle: cpu0}]
   12 root     -92    -     0K   624K WAIT    0  18:11   2.99% [intr{irq261: igb1:que 0}]
   12 root     -92    -     0K   624K WAIT    1   6:43   1.11% [intr{irq257: igb0:que 1}]
29579 root      20    0 10196K  5648K select  2   6:51   0.71% /usr/local/sbin/openvpn --config /var/etc/openvpn/server1.conf
48075 root      20    0 10196K  5620K select  0   4:06   0.64% /usr/local/sbin/openvpn --config /var/etc/openvpn/server2.conf
   12 root     -60    -     0K   624K WAIT    1   2:49   0.39% [intr{swi4: clock (0)}]
   12 root     -92    -     0K   624K WAIT    0   2:11   0.28% [intr{irq256: igb0:que 0}]
73588 root      52  -10  9588K  5404K select  2   1:02   0.13% /usr/local/sbin/tincd --config=/usr/local/etc/tinc
64525 root      20    0  7812K  4204K CPU3    3   0:00   0.13% top -aSH
48682 root      20    0  6600K  2380K bpf     3   0:59   0.12% /usr/local/sbin/filterlog -i pflog0 -p /var/run/filterlog.pid
   20 root     -16    -     0K    16K -       3   0:33   0.09% [rand_harvestq]
63955 root      20    0  6404K  2560K select  2   0:33   0.07% /usr/sbin/syslogd -s -c -c -l /var/dhcpd/var/run/log -P /var/run/syslog.pid -f /etc/syslog.conf
   12 root     -92    -     0K   624K WAIT    2   1:08   0.07% [intr{irq263: igb1:que 2}]
80585 root      20    0  6900K  2456K nanslp  1   0:07   0.05% [dpinger{dpinger}]
81868 root      20    0  6900K  2456K nanslp  3   0:07   0.05% [dpinger{dpinger}]
   12 root     -92    -     0K   624K WAIT    1   0:24   0.03% [intr{irq262: igb1:que 1}]
   19 root     -16    -     0K    16K pftm    0   0:53   0.03% [pf purge]
81494 root      20    0  6900K  2456K nanslp  1   0:07   0.03% [dpinger{dpinger}]
81182 root      20    0  6900K  2456K nanslp  3   0:07   0.03% [dpinger{dpinger}]
23476 root      20    0 12396K 12500K select  2   0:11   0.02% /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf -p /var/run/ntpd.pid{ntpd}
56889 unbound   20    0 52768K 25640K kqread  3   0:04   0.02% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
   12 root     -92    -     0K   624K WAIT    3   0:47   0.02% [intr{irq259: igb0:que 3}]
   15 root     -68    -     0K    80K -       1   0:02   0.01% [usb{usbus0}]
36417 root      20    0 12904K  8148K select  1   0:00   0.01% sshd: admin@pts/0 (sshd)
   12 root     -92    -     0K   624K WAIT    2   0:35   0.01% [intr{irq258: igb0:que 2}]
   12 root     -92    -     0K   624K WAIT    3   0:24   0.01% [intr{irq264: igb1:que 3}]
   12 root     -88    -     0K   624K WAIT    0   0:01   0.01% [intr{irq23: ehci0}]
   15 root     -68    -     0K    80K -       0   0:02   0.01% [usb{usbus0}]
   12 root     -72    -     0K   624K WAIT    3   0:02   0.01% [intr{swi1: netisr 0}]
81868 root      20    0  6900K  2456K sbwait  1   0:02   0.01% [dpinger{dpinger}]
80585 root      20    0  6900K  2456K sbwait  3   0:02   0.00% [dpinger{dpinger}]
81182 root      20    0  6900K  2456K sbwait  3   0:01   0.00% [dpinger{dpinger}]
81494 root      20    0  6900K  2456K sbwait  3   0:02   0.00% [dpinger{dpinger}]
81868 root      20    0  6900K  2456K nanslp  3   0:02   0.00% [dpinger{dpinger}]
81494 root      20    0  6900K  2456K nanslp  0   0:02   0.00% [dpinger{dpinger}]
   21 root     -16    -     0K    48K psleep  3   0:02   0.00% [pagedaemon{dom0}]
56889 unbound   20    0 52768K 25640K kqread  1   0:03   0.00% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
   25 root      20    -     0K    32K sdflus  2   0:02   0.00% [bufdaemon{/ worker}]
81182 root      20    0  6900K  2456K nanslp  2   0:02   0.00% [dpinger{dpinger}]
80585 root      20    0  6900K  2456K nanslp  3   0:02   0.00% [dpinger{dpinger}]
    0 root     -16    -     0K   624K swapin  0   1:09   0.00% [kernel{swapper}]

And this is how things look when I'm doing a speed small test (from 1-2% usage to 20-30%. When I do more data tests from multiple clients, then this goes up. but this is based on a single client doing a speed test):

last pid: 31858;  load averages:  0.22,  0.24,  0.16                                                                                                    up 0+11:48:39  13:30:36
200 processes: 5 running, 156 sleeping, 39 waiting
CPU:  1.7% user,  0.0% nice,  1.7% system,  9.1% interrupt, 87.5% idle
Mem: 19M Active, 113M Inact, 338M Wired, 27M Buf, 7407M Free
Swap: 16G Total, 16G Free

Message from syslogd@MCEFW at Nov  7 13:26:57 ... TIME    WCPU COMMAND
MCEFW php-fpm[340]: /index.php: SuccesCPU3    3 694:27  89.60min' from: 10.10.30.253 (Local Database)
   11 root     155 ki31     0K    64K RUN     0 678:56  87.13% [idle{idle: cpu0}]
   11 root     155 ki31     0K    64K CPU2    2 696:16  86.87% [idle{idle: cpu2}]
   11 root     155 ki31     0K    64K CPU1    1 689:57  84.43% [idle{idle: cpu1}]
   12 root     -92    -     0K   624K WAIT    2   1:14  10.67% [intr{irq263: igb1:que 2}]
   12 root     -92    -     0K   624K WAIT    1   0:35  10.32% [intr{irq262: igb1:que 1}]
   12 root     -92    -     0K   624K WAIT    0  18:28   12.28% [intr{irq261: igb1:que 0}]
   12 root     -92    -     0K   624K WAIT    3   0:31   5.05% [intr{irq264: igb1:que 3}]
   12 root     -92    -     0K   624K WAIT    3   0:53   2.70% [intr{irq259: igb0:que 3}]
   12 root     -92    -     0K   624K WAIT    1   6:51   2.14% [intr{irq257: igb0:que 1}]
  340 root      20    0 89064K 35676K accept  2   0:11   1.48% php-fpm: pool nginx (php-fpm)
   12 root     -92    -     0K   624K WAIT    0   2:23   1.37% [intr{irq256: igb0:que 0}]
  341 root      52    0 88936K 34364K accept  3   0:07   1.25% php-fpm: pool nginx (php-fpm)
29579 root      20    0 10196K  5648K select  3   6:54   1.16% /usr/local/sbin/openvpn --config /var/etc/openvpn/server1.conf
11255 root      52    0 88936K 33928K accept  0   0:10   0.75% php-fpm: pool nginx (php-fpm)
48075 root      20    0 10196K  5620K select  3   4:09   0.48% /usr/local/sbin/openvpn --config /var/etc/openvpn/server2.conf
   12 root     -60    -     0K   624K WAIT    2   2:50   0.36% [intr{swi4: clock (0)}]
   20 root     -16    -     0K    16K -       1   0:33   0.22% [rand_harvestq]
48682 root      20    0  6600K  2380K bpf     2   1:00   0.21% /usr/local/sbin/filterlog -i pflog0 -p /var/run/filterlog.pid
73588 root      52  -10  9588K  5404K select  1   1:03   0.17% /usr/local/sbin/tincd --config=/usr/local/etc/tinc
63955 root      20    0  6404K  2560K select  3   0:33   0.16% /usr/sbin/syslogd -s -c -c -l /var/dhcpd/var/run/log -P /var/run/syslog.pid -f /etc/syslog.conf
   12 root     -92    -     0K   624K WAIT    2   0:37   0.14% [intr{irq258: igb0:que 2}]
64525 root      20    0  7812K  4220K CPU0    0   0:01   0.13% top -aSH
21883 root      20    0 23592K  9080K kqread  0   0:00   0.12% nginx: worker process (nginx)

Still very puzzled as to why it's so slow.

akuma1x

@MrSassinak said in No matter what I do, through pfSense I'm getting between 190-200Mb down, and between 400-600Mb up..:

Nope.. I eliminated all the packages and removed all traffic shaping.. So its pretty much a basic pfsense box.

Is this a box that you can physically factory reset? You say above, you had those packages installed, but now they are removed. I believe I have heard it mentioned here several times that some bits and pieces and settings and preferences from some of these packages sometimes are still remaining in the system.

So, if you can, I would factory reset, then start throwing the tests back at it and see what you get.

Jeff

MrSassinak

@akuma1x Well, I realized that not everything was cleared out (example my firewall rules still had references to the Traffic shaping queues) so I factory reset it as well.. (after a backup and then restored it) to make sure everything was back sans the data.

And still having the same issue.

Download is pretty much half of the upload (even though its a symmetrical pipe) and neither is anywhere close to my SLA Speed of 1G. and if I bypass pfsense and just my Mac or a Windows laptop, I get full line speed.

akuma1x

@MrSassinak

No, I mean completely factory reset it. Don't install a backup, just run it plain vanilla from scratch.

Do the basic first-run setup wizard screens, then do your testing.

Jeff

MrSassinak

@akuma1x I reset everything back to factory (fresh software load.. all my data gone.. sniff) and still have the same problem.

stephenw10

@MrSassinak said in No matter what I do, through pfSense I'm getting between 190-200Mb down, and between 400-600Mb up..:

All the interfaces are synced at 1Gb (actually Fixed Speed/Duplex since auto-negotiate sometimes causes issues).

You should never have to set Gigabit Ethernet at fixed speed/duplex. Are you still doing that after the reset?
Try putting a switch in between the modem and firewall and leave the interfaces at autoselect.

Steve

johnpoz

Yeah if your hard coding gig, your doing it wrong!! And you have something wrong that should be looked at..

MrSassinak

@stephenw10 I don't HAVE to set the speed/duplex.. I just do it to eliminate it as a variable. Its been running autonegotiation forever (which it finds the speed correctly). But when there are speed issues, the first thing is always to reduce/eliminate as many variables as possible for the cause. (its why we dumbed the installation down to pretty much factory spec).

johnpoz

Setting speed and duplex on copper gig would not a variable you should ever mess with.. It should always be left at auto.

MrSassinak

@johnpoz While I do agree in principal, historically, I've had issues sometimes with Cisco and Broadcom core switches (not with THIS system mind you, but with others and eliminating autoneg got rid of those problems).

But as I'm at the end of my rope here, I'm willing to try just about anything.. (this problem has bene going on for a while, but as our data needs are ramping up, its now becoming more of an issue since we can't take advantage our SLA speed.

bmeeks

What happens many times when you try to force speed or duplex in Gigabit circuits is the other end of the conversation gets confused because Gigabit links want (just about demand, actually) to be set to auto-negotiate. If one end gets confused about speed or duplex (especially duplex), speeds will suffer tremendously.

I know where you are coming from by using prior bad experiences with auto-negotiate and thus wanting to hard-code, but that is strongly discouraged in Gigabit land. You can sometimes get away with it in 10/100 setups, but even there these days most everything expects auto-negotiate on copper.

johnpoz

Its not principle... its part of the standard..

Clause 40 (1000BASE-T) makes special use of Auto-Negotiation and requires additional MII registers. This use is summarized below. Details are provided in 40.5.

Auto-Negotiation is mandatory for 1000BASE-T (see 40.5.1).
1000BASE-T requires an ordered exchange of Next Page messages (see 40.5.1.2), or optionally an exchange of an Extended Next Page message
1000BASE-T parameters are configured based on information provided by the exchange of Next Page messages.
1000BASE-T uses MASTER and SLAVE to define PHY operations and to facilitate the timing of transmit and receive operations. Auto-Negotiation is used to provide information used to configure MASTER-SLAVE status (see 40.5.2).
1000BASE-T transmits and receives Next Pages for exchange of information related to MASTER-SLAVE operation. The information is specified in MII registers 9 and 10 (see 32.5.2 and 40.5.1.1), which are required in addition to registers 0-8 as defined in 28.2.4.
1000BASE-T adds new message codes to be transmitted during Auto-Negotiation (see 40.5.1.3).
1000BASE-T adds 1000BASE-T full duplex and half duplex capabilities to the priority resolution table (see 28B.3) and MII Extended Status Register (see 22.2.2.4).
1000BASE-T is defined as a valid value for “x” in 28.3.1 (e.g., link_status_1GigT.) 1GigT represents that the 1000BASE-T PMA is the signal source.

If your hard setting it - this for sure could cause you issues!

If you have issues with devices auto doing gig - then you need to figure out why that is to get gig.. Not hard set it.

MrSassinak

Folks, while I appreciate this is a hot topic, I think we are moving away from the central issue. autoneg or not, the problem of speed still exists (setting it fixed or autoneg, has zero impact on the issue).

As I mentioned earlier, I have had autoneg on for quite a long time but as our data needs are ramping up, its now becoming more of an issue since we can't take advantage our SLA speed hence in the interest of eliminating this as a variable (I don't control the data center upstream switch but I have confirmed with them channel speed, and optimum MTU size).

bmeeks

@MrSassinak said in No matter what I do, through pfSense I'm getting between 190-200Mb down, and between 400-600Mb up..:

Folks, while I appreciate this is a hot topic, I think we are moving away from the central issue. autoneg or not, the problem of speed still exists (setting it fixed or autoneg, has zero impact on the issue).

As I mentioned earlier, I have had autoneg on for quite a long time but as our data needs are ramping up, its now becoming more of an issue since we can't take advantage our SLA speed hence in the interest of eliminating this as a variable (I don't control the data center upstream switch but I have confirmed with them channel speed, and optimum MTU size).

You said in an earlier post you had it hard-coded. We assumed you still did.

Actually, reading through again, you said two different things. You said in one post you had it set to hard-coded. Then later you said it has been set to auto-negotiation forever. Which is it?

johnpoz

You can not actually troubleshoot a speed issue if your hard coding gig.. You can not - because now you have thrown in a known problem that could be the cause of the problem..

You need to forget the old days.. The only reason you would hard code a gig interface, is your forcing to use a lower speed than gig.

bmeeks

Also, have you tried swapping NICs if that is possible? Or swapping which NIC port is in use. Not out of the realm of possibilty that you have a physical hardware issue with a NIC or its port. Don't forget the obvious such as the cable on the pfSense WAN and LAN connection. I once had a speed/connectivity issue caused by a slightly bent gold finger contact inside an RJ45 port on a Dell motherboard. I was the second-level network support guy and my field premises tech was at the site. We had fiddled with the Cisco switch port, tried a different port, pulled out the RJ45 jack from the wall and re-punched the terminations, swapped the patch cables both in the wiring closet and at the PC. Nada. Then my field guy just happened to be peering at the right spot on the rear of the PC and saw the bent pin inside the RJ45 port on the motherboard. Swapped the motherboard and problem solved.

MrSassinak

@bmeeks Historically it has been set to autonegotiation. Since the speed has become a problem, in my effort to uncover what is the root cause, I've made several changes/modifications/tests (in sequence):

Disabling traffic shaping
Uninstalling all packages
Namely, rolling back all tunables
Setting Speed-Duplex to be fixed (this once it showed it had no impact, it was set back to be autospeed duplex. I think this is the part that confused people.. it was not left on, but it was set, tested, then reset back to auto)
And then reinstalling pfsense and then rolling back to previous configuration.

Then per akuma1x, I reset it back to factory (ie: out of the box with ZERO changes other than what's stock from the installation).

So right now, I have a clean zero configuration pfsense 2.4.4p3 install running on a Intel C3558 CPU with 8GB of RAM and 128GSSD. (nics are igb), and I am still stuck with 200Mb down and 400Mb up on a 1Gb Connection.

bmeeks

@MrSassinak said in No matter what I do, through pfSense I'm getting between 190-200Mb down, and between 400-600Mb up..:

@bmeeks Historically it has been set to autonegotiation. Since the speed has become a problem, in my effort to uncover what is the root cause, I've made several changes/modifications/tests (in sequence):

Disabling traffic shaping

Uninstalling all packages

Namely, rolling back all tunables

Setting Speed-Duplex to be fixed (this once it showed it had no impact, it was set back to be autospeed duplex. I think this is the part that confused people.. it was not left on, but it was set, tested, then reset back to auto)

And then reinstalling pfsense and then rolling back to previous configuration.

Then per akuma1x, I reset it back to factory (ie: out of the box with ZERO changes other than what's stock from the installation).

So right now, I have a clean zero configuration pfsense 2.4.4p3 install running on a Intel C3558 CPU with 8GB of RAM and 128GSSD. (nics are igb), and I am still stuck with 200Mb down and 400Mb up on a 1Gb Connection.

Look over my post immediately above this one. Don't discount that you may have a hardware issue somewhere.