PfSense seems hard limited at 100Mbps, very slow to respond

themrrobert

Hello all,

I have 2 pfsense boxes, I will describe the setup.
#1) "gig"
16 logical cores,
32gb ram
intel xeon L5520
CPU freq: 2261
Broadcom netxtreme ii
not sure of brand
wan0 - gigabit internet
link1 - link to "minor", gigabit link (gigalink)
lan2 - data miners - this is connected via 100Mb connection, so it's definitely bottlenecking, the miners could easily pass 100m. the problem is that i have a gigabit connection, and traffic that comes in the wan1 "gigalink", still comes back really slow, like very high ping, disconnected sessions, etc.
Rules:
Wan: deny from all
link1: allow all
lan2: allow all
Average b/w usage: 95Mbps down, <1Mbps up

This server, when pushing 100Mbps through uses 50% cpu. (averaged over 10 seconds, i've watched top it often system use often rises + falls in fat sine-shaped waves, pushing 100% (0 idle) then dropping to 2% (98%) idle.

#2) "minor"
intel xeon L5520
cpu freq: 2268
16 logical cores,
32gb ram
intel 82801JI x 6
wan0 - 500mb
link1 - link to gigabit, other 1000Tx-FD (gigalink)
lan2 - medium network lan, using up about a /22
dmz3 - websites doing lots of work
Total avg bandwidth usage: 45Mbps down, 20Mbps up

this system has a complex set of firewall, while optimized, surely are more complex than "gig"'s . it has about 8 rules allowing inbound wan, using limiters to ensure voice priority, and more.

This system pushes 40-50Mbs and it only uses 9% cpu.

I can't possibly believe that this should be correct, what may be causing this?

Theres plenty of room left all throughout the 'minor''s networks so it looks like the bottleneck is the cpu on the router.

Sample ping from a network (192.168.192.1 is the link1 interface on the gig, and 192.168.192.2 is the address of the router "minor")
so you can clearly see the source of this ridiculous lag.

PING 192.168.192.1 (192.168.192.1) 56(84) bytes of data.
64 bytes from 192.168.192.1: icmp_req=1 ttl=63 time=15022 ms
64 bytes from 192.168.192.1: icmp_req=2 ttl=63 time=14016 ms
64 bytes from 192.168.192.1: icmp_req=3 ttl=63 time=13008 ms
64 bytes from 192.168.192.1: icmp_req=4 ttl=63 time=12000 ms
64 bytes from 192.168.192.1: icmp_req=5 ttl=63 time=10992 ms
64 bytes from 192.168.192.1: icmp_req=6 ttl=63 time=9984 ms
64 bytes from 192.168.192.1: icmp_req=7 ttl=63 time=8976 ms
64 bytes from 192.168.192.1: icmp_req=8 ttl=63 time=7968 ms
64 bytes from 192.168.192.1: icmp_req=9 ttl=63 time=6960 ms
64 bytes from 192.168.192.1: icmp_req=10 ttl=63 time=5952 ms
64 bytes from 192.168.192.1: icmp_req=11 ttl=63 time=4944 ms
64 bytes from 192.168.192.1: icmp_req=12 ttl=63 time=3936 ms
64 bytes from 192.168.192.1: icmp_req=13 ttl=63 time=2928 ms
64 bytes from 192.168.192.1: icmp_req=14 ttl=63 time=1920 ms
64 bytes from 192.168.192.1: icmp_req=15 ttl=63 time=912 ms
64 bytes from 192.168.192.1: icmp_req=16 ttl=63 time=0.197 ms
^C
–- 192.168.192.1 ping statistics ---
21 packets transmitted, 16 received, 23% packet loss, time 20108ms
rtt min/avg/max/mdev = 0.197/7470.258/15022.992/4636.801 ms, pipe 15
robertmoss@robert-desktop ~ $ ping 192.168.192.2
PING 192.168.192.2 (192.168.192.2) 56(84) bytes of data.
64 bytes from 192.168.192.2: icmp_req=1 ttl=64 time=0.174 ms
64 bytes from 192.168.192.2: icmp_req=2 ttl=64 time=0.174 ms
64 bytes from 192.168.192.2: icmp_req=3 ttl=64 time=0.153 ms
64 bytes from 192.168.192.2: icmp_req=4 ttl=64 time=0.173 ms
64 bytes from 192.168.192.2: icmp_req=5 ttl=64 time=0.176 ms
64 bytes from 192.168.192.2: icmp_req=6 ttl=64 time=0.188 ms
64 bytes from 192.168.192.2: icmp_req=7 ttl=64 time=0.166 ms
64 bytes from 192.168.192.2: icmp_req=8 ttl=64 time=0.189 ms
^C
--- 192.168.192.2 ping statistics ---
8 packets transmitted, 8 received, 0% packet loss, time 6997ms
rtt min/avg/max/mdev = 0.153/0.174/0.189/0.012 ms

The nic cards have tcpoffload, and i'm not even using the broadcom nic's at present.

I've tried backing up and restoring a config, and that didn't work. I don't know what else to do? I have trouble believing that just 100Mbps of traffic through almost 0 rules is causing that much lag without a deeper problem.

Computer -> 100mb switch -> gigabit link to minor -> gigabit link to gig.
the link at the 100mb point is super super unsaturated so that can't possibly be the cause, especially since pinging the previous hop is perfect, and its only while crossing a direct end-to-end connection that it has the lag, and i'm sure it's the box, because it lags on all interfaces, and the cpu fluctuation seems to agree. what can i check? it looks like it's check_reload_filter that is causing the cpu spike, but i see the same process on the minor pc that uses cpu and it doesn't make the cpu rise like that.

Pings will be horrible, like 15k ms, 10k ms, 9k ms, etc etc, then for a few seconds everything is < 1ms (idle is 99%), then it jumps back up (idle is 50% or less, its odd that even at 50% its so bad)

it's odd that it never seems to top 100Mb, especially since i should be able to connect in via another interface and download something to boost the total downspeed, but the latency is so terrible i can't

themrrobert

More notes:
The 'failing' gig router only has 25k active states,

while the minor one has 35k

i can't see any reason for the gig to be failing. obviously, it's data miners on the gig, so they are probably making more requests in a period of time, but could that really cause that much system traffic on pfSense?

I disabled the 'disable hardware offload' settings, and hit apply, it said applied successfully. Do I need to restart for that to take effect?

Also, I noticed a problem with one of the lans that happened to be on the broadcom bce1, so i moved it to the intel nic, and when i booted i saw this message:
bce1: bce_pulse(): Bootcode lost the driver pulse! (bc_state = 0x0003600E)

And interestingly, while I'm looking at why something was blocked travelling across the link (,(clicking the red x)) it lists the interface as bce1 even though nothing could be using that.

Those 2 might not even be related just thought i'd mention it.

themrrobert

I have solved it.

The Firewall was being screwed up from all of the retransmissions and errors that were caused by the overloaded 10/100 switches.

Once I enabled a limiter to limit the up/down speed to 76Mb, everything seems to work fine, in fact, we are actually getting more than 100Mb downstream now, which seems odd since it can only push out 76Mb at a time, I'm wondering if the firewall will get backed up with data in the limiter.

Regardless, now even though the firewall has to control that data, it seems to be doing much better, CPU usage is down, everything is looking good.

Only thing is I still don't know where that broadcom driver error originated from, I will reinstall the firewall at some point and see if that fixes it, but for now, I just won't use the ports

kejianshi

"pfSense seems hard limited at 100Mbps"

"The Firewall was being screwed up from all of the retransmissions and errors that were caused by the overloaded 10/100 switches."

And the graph is capped at 100Mbps.

Now - Thats funny.

Anyway, Can you install some gigabit capable stuff? I think you have a need for it.