PFsense on a Poweredge 1850

vman76

Hello All,

I’m rolling out a Pfsense box to handle Internet traffic for a small college dorm. There around 3,000 users with a lot of youtube and Netflix usage. We have a Cisco ASA 5550 but are pushing it to its 1.2 Gbps advertised limit when combined with Academic traffic.

Our dorm interface usage on the ASA is:
Peak 600Mbps
Peak : 60,000 Packets per Second

We do 1-to-1 NAT and pass about 8,000 IP addresses through the firewall. The hardware we are using is a Dell Powerdge 1850 with 4 Intel Pro 10/100/1000 interfaces. Dual onboard, dual on a PCI-? Card.
http://www.dell.com/downloads/global/products/pedge/en/1850_specs.pdf

Pfsense detected:
CPU Type:
Intel(R) Xeon(TM) CPU 3.00GHz
4 CPUs: 2 package(s) x 1 core(s) x 2 HTT threads
4 GB RAM (we can easily add another 4 GB).

We sent some traffic over the last 2 weeks through the PFsense. We’ve tested 2 dorm buildings and at peak hit 200Mbps ,15,000PPS at only 11% CPU and less than 1 gig RAM. These 2 buildings account for about 20% of all our expected traffic.

It’s working out nicely and I think it’ll handle the 600Mbps. Next year we’re upgrading to a 1 Gbps Internet circuit. Does this set up seem sufficient to handle that? I’m hoping someone built something like this using the same hardware specs.

EDIT:
The inside and outside interfaces are on this PCI Bus:
em0@pci0:3:11:0: class=0x020000 card=0x10128086 chip=0x10108086 rev=0x01 hdr=0x00
cap 01[dc] = powerspec 2 supports D0 D3 current D0
cap 07[e4] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split transaction
cap 05[f0] = MSI supports 1 message, 64 bit
em1@pci0:3:11:1: class=0x020000 card=0x10128086 chip=0x10108086 rev=0x01 hdr=0x00
cap 01[dc] = powerspec 2 supports D0 D3 current D0
cap 07[e4] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split transaction
cap 05[f0] = MSI supports 1 message, 64 bit

Thanks,
vman

vman76

I figured I'd update this in case someone tries to use the same hardware and does a search on here.

The NIC i'm using is:
Dell Intel PRO1000MT PCI-X Dual Port Network Card Adapter J1679

The firewall held up very well as we added more networks to it on a weekly basis.CPU, Memory, and mbufs held up just fine. However, once we came close to 300 Mbps and 30,000 PPS it started getting input errors on the outside Internet facing interfaces. After a some investigating, the em0 NIC which is on the PCI-X was getting overruns and dropping frames:

sysctl dev.em.0 output.

dev.em.0.mac_stats.missed_packets: 6059754
dev.em.0.mac_stats.recv_no_buff: 7508997

Here is the sampling of 1 minute data showing the traffic rate, PPS and input errors. I gathered these with an SNMP script:

02-28-2014 22:39:01 IN: RATE 369 Mbps PPS: 35252 ERRORS: 550
02-28-2014 22:40:01 IN: RATE 339 Mbps PPS: 32869 ERRORS: 265
02-28-2014 22:41:01 IN: RATE 343 Mbps PPS: 32961 ERRORS: 45
02-28-2014 22:42:01 IN: RATE 396 Mbps PPS: 37093 ERRORS: 767
02-28-2014 22:43:01 IN: RATE 361 Mbps PPS: 34294 ERRORS: 1095
02-28-2014 22:44:01 IN: RATE 306 Mbps PPS: 29744 ERRORS: 194

Once the PPS and Mbps drop, the errors go away:

02-28-2014 23:48:01 IN: RATE 266 Mbps PPS: 26766 ERRORS: 0
02-28-2014 23:49:01 IN: RATE 277 Mbps PPS: 27468 ERRORS: 0
02-28-2014 23:50:01 IN: RATE 236 Mbps PPS: 24109 ERRORS: 0

I don't think this bad boy will handle anywhere near 600mbps without a ton of dropped frames so I have look at another solution. I have a better Intel pro PT quad card but it's PCI-E and my mobo is configured for PCI-X. I have the onboard dual NIC as an option too but I believe it is alsoPCI-X and for some reason will only connect at 100 Mbps on my Cisco 6513 10/100/1000 line card. I tried both hardcoding and autoneg'ing both sides.

If anyone has any ideas or suggestions I'm up for hearing it.

Edit: Just checked last night numbers and they were worse:

03-03-2014 23:57:01 IN: RATE 403 PPS: 38316 Mbps ERRORS: 4004
03-03-2014 23:58:01 IN: RATE 417 PPS: 39887 Mbps ERRORS: 3996
03-03-2014 23:59:01 IN: RATE 344 PPS: 33619 Mbps ERRORS: 920

We send half of our subnets through the pfsense, the other half through an ASA 5550. Its stats for comparison are:

03-03-2014 23:30:01 IN: RATE 328 Mbps PPS: 32666 ERRORS: 13
03-03-2014 23:31:01 IN: RATE 353 Mbps PPS: 34750 ERRORS: 0
03-03-2014 23:32:01 IN: RATE 355 Mbps PPS: 34685 ERRORS: 0
03-03-2014 23:33:01 IN: RATE 346 Mbps PPS: 33931 ERRORS: 2
03-03-2014 23:34:01 IN: RATE 353 Mbps PPS: 34645 ERRORS: 0
03-03-2014 23:35:01 IN: RATE 357 Mbps PPS: 34787 ERRORS: 0

The errors on the ASA are also overrun input errors on the NIC just like the pfsense. They start getting high around 450-500 Mbps. The Cisco interfaces are on a PCI-e card.

podilarius

Try the suggestions here for em type interfaces.

https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards

vman76

@podilarius:

Try the suggestions here for em type interfaces.

https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards

Thanks for the reply. The only suggestion I see is:

Certain intel igb cards, especially multi-port cards, can very easily/quickly exhaust mbufs and cause panics, especially on amd64. The following tweaks should help:
In /boot/loader.conf.local - Add the following (or create the file if it does not exist):
kern.ipc.nmbclusters="131072"
hw.igb.num_queues=1
That will increase the amount of network memory buffers, and make the card use one queue instead of multiple queues, to reduce the strain on the system.

In my case, I don't think I'm ever running out of mbufs so do you still think that fix would apply?

vmstat -z | head -1 ; vmstat -z | grep -i mbuf
ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
mbuf_packet: 256, 0, 1053, 995, 33971588716, 0
mbuf: 256, 0, 34, 1128, 39655750379, 0
mbuf_cluster: 2048, 25600, 2049, 759, 6468255267, 0
mbuf_jumbo_page: 4096, 12800, 0, 119, 363049, 0
mbuf_jumbo_9k: 9216, 6400, 0, 0, 0, 0
mbuf_jumbo_16k: 16384, 3200, 0, 0, 0, 0
mbuf_ext_refcnt: 4, 0, 0, 0, 0, 0

vman76

I was able to get my paws on poweredge 1950 which is a newer generation of hardware with the following specs:

Intel(R) Xeon(R) CPU 5130 @ 2.00GHz
4 CPUs: 2 package(s) x 2 core(s)

4GB Ram and a PCI-e quad port intel Pro/1000 PT card. I've got the 1850's config loaded on there and am eager to test it out.

em0@pci0:14:0:0: class=0x020000 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00
class = network
subclass = ethernet
cap 01[c8] = powerspec 2 supports D0 D3 current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 256(256) link x4(x4)
ecap 0001[100] = AER 1 0 fatal 1 non-fatal 0 corrected
ecap 0003[140] = Serial 1 001517ffff8525cc

em1@pci0:14:0:1: class=0x020000 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00
class = network
subclass = ethernet
cap 01[c8] = powerspec 2 supports D0 D3 current D0
cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[e0] = PCI-Express 1 endpoint max data 256(256) link x4(x4)
ecap 0001[100] = AER 1 0 fatal 1 non-fatal 0 corrected
ecap 0003[140] = Serial 1 001517ffff8525cc

bryan.paradis

PCI-X should have plenty enough bandwidth to max out that card without issue. There could be something screwing with it in the bios. Did you try the second PCI-X slot?

Also I am not sure but 1850s have different risers for different card configurations. Grabbing the PCI-E riser off ebay might get you to PCI-E but you shouldn't need to anyway!

http://bsdrp.net/documentation/technical_docs/performance

Check out that link. Lots of information there.

vmstat -i

It wouldn't be interrupt related would it?

sysctl hw.em.0

What is the rest of that output?

vman76

@bryan.paradis:

PCI-X should have plenty enough bandwidth to max out that card without issue. There could be something screwing with it in the bios. Did you try the second PCI-X slot?

Also I am not sure but 1850s have different risers for different card configurations. Grabbing the PCI-E riser off ebay might get you to PCI-E but you shouldn't need to anyway!

http://bsdrp.net/documentation/technical_docs/performance

Check out that link. Lots of information there.
vmstat -i
It wouldn't be interrupt related would it?
sysctl hw.em.0
What is the rest of that output?

Hello Bryan,
I agree that the PCI-X should have enough bandwidth too. I don't think the Mbps is the issue.I didn't try the other side of the riser card. I think it's the amount of packets and how fast they're coming into the NIC. This firewall is sitting behind a Cisco ASR-1002 which can forward a ton of packets faster than the NIC on the PFsense can take them in. I think that's why I see the same exact issue (overruns on input) with our ASA 5500 albeit at a higher PPS. Based on tighter sampling, I can see on the ASR that is is forwarding over 100,000 pps to the firewalls.

We can use this 1850 for something else at the college so I swapped it for newer server with a better NIC. I need to get this firewall back in action ASAP so that seemed like my best option.

That's a great link and it's the one I used to do most of the troubleshooting when I saw the issue.

I'm not sure if it's interrupt related. I spent more time troubleshooting the ASA than the 1850 and the 2 scenarios that may be happening here: (from Cisco site):

Software level - The ASA software does not pull the packets off of the interface FIFO queue fast enough. This causes the FIFO queue to fill up and new packets to be dropped.

Hardware level - The rate at which packets come into the interface is too fast, which causes the FIFO queue to fill before the ASA software can pull the packets off. Usually, a burst of packets causes the FIFO queue to fill up to maximum capacity in a short amount of time.

The CPU on the ASA wasn't anywhere near maxed out and the PFSense CPU was also not taxed so I think its the latter of the 2 scenarios. I attached a graph of the CPU for the PF during peak usage (350-400 Mbps)

The full output of the sysctl command from this AM before I decommissioned the 1850:

dev.em.0.%desc: Intel(R) PRO/1000 Legacy Network Connection 1.0.4
dev.em.0.%driver: em
dev.em.0.%location: slot=11 function=0
dev.em.0.%pnpinfo: vendor=0x8086 device=0x1010 subvendor=0x8086 subdevice=0x1012 class=0x020000
dev.em.0.%parent: pci3
dev.em.0.nvm: -1
dev.em.0.rx_int_delay: 0
dev.em.0.tx_int_delay: 66
dev.em.0.rx_abs_int_delay: 66
dev.em.0.tx_abs_int_delay: 66
dev.em.0.rx_processing_limit: 100
dev.em.0.flow_control: 3
dev.em.0.mbuf_alloc_fail: 0
dev.em.0.cluster_alloc_fail: 0
dev.em.0.dropped: 0
dev.em.0.tx_dma_fail: 0
dev.em.0.tx_desc_fail1: 0
dev.em.0.tx_desc_fail2: 4
dev.em.0.rx_overruns: 77194
dev.em.0.watchdog_timeouts: 0
dev.em.0.device_control: 1223688777
dev.em.0.rx_control: 32770
dev.em.0.fc_high_water: 47104
dev.em.0.fc_low_water: 45604
dev.em.0.fifo_workaround: 0
dev.em.0.fifo_reset: 0
dev.em.0.txd_head: 49
dev.em.0.txd_tail: 49
dev.em.0.rxd_head: 164
dev.em.0.rxd_tail: 163
dev.em.0.mac_stats.excess_coll: 0
dev.em.0.mac_stats.single_coll: 0
dev.em.0.mac_stats.multiple_coll: 0
dev.em.0.mac_stats.late_coll: 0
dev.em.0.mac_stats.collision_count: 0
dev.em.0.mac_stats.symbol_errors: 0
dev.em.0.mac_stats.sequence_errors: 0
dev.em.0.mac_stats.defer_count: 3567
dev.em.0.mac_stats.missed_packets: 6059754
dev.em.0.mac_stats.recv_no_buff: 7508997
dev.em.0.mac_stats.recv_undersize: 0
dev.em.0.mac_stats.recv_fragmented: 0
dev.em.0.mac_stats.recv_oversize: 0
dev.em.0.mac_stats.recv_jabber: 0
dev.em.0.mac_stats.recv_errs: 0
dev.em.0.mac_stats.crc_errs: 0
dev.em.0.mac_stats.alignment_errs: 0
dev.em.0.mac_stats.coll_ext_errs: 0
dev.em.0.mac_stats.xon_recvd: 3591
dev.em.0.mac_stats.xon_txd: 0
dev.em.0.mac_stats.xoff_recvd: 3591
dev.em.0.mac_stats.xoff_txd: 0
dev.em.0.mac_stats.total_pkts_recvd: 20984938718
dev.em.0.mac_stats.good_pkts_recvd: 20978871785
dev.em.0.mac_stats.bcast_pkts_recvd: 55671
dev.em.0.mac_stats.mcast_pkts_recvd: 42983
dev.em.0.mac_stats.rx_frames_64: 411105803
dev.em.0.mac_stats.rx_frames_65_127: 1531294228
dev.em.0.mac_stats.rx_frames_128_255: 670658750
dev.em.0.mac_stats.rx_frames_256_511: 290321790
dev.em.0.mac_stats.rx_frames_512_1023: 366207236
dev.em.0.mac_stats.rx_frames_1024_1522: 17709283978
dev.em.0.mac_stats.good_octets_recvd: 27173769214521
dev.em.0.mac_stats.good_octets_txd: 2201587061146
dev.em.0.mac_stats.total_pkts_txd: 11657222216
dev.em.0.mac_stats.good_pkts_txd: 11657222216
dev.em.0.mac_stats.bcast_pkts_txd: 3179
dev.em.0.mac_stats.mcast_pkts_txd: 2
dev.em.0.mac_stats.tx_frames_64: 4253849187
dev.em.0.mac_stats.tx_frames_65_127: 5647725507
dev.em.0.mac_stats.tx_frames_128_255: 455801927
dev.em.0.mac_stats.tx_frames_256_511: 188977807
dev.em.0.mac_stats.tx_frames_512_1023: 278759522
dev.em.0.mac_stats.tx_frames_1024_1522: 832108266
dev.em.0.mac_stats.tso_txd: 0
dev.em.0.mac_stats.tso_ctx_fail: 0

PF-CPU.jpg_thumb

bryan.paradis

100,000 pps really doesn't seem like much?

A Ubiquiti Edge Router should be able to pound out 10 times that in certain cases.

Did you try turning on polling for the interface?

ifconfig interface polling

http://www.cyberciti.biz/faq/freebsd-device-polling-network-polling-tutorial/

For an idea on sort of performance potential in that pci-x nic check here:

http://pdos.csail.mit.edu/~rtm/e1000/

missed errors and no buffer errors advice on this page at the bottom

https://nuclearcat.com/mediawiki/index.php/Intel_Gigabit_Performance

and more tuning information

https://calomel.org/freebsd_network_tuning.html

jasonlitka

@bryan.paradis:

A Ubiquiti Edge Router should be able to pound out 10 times that in certain cases.

That's debatable. Just because they said it could doesn't mean it can.

vman76

@bryan.paradis:

100,000 pps really doesn't seem like much?

A Ubiquiti Edge Router should be able to pound out 10 times that in certain cases.

Did you try turning on polling for the interface?
ifconfig interface polling
http://www.cyberciti.biz/faq/freebsd-device-polling-network-polling-tutorial/

For an idea on sort of performance potential in that pci-x nic check here:

http://pdos.csail.mit.edu/~rtm/e1000/

missed errors and no buffer errors advice on this page at the bottom

https://nuclearcat.com/mediawiki/index.php/Intel_Gigabit_Performance

and more tuning information

https://calomel.org/freebsd_network_tuning.html

Thanks for all the links!

I'm always leary about PPS numbers advertised that aren't taken in production environments. The Cisco 7206 VXR NPE-G1 also is spec'd out at 1,000,000 PPS. In our environment by the time it gets to 150,000 PPS @ 600 Mbps, it'll be dropping as well. Especially if any ACLs or features are enabled.

I have the 1950 running and Iperf between 2 directly connected hosts shows promisinng numbers. 960 Mbps and 120,000 PPS with no input errors or drops. The CPU hung around 30% during the test. Production traffic will show it's true colors.

packets errs idrops bytes packets errs bytes colls
115k 0 0 116M 115k 0 116M 0
113k 0 0 114M 113k 0 114M 0
115k 0 0 116M 115k 0 116M 0
113k 0 0 114M 113k 0 114M 0
115k 0 0 116M 115k 0 116M 0
114k 0 0 115M 114k 0 115M 0

stephenw10

Real world numbers are always great to have. :)

I would have expected the 1850 to manage substantially more though. I have no numbers to prove it. ::)

Steve

bryan.paradis

@vman76:

@bryan.paradis:
100,000 pps really doesn't seem like much?

A Ubiquiti Edge Router should be able to pound out 10 times that in certain cases.

Did you try turning on polling for the interface?
ifconfig interface polling
http://www.cyberciti.biz/faq/freebsd-device-polling-network-polling-tutorial/

For an idea on sort of performance potential in that pci-x nic check here:

http://pdos.csail.mit.edu/~rtm/e1000/

missed errors and no buffer errors advice on this page at the bottom

https://nuclearcat.com/mediawiki/index.php/Intel_Gigabit_Performance

and more tuning information

https://calomel.org/freebsd_network_tuning.html
Thanks for all the links!

I'm always leary about PPS numbers advertised that aren't taken in production environments. The Cisco 7206 VXR NPE-G1 also is spec'd out at 1,000,000 PPS. In our environment by the time it gets to 150,000 PPS @ 600 Mbps, it'll be dropping as well. Especially if any ACLs or features are enabled.

I have the 1950 running and Iperf between 2 directly connected hosts shows promisinng numbers. 960 Mbps and 120,000 PPS with no input errors or drops. The CPU hung around 30% during the test. Production traffic will show it's true colors.

packets errs idrops bytes packets errs bytes colls
115k 0 0 116M 115k 0 116M 0
113k 0 0 114M 113k 0 114M 0
115k 0 0 116M 115k 0 116M 0
113k 0 0 114M 113k 0 114M 0
115k 0 0 116M 115k 0 116M 0
114k 0 0 115M 114k 0 115M 0

That is looking better for sure. mind posting the sysctl for that guy? Also what size packets are you using or were using in the test?

@stephenw10:

Real world numbers are always great to have. :)

I would have expected the 1850 to manage substantially more though. I have no numbers to prove it. ::)

Steve

It is just really too low for the 1850 imo.

http://dl.ubnt.com/Tolly212127UbiquitiEdgeRouterLitePricePerformance.pdf It is a Tolly report looking for PPS from another reviewer. These things are wicked fast really for what they are. People have freebsd running on them already!

stephenw10

Impressive.
The ERL has a custom ASIC to enable it to perform like that. It's not supported by FreeBSD, so if/when pfSense runs on it don't expect those numbers. Currently tops out at 250Mbps.

Steve

vman76

@bryan.paradis:

That is looking better for sure. mind posting the sysctl for that guy? Also what size packets are you using or were using in the test?

Sure, here is the current data . The firewall is now in production and averaging 150Mbps, @ 24,000 PPS with no issues since around noon. I tried various iperfs but the money spot was this one:

iperf -c –w 65000 –t 600 –P5

Which should use the full Ethernet frame. I tried a bunch of other windows sizes and more flows (up to -P 50) along with UDP tests. The above gave me the best results.Looking at the distribution of packets on the last firewall, and on routes netflow roue-cache the students use mostly applications with large packets (video streaming, filesharing etc). I'd like to have done some more testing but time constraints did not allow it.

dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.3.2
dev.em.0.%driver: em
dev.em.0.%location: slot=0 function=0
dev.em.0.%pnpinfo: vendor=0x8086 device=0x10a4 subvendor=0x8086 subdevice=0x10a4 class=0x020000
dev.em.0.%parent: pci14
dev.em.0.nvm: -1
dev.em.0.debug: -1
dev.em.0.fc: 3
dev.em.0.rx_int_delay: 0
dev.em.0.tx_int_delay: 66
dev.em.0.rx_abs_int_delay: 66
dev.em.0.tx_abs_int_delay: 66
dev.em.0.rx_processing_limit: 100
dev.em.0.eee_control: 0
dev.em.0.link_irq: 0
dev.em.0.mbuf_alloc_fail: 0
dev.em.0.cluster_alloc_fail: 0
dev.em.0.dropped: 0
dev.em.0.tx_dma_fail: 0
dev.em.0.rx_overruns: 0
dev.em.0.watchdog_timeouts: 0
dev.em.0.device_control: 1209795137
dev.em.0.rx_control: 67141634
dev.em.0.fc_high_water: 30720
dev.em.0.fc_low_water: 29220
dev.em.0.queue0.txd_head: 192
dev.em.0.queue0.txd_tail: 192
dev.em.0.queue0.tx_irq: 0
dev.em.0.queue0.no_desc_avail: 0
dev.em.0.queue0.rxd_head: 531
dev.em.0.queue0.rxd_tail: 530
dev.em.0.queue0.rx_irq: 0
dev.em.0.mac_stats.excess_coll: 0
dev.em.0.mac_stats.single_coll: 0
dev.em.0.mac_stats.multiple_coll: 0
dev.em.0.mac_stats.late_coll: 0
dev.em.0.mac_stats.collision_count: 0
dev.em.0.mac_stats.symbol_errors: 0
dev.em.0.mac_stats.sequence_errors: 0
dev.em.0.mac_stats.defer_count: 5793
dev.em.0.mac_stats.missed_packets: 0
dev.em.0.mac_stats.recv_no_buff: 139
dev.em.0.mac_stats.recv_undersize: 0
dev.em.0.mac_stats.recv_fragmented: 0
dev.em.0.mac_stats.recv_oversize: 0
dev.em.0.mac_stats.recv_jabber: 0
dev.em.0.mac_stats.recv_errs: 0
dev.em.0.mac_stats.crc_errs: 0
dev.em.0.mac_stats.alignment_errs: 0
dev.em.0.mac_stats.coll_ext_errs: 0
dev.em.0.mac_stats.xon_recvd: 5929
dev.em.0.mac_stats.xon_txd: 120
dev.em.0.mac_stats.xoff_recvd: 5929
dev.em.0.mac_stats.xoff_txd: 120
dev.em.0.mac_stats.total_pkts_recvd: 397413786
dev.em.0.mac_stats.good_pkts_recvd: 397401928
dev.em.0.mac_stats.bcast_pkts_recvd: 2715
dev.em.0.mac_stats.mcast_pkts_recvd: 1528
dev.em.0.mac_stats.rx_frames_64: 11419946
dev.em.0.mac_stats.rx_frames_65_127: 24122771
dev.em.0.mac_stats.rx_frames_128_255: 5438765
dev.em.0.mac_stats.rx_frames_256_511: 2942593
dev.em.0.mac_stats.rx_frames_512_1023: 13221690
dev.em.0.mac_stats.rx_frames_1024_1522: 340256163
dev.em.0.mac_stats.good_octets_recvd: 504144384891
dev.em.0.mac_stats.good_octets_txd: 70175650866
dev.em.0.mac_stats.total_pkts_txd: 199599490
dev.em.0.mac_stats.good_pkts_txd: 199599248
dev.em.0.mac_stats.bcast_pkts_txd: 1616
dev.em.0.mac_stats.mcast_pkts_txd: 2
dev.em.0.mac_stats.tx_frames_64: 83244952
dev.em.0.mac_stats.tx_frames_65_127: 68946765
dev.em.0.mac_stats.tx_frames_128_255: 3324597
dev.em.0.mac_stats.tx_frames_256_511: 2036340
dev.em.0.mac_stats.tx_frames_512_1023: 3106394
dev.em.0.mac_stats.tx_frames_1024_1522: 38940203
dev.em.0.mac_stats.tso_txd: 0
dev.em.0.mac_stats.tso_ctx_fail: 0
dev.em.0.interrupts.asserts: 106244188
dev.em.0.interrupts.rx_pkt_timer: 39933
dev.em.0.interrupts.rx_abs_timer: 0
dev.em.0.interrupts.tx_pkt_timer: 5731
dev.em.0.interrupts.tx_abs_timer: 11354
dev.em.0.interrupts.tx_queue_empty: 0
dev.em.0.interrupts.tx_queue_min_thresh: 0
dev.em.0.interrupts.rx_desc_min_thresh: 0
dev.em.0.interrupts.rx_overrun: 0

bryan.paradis

@stephenw10:

Impressive.
The ERL has a custom ASIC to enable it to perform like that. It's not supported by FreeBSD, so if/when pfSense runs on it don't expect those numbers. Currently tops out at 250Mbps.

Steve

Yes indeed. It is a heavily changed vyatta base OS on debian mips cavicum. The driver would need to be ported. Still at $99

http://rtfm.net/FreeBSD/ERL/

Performance could be a little better, though it's more than adequate for my home Internet connection. Basic packet passing between two Gigabit hosts seems to top out at about 250Mbits/sec.

https://wiki.freebsd.org/FreeBSD/mips/Octeon

@vman76:

@bryan.paradis:

That is looking better for sure. mind posting the sysctl for that guy? Also what size packets are you using or were using in the test?

Sure, here is the current data . The firewall is now in production and averaging 150Mbps, @ 24,000 PPS with no issues since around noon. I tried various iperfs but the money spot was this one:

iperf -c –w 65000 –t 600 –P5

Which should use the full Ethernet frame. I tried a bunch of other windows sizes and more flows (up to -P 50) along with UDP tests. The above gave me the best results.Looking at the distribution of packets on the last firewall, and on routes netflow roue-cache the students use mostly applications with large packets (video streaming, filesharing etc). I'd like to have done some more testing but time constraints did not allow it.

dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.3.2
dev.em.0.%driver: em
dev.em.0.%location: slot=0 function=0
dev.em.0.%pnpinfo: vendor=0x8086 device=0x10a4 subvendor=0x8086 subdevice=0x10a4 class=0x020000
dev.em.0.%parent: pci14
dev.em.0.nvm: -1
dev.em.0.debug: -1
dev.em.0.fc: 3
dev.em.0.rx_int_delay: 0
dev.em.0.tx_int_delay: 66
dev.em.0.rx_abs_int_delay: 66
dev.em.0.tx_abs_int_delay: 66
dev.em.0.rx_processing_limit: 100
dev.em.0.eee_control: 0
dev.em.0.link_irq: 0
dev.em.0.mbuf_alloc_fail: 0
dev.em.0.cluster_alloc_fail: 0
dev.em.0.dropped: 0
dev.em.0.tx_dma_fail: 0
dev.em.0.rx_overruns: 0
dev.em.0.watchdog_timeouts: 0
dev.em.0.device_control: 1209795137
dev.em.0.rx_control: 67141634
dev.em.0.fc_high_water: 30720
dev.em.0.fc_low_water: 29220
dev.em.0.queue0.txd_head: 192
dev.em.0.queue0.txd_tail: 192
dev.em.0.queue0.tx_irq: 0
dev.em.0.queue0.no_desc_avail: 0
dev.em.0.queue0.rxd_head: 531
dev.em.0.queue0.rxd_tail: 530
dev.em.0.queue0.rx_irq: 0
dev.em.0.mac_stats.excess_coll: 0
dev.em.0.mac_stats.single_coll: 0
dev.em.0.mac_stats.multiple_coll: 0
dev.em.0.mac_stats.late_coll: 0
dev.em.0.mac_stats.collision_count: 0
dev.em.0.mac_stats.symbol_errors: 0
dev.em.0.mac_stats.sequence_errors: 0
dev.em.0.mac_stats.defer_count: 5793
dev.em.0.mac_stats.missed_packets: 0
dev.em.0.mac_stats.recv_no_buff: 139
dev.em.0.mac_stats.recv_undersize: 0
dev.em.0.mac_stats.recv_fragmented: 0
dev.em.0.mac_stats.recv_oversize: 0
dev.em.0.mac_stats.recv_jabber: 0
dev.em.0.mac_stats.recv_errs: 0
dev.em.0.mac_stats.crc_errs: 0
dev.em.0.mac_stats.alignment_errs: 0
dev.em.0.mac_stats.coll_ext_errs: 0
dev.em.0.mac_stats.xon_recvd: 5929
dev.em.0.mac_stats.xon_txd: 120
dev.em.0.mac_stats.xoff_recvd: 5929
dev.em.0.mac_stats.xoff_txd: 120
dev.em.0.mac_stats.total_pkts_recvd: 397413786
dev.em.0.mac_stats.good_pkts_recvd: 397401928
dev.em.0.mac_stats.bcast_pkts_recvd: 2715
dev.em.0.mac_stats.mcast_pkts_recvd: 1528
dev.em.0.mac_stats.rx_frames_64: 11419946
dev.em.0.mac_stats.rx_frames_65_127: 24122771
dev.em.0.mac_stats.rx_frames_128_255: 5438765
dev.em.0.mac_stats.rx_frames_256_511: 2942593
dev.em.0.mac_stats.rx_frames_512_1023: 13221690
dev.em.0.mac_stats.rx_frames_1024_1522: 340256163
dev.em.0.mac_stats.good_octets_recvd: 504144384891
dev.em.0.mac_stats.good_octets_txd: 70175650866
dev.em.0.mac_stats.total_pkts_txd: 199599490
dev.em.0.mac_stats.good_pkts_txd: 199599248
dev.em.0.mac_stats.bcast_pkts_txd: 1616
dev.em.0.mac_stats.mcast_pkts_txd: 2
dev.em.0.mac_stats.tx_frames_64: 83244952
dev.em.0.mac_stats.tx_frames_65_127: 68946765
dev.em.0.mac_stats.tx_frames_128_255: 3324597
dev.em.0.mac_stats.tx_frames_256_511: 2036340
dev.em.0.mac_stats.tx_frames_512_1023: 3106394
dev.em.0.mac_stats.tx_frames_1024_1522: 38940203
dev.em.0.mac_stats.tso_txd: 0
dev.em.0.mac_stats.tso_ctx_fail: 0
dev.em.0.interrupts.asserts: 106244188
dev.em.0.interrupts.rx_pkt_timer: 39933
dev.em.0.interrupts.rx_abs_timer: 0
dev.em.0.interrupts.tx_pkt_timer: 5731
dev.em.0.interrupts.tx_abs_timer: 11354
dev.em.0.interrupts.tx_queue_empty: 0
dev.em.0.interrupts.tx_queue_min_thresh: 0
dev.em.0.interrupts.rx_desc_min_thresh: 0
dev.em.0.interrupts.rx_overrun: 0

Interesting! Thanks for posting.

vman76

Well it looks I found the hardware limits of the new server as well. We were able to push about 500Mbps and 80,000 PPS with no issue. Once we get to the 600Mbps and 100,000 PPS we get input errors (NIC buffer overruns). While doing some realtime troubleshooting, I noticed that the errors occur exactly when the one of 4 CPU's hits 100% .(kernel em0 queue) process. em0 is my otuside interfaces. So it appears my earlier suspicion applies in this case and the CPU is too busy to pull the packets off the NIC buffer in time and I end up with overruns. The CPU I'm using is a Intel(R) Xeon(R) CPU 5130 @ 2.00GHz so it looks like I'm going to be searching for another box. I'm doing 1to1 NAT on over 5,000 hosts so I think that might be driving the CPU higher than I expected. The attached pic shows CPU1 at 84% but "top -P" shows that it gets to 100% when the packet loss occurs.

I'd love to put the Ubiquiti Edgerouter inline and test their PPS claim here since I'm way under 1,000,000 PPS :P (j/k)

Out of curiosity, does anyone know why the RRD graphs don't show individual CPU/core stats? The CPU data there looks like its the average of all 4 CPU's which doesn't real help in troubleshooting a problem like this. I did an snmpwalk and found utilization data for all the CPU's so I'm graphing it separately in cacti now. (HOST-RESOURCES-MIB::hrProcessorLoad.x)

Some data from my troubleshooting is below in case some spots something . I have a lot of experience troubleshooting networks in general but I'm very new to BSD so I could be missing something.

input (Total) output
packets errs idrops bytes packets errs bytes colls
86k 83 0 73M 87k 0 73M 0
100k 155 0 85M 101k 0 85M 0
96k 0 0 82M 97k 0 82M 0
99k 74 0 82M 101k 0 82M 0
96k 0 0 82M 98k 0 82M 0

dev.em.0.mac_stats.missed_packets: 2294752
dev.em.0.mac_stats.recv_no_buff: 4617837
dev.em.0.mac_stats.recv_undersize: 0
dev.em.0.mac_stats.recv_fragmented: 0
dev.em.0.mac_stats.recv_oversize: 0
dev.em.0.mac_stats.recv_jabber: 0
dev.em.0.mac_stats.recv_errs: 0
dev.em.0.mac_stats.crc_errs: 0
dev.em.0.mac_stats.alignment_errs: 0
dev.em.0.mac_stats.coll_ext_errs: 0
dev.em.0.mac_stats.xon_recvd: 9112
dev.em.0.mac_stats.xon_txd: 120
dev.em.0.mac_stats.xoff_recvd: 9112
dev.em.0.mac_stats.xoff_txd: 120
dev.em.0.mac_stats.total_pkts_recvd: 10671726540
dev.em.0.mac_stats.good_pkts_recvd: 10669413564
dev.em.0.mac_stats.bcast_pkts_recvd: 15097
dev.em.0.mac_stats.mcast_pkts_recvd: 9664
dev.em.0.mac_stats.rx_frames_64: 240300603
dev.em.0.mac_stats.rx_frames_65_127: 744037531
dev.em.0.mac_stats.rx_frames_128_255: 281908686
dev.em.0.mac_stats.rx_frames_256_511: 135974542
dev.em.0.mac_stats.rx_frames_512_1023: 172724810
dev.em.0.mac_stats.rx_frames_1024_1522: 9094467392
dev.em.0.mac_stats.good_octets_recvd: 13931850472813
dev.em.0.mac_stats.good_octets_txd: 1173620928614
dev.em.0.mac_stats.total_pkts_txd: 5912173538
dev.em.0.mac_stats.good_pkts_txd: 5912173297
dev.em.0.mac_stats.bcast_pkts_txd: 2117
dev.em.0.mac_stats.mcast_pkts_txd: 2

: vmstat -i
interrupt total rate
irq14: ata0 376 0
irq20: uhci1 437491 0
irq21: uhci0 uhci2+ 541201 0
cpu0: timer 1165155769 1997
irq256: bce0 23965829 41
irq257: mfi0 1297902 2
irq258: em0 2536851814 4350
irq259: em1 2695135942 4621
cpu2: timer 1165155721 1997
cpu3: timer 1165155724 1997
cpu1: timer 1165155721 1997
Total 9918853490 17008

highCPU.jpg_thumb

stephenw10

I don't really have experience at this sort of traffic level but it seems like you should be able to do better than that on those servers. That's just a general impression though. It would be useful to get an opinion from someone more experienced.

Could this be a situation where IP fastforwarding could be usefully enabled? It can cause problems, notably with IPSec.
https://forum.pfsense.org/index.php?topic=57723.0

What hardware offloading options do you have enabled?

Steve

vman76

@stephenw10:

I don't really have experience at this sort of traffic level but it seems like you should be able to do better than that on those servers. That's just a general impression though. It would be useful to get an opinion from someone more experienced.

Could this be a situation where IP fastforwarding could be usefully enabled? It can cause problems, notably with IPSec.
https://forum.pfsense.org/index.php?topic=57723.0

Steve

I thought it could do better too but the numbers say otherwise. I have a simple ruleset of about 5 rules on each interface. I have not loaded any packages. No VPN. I do log everything to syslog but that is a requirement that I can't get away from.

Hmm, interesting option. We will not be using IPSec terminated directly on this box so that's not an issue. However ,students do use VPN clients which will go through the firewall. I have to research it more to see if anything else might break by applying it. With over 3,000 users with every device you can imagine a student might bring into a dorm room, I'm apprehensive on what it might break.

stephenw10

Hmm, I imagine it would break IPSec through the box and probably generate some complaints! It can dramatically increase throughput in some instances though. There may other opportunities for tuning though.

Earlier I said that the ERL had an ASIC to increase throughput but I think that was wrong (I can't edit it now). It looks like it has a closed source IP forwarding module that can run separately on one of it's 8 cores. No chance of a FreeBSD driver but maybe an equivalent in the future.

Steve

podilarius

The results are somewhat expected. currently pfSense is using an old pf that is single core only. The only real reason to run pfsense on a multicore is for the addons to use the other cores while pf filtering is stuck on one.
The faster the clock speed of a single core, the more throughput you will observe. The pfSense hardware sizing have 2GHz machines topping out at around 500Mbps. You got it to go a bit higher. I would imagine that you could get a lot more if you have a 3.6GHz or an over clocked machine at 4Ghz.
There has been talk about upgrading to the newer pf, but I don't know much about it or even when. Perhaps 2.2 or 2.3. It should have multicore if based on the newer code. (Note, I am not with ESF and I don't know the plans, at all.) Just hoping that we can get to multicore/multithreaded before I need it.