The Router Rumble

KOM

http://arstechnica.com/gadgets/2016/09/the-router-rumble-ars-diy-build-faces-better-tests-tougher-competition/

Homebrew 2.0—pfSense 2.3.1

_A lot of people piped up about pfSense in the first article, and for good reason. It's a "prosumer" industry standard and has been for a while. pfSense has a ton of bells and whistles including vLAN tagging, QoS, graphing, logging, and more. It's a good performer, too (at least, compared to most options). I installed the latest release, pfSense 2.3.1, on the Homebrew 2.0 hardware to test.

Direct connection at the top, two pfSense runs at the bottom. No config changes were made between these two runs.
Enlarge / Direct connection at the top, two pfSense runs at the bottom. No config changes were made between these two runs.

pfSense is pretty… tweaky. I've actually been hammering at it on various hardware off and on for a couple of months now, and it's frustratingly inconsistent. The two runs shown above (we're still showing the direct switched run at the top for reference and scale) were both run against the same hardware and the same configs; pfSense was just feeling more cooperative on the second run than the first.

That seems to be the nature of the beast, unfortunately. The first run was particularly abominable. As you can see, it stalled out and failed even in the 1MB filesize test group. In general, you're seeing failures at the 10 concurrency mark in each of the three test groups, along with a failure in the 10KB/100 concurrency test. Overall throughput when it's working is pretty good, though lower than vanilla Ubuntu's was on the same hardware. In particular, performance dips down to around 500mbps across the entire 10K filesize test suite.

The second run with pfSense is noticeably better, but it still falls flat on its face at 10K/10 clients and still exhibits lower throughput than Ubuntu pretty much across the board.

That said, this is still leaps and bounds better than you'll see out of most consumer hardware. If you need or want the advanced features that pfSense brings to the table, you shouldn't hesitate about installing and using it... especially if you "only" have 500mbps or less of WAN to throw in its face!_

coachmark2

Yeah I wish we could x-post this in other areas of the forum to get more attention. Does anyone know why they might've had these results?

KOM

No idea. I wish they would stop and consider that something is wrong and look into correcting it instead of just shrugging their shoulders.

jimp

Without any way to replicate their tests or results, it's crap. Even if pfSense came out on top it would still be a crap article.

The fact that they tossed it on something and didn't do any tweaking or troubleshooting is exceptionally dumb. And in the case of the APU, they didn't even use an official pfSense image, they used some other image for what seems to be no discernible valid reason.

For the larger connection sizes they probably needed to increase the state table size and/or set the firewall optimization to aggressive, at a minimum.

I'd like to see their test replicated on hardware we actually sell, rather than something DIY.

mditty

@jimp:

Without any way to replicate their tests or results, it's crap. Even if pfSense came out on top it would still be a crap article.

The fact that they tossed it on something and didn't do any tweaking or troubleshooting is exceptionally dumb. And in the case of the APU, they didn't even use an official pfSense image, they used some other image for what seems to be no discernible valid reason.

For the larger connection sizes they probably needed to increase the state table size and/or set the firewall optimization to aggressive, at a minimum.

I'd like to see their test replicated on hardware we actually sell, rather than something DIY.

Would love to see you guys doing something more constructive than calling it a crap article and ignoring it. Right now I have to wondering if I have my own RCC-VE 2440 setup wrong and maybe missing some tweaks required, I don't have a stress environment so I do partially rely on the assumption that the broader pfsense community has helped make sure the defaults are reasonable.

They do list their testing methodology in their previous article, so if there are questions about how to reproduce hopefully you have contacted the author:
http://arstechnica.com/gadgets/2016/04/the-ars-guide-to-building-a-linux-router-from-scratch/

Excluding if you want to call their actual test broken, loading an system and using it without much tweaking is actually very relevant test for many users. A lot of people looking at building their own system or using open source do not have the HW to build a stress environment to validate and tune their settings and normal usage would only occasionally hit the limits and issues this stress testing shows. What would be dumb is if there was a well documented set of recommendations for gigabit connections for pfsense and they didn't apply them. If that is the case hopefully you are working with the author to resolve this.

Have you guys contacted them to see if they would like to test your HW? Or to improve this article and resolve some of the issues you see?

jimp

We're not ignoring it. We've all read it. But without being able to replicate the test or the results, it's still crap, and using just one metric like that is also crap.

We have covered testing effectively pretty well, their test only covers a number of rapid connections, which does not relate well at all to real-world behavior. Their test uses nginx and apachebench, apparently, but aside from the config files the details are still light.

The original article also goes into detail about how they tweaked their firewall box in many ways, none of which they bothered to do with pfSense. So forgive me if I still call it crap, but it's crap. Apples and oranges. Meaningless comparisons, made worse by sloppy procedures.

Sure, pf itself is no speed demon, but one measurement alone does not make or break any firewall.

To pursue replicating the test on our own by guessing at details would only legitimize what was clearly a half-hearted attempt on their part. They need to do better.

If we can replicate a problem and find a fix, we'd be all over it.

Harvy66

Speaking about tweaks, I read setting net.pf.states_hashsize equal to the smallest power of 2 that is larger than or equal to your state table size and make almost a 3x-4x difference when dealing with lots of states. Having a default of 32k entries, this limit may not have been an issue in their tests, but it is possible.

Also these two settings that can make a difference with MSI-X NICs.

hw.igb.rx_process_limit="-1"
hw.igb.tx_process_limit="-1"

https://forum.pfsense.org/index.php?topic=113496.msg631076#msg631076

Harvy66

I can't wait for the future FreeBSD SMP changes to the network stack. Should make PFSense freaking awesome with multi-core scaling.

W4RH34D

ARS is a far cry from what it used to be.

That being said, it's still cool to bring this sort of thing into the public narrative. Pfsense rocked my world and it will rock many others' as well.

Nullity

@Harvy66:

I can't wait for the future FreeBSD SMP changes to the network stack. Should make PFSense freaking awesome with multi-core scaling.

What changes?

pf has been SMP-friendly since FreeBSD 10: https://wiki.freebsd.org/WhatsNew/FreeBSD10#Networking_improvements

KOM

ARS is a far cry from what it used to be.

Oh? I've been a reader for 10+ years and I haven't noticed any general decline in quality. One author getting this one thing wrong does not mean their entire output is just as poor.

Harvy66

@Nullity:

@Harvy66:

I can't wait for the future FreeBSD SMP changes to the network stack. Should make PFSense freaking awesome with multi-core scaling.

What changes?

pf has been SMP-friendly since FreeBSD 10: https://wiki.freebsd.org/WhatsNew/FreeBSD10#Networking_improvements

I was watching an interview with one of the FreeBSD network guys and he said they're looking at lock-less datastructures for the network stack. Instead of a single systemwide state table, there is a statetable per core, allowing each core to read/write knowing nothing else will touch its datastructures.

Going along with this is current useland network API calls currently don't know what core a network queue is attached to. This means if a user thread is running on Core 1 but the state for the traffic is on Core 0, you now have cross-core state access, which violates the above lock-less feature. So they need to add APIs to allow userland to find out what core a given network flow is attached to and process that flow only on the appropriate thread.

W4RH34D

@KOM:

ARS is a far cry from what it used to be.

Oh? I've been a reader for 10+ years and I haven't noticed any general decline in quality. One author getting this one thing wrong does not mean their entire output is just as poor.

I'd say in the early 00's I first came across ARS and the information and quality was so far above everything else it seemed like it was run by aliens. My mind was blown. Then as the years went on it seemed to become more and more "accessible" and "watered down".

I bothered to make a forum account in the 10's. Yes, I was so blown away that I didn't even feel like I could communicate with the members for 10 years as well. Long story short we disagreed about a few things and they brow beat the heck out of me for it. 5 years later I was right about the AMD stuff we disagreed on. I nearly got banned for what I had to say about mp3's.

Anyway, I still read the website. I just don't participate in the forums no matter how badly I want to share my perspective. I also don't see it with the same reverence either. It's just another engadget or yahoo tech to me now.

Harvy66

I myself prefer meritocratic forums. Not a huge fan of democratic, everyone's opinion matters, everyone gets a trophy forums. But I do frequent support forums where the target audience are the general public and need help.