PfSense 1.2.3 - High load from thread taskq after synflood

cwadge

I'm staging a pair of pfSense 1.2.3 boxes in a lab, in a master/backup situation, with CARP+pfsync. These are running on some old dual 3.2GHz Xeons with 4GB DDR2 and 4 PCI-X Intel gig-E NICs ('em' driver). As part of my tests, I synflood them for a while with an equivalently spec'd Linux box running hping3, which achieves around 200k packets/second. The load on the active pfSense box climbs to 100% across both CPUs during this time, though legitimate traffic continues to flow and the UI/shell is still responsive. After the flood is turned off however, "thread taskq" completely saturates one CPU, intermittently flapping between the two CPUs:

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
22 root 1 8 - 0K 8K CPU1 0 83:27 100.00% thread taskq
11 root 1 171 ki31 0K 8K RUN 1 51:58 70.26% idle: cpu1
12 root 1 171 ki31 0K 8K RUN 0 28:27 34.96% idle: cpu0
29 root 1 -68 - 0K 8K - 0 16:57 0.00% em2 taskq
27 root 1 -68 - 0K 8K - 0 5:06 0.00% em0 taskq
28 root 1 -68 - 0K 8K - 0 1:55 0.00% em1 taskq
…

This taskq CPU thrashing will continue indefinitely unless the system is rebooted, after which it behaves normally. This situation is 100% reproducible in my environment. My hunch is a FreeBSD bug, but I thought it was worth bringing up.

Incidentally, for anyone running into this in production, is there a way to restart thread taskq without pissing off either FreeBSD or the pfSense stack?

cmb

I'm sure that's likely some kind of problem in the em driver. Try that again on 2.0 and if it's the same there, report that problem on both FreeBSD 7.2 and RELENG_8 to net@freebsd.org. You'll have to reboot to fix it.

Must be specific to something about the chipset on your cards, or something else atypical from similar setups I've tested. I've beaten down boxes with 2-6 em NICs and taskq will go to 100% when they're getting killed, but I've never seen it not back all the way down to 0% when the traffic stops.

cmb

Can you post the exact options you're using with hping3? Curious if I can replicate that if I'm doing exactly what you are. I have used it before against a box with em NICs, but maybe differently, and possibly before we upgraded to FreeBSD 7.2 base.

cwadge

@cmb:

Can you post the exact options you're using with hping3? Curious if I can replicate that if I'm doing exactly what you are. I have used it before against a box with em NICs, but maybe differently, and possibly before we upgraded to FreeBSD 7.2 base.

Sure thing. In fact I've simulated quite a few types of packet storm attacks, but unfortunately pfSense 1.2.3 seems to react the same to any of them. It doesn't seem to matter whether the traffic is spoofed, fragmented, incremented, various flags set or not, etc. At around the 170k pps mark, taskq starts to run away and never recovers. That leads me to suspect its the sheer number of packets that's triggering the infinite loop. With this in mind, any flood test should do the trick; for example:

#> hping3 -S <pfsense_ip> -p 80 --flood</pfsense_ip>

My initial tests targeted a valid webhost on the trusted side of the PFS box, 1:1 NAT, with pinholes for 80 and 443. CARP IPs on both sides of the PFS box, with pfsync for state sharing on a dedicated interface. Less significantly, the attacker box in this lab was a 64-bit Linux box running Deb 5.04 with a well-tuned network stack and a couple of Intel 82573E's. Same Exact same CPUs and hardware as the victim machine, but with half the RAM (2GB).

Probably TMI already, but let me know if you need more detail on anything in-particular.

sullrich

Do you have a set of different nics you can test with?

Broadcomm for example.

cwadge

@sullrich:

Do you have a set of different nics you can test with? Broadcomm for example.

That would be a nice control sample, but unfortunately I don't. This is all Supermicro gear of varying vintage and pedigree, which all uses Intel NICs standard. I do have some newer boxes available with Intel 82576's onboard though (uses 'igb' driver) which would probably be worth testing.

cwadge

Just finished a brief round of testing with 1.2.3 and the igb NICs… it fared even worse than the em's.

These boxes are dual quad Nehalem CPUs with 1333MHz DDR3 (more than 4GB, but capped due to 32-bit addressing) and 1MB cache per core. I tested this pair in exactly the same configuration and manner as the aforementioned dual Xeon pair. After a few minutes of synflooding, though I was able to fully saturate 4 out of 8 cores, taskq never got out of control. However, during the synflood the boxes would flap between the primary and secondary CARP members causing brief interruptions of service. Additionally, the web interface would become completely unresponsive for several seconds at a time. This didn't even happen with the dual Xeons in the previous lab. After terminating the synflood, I was barely able to pass any traffic in or out of the CARP pair. As it turns out, it managed to break the geomirror on the primary box and split-brain CARP on both (CARP0 incorrect hash). Wow.

jits

This is good to know and from the results we'll be able to understand what we're dealing with if we come across it.

Does this also affect single boxes as badly? Those operating without CARP.

Is there a resolution for those using 1.2.3 or is 2.0 immune to the effects and will there be a fix for those deployed 1.2.3 versions?

Jits

cwadge

@jits:

This is good to know and from the results we'll be able to understand what we're dealing with if we come across it.

Cool, hope it helps somebody instead of just alienating the pfSense devs. ;) Sorry to pee on the cheerios, folks.

@jits:

Does this also affect single boxes as badly? Those operating without CARP. Is there a resolution for those using 1.2.3 or is 2.0 immune to the effects and will there be a fix for those deployed 1.2.3 versions?

To be truthful I'm not sure. I was evaluating it for a scenario that requires high-availability, so I haven't even tried them as stand-alone firewalls.

Since it's extremely unlikely I can make pfSense work in the scenario for which it was being evaluated, there's a less than stellar chance that I'll be able to make time to run anymore labs against it. That said, I'd still be interested in seeing how the following scenarios fare against the same battery of tests:

pfSense 1.2.3 / dual Xeon / 'em' NICs / standalone
pfSense 2.0b / dual Xeon / 'em' NICs / CARP + pfsync
pfSense 2.0b / dual Xeon / 'em' NICs / standalone
pfSense 2.0b / dual E5504 / 'igb' NICs / CARP + pfsync
pfSense 2.0b / dual E5504 / 'igb' NICs / standalone

If anybody has some recent generation server hardware and they want to try some torture tests of their own, I for one would be interested in the results. I'll try and do the same if the opportunity presents itself.

cwadge

@cwadge:

pfSense 2.0b / dual Xeon / 'em' NICs / CARP + pfsync

pfSense 2.0b / dual Xeon / 'em' NICs / standalone

Well, good news on this front. Looks like 2.0 is a huge step forward, probably primarily due to the underlying FreeBSD 8 base. Not only did it not have any breakage as a result of my synflooding, it was able to absorb much more traffic and still remain quite snappy. I haven't tested on the newer Nehalem / igb equipment, but on the last generation hardware the results certainly look promising.

*Edited for grammar

cmb

@jits:

Is there a resolution for those using 1.2.3 or is 2.0 immune to the effects and will there be a fix for those deployed 1.2.3 versions?

The issues here are NIC driver issues, which we don't write or control. Hence the difference with FreeBSD 7.2 vs. 8 (though there are other general networking improvements between the two as well). When you nail a box with a few hundred thousand pps denial of service attack, it can…deny service. There won't be an update for 1.2.3 because there isn't really a problem here (well, the igb driver has issues, and em and taskq have their issues from time to time but we can't release a 1.2.x on FreeBSD 8), at some level of DoS traffic you're going to cause problems. What that level is depends on the NIC driver and the OS in general.

The practical implications of this in production are non-existent for virtually every user. If you get hit with a DoS attack that big it's going to more than overfill your Internet pipe (unless you have a gigabit Internet connection), at which point it doesn't matter what your firewall does, you're offline until your ISP can stop the DoS traffic from being sent across your connection. Once it gets to your firewall, it's already consumed all your bandwidth and it's too late.

cmb

@cwadge:

@cwadge:

pfSense 2.0b / dual Xeon / 'em' NICs / CARP + pfsync

pfSense 2.0b / dual Xeon / 'em' NICs / standalone

Well, good news on this front. Looks like 2.0 is a huge step forward, probably primarily due to the underlying FreeBSD 8 base. Not only did it not have any breakage as a result of my synflooding, it was able to absorb much more traffic and still remain quite snappy. I haven't tested on the newer Nehalem / igb equipment, but on the last generation hardware the results certainly look promising.

That's good. It's attributable entirely to the 8 base. I would definitely repeat that for igb, if that driver still has issues you should post to freebsd-net with info.

cwadge

@cmb:

The practical implications of this in production are non-existent for virtually every user. If you get hit with a DoS attack that big it's going to more than overfill your Internet pipe (unless you have a gigabit Internet connection), at which point it doesn't matter what your firewall does, you're offline until your ISP can stop the DoS traffic from being sent across your connection. Once it gets to your firewall, it's already consumed all your bandwidth and it's too late.

This is usually true enough. In my case, it is indeed a high-bandwidth situation so the syn smackdown pfSense got in the labs are a real possibility. Don't get me wrong; the pfSense devs have done a great job, and features like XMLRPC for config-sharing in CARP clusters are simply awesome. It just seems that a combination of weak drivers in FreeBSD* and the uniprocessor nature of PF hold it back from scaling well enough for this particular situation.

* Used to be a network engineer for a company that made a layer-7 filtering bridge based on FreeBSD so yeah, I feel your pain. :)