What is the biggest attack in GBPS you stopped

Harvy66

FreeBSD is getting more SMP love for its network stack in 11. Each major release seems to have better core scaling for IO in general. There are some major plans to allow the network stack to both receive and send flows stickied to a single core and have flows randomly distributed among the cores.

One thing that I do not know about SMP loving is NAT. I know the NAT implementation has been single threaded for a while. It's possible it may be able to get a re-write once some of the new SMP network stack APIs are finalized. Or just got IPv6.

firewalluser

@Supermule:

Mikrotik, Fortigate, ISA Server and Windows Firewall.

No other of what we have tested passed the tests.

Mikrotik - RouterOS based on Linux, so try some linux hacks on it for stability testing.
http://en.wikipedia.org/wiki/MikroTik#RouterOS

Fortigate - FortiOS based on Linux, so as above…
http://en.wikipedia.org/wiki/Fortinet#GPL_violations

Windows ISA Server Forefront no longer available as MS have announced they are dropping it so support will be gone in time. Try some windows hacks for stability testing.

This matters because freebsd is primarily aimed at stability although it has pioneered some features yet to be seen in other OS platforms and also holds an unofficial world record for the most amount of data transmitted
https://www.freebsd.org/advocacy/whyusefreebsd.html

http://www.serverwatch.com/tutorials/article.php/10825_3393051_2/Differentiating-Among-BSD-Distros.htm
"FreeBSD holds the unofficial record for transferring data, having achieved more than 2 Terabytes of data from one server running the OS. It follows from this statistic that FreeBSD is also one of the most stable OSes available."

The last part above is not what you want to hear considering what you are experiencing but it goes back to my point about tuning.

You can tune a little ford fiesta engine to compete on a 1/4 mile with similar performance as a bigger engined car, but that ford fiesta engine will then have no reliability and will likely explode after completing the 1/4 mile.

I guess what you need to do is define your aim's then select the correct fw according to those aim's.

Supermule

Attack is currently scaled down to 2mbit/s and the FW still dies despite limiting states pr. second and states pr. host.

Ran Top command and here is some info.

1 core is still blasting away at full speed. (Core 6) if you pay for IOPS/CPU in a datacenter, this is not good news.

Advanced_options_rule.PNG_thumb

top1.PNG_thumb

top.PNG_thumb

vmware.PNG_thumb

Supermule

It seems to need it bad.

But when will pfSense have it, is another interesting question.

@Harvy66:

FreeBSD is getting more SMP love for its network stack in 11. Each major release seems to have better core scaling for IO in general. There are some major plans to allow the network stack to both receive and send flows stickied to a single core and have flows randomly distributed among the cores.

One thing that I do not know about SMP loving is NAT. I know the NAT implementation has been single threaded for a while. It's possible it may be able to get a re-write once some of the new SMP network stack APIs are finalized. Or just got IPv6.

Supermule

Last one for today. Still packetloss and an unresponsive GUI. Traffic stateless is around 15mbit/s.

I see filterlog comsuming a lot of CPU during the attack. Big difference in Vmware CPU wise… Core nr. 6 is still blasting away, but its not attack dependant. If it wasnt going at a 100% then it could have survived (maybe).

vmware.PNG_thumb

top_before_attack_stateless.PNG_thumb

top_attack_stateless.PNG_thumb

traffic.PNG_thumb

Supermule

SYN Flood recording stateless running TOP on Console

Youtube Video

SYN Flood recording running SYN Proxy states with limiters and TOP on Console

Youtube Video

If you wonder why you cant see core nr4 in TOP, you are not the only one.

It runs 100% in VmWare.

core_missing.PNG_thumb

firewalluser

So have you identified the code thats running on the core that maxes out when this happens?

If you havent, how can you fix the problem?

At the moment you are just reporting symptoms which as you can see by the length of the thread its not been that useful at fixing the problem so far has it?

Supermule

I havent got a clue of where to begin and where to look.

I cant see whats using the core…

What do you make of this? Core nr. 4 says its idle at 100%

core4.PNG_thumb

tim.mcmanus

@firewalluser:

So have you identified the code thats running on the core that maxes out when this happens?

I'm almost certain the issue is with the network driver in FreeBSD and it's also being contributed to by PF.

When my state table is low (394K) the attack cripples the entire box with the exception of the console. PF alerts that it hit its state table max in the console. I'm not sure why a full state table creates a more significant impact on the box, but it does.

When I increase the state table I get the IRQ warning; the interrupt storm. This disables the interface being attacked and it's most likely due to the interface grabbing one CPU/core and filling it with software interrupts. PF takes all packets and puts them through the CPU, and in this case it would/should grab only one CPU. This makes sense because you don't want IRQ polling across all CPUs (I have a link to an excellent article regarding this design, I'll find it in a few). So generating an interrupt storm on any interface should max out one CPU and take that interface down because of the interrupts being generated. The CPU does not have the resources to process legit request because it's overwhelmed with interrupts.

I guess that the network driver would have to include code to drop these packets before they got to the OS/kernel. Once the kernel gets involved in processing these packets, it generates the interrupt storm, bogs down one CPU, and the interface goes down.

I also assume that there is probably some performance tuning I can do in pfSense, but I think the issue is at a lower level than that. If I have time this weekend, I'll pin up a FreeBSD 10.1 box running PF to validate these assumptions, but I have a strong feeling it's the networking driver that's creating this issue by passing every packet off to the kernel and PF for processing.

Supermule

I disabled Device Polling and the 100% usage of the Core nr. 4 went away instantly.

This is console with Top -p running when no attack and then the SYN attack.

First one is with no portforward and the box is fine.

2nd is with portforward and it dies.

![top-p_no portforward_SYN flood.PNG](/public/imported_attachments/1/top-p_no portforward_SYN flood.PNG)
![top-p_no portforward_SYN flood.PNG_thumb](/public/imported_attachments/1/top-p_no portforward_SYN flood.PNG_thumb)
![top-p_WITH_portforward_SYN flood.PNG](/public/imported_attachments/1/top-p_WITH_portforward_SYN flood.PNG)
![top-p_WITH_portforward_SYN flood.PNG_thumb](/public/imported_attachments/1/top-p_WITH_portforward_SYN flood.PNG_thumb)

Harvy66

It's possible that when the state table is full and a new packet for yet another new state comes in, if it's all being processed on the same thread, maybe a new state with a full table triggers some sort of "clean up" in an attempt to make room, and this clean up is really expensive to be doing per packet.

Harvy66

I think you mentioned this before, but I just want to make sure. When you say with/without forwarding, do you mean when targeting the forwarded port or just forwarding in general?

If it's forwarding in general, maybe it's NAT that's causing some/all of the issues. Since NAT is single threaded, if a new state is coming in and in order to forward ports, NAT needs to first re-write the header information prior to the firewall seeing the packet, now we have a single chunk of code that is acting as gatekeeper to the firewall, and it's single threaded to boot.

When you have no forwarding rules, NAT doesn't even need to be checked. But if you have one more more rules, NAT has to check the new state packet, for every new state packet that comes in, single threaded, lots of locking.

edit: I see syslogd using a lot of CPU, are you logging blocked packets? May want to disable that during the test.

tim.mcmanus

It's also important to note that SM is running pfSense as a VM and I am running it on bare metal. This can impact the way it handles network traffic.

https://lists.freebsd.org/pipermail/freebsd-net/2015-March/041657.html

Supermule

Okay the script fucked my unbound and it lost its PID and couldnt start…. had to revert to DNS forwarder to get internet access back...

Tim and Anthony is a great help! Getting closer....

almabes

Tomorrow we should be able to test on a real business class network, instead of my crappy Comcast CPE that dies. A fiber optic Internet connection through a Cisco switch port. I'll lug the VM box up there, and also see what I can scare up for bare metal…hopefully more than an unfurled tinfoil hat.

firewalluser

@Supermule:

I havent got a clue of where to begin and where to look.

I cant see whats using the core…

What do you make of this? Core nr. 4 says its idle at 100%

Was going to say Dtrace might be a good start but then I saw this. https://forum.pfsense.org/index.php?topic=94260.0

firewalluser

@tim.mcmanus:

@firewalluser:

So have you identified the code thats running on the core that maxes out when this happens?

I'm almost certain the issue is with the network driver in FreeBSD and it's also being contributed to by PF.

When my state table is low (394K) the attack cripples the entire box with the exception of the console. PF alerts that it hit its state table max in the console. I'm not sure why a full state table creates a more significant impact on the box, but it does.

What makes you say that?

Difficult to tell really without dtrace running wouldnt you say?

Supermule

Yes. Could be great with better logger tools built in pfSense.

We are fighting a weird battle right now.

Sometimes it handles the attacks fine, then the same config crashes instantly on the same attack seconds later.

Only difference on my system is the the number 4 core hits 100%. When that happens then it goes down and packetloss occurs.

When it doesnt, then it can handle it. I have 8 cores and I cant see what uses that specific core.

Nullity

@firewalluser:

@Supermule:

I havent got a clue of where to begin and where to look.

I cant see whats using the core…

What do you make of this? Core nr. 4 says its idle at 100%

Was going to say Dtrace might be a good start but then I saw this. https://forum.pfsense.org/index.php?topic=94260.0

You can load up FreeBSD modules included with FreeBSD dist. I just tried to get it working but some trouble with the DTrace providers hindered me. After removing the /usr/lib/dtrace dir I got these not-so-clear results.


[2.2.2-RELEASE][admin@pfsense]/usr/lib: /usr/share/dtrace/toolkit/hotkernel
Sampling... Hit Ctrl-C to end.
dtrace: buffer size lowered to 2m
dtrace: aggregation size lowered to 2m
^C
FUNCTION                                                COUNT   PCNT
0xffffffff8035fbe2                                          1   0.0%
0xffffffff80dd0860                                          1   0.0%
0xffffffff80abad44                                          1   0.0%
0xffffffff8035fb5c                                          1   0.0%
0xffffffff8035fab5                                          1   0.0%
0xffffffff8035fbd7                                          1   0.0%
0xffffffff80f46fb6                                          1   0.0%
0xffffffff8035d4b2                                          1   0.0%
0xffffffff8035fbf9                                          1   0.0%
0xffffffff8097f4c0                                          1   0.0%
0xffffffff80d06b71                                          1   0.0%
0xffffffff8035fb8b                                          1   0.0%
0xffffffff80d06c28                                          1   0.0%
0xffffffff8035fac5                                          1   0.0%
0xffffffff8035fb67                                          1   0.0%
0xffffffff80f3275e                                          1   0.0%
0xffffffff8035faa0                                          1   0.0%
0xffffffff80f3712d                                          3   0.0%
0xffffffff80dd48bd                                          3   0.0%
0xffffffff80d069ea                                          8   0.0%
0xffffffff80f3726b                                        105   0.6%
0xffffffff80f2d8e6                                      17886  99.2%

Seems promising.

Edit: May be related: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=185290

Harvy66

Last Supermule said, the problem only occurs when port forwarding is enabled in NAT. My guess is NAT unless we can get a confirmation that what I read was incorrect.