What is the biggest attack in GBPS you stopped

Harvy66

FYI PFSense defaults to an established TCP state to last 24 hours, even with zero traffic, as long as no FIN packet happens.

almabes

@Supermule:

GUI is constantly monitored here by employees.

So yes it is.

I have to agree here. One of the things i do for several clients is proactively monitor their firewalls.

tim.mcmanus

@firewalluser:

Ok, but I'd suggest that would be a minority but could be wrong as I know I'm not constantly logged in monitoring things.

Do you still log to syslog though so you have historical data in which to look back over data and spot any patterns?

I actually use the UI but also monitory with OpenNMS. I pulled my system logs after the attack and preserved some Wireshark pcaps. I have Security Onion running to do pcaps on the same mirrored interface, but haven't set up syslog yet. That shouldn't be too tough to do in the scheme of things.

tim.mcmanus

@Supermule:

I have come a BIG step closer to locating the culprit.

Look at the graphs when NTPD is enabled.

It destroys the GUI completely and takes the interfaces offline in the GUI. No response from them. The graphs is a 3 minute attack and only maybe 10 seconds are showing.

Whats really interesting is the VmWare graph. When it spikes for the last time, the GUI comes back and the CPU graph in the GUI starts working again.

Wonder if NTPD and Apinger together could make something?

This makes sense because the kind of attack you hit me with is classified as an NTP attack.

![Screen Shot 2015-05-24 at 10.54.49 PM.png](/public/imported_attachments/1/Screen Shot 2015-05-24 at 10.54.49 PM.png)
![Screen Shot 2015-05-24 at 10.54.49 PM.png_thumb](/public/imported_attachments/1/Screen Shot 2015-05-24 at 10.54.49 PM.png_thumb)
![Screen Shot 2015-05-24 at 10.55.16 PM.png](/public/imported_attachments/1/Screen Shot 2015-05-24 at 10.55.16 PM.png)
![Screen Shot 2015-05-24 at 10.55.16 PM.png_thumb](/public/imported_attachments/1/Screen Shot 2015-05-24 at 10.55.16 PM.png_thumb)

firewalluser

https://blog.cloudflare.com/understanding-and-mitigating-ntp-based-ddos-attacks/

https://blog.cloudflare.com/technical-details-behind-a-400gbps-ntp-amplification-ddos-attack/

https://www.acunetix.com/blog/articles/ntp-reflection-ddos-attacks/

https://forum.pfsense.org/index.php?topic=71396.0
"If you have appropriate WAN rules to stop the Internet from reaching your firewall's NTP server, then good news, you have nothing to do. However, if you have opened your NTP service up on purpose or if you have overly permissive rules (e.g. "allow all on WAN") and you don't want to change them, you can apply the following fix to change the behavior of the NTP daemon so it will no longer respond to the monlist command:"

https://redmine.pfsense.org/issues/3496
https://redmine.pfsense.org/issues/3384

Supermule

I dont have any NTP ports open on WAN. Only open port is 80.

Supermule

Totally clean install version 2.2.2 with no packages besides filemanager and no ports open.

No packetloss and responsiveness was 100%.

Load was not very high. No cpu hit 100% running 8 cores and 8GB RAM.

vmware.PNG_thumb

services.PNG_thumb

lan2wan.PNG_thumb

traffic.PNG_thumb

Supermule

Same install but now a port forward to port 80.

As soon as 1 cpu (nr.4) hots 100%, I get packetloss and GW goes offline.

There is NOTHING different done except a port forward. (HTTP). Total load is actually LOWER than with all ports blocked.

traffic.PNG_thumb

vmware.PNG_thumb

lan2wan.PNG_thumb

Supermule

CONCLUSION:

As soon as you have a working port forward (NOT DISABLED) to a server behind and pfSense needs to route it, youre dead.

Even at limited SYN Proxy state enabled and only 3mbit traffic passing through and states never reaches above 10%. Disabling every service on the firewall and still dead.

If you dont have a portforward, then its fine but that misses the main reason for having this setup….. IMHO.

Its definately FreeBSD/PF related and I dont believe its in the ESF code unless they alter the Packet Filter code.

I would be glad if anyone would run a base OpenBSD/FreeBSD with PF enabled as a frontend so we can continue testing.

firewalluser

@Supermule, tell me, is the script posted in this thread the one causing the problem you see?

If so, does it also cause the problem you describe here? https://forum.pfsense.org/index.php?topic=87571.msg492268#msg492268

It seems like we are going around in circles at this stage this is why I ask.

Its also like CMB says here https://forum.pfsense.org/index.php?topic=87571.msg493401#msg493401
"DDoS is hell on stateful firewalls is the basic summary of this thread. It's not specific to anything in any particular firewall."

The very nature of any stateful firewall will cause an increase in resource use like you are seeing.

With this is mind, what can you do to limit your exposure [edit] of to [/edit] the weakness of a stateful firewall?

Lots of suggestions here
http://www.cisco.com/web/about/security/intelligence/guide_ddos_defense.html

Some users who might be hosting a website that is providing services/products only to punters in their own country, can limit access to the ip addresses to those assigned. If its something being offered further afield like to a continent, then rinse and repeat the above but with all the continents ip address blocks.

If its something flogged globally, then consider a website sat behind the TLD specific to that country, ie if in the UK then a website assigned to a .co.uk could help, but then you'd need something to redirect the originating ip address to the correct country domain. This approach can lesson but not eradicate 100% a DDOS attack of sorts.

Perhaps having something that temporarily disables the port forward in realtime when the CPU activity reaches a threshold might be a way around the problem to avoid taking out the firewall if the other tuning options like increasing the default states et al doesnt work.

Either way, theres lots of ways to skin the cat!

Supermule

@firewalluser:

@Supermule, tell me, is the script posted in this thread the one causing the problem you see?

It can be altered to do so.

If so, does it also cause the problem you describe here? https://forum.pfsense.org/index.php?topic=87571.msg492268#msg492268

Yes. It takes more bandwith to do it. Limiting bandwith will make pfsense survive with no ports open.

It seems like we are going around in circles at this stage this is why I ask.

Its also like CMB says here https://forum.pfsense.org/index.php?topic=87571.msg493401#msg493401
"DDoS is hell on stateful firewalls is the basic summary of this thread. It's not specific to anything in any particular firewall."

The very nature of any stateful firewall will cause an increase in resource use like you are seeing.

With this is mind, what can you do to limit your exposure [edit] of to [/edit] the weakness of a stateful firewall?

This takes down a stateless setup as well with 15mbit of traffic. So its not related to the states.

Some users who might be hosting a website that is providing services/products only to punters in their own country, can limit access to the ip addresses to those assigned. If its something being offered further afield like to a continent, then rinse and repeat the above but with all the continents ip address blocks.

If its something flogged globally, then consider a website sat behind the TLD specific to that country, ie if in the UK then a website assigned to a .co.uk could help, but then you'd need something to redirect the originating ip address to the correct country domain. This approach can lesson but not eradicate 100% a DDOS attack of sorts.

Perhaps having something that temporarily disables the port forward in realtime when the CPU activity reaches a threshold might be a way around the problem to avoid taking out the firewall if the other tuning options like increasing the default states et al doesnt work.

It doesnt matter. The states itself is not the issue. Increase it and you wont hit the limit. As soon as you port forward, its dead. But then you cant host anything behind it.

Either way, theres lots of ways to skin the cat!

IMHO there is no cat to skin currently. As stated, if you port forward, youre dead.

firewalluser

@Supermule:

It doesnt matter. The states itself is not the issue. Increase it and you wont hit the limit. As soon as you port forward, its dead. But then you cant host anything behind it.

This behaviour which takes down stateless & stateful fw's is seen with that script posted in the msg link below?

https://forum.pfsense.org/index.php?topic=91856.msg523649#msg523649

Supermule

https://forum.pfsense.org/index.php?topic=91856.msg523921#msg523921

Harvy66

@firewalluser:

@Supermule, tell me, is the script posted in this thread the one causing the problem you see?

If so, does it also cause the problem you describe here? https://forum.pfsense.org/index.php?topic=87571.msg492268#msg492268

It seems like we are going around in circles at this stage this is why I ask.

Its also like CMB says here https://forum.pfsense.org/index.php?topic=87571.msg493401#msg493401
"DDoS is hell on stateful firewalls is the basic summary of this thread. It's not specific to anything in any particular firewall."

The very nature of any stateful firewall will cause an increase in resource use like you are seeing.

With this is mind, what can you do to limit your exposure [edit] of to [/edit] the weakness of a stateful firewall?

Lots of suggestions here
http://www.cisco.com/web/about/security/intelligence/guide_ddos_defense.html

Some users who might be hosting a website that is providing services/products only to punters in their own country, can limit access to the ip addresses to those assigned. If its something being offered further afield like to a continent, then rinse and repeat the above but with all the continents ip address blocks.

If its something flogged globally, then consider a website sat behind the TLD specific to that country, ie if in the UK then a website assigned to a .co.uk could help, but then you'd need something to redirect the originating ip address to the correct country domain. This approach can lesson but not eradicate 100% a DDOS attack of sorts.

Perhaps having something that temporarily disables the port forward in realtime when the CPU activity reaches a threshold might be a way around the problem to avoid taking out the firewall if the other tuning options like increasing the default states et al doesnt work.

Either way, theres lots of ways to skin the cat!

DDOS attacks can't be stopped because it's trying to shove 100Gb/s down a 1Gb pipe. Lots of different IPs does not a DDOS make. 3Mb of traffic hitting a 10Gb firewall is not what anyone in their right mind would call a DDOS. If a firewall dies, it's because of a slow path that is definitely not O(1).

Packets hitting the firewall should trigger an O(1) to see if the state exists, if it does, pass, if it does not, go to next check
Packets not passed should trigger an O(n) to compare the new state against firewall rules. If good, add and pass, else block
These two I can't see being an issue without some absurdly crazy firewall rules.

NAT sits in there somewhere, not sure where, and possibly some other routing related stuff

Whatever is going on, if I take the 30Mb/s that I saw SuperMule take down my firewall with, and assume I only have one CPU, then it's taking over 100k clock cycles per packet. Since I have a quad core and all 4 cores were getting hosed, and the packets were actually quite large, it was more like 1mil clockcycles per packet. Since my system was not keeping up, that means it was worse than 1mil/packet.

I'm not sure what kind of slow path warrants 1mil cycles to decide what to do with a packet. 1mil cycles is a lot of work. You can encrypt 2.3mil bits with AES. Another way to put it. 1Gb/s over SMB was about 0.5% cpu on my old 2.67ghz cpu, which once translated down to 1mhz(1mil cycles per second), then into 1500 byte packets, is about 780 packets per second. In the time PFSense processes one of these packets, my Windows box could have transferred 780 packets via SMB.

The funny thing is it isn't the number of states. SuperMule did an attack that hit a forwarded port, which means the states were being created. It was up against the 4mil state limit, yes CPU was higher than normal, but the system was perfectly stable. A similar attack against blocked ports resulted in the same thing, everything was fine. PFSense does handle lots of blocked traffic just fine. Whatever is going on is triggering something other than just blocking traffic.

Lets make an analogy. If normal person stands on scale and it says they weigh 75 tons, you don't ask them to take off their shoes. That's about the same magnitude difference.

Harvy66

I was just thinking, when my 4mil states are full up and the firewall is trying to expire states and whatever is done, scanning 4mil states could be a bit of work. I wonder how the attack would do if the state table was made to something small, like 10k states. Expiring states may require an O(n) scan when lots of states are made about the same time.

Another thought is I have "Firewall Adaptive Timeouts"(System->Advanced->Firewall/NAT) set to 4mil states, which means by the time my state table gets full, states are being expired instantly. If expiration causes a full scan of at least triggers often, then creating states quickly and just as quickly expiring them may be the cause. I don't know. Just a thought.

That could explain why it takes a few tens of seconds before I started to feel the hurt of the attack, the issue didn't full trigger until the state table got full.

edit: more thoughts

I wonder if I set the target expiration to 4mil, but set the max to something larger. I'm not sure what PFSense/FreeBSD has to do when the table gets full, it may trigger a bad code path. Maybe I should have the table set to something like 5mil max states, but leave the adaptive timeout to 4mil.

edit2:

If the issue is an issue involving states, a good test to make an extreme could be to try a few combinations. 1mil max states with a target of 10k, should never get much past 10k, but shouldn't hit the max state limit. 10k max with 10k target, 10k max with no adaptive, etc.

firewalluser

@supermule So whats different with your setup then compared to what I have setup?

edit:
CPU Microcode perhaps? My VM is running on an AMD CPU which has some bugs affecting threading performance, all the others affected who have posted are running Intel CPU's iirc so maybe thats why I dont see the problem?

edit2: This might also be a factor especially considering the vmware point which I've had to tweak as its possible to get the time to drift on vm's.
http://en.wikipedia.org/wiki/HPET

Anyone know if HPET is built into the pfsense builds?
http://www.freebsd.org/cgi/man.cgi?query=hpet&apropos=0&sektion=0&manpath=FreeBSD+9-current&format=html

firewalluser

@Harvy66:

I was just thinking, when my 4mil states are full up and the firewall is trying to expire states and whatever is done, scanning 4mil states could be a bit of work. I wonder how the attack would do if the state table was made to something small, like 10k states. Expiring states may require an O(n) scan when lots of states are made about the same time.

Another thought is I have "Firewall Adaptive Timeouts"(System->Advanced->Firewall/NAT) set to 4mil states, which means by the time my state table gets full, states are being expired instantly. If expiration causes a full scan of at least triggers often, then creating states quickly and just as quickly expiring them may be the cause. I don't know. Just a thought.

That could explain why it takes a few tens of seconds before I started to feel the hurt of the attack, the issue didn't full trigger until the state table got full.

edit: more thoughts

I wonder if I set the target expiration to 4mil, but set the max to something larger. I'm not sure what PFSense/FreeBSD has to do when the table gets full, it may trigger a bad code path. Maybe I should have the table set to something like 5mil max states, but leave the adaptive timeout to 4mil.

You get 1 packet come in which causes x-number-of-lines-of-code to run & some memory spaces to be filled up.

Consider the time it takes for 1 packet to be processed by the fw, then add the time for the state to expire and before long you could easily fill up the available/free ram and also swamp the cpu by getting it to run x-number-of-lines-of-code per incoming packet, by having some code assigned to 1 core dominating the CPU especially if some threaded code has a higher priority than other code.

What if you could throttle the packets coming in before the states were processed? Would that prevent the firewall from crashing/hanging?

almabes

@firewalluser:

What if you could throttle the packets coming in before the states were processed? Would that prevent the firewall from crashing/hanging?

Isn't that the core idea of "a DDoS shouldn't be dealt with by the firewall, but upstream?"

tim.mcmanus

@firewalluser:

@supermule So whats different with your setup then compared to what I have setup?

edit:
CPU Microcode perhaps? My VM is running on an AMD CPU which has some bugs affecting threading performance, all the others affected who have posted are running Intel CPU's iirc so maybe thats why I dont see the problem?

edit2: This might also be a factor especially considering the vmware point which I've had to tweak as its possible to get the time to drift on vm's.
http://en.wikipedia.org/wiki/HPET

Anyone know if HPET is built into the pfsense builds?
http://www.freebsd.org/cgi/man.cgi?query=hpet&apropos=0&sektion=0&manpath=FreeBSD+9-current&format=html

Just to note; I was running bare metal with the same predictable results, I posted my machine specs in this thread, but will append them to my signature.

jimp or cmb made a recommendation on this thread somewhere about tuning pfSense settings to better handle DDOS.

I know that with the default state limit of 394000, the UI and most every other service seems to lock up, but system utilization under top is minimal at best. The screen shot I posed of the console shows the system utilization with a 394K state table. Increasing the state table makes the box responsive, but the interface being attacked stops responding with 4Mbit of attack traffic.

tim.mcmanus

I think I have an idea where the issue may be: interrupts

This console screen shot is very telling:

The system start throttling interrupts, but the CPU utilization for interrupts is conspicuously a fixed 25.0%. Either the code handling interrupts has come challenges or the system has throttled the CPU utilization for interrupts at 25%, and its hit that limit and cannot process any more.

Does anyone know if interrupt CPU limits are adjustable, and where can this be done? According to the console shot, I have some additional headroom I could allocate to interrupts to see if that helps.

See this thread with a similar set of symptoms in 2011: https://forum.pfsense.org/index.php?topic=38589.msg198765#msg198765