What is the biggest attack in GBPS you stopped

Supermule

APINGER running and the box is useless….

This is the difference running stateless and apinger vs no apinger.

Spike in CPu is 20% or more on ESXi and recovery takes about a minute longer...

lan2wan.PNG_thumb

traffic.PNG_thumb

vmware.PNG_thumb

Supermule

Last one for today…

Enabling NTPD was also something that crippled the box.

Whats really interesting in the graphs in VmWare. The last 3 is the following:

1: Stateless NOT running Apinger and NTPD. No CPU hits 100% and the box is responsive and routes traffic.
2: Stateless Runing Apinger but NOT NTPD. 1 CPU (nr. 4) is 100% and the box stops routing and loses packets.
3: Stateless NOT running Apinger but running NTPD. 1 CPU is 100% (nr. 3) and the box stops responding and loses packets.

1st graph doesnt have the small "bump" at the end of the attack and is responsive all along. When enabling Apinger OR NTPD OR both, then the box dies and recovery time is long (minutes). Recovery time is longer when running Apinger than with NTPD running.

When running SynProxy state the same pattern can be seen when attacked. Some CPU runs 100% and the box is dead.

Last image is a better view of the cpu usage.

1st one maxes out and packet loss occurs. 2nd does not and routes everything fine. 3rd is initially fine, but as soon as 1 cpu hits 100%, then the box is gone. (about halfway into the attack).

traffic.PNG_thumb

lan2wan.PNG_thumb

services.PNG_thumb

vmware.PNG_thumb

maxcpu.PNG_thumb

tim.mcmanus

What I noticed was that pfSense would not let go of states for several minutes (5+). So when I was hit with 4.8M states, I'd still see 2.9M several minutes later. It wasn't until I rebooted the box several hours later that it returned to 3,500 states. IMHO boxes that are unresponsive after attacks still haven't released their states. Mine for whatever reason, recovered almost immediately after the attacks ceased.

Here is my hardware:

Intel Core i3-2100 Sandy Bridge dual core
Intel BOXDQ77MK LGA 1155 Intel Q77
4GB RAM
320 GB 7200RM HD
2 x Intel EXPI9301CTBLK 10/ 100/ 1000Mbps PCI-Express Network Adapter

Also note, that the initial SYN flood significantly burdened the UI but not the console, and when I increased states, the box was fine. The interface that was being attacked was disabled, but the box and the other three interfaces were working perfectly.

almabes

My setup:

Cisco Comcast business CPE -> Cisco SG300 switch -> WAN side of two firewalls, one HW one VM

VM firewall is running on a Dell precision 7500 with dual Xeon 5650 processors and 48 GB RAM
ESXi 6.0
Official ESF pfSense 2.2.2 OVA (2GB ram and 2 cores)
WAN goes to a broadcom add in NIC
LAN is coming out the onboard NIC and plugs into a catalyst 2948 switch

The Hardware firewall is a VK40-TE. It runs the nanoBSD version of pfSense 2.2.2
Wan side plugs into the SG300
Lan into the 2948

I have 5 IPs, so the two firewalls have different WAN interface IP addresses. Additionally, on the VM firewall, I 1:1 natted a windows box and opened RDP.

I set up two laptops on the LAN, one wireless, because I wanted to sit on my couch. The other was plugged into the 2948. The wired laptop was configured to use the VM as it's gateway. I fired of ping -t www.google.com, ping -t <other firewall="" ip="">, and ping -t <comcast public="" cpe="" ip="">.

The results were unexpected.

During the attack, I could ping the WAN interface of the "opposite" firewall and get a 2ms response, as if nothing was happening. After a few seconds, www.google.com failed to reply, or came back with 2100+ ms replies, through BOTH firewalls. Even the un attacked one. The Comcastic gateway device went from sub 10 ms pings to 300-400ms.

Even after the attack stopped, the Comcastic gateway failed to route traffic. I had to power cycle it to get back on the grid.</comcast></other>

almabes

Did one more test, because I noticed a configuration error in my test setup.
After I reconfigured, I RDPd over to the test Win2k12 box's public IP, fired up wireshark and had supermule attack it.
I set up a ping to www.google.com through the un atacked firewall. It immediately started timing out.

So again, my comcast gateway quickly crapped on itself, but pfSense didn't break a sweat. I was able to watch the packet capture over the RDP connection through the instance of pfSense under attack.

So, does the Cisco DPC3939B suck rocks, or was it "protecting" me by taking the brunt of the attack?
My states never got above 40k.
CPU hit about 30%, once.
RDP session never blinked.

As soon as the attack stopped, www.google.com was pingable again.

Supermule

Its your GW Cisco box that crapped itself and took the heat of pfsense.

almabes

@Supermule:

Its your GW Cisco box that crapped itself and took the heat of pfsense.

I agree. My next step is to hit up a friend of mine that is the owner and chief packet plumber of an ISP. I'll haul the VM box over there and we'll test.

firewalluser

@Supermule:

After deleting the Vmware Tools then I disabled Apinger and NTPD.

The graphs on ESXi looked the same as "normal". No jitter from Apinger and NTPD afterwards.

Recovery was instant. Traffic graphs didnt respond well.

Might be relevant for tuning apinger.
https://forum.pfsense.org/index.php?topic=37740.msg194857#msg194857

WRT the gui going unresponsive during an attack, you guys wont be logged into the gui in real life anyway, it will probably occur when you are at home cooking on the bbq, so the fact the gui goes unresponsive is perhaps best looked at as a side show distraction not really relevant in the scheme of things.

How are those affected handling the log data? Are you just writing to pfsense's own log or are you syslogging it out to another machine?

WRT to NTPD, I noticed back over Xmas some of the NTP pools were being dominated by certain countries. I didnt like this so I moved away from the default NTP pools used by pfsense and chose some of my own to put in pfsense, although building our own NTP server might be an option http://www.satsignal.eu/ntp/Raspberry-Pi-NTP.html but I should also point out, it might be possible to hijack/affect the GPS signals based on a hack I read about a while back (which has quite major significance when you think about it) but cant find the link to it atm.

Supermule

GUI is constantly monitored here by employees.

So yes it is.

firewalluser

Ok, but I'd suggest that would be a minority but could be wrong as I know I'm not constantly logged in monitoring things.

Do you still log to syslog though so you have historical data in which to look back over data and spot any patterns?

Harvy66

FYI PFSense defaults to an established TCP state to last 24 hours, even with zero traffic, as long as no FIN packet happens.

almabes

@Supermule:

GUI is constantly monitored here by employees.

So yes it is.

I have to agree here. One of the things i do for several clients is proactively monitor their firewalls.

tim.mcmanus

@firewalluser:

Ok, but I'd suggest that would be a minority but could be wrong as I know I'm not constantly logged in monitoring things.

Do you still log to syslog though so you have historical data in which to look back over data and spot any patterns?

I actually use the UI but also monitory with OpenNMS. I pulled my system logs after the attack and preserved some Wireshark pcaps. I have Security Onion running to do pcaps on the same mirrored interface, but haven't set up syslog yet. That shouldn't be too tough to do in the scheme of things.

tim.mcmanus

@Supermule:

I have come a BIG step closer to locating the culprit.

Look at the graphs when NTPD is enabled.

It destroys the GUI completely and takes the interfaces offline in the GUI. No response from them. The graphs is a 3 minute attack and only maybe 10 seconds are showing.

Whats really interesting is the VmWare graph. When it spikes for the last time, the GUI comes back and the CPU graph in the GUI starts working again.

Wonder if NTPD and Apinger together could make something?

This makes sense because the kind of attack you hit me with is classified as an NTP attack.

![Screen Shot 2015-05-24 at 10.54.49 PM.png](/public/imported_attachments/1/Screen Shot 2015-05-24 at 10.54.49 PM.png)
![Screen Shot 2015-05-24 at 10.54.49 PM.png_thumb](/public/imported_attachments/1/Screen Shot 2015-05-24 at 10.54.49 PM.png_thumb)
![Screen Shot 2015-05-24 at 10.55.16 PM.png](/public/imported_attachments/1/Screen Shot 2015-05-24 at 10.55.16 PM.png)
![Screen Shot 2015-05-24 at 10.55.16 PM.png_thumb](/public/imported_attachments/1/Screen Shot 2015-05-24 at 10.55.16 PM.png_thumb)

firewalluser

https://blog.cloudflare.com/understanding-and-mitigating-ntp-based-ddos-attacks/

https://blog.cloudflare.com/technical-details-behind-a-400gbps-ntp-amplification-ddos-attack/

https://www.acunetix.com/blog/articles/ntp-reflection-ddos-attacks/

https://forum.pfsense.org/index.php?topic=71396.0
"If you have appropriate WAN rules to stop the Internet from reaching your firewall's NTP server, then good news, you have nothing to do. However, if you have opened your NTP service up on purpose or if you have overly permissive rules (e.g. "allow all on WAN") and you don't want to change them, you can apply the following fix to change the behavior of the NTP daemon so it will no longer respond to the monlist command:"

https://redmine.pfsense.org/issues/3496
https://redmine.pfsense.org/issues/3384

Supermule

I dont have any NTP ports open on WAN. Only open port is 80.

Supermule

Totally clean install version 2.2.2 with no packages besides filemanager and no ports open.

No packetloss and responsiveness was 100%.

Load was not very high. No cpu hit 100% running 8 cores and 8GB RAM.

vmware.PNG_thumb

services.PNG_thumb

lan2wan.PNG_thumb

traffic.PNG_thumb

Supermule

Same install but now a port forward to port 80.

As soon as 1 cpu (nr.4) hots 100%, I get packetloss and GW goes offline.

There is NOTHING different done except a port forward. (HTTP). Total load is actually LOWER than with all ports blocked.

traffic.PNG_thumb

vmware.PNG_thumb

lan2wan.PNG_thumb

Supermule

CONCLUSION:

As soon as you have a working port forward (NOT DISABLED) to a server behind and pfSense needs to route it, youre dead.

Even at limited SYN Proxy state enabled and only 3mbit traffic passing through and states never reaches above 10%. Disabling every service on the firewall and still dead.

If you dont have a portforward, then its fine but that misses the main reason for having this setup….. IMHO.

Its definately FreeBSD/PF related and I dont believe its in the ESF code unless they alter the Packet Filter code.

I would be glad if anyone would run a base OpenBSD/FreeBSD with PF enabled as a frontend so we can continue testing.

firewalluser

@Supermule, tell me, is the script posted in this thread the one causing the problem you see?

If so, does it also cause the problem you describe here? https://forum.pfsense.org/index.php?topic=87571.msg492268#msg492268

It seems like we are going around in circles at this stage this is why I ask.

Its also like CMB says here https://forum.pfsense.org/index.php?topic=87571.msg493401#msg493401
"DDoS is hell on stateful firewalls is the basic summary of this thread. It's not specific to anything in any particular firewall."

The very nature of any stateful firewall will cause an increase in resource use like you are seeing.

With this is mind, what can you do to limit your exposure [edit] of to [/edit] the weakness of a stateful firewall?

Lots of suggestions here
http://www.cisco.com/web/about/security/intelligence/guide_ddos_defense.html

Some users who might be hosting a website that is providing services/products only to punters in their own country, can limit access to the ip addresses to those assigned. If its something being offered further afield like to a continent, then rinse and repeat the above but with all the continents ip address blocks.

If its something flogged globally, then consider a website sat behind the TLD specific to that country, ie if in the UK then a website assigned to a .co.uk could help, but then you'd need something to redirect the originating ip address to the correct country domain. This approach can lesson but not eradicate 100% a DDOS attack of sorts.

Perhaps having something that temporarily disables the port forward in realtime when the CPU activity reaches a threshold might be a way around the problem to avoid taking out the firewall if the other tuning options like increasing the default states et al doesnt work.

Either way, theres lots of ways to skin the cat!