What is the biggest attack in GBPS you stopped

Supermule

Why is that an issue?

Harvy66

RWIN is how much in-flight data you can have, and at 100Mb/s, 64KB is not much data. At 100Mb, the max latency you can have before your bandwidth is reduced is 5ms. At 10ms, your maximum bandwidth is 50Mb. A system 30ms away from me would at max get 16.6Mb/s from me if they initiated the connection.

Remember, SYN proxy only affects incoming connections, so for most users that make out going connections, they won't see this issue, but if you're running any services on TCP that listen, they will be affected.

Supermule

Doesnt the client scale the RWIN value by itself based on the latency to keep maximum throughput on the connection??

Despite using SYN Proxy as state in the FW?

Harvy66

The old RWin only supported up to 64KB, the new extended RWin that supports up to 4GB is configured during the TCP handshake. Since SYN proxy doesn't allow the receiver to participate in the handshake, SYN proxy assume the 64KB because it can't know if the target will actually support the newer RWin.

Or something along those lines. The main issue is that the increase from 64KB to 4GB was a bandaid fix. Since the field is only 16bits, they use a multiplier during negotiation, but only newer TCP stacks understand the multiplier.

http://tools.ietf.org/html/rfc1323

The scale factor is carried in a new TCP option, Window
Scale. This option is sent only in a SYN segment (a segment with
the SYN bit on), hence the window scale is fixed in each direction
when a connection is opened.

Supermule

So if one side is using delayed acks or dont send them at all, then it fills the stack until its dropped by RTT??

And thereby filling the pipe quickly and grind it to a halt if PPS is big enough?

Harvy66

If the other side does not send ACKs in a timely fashion, the un-acked data gets resent. After n seconds, the TCP connection times-out if no ACKs are received. But yes, the sender is should not have more outstanding data sent than the agreed upon amount. Technically the sender could send more than the RWin, but it would be a bad actor.

cmb

@Supermule:

Even if it has the "none" setting, it crashes…

Not true, though you have to actually correctly put in rules so no states are created (both in and out directions).

There are a variety of tuning options new to 2.2.1 under System>Advanced, Firewall/NAT, to granularly control timers, which greatly helps with DDoS resiliency. TCP first in particular for SYN floods, turning that down to 3-5 seconds or so is probably a good idea if you don't care about people with really poor connectivity.

Harvy66

I do not care about those people.. :p

I find life is so much simpler once we stop caring /joke

I'll install 2.2.2 about a week after it goes live.

lowprofile

Nice thread but also some overkill statements from people :D

I was one of the first to locate this issue, and since last i've been much more experienced and today having an almost bulletproof setup regarding SYN flood.
Too much custom work to make a how-to guide at this moment, but looking forward to see the new corrections in 2.2.2

Will make some test in upcoming weeks.

heper

@lowprofile:

Too much custom work to make a how-to guide at this moment, but looking forward to see the new corrections in 2.2.2

it would be helpful if you could point out some various important clues … perhaps the gifted people here could make their own fullblown howto out of the info you could provide

cmb

Thanks for checking back in, lowprofile. Would definitely appreciate if you could just share some brief tips of your findings with people here. Enough others are interested that I think they'll run with it in doing more testing and putting together recommendations for specific scenarios. I'd like to put out a guide myself, just going to be a bit until I have enough time for that.

@lowprofile:

Too much custom work to make a how-to guide at this moment, but looking forward to see the new corrections in 2.2.2

Those new config options made 2.2.1 actually, no changes in that regard from 2.2.1 to 2.2.2. There weren't any "corrections" technically I guess, as nothing changed by default, we just exposed all those timer values for configuration since they're greatly helpful in some circumstances.

cmb

@Harvy66:

During part of the test, the incoming bandwidth was around 40Mb/s, and I was still getting packetloss to my Admin interface. The bandwidth DDOS was the only part of the DDOS where PFSense was responding correctly, the other parts of the DDOS that did not consume 100% of the bandwidth left it unstable.

You're using the traffic shaper, that's almost certainly what caused that.

Those messing with this and doing traffic shaping on the same box, all bets are off there. ALTQ is not very fast for the kind of scale abuse we're talking here, and queuing in general really complicates things. If you're looking to handle as big of a DDoS as possible, you don't want to be running traffic shaping.

lowprofile

I am on 2.1.5, and due to the Kernel panic error (CARP+Limiter) i haven't upgraded to 2.2.1, but it seems like it may got fixed in 2.2.2 - I will give it some days yet to hear from others.
I'll then upgrade to 2.2.2 and make a how-to-guide, since there is too much unnecessary tweaks/changes on my present setup, which also isn't proper documented as well. It isn't pretty with all those extra tuning from all over the net (freeBSD recommendation etc) which is implemented.

I will rather start from beginning, and make a solid setup on the new 2.2.2 - So this will include proper test with DDoS. If anyone is interested to participate in this test and tuning, please let me know. I assume SuperMule will be a part of this test.
It requires 2.2.2, we can take a session trough skype. I am located in +1 GMT timezone. Expect some DDoS, not volume attacks, but SYN floods of maximum 60-80mbit.

Supermule

I dont use traffic shaper at all and are affected in the exact same way as others using it.

@cmb:

@Harvy66:

During part of the test, the incoming bandwidth was around 40Mb/s, and I was still getting packetloss to my Admin interface. The bandwidth DDOS was the only part of the DDOS where PFSense was responding correctly, the other parts of the DDOS that did not consume 100% of the bandwidth left it unstable.

You're using the traffic shaper, that's almost certainly what caused that.

Those messing with this and doing traffic shaping on the same box, all bets are off there. ALTQ is not very fast for the kind of scale abuse we're talking here, and queuing in general really complicates things. If you're looking to handle as big of a DDoS as possible, you don't want to be running traffic shaping.

Supermule

If upgraded to 2.2.2 then we need 3-4 people willing to be tested.

If both bare metal and using VM's could be a mix, then it would be perfect.

Same setup with traffic shaper. Used and not used.

Volunteers can contact me on PM. Attacks will be restricted to 2-10 mins depending on wish from the tested party.

Different attack types (tcp/udp) will be used. Pipe should be 100 mbit+ preferably.

Harvy66

@lowprofile:

Nice thread but also some overkill statements from people :D

I was one of the first to locate this issue, and since last i've been much more experienced and today having an almost bulletproof setup regarding SYN flood.
Too much custom work to make a how-to guide at this moment, but looking forward to see the new corrections in 2.2.2

Will make some test in upcoming weeks.

I'm under the impression that a similar issue can be triggered by UDP, not just TCP. I think SuperMule showed many out of state UDP packets from many IP+port combos can trigger issues without consuming all of the bandwidth.

Supermule

Exactly.

@Harvy66:

@lowprofile:

Nice thread but also some overkill statements from people :D

I was one of the first to locate this issue, and since last i've been much more experienced and today having an almost bulletproof setup regarding SYN flood.
Too much custom work to make a how-to guide at this moment, but looking forward to see the new corrections in 2.2.2

Will make some test in upcoming weeks.

I'm under the impression that a similar issue can be triggered by UDP, not just TCP. I think SuperMule showed many out of state UDP packets from many IP+port combos can trigger issues without consuming all of the bandwidth.

Harvy66

If UDP can cause it, I wonder if ICMP can also cause it, heck, even a custom protocol. what I'm getting at is I wonder if it's an issue with the firewall and IP, when lots of different IPs are getting blocked.

ledj

Reading this thread with interest.

I might be interested in participating in testing of pfsense. I've set up a pfsense which should replace our existing firewall (shorewall/iptables on linux).

We have the 2 firewalls (existing and a new pfsense) on the same pipe running 1 gbit which is also used for productions system (so a minimal test in regards to time span would be OK in the middle of the night… timezone gmt+2 since we have to announce this to our customers, probably your daytime :) ...)

A few questions:

From where does the simulated attack origin ?

Is it special crafted UDP traffic ? (low and slow attack ?)

Will the simulated attack influence our primary linux firewall ? (I guess not since it's not using full pipe)

What is your settings for timeouts etc. in pfsense ? (the things cmb pointed out, our linux firewall is tuned for this after some annoying floods, but I'm new to pfsense so good to know recommended settings... also it's not easy to migrate settings when things are named differently... but could read up on this... though it's faster to have recommended settings, but maybe there isn't any recommended settings yet ?)

The pfsense is on a hardware server with http://ark.intel.com/products/75779/Intel-Xeon-Processor-E5-1620-v2-10M-Cache-3_70-GHz and 32gb ram, so no VM.

Supermule

From everywhere… using spoofed IP's.

No. It can be tailored to use special crafted packages.

Depending on the packet load of the pipe, it can be.

I have time tonight at 10PM CET.

Send me a port and IP to test. Make sure it responds to ICMP on WAN so I can monitor the response from here and test various setups regarding the attack.

It will take 2-10 mins depending on response from the ICMP. If no response at all on PING then its a quick test, if normal reply attack will change using different approach until pfsense doesnt respond.