What is the biggest attack in GBPS you stopped

tim.mcmanus

So there is nothing we can do then as this is just like a HW failure of sorts then?

Yes and no.

If you can code and want to help recode the FreeBSD network drivers, they could use the help. They've acknowledged that there will be more improvements in the networking layer in v11.

Because pfSense is built on top of FreeBSD, it's only as good as the underlying operating system code. So bugs/deficiencies in the OS code affect the applications running on it. So we can't expect the pfSense team to fix a problem that's not in pfSense, and they are at the mercy of the FreeBSD code releases. I know they regularly collaborate with the FreeBSD team, but they don't run that project.

So it's not hardware per se, there are two layers of code that are maintained by two separate projects.

Here are the two threads Supoermule opened up on the FreeBSD forums before they started ignoring him:

https://forums.freebsd.org/threads/dos-and-ddos-attacks.51899/

https://forums.freebsd.org/threads/freebsd-pf-and-syn-ack-flooding.51921/

With any software project, you need to provide very detailed information to the developers so they can ascertain what the root cause may be as well as any code you're using to trigger the issue. While Supermule provides a lot of data, it's not data the developers or forum users found useful, similar to what's in this thread.

So it's possible the revised FreeBSD code resolves something, but until it's tested against this use case appropriately (on bare metal) it's only a guess as best that it resolves this use case.

cmb

@Supermule:

https://lists.freebsd.org/pipermail/freebsd-announce/2015-July/001655.html

Latest update.

That has no relation to things people have been attempting here. It only applies to sessions the system itself answers (and gets all the way to LAST_ACK state, which never happens in any of these tests). No applicability with things it's routing, or NATing, or blocking, or passing but not able to complete a TCP handshake much less get to LAST_ACK.

2.2.4 snapshots have had the patch since the first build after its release, yesterday morning. Release is coming hopefully tomorrow, though not because of that specifically, as it's non-applicable for the vast majority of use cases.

Note the credit to Netflix? Probably something they ran into by coincidence on their FreeBSD CDN boxes (when you're pumping 20-40 Gbps out of a single server across a huge number of connections, you tend to run into any possible TCP bugs), then found the associated potential security impact.

jdillard

@cmb:

Note the credit to Netflix? Probably something they ran into by coincidence on their FreeBSD CDN boxes (when you're pumping 20-40 Gbps out of a single server across a huge number of connections, you tend to run into any possible TCP bugs), then found the associated potential security impact.

As an aside, you can watch Gleb Smirnoff talk about why Netflix decided to use NGINX and FreeBSD to build their own CDN: https://www.youtube.com/watch?v=KP_bKvXkoC4

KOM

So we can't expect the pfSense team to fix a problem that's not in pfSense, and they are at the mercy of the FreeBSD code releases.

While we can't expect them to, they certainly have fixed some FreeBSD bugs and submitted the patches upstream, which were accepted for inclusion by the FreeBSD team.

Supermule

Exactly. But nothing….

Let wait and see if it works.

Derelict

Why would it?

tim.mcmanus

@Supermule:

Exactly. But nothing….

Let wait and see if it works.

ivor

@tim.mcmanus:

Here are the two threads Supoermule opened up on the FreeBSD forums before they started ignoring him:

https://forums.freebsd.org/threads/dos-and-ddos-attacks.51899/

https://forums.freebsd.org/threads/freebsd-pf-and-syn-ack-flooding.51921/

I don't see their response as ignoring https://forums.freebsd.org/threads/freebsd-pf-and-syn-ack-flooding.51921/#post-291316

firewalluser

@tim.mcmanus:

Because pfSense is built on top of FreeBSD, it's only as good as the underlying operating system code. So bugs/deficiencies in the OS code affect the applications running on it.
Snip

I understand all of this.

So it's possible the revised FreeBSD code resolves something, but until it's tested against this use case appropriately (on bare metal) it's only a guess as best that it resolves this use case.

The point I'm angling at, is this, unless we have better logging facilities for all and any output from freeBSD and the pfsense elements sat on top, we wont be able to spot the problems will we?

Or am I missing something?

tim.mcmanus

@firewalluser:

The point I'm angling at, is this, unless we have better logging facilities for all and any output from freeBSD and the pfsense elements sat on top, we wont be able to spot the problems will we?

Or am I missing something?

The suggestions were made in this thread.

The underlying issue–which has yet to be determined--requires dtrace to be installed.

And since this is a FreeBSD issue, per recommendations also in this thread, one would need to install a current release of FreeBSD on bare metal with dtrace installed to capture the appropriate data from the kernel extension/module in question.

This is far beyond logging and more code debugging, hence the use of dtrace.

Additionally, Supermule has never released the code that causes the issue and adamantly refuses to, so no one can recreate the issue in a lab. So anyone who wants to legitimately troubleshoot the use case and capture data from FreeBSD can't. This stubborn refusal is one of the reasons no developer on these forums or on FreeBSD's forums will help with the issue.

firewalluser

I thought DTrace was getting nowhere because it requires ESF to include it in the version of freebsd they use with pfsense?

It is looking like v11 will need to be installed and pfsense inserted on top which is something I'm hoping to do on the rpi (as much to see the performance abilities or not)once I have a few other jobs out of the way, but I was also hoping more progress had been made on Dtrace but I think thats stalled.

Code debugging is still logging in my books, I ship all my apps in full debug mode as I've spent enough time in the past hunting down problems in code sometimes my own, sometimes 3rd party addons, made harder when some of it is black boxes and not source code provided.

I think he provided a DDOS script which was supposed to cause the problems although I havent seen it myself as my bandwidth is not enough unless GCHQ/isp have some speed restriction in place, but I cant comment on the DDOS script other than its not unlike many which can be downloaded from the web.

Harvy66

There is a difference in logging high level stuff and logging system calls. An OS can be handling thousands of calls per seconds or more. Logging kernel level stuff is much more difficult.

tim.mcmanus

@firewalluser:

I thought DTrace was getting nowhere because it requires ESF to include it in the version of freebsd they use with pfsense?

Let me back up a bit and summarize some of the results I got while testing. Initially the box and web UI slowed to a crawl while the attack was going on. When I increased my state table to 8M states, the box responded well, but I started getting an IRQ storm alert. That led me to believe that there was an issue in the em network driver in FreeBSD. In order to validate this assumption, I would need to install FreeBSD on bare metal and re-run the tests. Troubleshooting 101, remove stuff, check for error, profit. If it can be recreated in FreeBSD 10.x, then we can install dtrace there and go forward. No use trying to troubleshoot a potential FreeBSD issue on a pfSense box. Even the folks on the FreeBSD forums will tell you to do a vanilla install on bare metal and then post the results. I believe that it's possible that someone did run a test with FreeBSD and reported the same behavior on this thread. I'm not entirely sure. But that's essentially where the troubleshooting left off.

@firewalluser:

I think he provided a DDOS script which was supposed to cause the problems although I havent seen it myself as my bandwidth is not enough unless GCHQ/isp have some speed restriction in place, but I cant comment on the DDOS script other than its not unlike many which can be downloaded from the web.

Just to be clear, the attack was a DOS, not a DDOS. There was nothing distributed about it unless Supermule has a botnet at his disposal and that's why he's not releasing any code (I doubt it), but I think it's a single script randomizing source IP addresses. It could be that he's downloaded someone else's compiled code and is just using it like a script kiddie, and therefore he doesn't have any source code to release. That might be more probable, but we'll never really know until he provides some transparency.

I stopped working on the issue because I don't have the source to recreate the issue, and therefore I cannot test it in a lab. I provided my external IPs to Supermule, but I never got the same transparency in return. So I walked away.

Guest

At first I really don´t think thats a easy going job, only coding something new, insert it in the
next FreeBSD version and then pfSense will be the profiteer also in the next version, because
pfSense is not swapping over the code 1:1 without doing many adaptations and changes as
well.

And what would be the benefit from this all, if some dozen peoples like anonymous, were
shooting with their "super canon"? Would this also secure our pfSense firewalls? Either
on bare metal or in a VM this would be the end of any firewall that is trying to proper
handle a load like this then.

Would it perhaps be better to own something like a so called "hedgehog mode"
like the bigger vendors are doing on greater devices?

For sure the attack driven by @supermule was a mixed one, a script combined with
a special syn flood attack. (XSYN script and OVH )

@firewalluser

I think he provided a DDOS script which was supposed to cause the problems although I havent seen it myself as my bandwidth is not enough unless GCHQ/isp have some speed restriction in place, but I cant comment on the DDOS script other than its not unlike many which can be downloaded from the web.

You can easily watch it here: This is the XSYN script

firewalluser

I think theres too much focus on the (d)dos and not enough on the fact, that, observing a system in general at a greater level of detail can show up new anomalies.

Put another way, until we log at greater detail how will we spot problems?

Dtrace wouldnt exist if there wasnt a need for it, would it?

I know OS Debugging can add an overhead, but having used tools like this one http://www.rohitab.com/apimonitor on the Windows platform has enabled me to spot things which otherwise would have gone unnoticed by using the programming languages own debugger because I could see which API's get repeatedly and unnecessarily called in OOP code for example.

That api monitor hooks into all the api's I choose and from that I can get metrics which show me things like where my code is slow, and where there might be potential problems which can be exploited at the OS level. I've found bugs in programming languages which are over 15years old possibly 20yrs and could cause any system written in the language to crash.

The thing to bear in mind with all good systems is they tend to have the original programmer(s) still in place, unlike many of the OS's today which have been taken over by younger folk as others climb the management ladder or go off elsewhere.

Those newer folk dont have the hidden knowledge thats in the heads of the original programmers. So if you dont have greater levels of logging and detail, would we spot what we maybe currently missing?

Syslog is a good, but like money you can never have enough.

Guest

would we spot what we maybe currently missing?

Hmmm, I will try it to explain it could be a very simple thing!

As I was digging out from some forum threads here and there it
would be not affecting lazy consumer routers, the combined attack
I mean and there for I think there must be an elemental difference
between the NAT from FreeBSD and the NAT in consumer routers.

The NAT at consumer routers do the following think;
They will not pass anything in from outside that was not called
by somebody or a device from the LAN side! Is this right so? ;)

The NAT at FreeBSD or pfSense based devices do also the following
likes above but on another way! And I mean really this is the small
piece that makes it really difficult to fix the entire problem. :-
FreeBSD & pfSense lets the packets in or pass to inspect
them that the rules can matching them for deny or allow or perhaps
pass through.

That means that at the lazy consumer routers the packets don´t comes
in but at the pfSense side they must be coming in at first to match
the rules. Can this be the small piece of difference here in the game?

Asch Conformity, mainly the blind leading the blind.

But the one-eyed man is the king of the blinds

Supermule

Its not like that.

firewalluser

@BlueKobold:

The NAT at consumer routers do the following think;
They will not pass anything in from outside that was not called
by somebody or a device from the LAN side! Is this right so? ;)

The NAT at FreeBSD or pfSense based devices do also the following
likes above but on another way! And I mean really this is the small
piece that makes it really difficult to fix the entire problem.

Good thinking, could not say if its correct or not as I have no knowledge of how the different consumer routers work, they might for example have different levels of isolation or sandboxing in place. However from the freebsd thread SYN ACK seems to be an issue.

I dont know if this would work, by altering the number of retries?
https://forum.ivorde.com/quick-temporary-tuning-of-freebsd-under-spoofed-syn-flood-attack-t13632.html

As I cant replicate this at my end I am unable to reproduce which is half the problem but not impossible when trying to fix bugs of sorts, hence the suggestion to increase the logging at the system level. Maybe something would show up?

Guest

Good thinking, could not say if its correct or not as I have no knowledge of how the different consumer
routers work, they might for example have different levels of isolation or sandboxing in place.

In normal as I am informed, correct me please if I am wrong with this, the NAT mechanism is working
like this (related to the consumer grade NAT routers), the way ste-by-step I mean;

rule number one is often using netfilter (SPI) in that game to prevent from IP packet fragmentations
the second rule is something like "use rule number one on top of all other rules then followed by NAT
from the LAN side someone or a device is calling an information such like a website to open and display
the informations are send to the Internet by opening a session for this related to an internal IP address
if data now from the outside (Internet) are reaching the WAN interface of the home router, the NAT
mechanism is purely and only checking if there is an open session that is matching this data and let
them pass or deny them.

And for sure on top of this, perhaps also a smaller soldered on board ASIC/FPGA that brings them
up to handle those rules and actions more liquid I am really pretty sure they own all something like this.

I now I am walking now on so called thin ice
For sure from vendor to vendor this might be used in different ways, but it is really affective
working for them, so could pfSense or FreeBSD also going to solve this out like this is done?

And if the most peoples want to go more likes the style is now, no problem at all I thing, it might
be not working only as a replacement, an extension or only as another option for the state of
art, the pfSense is acting and handling this point now at the time, but perhaps something like
a second option where each user will be able to set it up or activate it or may not perhaps.

However from the freebsd thread SYN ACK seems to be an issue.

But you are perhaps a programmer or code writer that is able to determinate now
where are the exactly differences between this both SPI/NAT versions?

Put another way, until we log at greater detail how will we spot problems?

You will be able to sniff, syslog and debug for many years something and millions of clean
code audits on top, if the mechanism it selfs (SPI/NAT) is the point we have to come closer

I dont know if this would work, by altering the number of retries?

Hey, can this combined together by using syncookies against syn flood attacks?

I am not a code writer or FreeBSD and pfSense professional and also not a security
expert likes many users are here in this forum are, and this may be bringing me up
to ask some poor questions that makes more or long time experienced users and the
pros up to be running wild, but if there is out something else "they" have and not "we"
and it is still working likes a charm you will perhaps excuse the jumping in to this discussion.

Supermule

https://technet.microsoft.com/en-us/library/cc756722%28v=ws.10%29.aspx

I think the issue is how NAT is handled and whats done to the traffic in the queue. I dont know how deep this goes but it could be different things.

I dont have any waiting hardware wise and its internal to FreeBSD/pfSense.

It could be the backlog and the way its handled when SYN flooded and also the fact that all traffic is copied to pf filter and inspected and then its forwarded/rejected or whatever its supposed to do…

That we have a bottleneck in that way the packets are handled using NAT. The script used spoofed IP's and is a DDoS. The traffic will at a point be flushed and then the firewall begins to route traffic again until we hit the bottlenneck again and everything halts and become unresponsive.

1 core suddenly uses 100% of the CPU and it stalls depite having enough ressources available.

So its tied to PPS and how they are composed and what the traffic wants in regards to reply from the FW. And not the overall bandwith usage.

One can monitor the usage of CPU in VmWare and see what happens hardware wise when the traffic drops on the traffic graphs. It follow suit in VmWare and everything is reachable from WAN again.

vmxnet3_vmwareload.png_thumb
traffic_drop.PNG_thumb

traffic_drop2.PNG_thumb