Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    What is the biggest attack in GBPS you stopped

    Scheduled Pinned Locked Moved General pfSense Questions
    737 Posts 33 Posters 818.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • T Offline
      tim.mcmanus
      last edited by

      Here is some data from the attack that Supermule, almabes, and I coordinated today.

      The initial SYN flood attack disabled my WAN2 interface and brought my pfSense box to a crawl.  However, I had the console running and saw this:

      This was easy to fix by increasing the amount of states.  My pfSense installation is set to the default limit of 394000, so I first increased it to 4,000,000 and then to 8,000,000.  After doing that, the pfSense box responded fine.  I have 4 NICs in the box—WAN1, WAN2, LAN1, and LAN2.  The attack was on WAN2, and all other interfaces and pfSense worked perfectly normal while WAN2 went down hard.

      Here is part of the Skype transcript with some real-time metrics:

      
      [5/23/15, 5:03:12 PM] Tim McManus: Ok
      [5/23/15, 5:03:25 PM] Tim McManus: UI is good.
      [5/23/15, 5:03:37 PM] Tim McManus: Wan1 good.
      [5/23/15, 5:03:53 PM] Tim McManus: 4mbit attack.
      [5/23/15, 5:04:21 PM] Tim McManus: 3M states.
      [5/23/15, 5:04:43 PM] Tim McManus: 3.2M states.
      [5/23/15, 5:05:07 PM] Tim McManus: Wan2 dead.
      [5/23/15, 5:05:15 PM] Tim McManus: Wan1 fine.
      [5/23/15, 5:06:42 PM] Tim McManus: Wan2 is crushed. 700ms Rtt ping. Normally 30ms tops.
      [5/23/15, 5:06:58 PM] Tim McManus: But the box is running fine.
      [5/23/15, 5:07:06 PM] Tim McManus: 4M states.
      [5/23/15, 5:07:22 PM] Tim McManus: UI is fine.
      [5/23/15, 5:07:48 PM] Tim McManus: UI is real fast.
      [5/23/15, 5:08:03 PM] Tim McManus: Just bumped states to 8M.
      [5/23/15, 5:08:22 PM] Tim McManus: Ping is now 13 ms.
      [5/23/15, 5:08:36 PM] Tim McManus: 115 me it incoming wan2.
      [5/23/15, 5:08:41 PM] Tim McManus: Mbit
      [5/23/15, 5:09:18 PM] Tim McManus: Wan2 is slower but working fine.
      [5/23/15, 5:10:26 PM] Tim McManus: 4.7M states.
      [5/23/15, 5:10:39 PM] Tim McManus: Rtt is 160ms.
      [5/23/15, 5:10:53 PM] Tim McManus: Wan2 down.
      [5/23/15, 5:11:04 PM] Tim McManus: Back
      [5/23/15, 5:11:09 PM] Tim McManus: Rtt 900 ms.
      [5/23/15, 5:11:28 PM] Tim McManus: 4.8M states.
      [5/23/15, 5:11:56 PM] Tim McManus: 4mbit incoming.
      [5/23/15, 5:12:33 PM] Tim McManus: High latency alert on the UI.
      [5/23/15, 5:13:13 PM] Tim McManus: Wan2 down. Wan1 fine.
      [5/23/15, 5:14:28 PM] Tim McManus: UI has been fine since I increased the state table.
      
      

      The console looked like this through the attack:

      Although the box was crippled from the Web UI, the console worked fine.  However, the initial attack took out all of the graphing and it looks like historical RRD graphs were also affected.  I have a gap from when the attack started until after I rebooted the box.  Those services did not come back online.  The box was rebooted a couple of hours after the attack.  Graphing shows that gap.

      I have additional data and metrics from my OpenNMS monitoring box as well as my AllGraphs and system logs.  The development team is more than welcome to PM me for them, but at this point I'm not going to publicly she them.  It seems like the attack will saturate PF states and significantly impairs the box, but increasing the amount of states allows the box to stay up while the interface is taken down.

      We will probably do some additional testing, but this is my quick summary of what we discovered.

      It is important to note that I have 4 x 1Gb NICs, an i3, and 4GB of RAM.  I can list all of the details of this box and the topology in another post if anyone is interested.  We were also running Wireshark during the test, and I have some, but not all, packet captures from that.  That was a useful tool to see when an interface was being attacked and by what method.

      1 Reply Last reply Reply Quote 0
      • T Offline
        tim.mcmanus
        last edited by

        @jimp:

        Can someone give a proper summary of what the problem actually is so others don't have to wade through 20+ pages to find the info?

        Specifically: Is the traffic in question actually passed, or blocked? Is a service on the firewall running pfSense (such as the GUI) exposed to the test source or is the traffic being passed through to an internal host (port forward, routed, etc)? – This is important because a SYN flood to pfSense as a host is completely different than a flood through pfSense as a forwarder/firewall.

        If the traffic is passing through the firewall, using rules to clamp down state limits, or going stateless properly (floating quick OUT rules to pass out with no state along with the pass in rules on the other tabs) may help.

        If the traffic is targeting the firewall itself, then there are things that can be tweaked (syncache parameters, for example), but it's yet another reason the services on the firewall such as the GUI and SSH should not be exposed to the Internet in general. State limits can help there as well.

        Also the "size" of an attack in Mbit/s or Gbit/s is not as important to know as the PPS rate which tends to be the limiting factor when dealing with small packets such as this.

        Some answers to your questions based on today's testing:

        Is the traffic in question actually passed, or blocked?  Both.  It was mostly blocked but the SYN flood was stopped.  There were several different kinds of attacks, but the SYN was blocked.

        Is a service on the firewall running pfSense (such as the GUI) exposed to the test source or is the traffic being passed through to an internal host (port forward, routed, etc)?  Traffic was focused on the external IP to ports 80 and 443.  No services were exposed on either WAN interface.  The external IP of WAN2 forwarded ports 80 and 443 to an internal address running a web server.

        This is important because a SYN flood to pfSense as a host is completely different than a flood through pfSense as a forwarder/firewall.  I believe, but cannot be certain, that when my WAN1 interface was attacked with the same SYN flood we had the same issue.  WAN1 does not forward any ports.  I didn't have Wireshark running at this time on that interface, but we can always retest.

        1 Reply Last reply Reply Quote 0
        • H Offline
          Harvy66
          last edited by

          I don't know if they were the same tests, but my i5 3.2ghz i350-T2 NIC took tens of megabits. I didn't think to have my console on, so I don't know if there was an interrupt storm. The i350 seems to be really good at keeping interrupts low I don't know how, but the interrupt rate is pretty much identical between load and idle during normal usage. But not sure under the attack.

          1 Reply Last reply Reply Quote 0
          • T Offline
            tim.mcmanus
            last edited by

            @Harvy66:

            I don't know if they were the same tests, but my i5 3.2ghz i350-T2 NIC took tens of megabits. I didn't think to have my console on, so I don't know if there was an interrupt storm. The i350 seems to be really good at keeping interrupts low I don't know how, but the interrupt rate is pretty much identical between load and idle during normal usage. But not sure under the attack.

            I specifically wanted to run top during the attack to determine if the issue was load or a specific process.  CPU never hit 50%.  I was hit with 118Mbit attacks, and the other 3 interfaces were fine as well as the pfSense box.

            But I strongly recommend running the console during a test attack.

            1 Reply Last reply Reply Quote 0
            • F Offline
              firewalluser
              last edited by

              So it looks like an i5 handles better than an i3 but dont know if the i5 also has same amount of ram or not?

              Its generally useful to post the hw specs perhaps in the sig as I've found this useful for debugging problems in programming languages.

              I'll be trying this later on today in a couple of VM's (virtual pc's running on vm ware) as I can control number of core's, network speed and other tweaks, not to mention setup many many nics and having 32Gb on my dev machine so I can give the virtual pfsense more ram & more cores to see if that or other hw like spin disks or ssd's becomes a factor.

              Anyone get anywhere with the syn flooding & dtrace links I pm'ed to almabes?

              Capitalism, currently The World's best Entertainment Control System and YOU cant buy it! But you can buy this, or some of this or some of these

              Asch Conformity, mainly the blind leading the blind.

              1 Reply Last reply Reply Quote 0
              • S Offline
                Supermule Banned
                last edited by

                Tim is running pfsense on bare metal. Almabes is running in a VM.

                We have tested the two scenarios before, and both were taken offline.

                Almabes modem died during the test and didnt com back online unless he manually rebooted it. He runs Cisco.

                I run dual Xeon's

                http://ark.intel.com/products/33927/Intel-Xeon-Processor-E5420-12M-Cache-2_50-GHz-1333-MHz-FSB

                A little note here that could contain a clue.

                When I disabled services as per the picture, then the box recovered really quickly in the VM.

                The overall load was lower as expected but after the CPU spiked during the attack, the recovery period was significantly shorter without the services running.

                services.PNG
                services.PNG_thumb
                vmware.PNG
                vmware.PNG_thumb

                1 Reply Last reply Reply Quote 0
                • S Offline
                  Supermule Banned
                  last edited by

                  Somewhat interesting findings this morning…

                  I turned Apinger Daemon on again as the only thing and the box died on me completely when attacked.

                  Recovery time was very long and 2-3 minutes extra downtime before the box was responding again.

                  services.PNG
                  services.PNG_thumb
                  vmware.PNG
                  vmware.PNG_thumb

                  1 Reply Last reply Reply Quote 0
                  • S Offline
                    Supermule Banned
                    last edited by

                    Turned of Apinger and turned on Cron.

                    Much more responsive GUI and the traffic graphs didnt die on me completely this time.

                    Recovered instantly after the attack was stopped.

                    cron.PNG
                    cron.PNG_thumb
                    traffic.PNG_thumb
                    vmware.PNG
                    vmware.PNG_thumb
                    traffic.PNG

                    1 Reply Last reply Reply Quote 0
                    • S Offline
                      Supermule Banned
                      last edited by

                      When enabling NTP, the box responsiveness became worse.

                      Not updating the graphs as quickly and a little worse in recovery time as seen in the little bend in the right hand corner of the last CPU graph from VmWare. It was a 15-20 second longer recovery.

                      traffic.PNG
                      traffic.PNG_thumb
                      services.PNG
                      services.PNG_thumb
                      vmware.PNG
                      vmware.PNG_thumb

                      1 Reply Last reply Reply Quote 0
                      • S Offline
                        Supermule Banned
                        last edited by

                        Enabled Snort to see if it made things worse.

                        The answer to that is yes and no… initial phase was really good, since it took out the initial spike in CPU and made the load more even (slower boost). CPU had a shorter MAX load interval than on previous attacks.

                        It activated a more even CPU usage on the 8 cores and recovery was really good.

                        Traffic graphs didnt fare as well as with Cron only running.

                        services.PNG
                        services.PNG_thumb
                        traffic.PNG
                        traffic.PNG_thumb
                        vmware.PNG
                        vmware.PNG_thumb

                        1 Reply Last reply Reply Quote 0
                        • S Offline
                          Supermule Banned
                          last edited by

                          When Snort AND Cron were enabled the graphs looked like this…

                          Total load was higher, but recovery was fine but initial CPU offloading was not there anymore.

                          services.PNG
                          services.PNG_thumb
                          traffic.PNG
                          traffic.PNG_thumb
                          vmware.PNG_thumb
                          vmware.PNG

                          1 Reply Last reply Reply Quote 0
                          • S Offline
                            Supermule Banned
                            last edited by

                            Enabling Apinger did not do any good to the box as expected.

                            More CPU on top and a minute longer to recover. For what its worth, the impact didnt seem as big as the first test but still a trend (big).

                            Traffic graphs very unresponsive as the rest of the GUI. It dies on the first CPU spike in the GUI.

                            services.PNG
                            services.PNG_thumb
                            traffic.PNG
                            traffic.PNG_thumb
                            vmware.PNG
                            vmware.PNG_thumb

                            1 Reply Last reply Reply Quote 0
                            • F Offline
                              firewalluser
                              last edited by

                              So maybe some services are more resource hungry and/or less refined in the scheme of things.

                              I wonder if there would be any benefit in having two firewalls in series, where the forward facing/1st was stripped of all unnecessary tasks/services, then the 2nd inline firewall had the unnecessary services/tasks running on it?

                              It certainly seems like there might not be any one property, service/task which is at fault, but maybe a combination of things which can affect how well the system stays up.

                              Have you seen the links I pm'ed almabes regarding setting up pfsense to reduce/avoid syn floods? If so did you give them ago and how did they perform?

                              Capitalism, currently The World's best Entertainment Control System and YOU cant buy it! But you can buy this, or some of this or some of these

                              Asch Conformity, mainly the blind leading the blind.

                              1 Reply Last reply Reply Quote 0
                              • S Offline
                                Supermule Banned
                                last edited by

                                I did not see the links.

                                States is about 1% on the box and it has limiters to how many states can be created pr. rule.

                                Running SYn Proxy state with allowance of 50 new connections pr. sec.

                                That allows the state table to have some "air" but it doesnt help much.

                                synproxy_state.PNG
                                synproxy_state.PNG_thumb

                                1 Reply Last reply Reply Quote 0
                                • S Offline
                                  Supermule Banned
                                  last edited by

                                  I have come a BIG step closer to locating the culprit.

                                  Look at the graphs when NTPD is enabled.

                                  It destroys the GUI completely and takes the interfaces offline in the GUI. No response from them. The graphs is a 3 minute attack and only maybe 10 seconds are showing.

                                  Whats really interesting is the VmWare graph. When it spikes for the last time, the GUI comes back and the CPU graph in the GUI starts working again.

                                  Wonder if NTPD and Apinger together could make something?

                                  traffic1.PNG
                                  traffic1.PNG_thumb
                                  traffic.PNG
                                  traffic.PNG_thumb
                                  vmware.PNG_thumb
                                  vmware.PNG

                                  1 Reply Last reply Reply Quote 0
                                  • S Offline
                                    Supermule Banned
                                    last edited by

                                    Deleted the Vmware Tools package and tested again.

                                    Did a little better this time with NTPD and Apinger running.

                                    Little spike before the last one is a reboot. Recovery took about a minute longer than usual.

                                    traffic.PNG
                                    traffic.PNG_thumb
                                    services.PNG
                                    services.PNG_thumb
                                    vmware.PNG
                                    vmware.PNG_thumb

                                    1 Reply Last reply Reply Quote 0
                                    • S Offline
                                      Supermule Banned
                                      last edited by

                                      After deleting the Vmware Tools then I disabled Apinger and NTPD.

                                      The graphs on ESXi looked the same as "normal". No jitter from Apinger and NTPD afterwards.

                                      Recovery was instant. Traffic graphs didnt respond well.

                                      services.PNG
                                      services.PNG_thumb
                                      traffic.PNG
                                      traffic.PNG_thumb
                                      vmware.PNG
                                      vmware.PNG_thumb

                                      1 Reply Last reply Reply Quote 0
                                      • S Offline
                                        Supermule Banned
                                        last edited by

                                        Ping from LAN -> WAN during the flood.

                                        Next will be disabling traffic limitations in the SynProxy settings.

                                        lan2wan.PNG
                                        lan2wan.PNG_thumb

                                        1 Reply Last reply Reply Quote 0
                                        • S Offline
                                          Supermule Banned
                                          last edited by

                                          Running stateless with the box not having any limits to states pr sec. and other things pr. rule settings…

                                          Box ran fine. Responsive. A little fallout on the traffic graphs but not so bad as seen before.

                                          CPU load is a LOT less and only 5 dropped packets to Google via ping.

                                          Instead of beeing crippled to a halt, it actually routed traffic to the server behind.

                                          Whats a little odd, is that the traffic doubled in bandwith from around 4-5mbit/s to around 8-10mbit/s running stateless compared to SynProxy state.

                                          lan2wan.PNG
                                          lan2wan.PNG_thumb
                                          services.PNG
                                          services.PNG_thumb
                                          vmware.PNG
                                          vmware.PNG_thumb
                                          traffic.PNG
                                          traffic.PNG_thumb

                                          1 Reply Last reply Reply Quote 0
                                          • S Offline
                                            Supermule Banned
                                            last edited by

                                            APINGER running and the box is useless….

                                            This is the difference running stateless and apinger vs no apinger.

                                            Spike in CPu is 20% or more on ESXi and recovery takes about a minute longer...

                                            lan2wan.PNG
                                            lan2wan.PNG_thumb
                                            traffic.PNG
                                            traffic.PNG_thumb
                                            vmware.PNG
                                            vmware.PNG_thumb

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.