Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    What is the biggest attack in GBPS you stopped

    Scheduled Pinned Locked Moved General pfSense Questions
    737 Posts 33 Posters 816.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M Offline
      mer
      last edited by

      Those are most likely sysctls, worst case is you can put them in /etc/sysctl.conf to survive a reboot.

      1 Reply Last reply Reply Quote 0
      • F Offline
        firewalluser
        last edited by

        @mer:

        Those are most likely sysctls, worst case is you can put them in /etc/sysctl.conf to survive a reboot.

        Which if sysctl.conf does need to be edited, then this thread https://forum.pfsense.org/index.php?topic=81174.0 or this thread https://forum.pfsense.org/index.php?topic=94511.0 can help those changes survive the reboot because even though the threads discuss syslog.conf and system.inc files, the principle will be the same, ie the filename is not so important but the methods to keep the changes is.

        Capitalism, currently The World's best Entertainment Control System and YOU cant buy it! But you can buy this, or some of this or some of these

        Asch Conformity, mainly the blind leading the blind.

        1 Reply Last reply Reply Quote 0
        • S Offline
          Supermule Banned
          last edited by

          For you to enjoy

          top -HSP

          Youtube Video

          top -asSHz1

          Youtube Video

          1 Reply Last reply Reply Quote 0
          • T Offline
            tim.mcmanus
            last edited by

            @Supermule:

            For you to enjoy

            top -HSP

            Youtube Video

            top -asSHz1

            Youtube Video

            Are you running pfSense in a hypervisor?  If so, you're dealing with an additional abstraction layer.  What does the hypervisor kernel look like?  You should, I think, be able to run top by sshing into ESXi.  However, the point is that running it on a hypervisor may mask r present issues differently than running on bare metal.

            1 Reply Last reply Reply Quote 0
            • S Offline
              Supermule Banned
              last edited by

              With esxtop running

              http://youtu.be/_dimZ1_DO_o

              1 Reply Last reply Reply Quote 0
              • T Offline
                tim.mcmanus
                last edited by

                @Supermule:

                With esxtop running

                http://youtu.be/_dimZ1_DO_o

                Your wait time is insane on the hypervisor.

                The way ESXi works is that it needs all 8 cores free before it will allow the CPU in your VM to process data.  So if you've over-subcribed CPUs, your wait time goes through the roof while the VM waits for all 8 cores to become free to the VM; hence it's waiting for free cores to process data.

                You can impair a VM considerably because of wait time and CPU availability.  It's one of the reasons why Oracle wants you to test their products on iron and will not support you if you're not using their hypervisor.  If you aren't aware of how wait times work at the hypervisor level, it'll bite you in the ass.

                In this case perhaps the hypervisor kernel is using multiple cores on the CPU to accept data from the NIC.  As you hammer that NIC, CPU wait times go up because the hypervisor kernel has a higher priority than your VM, so your VM is left waiting.

                When we tested FreeBSD and CentOS, my hypervisor was crushed.  It wasn't readily apparent if the issue was with the hypervisor or the VM, but I believe it was a combination of the two.

                If you can, remove the hypervisor layer so all you have to deal with is the hardware and BIOS.

                1 Reply Last reply Reply Quote 0
                • S Offline
                  Supermule Banned
                  last edited by

                  Its accumulated idle time. Not the actual VM waiting…

                  Run, %RUN:

                  • This value represents the percentage of absolute time the virtual machine was running on the system.

                  • If the virtual machine is unresponsive, %RUN may indicate that the guest operating system is busy conducting an operation.

                  • When %RUN is near zero and the virtual machine is unresponsive, it means that the virtual machine is idle, blocked on an operation, or is not scheduled due to resource contention. Look at other values (%WAIT, %RDY, and %CSTP) to identify resource contention.

                  • When %RUN is near the value of the number of vCPUS x 100%, it means that all vCPUs in the virtual machine are busy. This is an indicator that the guest operating system may be stuck in a operational loop. To investigate this issue further, you may need to engage the appropriate operating system vendor for assistance in identifying why the guest operating system is using all of the CPU resources.

                  • If you have engaged the guest operating system vendor, and they have determined that the issue is caused by the VMware Tools or the virtual machine hardware, it may be pertinent to suspend the virtual machine to collect additional diagnostic information.

                  Wait, %WAIT:

                  • This value represents the percentage of time the virtual machine was waiting for some VMkernel activity to complete (such as I/O) before it can continue.

                  • If the virtual machine is unresponsive and the %WAIT value is proportionally higher than %RUN, %RDY, and %CSTP, then it can indicate that the world is waiting for a VMkernel operation to complete.

                  • You may observe that the %SYS is proportionally higher than %RUN. %SYS represents the percentage of time spent by system services on behalf of the virtual machine.

                  • A high %WAIT value can be a result of a poorly performing storage device where the virtual machine is residing. If you are experiencing storage latency and timeouts, it may trigger these types of symptoms across multiple virtual machines residing in the same LUN, volume, or array depending on the scale of the storage performance issue.

                  • A high %WAIT value can also be triggered by latency to any device in the virtual machine configuration. This can include but is not limited to serial pass-through devices, parallel pass-through parallel , and USB devices. If the device suddenly stops functioning or responding, it can result in these symptoms. A common cause for a high %WAIT value is ISO files that are left mounted in the virtual machine accidentally are either deleted or moved to an alternate location. For more information, see Deleting a datastore from the Datastore inventory results in the error: device or resource busy (1015791).

                  • If there does not appear to be any backing storage or networking infrastructure issue, it may be pertinent to crash the virtual machine to collect additional diagnostic information.

                  Ready, %RDY:

                  • This value represents the percentage of time that the virtual machine is ready to execute commands, but has not yet been scheduled for CPU time due to contention with other virtual machines.

                  • Compare against the Max-Limited, %MLMTD value. This represents the amount of time that the virtual machine was ready to execute, but has not been scheduled for CPU time because the VMkernel deliberately constrained it. For more information, see the Managing Resource Pools section of the vSphere Monitoring and Performance Guide or Resource Management Guide.

                  • If the virtual machine is unresponsive or very slow and %MLMTD is low, it may indicate that the ESX host has limited CPU time to schedule for this virtual machine.

                  Co-stop, %CSTP:

                  • This value represents the percentage of time that the virtual machine is ready to execute commands but that it is waiting for the availability of multiple CPUs as the virtual machine is configured to use multiple vCPUs.

                  • If the virtual machine is unresponsive and %CSTP is proportionally high compared to %RUN, it may indicate that the ESX host has limited CPU resources, simultaneously co-schedule all vCPUs in this virtual machine.

                  • Review the usage of virtual machines running with multiple vCPUs on this host. For example, a virtual machine with four vCPUs may need to schedule 4 pCPUs to do an operation. If there are multiple virtual machines configured in this way, it may lead to CPU contention and resource starvation.

                  1 Reply Last reply Reply Quote 0
                  • T Offline
                    tim.mcmanus
                    last edited by

                    @Supermule:

                    Its accumulated idle time. Not the actual VM waiting…

                    I disagree.  Try dropping the cores down to two and see if that wait time drops.

                    I actually had a conversation with VMware about this yesterday.  Our SQL farm is getting crushed by wait time due to CPU over-subscription.  We're about to start a VM right-sizing exercise to address the issue.

                    1 Reply Last reply Reply Quote 0
                    • S Offline
                      Supermule Banned
                      last edited by

                      %RUN = 290.51 (could be 400%) so it means not all CPU's are busy
                      %WAIT=408.70 (Could be my NAS at home where the VMresides, Synology 1813+)
                      %RDY=0.18 (% time a vCPU was ready to be scheduled on a physical processor but couldn’t due to processor contention. Threshold: 10% per vCPU)
                      %MLMTD=0.00 (% time the VM was ready to run but wasn’t scheduled because it would violate the CPU Limit set. Threshold: 0%)
                      %CTSP=0.00% (% time a vCPU in an SMP virtual machine is “stopped” from executing, so that another vCPU in the same virtual machine could be run to “catch-up” and make sure the skew between the two virtual processors doesn’t grow too large. Threshold: 3%)

                      It doesnt indicate that the hypervisor subsystem suffers in any way, but storage wise(maybe).

                      CTSP_capture.PNG_thumb
                      CTSP_capture.PNG

                      1 Reply Last reply Reply Quote 0
                      • T Offline
                        tim.mcmanus
                        last edited by

                        Yes, not all CPU will be busy, but all CPUs allocated to a VM must be present for the VM to compute.

                        For example, if you have 16 cores and 12 are busy, and a VM with 8 cores comes along, it will wait until 8 cores are available to process data.  It cannot use the 4 cores available in the example above, it must find 8 available cores and then compute whether or not it needs all 8 cores.  It's an allocation model.  So CPU contention can create excessive wait times while CPU utilization is very low.

                        1 Reply Last reply Reply Quote 0
                        • S Offline
                          Supermule Banned
                          last edited by

                          Yes but it didnt :)

                          I can reissue 8 cores and test again. :)

                          1 Reply Last reply Reply Quote 0
                          • S Offline
                            Supermule Banned
                            last edited by

                            Even with 8 cores it doesnt get past 0.33 %CSTP :)

                            1 Reply Last reply Reply Quote 0
                            • S Offline
                              Supermule Banned
                              last edited by

                              I rebooted the server behind that receives the traffic while attacking.

                              Packet loss still occured and then I shut it down completely.

                              In the end you see the traffic settle in on around 7.5mbit/s and become a straight line.

                              Then packets begin to flow and CPU4 settles below 100%.

                              Youtube Video

                              What you reckon Tim?

                              1 Reply Last reply Reply Quote 0
                              • S Offline
                                Supermule Banned
                                last edited by

                                This is the IOPS I see on the storage attached as NFS during an attack.

                                iops_SAN.PNG
                                iops_SAN.PNG_thumb

                                1 Reply Last reply Reply Quote 0
                                • T Offline
                                  tim.mcmanus
                                  last edited by

                                  @Supermule:

                                  What you reckon Tim?

                                  I reckon it's going to be very complicated troubleshooting an apparent kernel issue in FreeBSD while it's running on a hypervisor.

                                  1 Reply Last reply Reply Quote 0
                                  • F Offline
                                    firewalluser
                                    last edited by

                                    @tim.mcmanus:

                                    Yes, not all CPU will be busy, but all CPUs allocated to a VM must be present for the VM to compute.

                                    For example, if you have 16 cores and 12 are busy, and a VM with 8 cores comes along, it will wait until 8 cores are available to process data.  It cannot use the 4 cores available in the example above, it must find 8 available cores and then compute whether or not it needs all 8 cores.  It's an allocation model.  So CPU contention can create excessive wait times while CPU utilization is very low.

                                    Correct. https://communities.vmware.com/message/2275523

                                    IF anything you'd be better off reducing the number of cores a VM needs as minimum rather than give it the most it can use, as the overhead switching or timeslicing at the host (ESXi) level can be further reduced just like sometimes its faster to run software single threaded on a single core than a multithread app across all cores, as the latter introduces more locks & overheads at the sw level as well as the cpu level not to mention having to share the bus to devices like the hw, nics or ram.

                                    Besides running alot of OS tended to be faster as a VM instead of bare metal, eg I can install W7x32 as a VM in just 6mins 23 seconds from hitting play on the VM for the first time, various reboots, assigning username & password, updates settings all the way to being on the desktop for the very first time. I might be able to shave a few more seconds off by not running other VM's at the same time, but I cant do that time installing W7x32 on the same bare metal machine which is a testament to VMware.

                                    Capitalism, currently The World's best Entertainment Control System and YOU cant buy it! But you can buy this, or some of this or some of these

                                    Asch Conformity, mainly the blind leading the blind.

                                    1 Reply Last reply Reply Quote 0
                                    • T Offline
                                      tim.mcmanus
                                      last edited by

                                      @Supermule:

                                      Wait, %WAIT:

                                      • This value represents the percentage of time the virtual machine was waiting for some VMkernel activity to complete (such as I/O) before it can continue.

                                      • If the virtual machine is unresponsive and the %WAIT value is proportionally higher than %RUN, %RDY, and %CSTP, then it can indicate that the world is waiting for a VMkernel operation to complete.

                                      • You may observe that the %SYS is proportionally higher than %RUN. %SYS represents the percentage of time spent by system services on behalf of the virtual machine.

                                      • A high %WAIT value can be a result of a poorly performing storage device where the virtual machine is residing. If you are experiencing storage latency and timeouts, it may trigger these types of symptoms across multiple virtual machines residing in the same LUN, volume, or array depending on the scale of the storage performance issue.

                                      • A high %WAIT value can also be triggered by latency to any device in the virtual machine configuration. This can include but is not limited to serial pass-through devices, parallel pass-through parallel , and USB devices. If the device suddenly stops functioning or responding, it can result in these symptoms. A common cause for a high %WAIT value is ISO files that are left mounted in the virtual machine accidentally are either deleted or moved to an alternate location. For more information, see Deleting a datastore from the Datastore inventory results in the error: device or resource busy (1015791).

                                      • If there does not appear to be any backing storage or networking infrastructure issue, it may be pertinent to crash the virtual machine to collect additional diagnostic information.

                                      You had an 800%-900% wait time in the video you posted.  Holy crap, that's insane!  See the part above that talks about latency.  This is a smoking gun!

                                      I've been very consistent when I've said that testing pfSense on a VM isn't going to give you tangible data because you'll also need to troubleshoot the hypervisor layer at the same time.

                                      Please, please, please stop wasting your time testing this issue on a hypervisor.  Put pfSense on bare metal and test it there.

                                      1 Reply Last reply Reply Quote 0
                                      • dennypageD Offline
                                        dennypage
                                        last edited by

                                        I couldn't agree more.

                                        @tim.mcmanus:

                                        Please, please, please stop wasting your time testing this issue on a hypervisor.  Put pfSense on bare metal and test it there.

                                        1 Reply Last reply Reply Quote 0
                                        • N Offline
                                          NOYB
                                          last edited by

                                          @dennypage:

                                          I couldn't agree more.

                                          @tim.mcmanus:

                                          Please, please, please stop wasting your time testing this issue on a hypervisor.  Put pfSense on bare metal and test it there.

                                          Ditto

                                          Too much shot-gunning going on in this effort instead of a methodical systematic approach.

                                          Begin with minimalist install / config (bare metal, no packages, no services, etc.) and work up to point of failure.

                                          If the minimalist install / config fails then go back even further to either earlier pfSense versions, or better yet to FreeBSD itself until the issue does not exist.  Then systematically move forward  adding to that config until the issue appears.

                                          1 Reply Last reply Reply Quote 0
                                          • S Offline
                                            Supermule Banned
                                            last edited by

                                            Jun 4 09:28:36    check_reload_status: Reloading filter
                                            Jun 4 09:28:36    check_reload_status: Restarting OpenVPN tunnels/interfaces
                                            Jun 4 09:28:36    check_reload_status: Restarting ipsec tunnels
                                            Jun 4 09:28:36    check_reload_status: updating dyndns Yousee
                                            Jun 4 09:06:36    check_reload_status: Reloading filter
                                            Jun 4 09:06:33    check_reload_status: Syncing firewall
                                            Jun 4 09:05:08    check_reload_status: Reloading filter
                                            Jun 4 09:05:06    check_reload_status: Syncing firewall
                                            Jun 4 09:01:12    kernel: em0: promiscuous mode enabled

                                            When this happens the firewall encounters packetloss.

                                            Disabling promiscious mode on em0 and em1 solves it and makes it endure.

                                            Running CRON job every 60 seconds will make you survive a SYN flood.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.