What is the biggest attack in GBPS you stopped
-
So whats the number of CPU's and cores available?
What VM's are you running and how many CPU's/Cores do each need as a minimum?
Possibly the best way to visualise this is a bit like a game of tetris but various sized horizontal sized blocks which are your minimum core requirements for the VM OS.
If you have a 4 core baremetal then your Tetris game is just four blocks wide, if its a 2 cpu, 8 core, the your Tetris game is 16 blocks wide.
When the blocks reach the base, thats your hypervisor's timeslice to work on the physical cpu's/cores.
Your task is to try and fit as many horizontal blocks representing the core requirement of your VM OS's to use up all the space. This is why its best to run VM's as a minimum sized cores/vCPU's in most cases.
eg say you have 4 OS's and 1 cpu/4 cores.
OS1 needs 2 cores min but can use 10 cores max.
OS2 needs 4 cores min but can use 8 cores max
OS3 needs 1 core min but can use 4 cores max
OS4 needs 1 core min but can use 2 cores max.Whats the most efficient way to set these OS's up as VM's in order to maximise the time slices for each?
If you went:
OS1 4cores
OS2 4cores
OS3 4cores
OS4 2coresWhen ever OS4 had its time slice you waste 2 physical cores.
This approach would also make for a clunky setup because if any OS needs to talk to another, then you have no two OS's running in the same time slice to communicate with each other, they will all have to wait 4 time slices to before they can get back and process their stuff.
If you went
OS1 2cores
OS2 4cores
OS3 1cores
OS4 1coresThen OS3 & OS4 can run when ever OS1 runs so you now have a "block" where you in effect have OS1, OS3 and OS4 that can operate in one timeslice and OS2 can operate in another timeslice. ESXi only has to swap between two different blocks of OS's (OS1,OS3 & OS4) and OS2, which makes for a more responsive setup as the OS's in the first block can communicate between themselves if need be in teh same timeslice and so the only extra wait time for any communication is swapping from block 1(OS1, OS3 & OS4) to block 2(OS2) which is running OS2.
Does that make sense and easier to understand?
Its more complicated than that because next you have RAM requirements to consider as well, but a similar principle exists, ie if you err towards less ram, ESXi doesnt have to spend time loading and unloading ram for each VM running in the timeslice. Its best to have the ram requirements for each VM' fit within the physical ram.
So if you take the 2nd example above.
OS1 2cores
OS2 4cores
OS3 1cores
OS4 1coresYou have lets say 32Gb of physical ram.
OS1 can use 8GB to 32GB
OS2 can use 4GB to 16GB
OS3 can use 2GB to 4GB
OS3 can use 4GB to 16GBOS1 2cores/32GB
OS2 4cores/16GB
OS3 1cores/4GB
OS4 1cores/16GBThen even though the first block(OS1,OS3&OS4) can share the physical CPU's, they cant share the physical as you would need 42GB of physical ram.
But if you went
OS1 2cores/16GB
OS2 4cores/16GB
OS3 1cores/4GB
OS4 1cores/8GBThen the first block (OS1,OS3&OS4) can share the physical ram as the total amount is 32GB, and the 2nd block(OS2) will use 16Gb with 16Gb going spare doing nothing.
If you notice I gave OS1 16GB, this is because even though OS4 can also use 16GB its only got 1 core.
However you then also need to look at what the tasks are that are going to be running on each VM.
Databases love ram, the more ram you have the more you can load the DB into ram already sorted into the most popular views that users use the most.
MS Exchange is similar ie its just a big DB but it offloads alot of its work to the workstation so Outlook will often have a copy of what is stored in Exchange so outlook only has to access the local disk, but Exchange will have lots of connections like keep-alive open for smart phones so that it can "push" emails & other things to the phones.
Webservers depends on what they are doing, some maybe front for DB's running on the same VM or maybe not, but hopefully that will give you a better overview of whats going on at a lower level. :)
-
I have divided the 2 running VM's here at home on 2 different sockets with 4 cores each.
Didnt matter at all until I ran the cron job disabling promiscous mode.
Its in the reject state on the hypervisor Vswitch allready….
-
I couldn't agree more.
Please, please, please stop wasting your time testing this issue on a hypervisor. Put pfSense on bare metal and test it there.
IT ISN'T BETTER ON BAREMETAL. Problem still exist. I tried several times on my bare metal supermicro.
Read the thread and the other threads again. You will see the history.- i am though not using pfsense anymore. So i can no more test
-
I have divided the 2 running VM's here at home on 2 different sockets with 4 cores each.
Didnt matter at all until I ran the cron job disabling promiscous mode.
Its in the reject state on the hypervisor Vswitch allready….
You know about promiscuous mode can mess up some nics? You'll see this note in the packet capture amongst other places.
What are your System:Advanced:Networking, Networking tab, Network checkboxes set to?
How did you install the VM?
Did you install from iso on the ESXi server or setup pfsense on a different bare metal/VMware host like VMware workstation, cloned it then moved it to the ESXi server in question?pfsense is one vm, whats the other VM?
-
1: Yes.
2: Picture attached.
3: From ISO directly on to the ESXi
4: Homeserver (Windows 2008 R2)
-
Disabling Apinger so the interface doesnt get restarted all the time during an attack.
IN the traffic graph you are able to see the drop in traffic after the ifconfig em0 -promisc reload in cron.
That makes the firewall come alive and start routing packets again.
I have attached the screenshot before and after the reload as running top -HSP
![top -HSP before reload of -promisc.PNG](/public/imported_attachments/1/top -HSP before reload of -promisc.PNG)
![top -HSP before reload of -promisc.PNG_thumb](/public/imported_attachments/1/top -HSP before reload of -promisc.PNG_thumb)
![top -HSP after reload of -promisc.PNG](/public/imported_attachments/1/top -HSP after reload of -promisc.PNG)
![top -HSP after reload of -promisc.PNG_thumb](/public/imported_attachments/1/top -HSP after reload of -promisc.PNG_thumb) -
In your pic, the first ticked checkbox is normally unticked for default.
Have you been toggling these?
If so notice any difference?Have you been through the pfsense 2 on VMware ESXI 5.5 pfsense docs to check settings?
When you run the DDOS is homeserver running as well? I know thats your aim ultimately, but does pfsense perform better without it running?
Do you have the Sata driver installed? Check out airvpn.org/topic/11847-pfsense-performance-configs-on-esxi-vmware/
that might be a lead?Hows your management channel setup?
Apologies if you have posted your VM settings, I dont recall seeing them, but if you havent can you post them as its a case of trying to see if thats been setup properly and not causing the problem which is making pfsense fail under the ddos. We cant rule the ESXi VM guest settings out just yet imo.
Got to go out for a couple hours now, but its definately worth going back over all the settings.
Have you even setup a basic pfsense with minimal settings, no packages, no config changes other than ip address changes for nics to see how that copes with the DDOS?
I think this is a back to basics moment like others have suggested, although I know baremetal isnt an option, but making sure the guest is configured right and then installing a basic pfsense installation would be my next move. If that handles the DDOS, I'd pull the XML backups and compare differences as its easy to miss something when toggling various settings in situations like this.
Good luck! :)
Disabling Apinger so the interface doesnt get restarted all the time during an attack.
IN the traffic graph you are able to see the drop in traffic after the ifconfig em0 -promisc reload in cron.
That makes the firewall come alive and start routing packets again.
I have attached the screenshot before and after the reload as running top -HSP
Although I could only get 2.42Mbps, apinger was still getting out for me as its only got to ping some ip addresses a couple hops away unlike your ddos traffic which is coming from all around the world and thus further away. The network infrastructure would let ip addresses closer to me get through as the bottle necks would be further away.
-
In your pic, the first ticked checkbox is normally unticked for default.
Have you been toggling these?
If so notice any difference?Not seen any difference at all.
Have you been through the pfsense 2 on VMware ESXI 5.5 pfsense docs to check settings?
Yes
When you run the DDOS is homeserver running as well?
Yes.
I know thats your aim ultimately, but does pfsense perform better without it running?
No difference.
Do you have the Sata driver installed?
No running on a scsi controller.
Hows your management channel setup?
Not understood. Using vsphere client, console and Putty.
Apologies if you have posted your VM settings, I dont recall seeing them, but if you havent can you post them as its a case of trying to see if thats been setup properly and not causing the problem which is making pfsense fail under the ddos. We cant rule the ESXi VM guest settings out just yet imo.
Got to go out for a couple hours now, but its definately worth going back over all the settings.
Enjoy.
Have you even setup a basic pfsense with minimal settings, no packages, no config changes other than ip address changes for nics to see how that copes with the DDOS?
Yes. It didnt do very well.
I think this is a back to basics moment like others have suggested, although I know baremetal isnt an option, but making sure the guest is configured right and then installing a basic pfsense installation would be my next move. If that handles the DDOS, I'd pull the XML backups and compare differences as its easy to miss something when toggling various settings in situations like this.
I haves asked Tim and Almabes if they have any available. Apparently it does make any difference.
Good luck! :)
Thanks :)
-
Any way to monitor cron jobs in real time??
Tried crontab -l but it says it cannot find any for the user root…
Tried changing it to crontab - admin -l but that doesnt work either.
I want to have a console running so I can see specifically when cron is run since it doesnt say anything in the system logs.
This is a real bitch to trouble shoot internally.................................
-
cat /etc/crontab
log for cron is typically /var/log/cronLooks like there is also a few things run by "minicron", look at /etc/rc:
rc:cd /tmp && /usr/sbin/cron -s 2>/dev/null
rc:/usr/local/bin/minicron 240 $varrunpath/ping_hosts.pid /usr/local/bin/ping_hosts.sh
rc:/usr/local/bin/minicron 3600 $varrunpath/expire_accounts.pid '/usr/local/sbin/fcgicli -f /etc/rc.expireaccounts'
rc:/usr/local/bin/minicron 86400 $varrunpath/update_alias_url_data.pid '/usr/local/sbin/fcgicli -f /etc/rc.update_alias_url_data'
rc: /usr/local/bin/minicron 60 /var/run/gmirror_status_check.pid /usr/local/sbin/gmirror_status_check.php -
Hows your management channel setup?
Not understood. Using vsphere client, console and Putty."
Either in the pfsense docs or the AirVPN link it mentions having the management channel setup a particular way, might be worth checking out.
Are you still getting the massive waits?
www.yellow-bricks.com/2012/07/17/why-is-wait-so-high/
Your %wait figure might be a red herring as we say. :)
Basically the link suggests the %wait includes idle time and %vmwait might be a better figure.
%wait is like MS including CPU idle time in the CPU processor load. ::)
-
Hows your management channel setup?
Not understood. Using vsphere client, console and Putty."
Either in the pfsense docs or the AirVPN link it mentions having the management channel setup a particular way, might be worth checking out.
Are you still getting the massive waits?
www.yellow-bricks.com/2012/07/17/why-is-wait-so-high/
Your %wait figure might be a red herring as we say. :)
Basically the link suggests the %wait includes idle time and %vmwait might be a better figure.
%wait is like MS including CPU idle time in the CPU processor load. ::)
%wait is very important.
Network->NIC->hypervisor Kernel->VM->VM NIC->VM kernel (and then back down the stack to move a packet)
If the hypervisor kernel is consuming resources, the VM isn't going to get any. So your VM wait time will be high, but the VM kernel activity will be low. You could also be dropping packets prior to getting to the VM because of high wait times. This is why enterprises regularly go through right-sizing activities. You'll see low CPU utilization on your VMs but very high wait times, and your performance will mysteriously suck.
-
Have you tried earlier versions of ESXi and/or pfsense?
Are you still on pfsense 2.1 or did you try 2.2?
It might be a build conflict somewhere, maybe even at the bios level. I've seen bios make some hw drag.
Can you try same setup on different hw maybe with a different provider?
Might be worth spinning something up on the amazon cloud to test although thats got its own custom build of pfsense, so maybe have a look at those settings to see what differences there are. You might get some clues from that as to what settings are important.
-
Yes…. but wait is not the problem.
I can easily move it away and on to local storage on the host if you want me to and test again.
Notice the sudden drop in traffic and the firewall comes alive. WHY??
-
Rather than more it to local hw, how about a different online provider, it might be your existing online provider who has the issue, never assume everyone knows what they are talking about. ;D
Always question everything. ;)
-
Have you tried earlier versions of ESXi and/or pfsense?
Yes. All the way back to 1.2.3 and all failed. ESXi from 4.1 to 6.0. ESXi 6.0 actually becomes unstable when testing. M0n0wall has been tested as well as OPNsense. OPNsense did better. Mikrotik wasnt even touched by the attack.
Are you still on pfsense 2.1 or did you try 2.2?
Production is on 2.1.5 and this testing is also done on 2.2.2 REL.
It might be a build conflict somewhere, maybe even at the bios level. I've seen bios make some hw drag.
Can you try same setup on different hw maybe with a different provider?
Its not a provider issue since I see the same patterne on 10gbit, 1gbit and 100mbit connections.
Might be worth spinning something up on the amazon cloud to test although thats got its own custom build of pfsense, so maybe have a look at those settings to see what differences there are. You might get some clues from that as to what settings are important.
Your choice but then we dont have a clue of whats in front and total control over the test environment.
-
My base FreeBSD 10.1 box will be back up and running again by Saturday. I had to repurpose it's already repurposed hardware, and I'll have more time to assist with data collection.
This whole thing of having to work for a living can really get in the way of the fun with packets.
-
Wait as stated is not the issue.
Moved the VM to a 6 disk/RAID10 local setup on a IBM X3650 from a 8 disk RAID10 NAS storage device.
Very little difference in %WAIT
But the drop in traffic and the FW coming alive is the same. SOMETHING is making it handle differently….
-
Wait as stated is not the issue.
Moved the VM to a 6 disk/RAID10 local setup on a IBM X3650 from a 8 disk RAID10 NAS storage device.
http://youtu.be/tD5A-kElWw8
Very little difference in %WAIT
Wait state isn't the issue, correct. But it will mask the underlying kernel problem. You'll never see the IRQ interrupt storm on a VM because it's the hypervisor kernel that's managing the hardware, not the VM kernel.
-
What happens if you slow the nic speed right down to its slowest setting so the nic acts as a throttle?
I dont know if it will be worth even trying half duplex at this stage.Try the slowest speed with System:Advanced:Networking, Network Interfaces tick boxes 2,3 & 4 unticked so the nic handles more of the packet processing.
2 = disable hw checksum offload
3= disable hw tcp segmentation offload
4= disable hw large receive offload.I dont even know if these check boxes will have any effect [edit]running as a VM [/edit]in ESXi either, so it might be worth setting the nic speed in ESXi as another test.
This is just a WAG though.