What is the biggest attack in GBPS you stopped

mer

cat /etc/crontab
log for cron is typically /var/log/cron

Looks like there is also a few things run by "minicron", look at /etc/rc:

rc:cd /tmp && /usr/sbin/cron -s 2>/dev/null
rc:/usr/local/bin/minicron 240 $varrunpath/ping_hosts.pid /usr/local/bin/ping_hosts.sh
rc:/usr/local/bin/minicron 3600 $varrunpath/expire_accounts.pid '/usr/local/sbin/fcgicli -f /etc/rc.expireaccounts'
rc:/usr/local/bin/minicron 86400 $varrunpath/update_alias_url_data.pid '/usr/local/sbin/fcgicli -f /etc/rc.update_alias_url_data'
rc: /usr/local/bin/minicron 60 /var/run/gmirror_status_check.pid /usr/local/sbin/gmirror_status_check.php

firewalluser

Hows your management channel setup?

Not understood. Using vsphere client, console and Putty."

Either in the pfsense docs or the AirVPN link it mentions having the management channel setup a particular way, might be worth checking out.

Are you still getting the massive waits?

www.yellow-bricks.com/2012/07/17/why-is-wait-so-high/

Your %wait figure might be a red herring as we say. :)

Basically the link suggests the %wait includes idle time and %vmwait might be a better figure.

%wait is like MS including CPU idle time in the CPU processor load. ::)

tim.mcmanus

@firewalluser:

Hows your management channel setup?

Not understood. Using vsphere client, console and Putty."

Either in the pfsense docs or the AirVPN link it mentions having the management channel setup a particular way, might be worth checking out.

Are you still getting the massive waits?

www.yellow-bricks.com/2012/07/17/why-is-wait-so-high/

Your %wait figure might be a red herring as we say. :)

Basically the link suggests the %wait includes idle time and %vmwait might be a better figure.

%wait is like MS including CPU idle time in the CPU processor load. ::)

%wait is very important.

Network->NIC->hypervisor Kernel->VM->VM NIC->VM kernel (and then back down the stack to move a packet)

If the hypervisor kernel is consuming resources, the VM isn't going to get any. So your VM wait time will be high, but the VM kernel activity will be low. You could also be dropping packets prior to getting to the VM because of high wait times. This is why enterprises regularly go through right-sizing activities. You'll see low CPU utilization on your VMs but very high wait times, and your performance will mysteriously suck.

firewalluser

Have you tried earlier versions of ESXi and/or pfsense?

Are you still on pfsense 2.1 or did you try 2.2?

It might be a build conflict somewhere, maybe even at the bios level. I've seen bios make some hw drag.

Can you try same setup on different hw maybe with a different provider?

Might be worth spinning something up on the amazon cloud to test although thats got its own custom build of pfsense, so maybe have a look at those settings to see what differences there are. You might get some clues from that as to what settings are important.

Supermule

Yes…. but wait is not the problem.

I can easily move it away and on to local storage on the host if you want me to and test again.

Youtube Video

Notice the sudden drop in traffic and the firewall comes alive. WHY??

firewalluser

Rather than more it to local hw, how about a different online provider, it might be your existing online provider who has the issue, never assume everyone knows what they are talking about. ;D

Always question everything. ;)

Supermule

@firewalluser:

Have you tried earlier versions of ESXi and/or pfsense?

Yes. All the way back to 1.2.3 and all failed. ESXi from 4.1 to 6.0. ESXi 6.0 actually becomes unstable when testing. M0n0wall has been tested as well as OPNsense. OPNsense did better. Mikrotik wasnt even touched by the attack.

Are you still on pfsense 2.1 or did you try 2.2?

Production is on 2.1.5 and this testing is also done on 2.2.2 REL.

It might be a build conflict somewhere, maybe even at the bios level. I've seen bios make some hw drag.

Can you try same setup on different hw maybe with a different provider?

Its not a provider issue since I see the same patterne on 10gbit, 1gbit and 100mbit connections.

Might be worth spinning something up on the amazon cloud to test although thats got its own custom build of pfsense, so maybe have a look at those settings to see what differences there are. You might get some clues from that as to what settings are important.

Your choice but then we dont have a clue of whats in front and total control over the test environment.

almabes

My base FreeBSD 10.1 box will be back up and running again by Saturday. I had to repurpose it's already repurposed hardware, and I'll have more time to assist with data collection.

This whole thing of having to work for a living can really get in the way of the fun with packets.

Supermule

Wait as stated is not the issue.

Moved the VM to a 6 disk/RAID10 local setup on a IBM X3650 from a 8 disk RAID10 NAS storage device.

Youtube Video

Very little difference in %WAIT

But the drop in traffic and the FW coming alive is the same. SOMETHING is making it handle differently….

tim.mcmanus

@Supermule:

Wait as stated is not the issue.

Moved the VM to a 6 disk/RAID10 local setup on a IBM X3650 from a 8 disk RAID10 NAS storage device.

http://youtu.be/tD5A-kElWw8

Very little difference in %WAIT

Wait state isn't the issue, correct. But it will mask the underlying kernel problem. You'll never see the IRQ interrupt storm on a VM because it's the hypervisor kernel that's managing the hardware, not the VM kernel.

firewalluser

What happens if you slow the nic speed right down to its slowest setting so the nic acts as a throttle?
I dont know if it will be worth even trying half duplex at this stage.

Try the slowest speed with System:Advanced:Networking, Network Interfaces tick boxes 2,3 & 4 unticked so the nic handles more of the packet processing.
2 = disable hw checksum offload
3= disable hw tcp segmentation offload
4= disable hw large receive offload.

I dont even know if these check boxes will have any effect [edit]running as a VM [/edit]in ESXi either, so it might be worth setting the nic speed in ESXi as another test.

This is just a WAG though.

dennypage

It's not about the problem being better on bare metal. It's about reducing the number of variables in the test. A basic principle in problem isolation is to eliminate as many variables as possible. You identify the simplest configuration that demonstrates the problem, and then work with that.

VM infrastructure is a massive variable when you are trying to diagnose an under-load kernel issue.

@lowprofile:

@dennypage:

I couldn't agree more.

@tim.mcmanus:

Please, please, please stop wasting your time testing this issue on a hypervisor. Put pfSense on bare metal and test it there.

IT ISN'T BETTER ON BAREMETAL. Problem still exist. I tried several times on my bare metal supermicro.
Read the thread and the other threads again. You will see the history.

i am though not using pfsense anymore. So i can no more test

Supermule

Nothing so far despite setting everything on ESXi, Switch and pfsense…

Playing around with settings on disabling offloading didnt yield anything either.

Not a blip difference.

@firewalluser:

What happens if you slow the nic speed right down to its slowest setting so the nic acts as a throttle?
I dont know if it will be worth even trying half duplex at this stage.

Try the slowest speed with System:Advanced:Networking, Network Interfaces tick boxes 2,3 & 4 unticked so the nic handles more of the packet processing.
2 = disable hw checksum offload
3= disable hw tcp segmentation offload
4= disable hw large receive offload.

I dont even know if these check boxes will have any effect [edit]running as a VM [/edit]in ESXi either, so it might be worth setting the nic speed in ESXi as another test.

This is just a WAG though.

NOYB

@dennypage:

It's not about the problem being better on bare metal. It's about reducing the number of variables in the test. A basic principle in problem isolation is to eliminate as many variables as possible. You identify the simplest configuration that demonstrates the problem, and then work with that.

VM infrastructure is a massive variable when you are trying to diagnose an under-load kernel issue.

@lowprofile:

@dennypage:

I couldn't agree more.

@tim.mcmanus:

Please, please, please stop wasting your time testing this issue on a hypervisor. Put pfSense on bare metal and test it there.

IT ISN'T BETTER ON BAREMETAL. Problem still exist. I tried several times on my bare metal supermicro.
Read the thread and the other threads again. You will see the history.

i am though not using pfsense anymore. So i can no more test

Ditto, ditto, ditto…

Please don't make me resurrect the KISS philosophy subject again. It doesn't seem to do well for my karma. ;)

firewalluser

I think the inevitable use of Dtrace is upon us. :)

I dont know the status of FreeBSD 11, but if its at least in a Release Candidate status, is it worth trying to port pfsense onto a FreeBSD 11 build?

Although even if we could, if the problems still showed up, we'd still need Dtrace….

Is there really no debugging facility built into pfsense?

Supermule

I get this in the system logs as of today…

php-fpm[60486]: /interfaces.php: The command '/usr/local/sbin/dhcpd -user dhcpd -group _dhcp -chroot /var/dhcpd -cf /etc/dhcpd.conf -pf /var/run/dhcpd.pid em1' returned exit code '1', the output was 'Internet Systems Consortium DHCP Server 4.2.6 Copyright 2004-2014 Internet Systems Consortium. All rights reserved. For info, please visit https://www.isc.org/software/dhcp/ Wrote 13 leases to leases file. Can't install new lease database /var/db/dhcpd.leases.1433496021 to /var/db/dhcpd.leases: No such file or directory Listening on BPF/em1/00:50:56:a5:77:03/192.168.10.0/24 Sending on BPF/em1/00:50:56:a5:77:03/192.168.10.0/24 Can't bind to dhcp address: Address already in use Please make sure there is no other dhcp server running and that there's no entry for dhcp or bootp in /etc/inetd.conf. Also make sure you are not running HP JetAdmin software, which includes a bootp server. If you did not get this software from ftp.isc.org, please get the latest from ftp.isc.org and install that befo

And this

php-fpm[85395]: /rc.newwanip: The command '/usr/local/sbin/dhcpd -user dhcpd -group _dhcp -chroot /var/dhcpd -cf /etc/dhcpd.conf -pf /var/run/dhcpd.pid em1' returned exit code '1', the output was 'Internet Systems Consortium DHCP Server 4.2.6 Copyright 2004-2014 Internet Systems Consortium. All rights reserved. For info, please visit https://www.isc.org/software/dhcp/ Wrote 13 leases to leases file. Listening on BPF/em1/00:50:56:a5:77:03/192.168.10.0/24 Sending on BPF/em1/00:50:56:a5:77:03/192.168.10.0/24 Can't bind to dhcp address: Address already in use Please make sure there is no other dhcp server running and that there's no entry for dhcp or bootp in /etc/inetd.conf. Also make sure you are not running HP JetAdmin software, which includes a bootp server. If you did not get this software from ftp.isc.org, please get the latest from ftp.isc.org and install that before requesting help. If you did get this software from ftp.isc.org and have not yet read the README, please read it bef

Comes right after this

| Jun 5 11:20:28 | check_reload_status: Reloading filter |
| Jun 5 11:20:28 | check_reload_status: Restarting OpenVPN tunnels/interfaces |
| Jun 5 11:20:28 | check_reload_status: Restarting ipsec tunnels |
| Jun 5 11:20:28 | check_reload_status: updating dyndns Yousee |
| Jun 5 11:20:28 | check_reload_status: Reloading filter |
| Jun 5 11:20:28 | check_reload_status: Restarting OpenVPN tunnels/interfaces |
| Jun 5 11:20:28 | check_reload_status: Restarting ipsec tunnels |
| Jun 5 11:20:28 | check_reload_status: updating dyndns Yousee |
| Jun 5 11:20:26 | check_reload_status: Syncing firewall |
| Jun 5 11:20:26 | check_reload_status: Syncing firewall |
| Jun 5 11:20:25 | check_reload_status: Reloading filter |
| Jun 5 11:20:25 | php-fpm[9696]: /rc.start_packages: Reloading Squid for configuration sync |
| Jun 5 11:20:25 | php-fpm[9696]: /rc.start_packages: Restarting/Starting all packages. |
| Jun 5 11:20:23 | lighttpd[33558]: (connections.c.1692) SSL (error): 5 -1 1 Operation not permitted |
| Jun 5 11:20:23 | lighttpd[33558]: (connections.c.619) connection closed: write failed on fd 20 |
| Jun 5 11:20:23 | lighttpd[33558]: (network_openssl.c.118) SSL: 5 -1 1 Operation not permitted |
| Jun 5 11:20:23 | check_reload_status: Reloading filter |
| Jun 5 11:20:23 | check_reload_status: Starting packages |
| Jun 5 11:20:23 | php-fpm[85395]: /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 80.197.148.74 -> 80.197.148.74 - Restarting packages. |
| Jun 5 11:20:23 | php-fpm[60486]: /interfaces.php: Removing static route for monitor 81.19.224.67 and adding a new route through 80.197.148.1 |
| Jun 5 11:20:21 | php-fpm[85395]: /rc.newwanip: Resyncing OpenVPN instances for interface WAN. |
| Jun 5 11:20:21 | check_reload_status: updating dyndns wan |

firewalluser

Did you set an IP address in ESXi for that nic?

Supermule

No it gets that from DHCP on WAN from my ISP. Hypervisor mgmt network is on the LAN.

DHCP is running on LAN issuing IP's to a Linksys wireless AP in bridge mode.

I do not run HP Jetadmin software on LAN.

log1.PNG_thumb

log2.PNG_thumb

Supermule

Disabled Apinger…

All I get in the logs is this then...

traffic_drop.PNG_thumb

log3.PNG_thumb

firewalluser

This might be an approach for getting Dtrace working on pfsense.

https://forum.pfsense.org/index.php?topic=94838.msg527131#msg527131

Different package, but might work, bit like the only way to get Rpi's working virtually is on Ubuntu with QEMU installed to emulate the ARM's, at which point it becomes alot quicker by virtual of stonking fast hw setting up freebsd and pfsense on RPi's, but I digress.