[RESOLVED]: Disconnects seemingly when under load
The IPSEC site-site VPN I am running between a two sites is going down, not every day but on the days it does go down what is consistent is that it goes down when it's under load when backup copy jobs are being pushed to the remote site. When the IPSEC link goes down, the WAN is staying up.
Site info, IPSEC settings and syslogs below;
Site 1 is on a 100MB leased line behind the pfSense box in question.
Site 2 is on a 1GB leased line behind a SonicWall NSA 3500.
Site 2 is also connected to another remote site via IPSEC which does not have this issue. All sites & connections are ours, nothing is shared.
The pfSense box at Site 1 has;
• 2x 1GB Intel 82574L NICs, one as WAN, the other LAN under which VLAN interfaces have been created and routing is setup between some of the interfaces.
• Dual Core Intel Pentium G2030 @ 3.00GHz
• 4GB RAM
• 300GB HDD
I have been running multi-ping (https://www.multiping.com/) from a laptop from the production VLAN at Site 1. The ping reply times go up around the time the backups or any large files are being pushed across from site 2. When the IPSEC link fails it is when a backup copy job is running and after ~30mins to 1hr. I notice that the ping reply times from 2 devices on another VLAN rise slightly around the same time that the backup copy jobs / inter-site file transfer is going on. The ping times to google drop / return to normal when the IPSEC link fails.
MultiPing screenshot during an outage;
(again ran from a machine on the production VLAN at Site 1)
The 10.0.8.* IPs are devices at Site 2.
10.22.0.* Are devices on the backup VLAN at Site 1
10.21.0.* Are all devices on the production VLAN at Site 1.
IPSEC config / Info
IPSEC settings for Site 1 / pfSense;
IPSEC settings for Site 2 / SonicWall;
System Activity during transfer;
The date and time on all are the same.
Both <1MB, IP addresses have been swapped with SITE1WANIP and SITE2WANIP respectively.
Site 1, IPSEC filtered Syslog - https://www.dropbox.com/s/10awipt5afxxkxl/ipsecSyslog.txt?dl=0
I filtered the log by process name of ‘charon’, happy to change this to support your input.
Site 1, Full Syslog - https://www.dropbox.com/s/lwcgx1j4ezopj7n/pfsenseFullSyslog.txt?dl=0
What should be my next ports of call to pin this down?
Is it one of the NICs doing too much – how can I confirm this? What can I do to effectively resolve?
If you require any extra info please let me know.
FYI this appears to have been resolved as per below and does not appear to have been load related;
The Sonicwall Syslog was more revealing - "IKEv2 IPsec proposal does not match: DH Group mismatch" & "VPN Policy: GHtoBH; ESP TFC Padding not Supported". I'm not sure why but checking "Enable Perfect Forward Secrecy" appears to have fixed it, although
"ESP TFC Padding not Supported" still appears in the logs. Article here https://www.sonicwall.com/en-us/support/knowledge-base/170505666326684.
If anyone can offer an explanation that would be appreciated. I'm not 100% convinced is resolved as I feel it may just have renegotiated and worked - However this is a finger in the air feeling and does not come from any solid fact!