Intermittent dropping of random connections under high load
Issue: Intermittent dropping of random connections under high load
I have a dual FW (primary/secondary), single WAN, multiLAN (4 VLANs) configuration which seems to drop connections when under high load.
I say 'seems' and 'random' because, at the moment, I do not have a reliable way of monitoring it (or even proving it is happening).
We have seen the issue via the following symptoms:
- Dropped ssh connections from one VLAN to another (common at high load) - linux-linux
- Failed Java app -> MS SQL connections (different subnets/VLANs) resulting in a 15 minute timeout and either an error along the lines of "Already closed." or "Connection timed out"
- Dropped RDP connectiions from the WLAN side into a host on one of the VLANs
- Dropped ssh connections from the WLAN side into a host on one of the VLANs (less common even at high load (different settings maybe)) - windows-linux
The firewall is the common factor (along with 2 HP switches).
I currently suspect that one or more settings on the firewall will resolve the issue.
Version 2.2.3-RELEASE (amd64)
built on Tue Jun 23 16:37:42 CDT 2015
MBuf 131k limit, never exceeds 40k
State table 398000 limit, never exceeds 300k
Memory (4GB) - rarely above 10%
No apparent disk space, swap or CPU issues.
System: Advanced: Firewall and NAT
[ticked] Clear invalid DF bits instead of dropping the packets
[normal] Firewall Optimization Options
[ticked] Disables the PF scrubbing option which can sometimes interfere with NFS and PPTP traffic.
System: Advanced: System Tunables
net.inet.ip.portrange.first Set the ephemeral port range to be lower. default (1024)
net.inet.tcp.blackhole Drop packets to closed TCP ports without returning a RST default (2)
net.inet.udp.blackhole Do not send ICMP port unreachable messages for closed UDP ports default (1)
net.inet.ip.random_id Randomize the ID field in IP packets (default is 0: sequential IP IDs) default (1)
net.inet.tcp.drop_synfin Drop SYN-FIN packets (breaks RFC1379, but nobody uses it anyway) default (1)
net.inet.ip.redirect Enable sending IPv4 redirects default (1)
net.inet6.ip6.redirect Enable sending IPv6 redirects default (1)
net.inet.tcp.syncookies Generate SYN cookies for outbound SYN-ACK packets default (1)
net.inet.tcp.recvspace Maximum incoming/outgoing TCP datagram size (receive) default (65228)
net.inet.tcp.sendspace Maximum incoming/outgoing TCP datagram size (send) default (65228)
net.inet.ip.fastforwarding IP Fastforwarding default (0)
net.inet.tcp.delayed_ack Do not delay ACK to try and piggyback it onto a data packet default (0)
net.inet.udp.maxdgram Maximum outgoing UDP datagram size default (57344)
net.link.bridge.pfil_onlyip Handling of non-IP packets which are not passed to pfil (see if_bridge(4)) default (0)
net.link.bridge.pfil_member Set to 0 to disable filtering on the incoming and outgoing member interfaces. default (1)
net.link.bridge.pfil_bridge Set to 1 to enable filtering on the bridge interface default (0)
net.link.tap.user_open Allow unprivileged access to tap(4) device nodes default (1)
kern.rndtest.verbose Verbosity of the rndtest driver (0: do not display results on console) default ()
kern.randompid Randomize PID's (see src/sys/kern/kern_fork.c: sysctl_kern_randompid()) default (347)
net.inet.ip.intr_queue_maxlen Maximum size of the IP input queue default (1000)
hw.syscons.kbd_reboot Disable CTRL+ALT+Delete reboot from keyboard. default (0)
net.inet.tcp.inflight.enable Enable TCP Inflight mode default ()
net.inet.tcp.log_debug Enable TCP extended debugging default (0)
net.inet.icmp.icmplim Set ICMP Limits default (0)
net.inet.tcp.tso TCP Offload Engine 0
hw.bce.tso_enable TCP Offload Engine - BCE 0
vfs.read_max Cluster read-ahead max block count 32
kern.ipc.maxsockbuf Maximum socket buffer size 4262144
net.inet.ip.process_options Enable IP options processing ([LS]SRR, RR, TS) 0 (0)
kern.random.sys.harvest.interrupt Harvest IRQ entropy 0 (0)
kern.random.sys.harvest.point_to_point Harvest serial net entropy 0 (0)
kern.random.sys.harvest.ethernet Harvest NIC entropy 0 (0)
net.route.netisr_maxqlen maximum routing socket dispatch queue length 1024
net.inet.udp.checksum compute udp checksum 1
net.bpf.zerocopy_enable Enable new zero-copy BPF buffer sessions 1
net.inet.icmp.reply_from_interface ICMP reply from incoming interface for non-local packets 1
vfs.forcesync Do full checks when switchint to RO mount of FS 1
net.inet6.ip6.rfc6204w3 Accept the default router list from ICMPv6 RA messages even when packet forwarding enabled. 1
net.enc.out.ipsec_bpf_mask IPsec output bpf mask 0x0001
net.enc.out.ipsec_filter_mask IPsec output firewall filter mask 0x0001
net.enc.in.ipsec_bpf_mask IPsec input bpf mask 0x0002
net.enc.in.ipsec_filter_mask IPsec input firewall filter mask 0x0002
net.inet.carp.senderr_demotion_factor Send error demotion factor adjustment 0 (0)
net.pfsync.carp_demotion_factor pfsync's CARP demotion factor adjustment 0 (0)
Primarily I am looking to fix the issue, but I would also like to know how to monitor the issue (even if to prove it no longer occurs).
Any help would be gratefully received.
What's in your system log?
Anything like this?
Dec 14 21:00:36 kernel [zone: pf states] PF states limit reached
We werent logging the system log (we are now - but the issue hasnt occurred again as the load hasnt been high enough yet), but on looking at the graphs it never exceeds 75% of max.
I have increased some defaults as they seem like common sense (the blackhole change is to allow the Java/SQL to fail quicker):
Firewall Maximum States 1,000,000 (was 398,000)
net.inet.tcp.blackhole Drop packets to closed TCP ports without returning a RST 1 (was 2)
kern.ipc.nmbclusters 262,144 (was 131,072)
kern.maxfiles 1,000,000 (was 127,587)
kern.maxfilesperproc 500,000 (was 114,822)
kern.ipc.soacceptqueue 1,024 (was 128)
Any other ideas please?