Pfsense clears up states. help needed



  • Hello. I'm facing a very strange issue, any help is kindly appreciated.

    There are two pfsense routers (version 2.3.2-RELEASE-p1, but I've faced this issue 1st time on 2.2.5/2.2.6) in HA mode. Sometimes one of the routers starts to drop traffic by resetting firewall states. Most times it happened on MASTER node, while some days ago it's happened on the BACKUP node as well.

    It looks like this: I SSH into the BACKUP node and the connection getting stalled after a few (up to 10-15) seconds. tcpdump on the BACKUP node console shows incoming tcp packets to port 22, but there were no replies. I checked the states table (pfctl -ss) and found that it didn't have my ssh connection entry anymore.

    After some researching I've found that the issue has gone after I switched the CARP interface off (cleared the "Enable" checkbox in the web interface). Restoring the
    interface back re-started the issue. Temporal disabling CARP (on the BACKUP node) instead of switching the interface off doesn't help.
    It also helps if I disable firewalling on the BACKUP node by "pfctl -d" command. After this the states are still getting cleared on the MASTER node but remain on the BACKUP.

    At this moment I'm trying to discover what part of the system clears up the states on the MASTER node. I tried to stop service daemons, checked Schedule States (it's ticked) and Killing States (it's cleared) checkboxes, ran "pfctl -xl" (nothing appears in logs during the states clearing), reconfigured HA from scratch using LAGG instead of plain Ethernet for SYNC interface - nothing helps. From time to time the issue disappears, but then comes back again.

    I thought to try dtrace but it appears it's unavailable:

    dtrace: failed to initialize dtrace: DTrace device not available on system

    Here is the list of running processes on the MASTER node:

    USER    PID  %CPU %MEM  VSZ  RSS TT  STAT STARTED        TIME COMMAND
    root      11 190.0  0.0    0    16  -  RL  12Oct16 35124:28.52 [idle]
    root      12  9.2  0.0    0  176  -  WL  12Oct16  905:49.41 [intr]
    root  34323  1.0  0.5 12636  5388  -  Ss  Tue08PM    3:29.85 /usr/local/sbin/openvpn –config /var/etc/openvpn/server2.conf
    root  87968  0.4  3.1 85624 31468  -  S    4:47PM    0:00.22 php-fpm: pool nginx (php-fpm)
    root    2886  0.2  0.5 12104  4764  -  Ss  Wed05PM    46:43.68 /usr/local/sbin/miniupnpd -f /var/etc/miniupnpd.conf -P /var/run/miniupnpd.pid
    root      15  0.1  0.0    0    8  -  DL  12Oct16    62:55.86 [rand_harvestq]
    root      0  0.0  0.0    0    88  -  DLs  12Oct16    0:03.24 [kernel]
    root      1  0.0  0.1  9060  688  -  ILs  12Oct16    0:00.08 /sbin/init –
    root      2  0.0  0.0    0    8  -  DL  12Oct16    0:00.00 [crypto]
    root      3  0.0  0.0    0    8  -  DL  12Oct16    0:00.00 [crypto returns]
    root      4  0.0  0.0    0    16  -  DL  12Oct16    1:22.12 [cam]
    root      5  0.0  0.0    0    8  -  DL  12Oct16    0:05.17 [fdc0]
    root      6  0.0  0.0    0    8  -  DL  12Oct16    12:37.83 [pf purge]
    root      7  0.0  0.0    0    8  -  DL  12Oct16    0:00.00 [sctp_iterator]
    root      8  0.0  0.0    0    16  -  DL  12Oct16    0:16.04 [pagedaemon]
    root      9  0.0  0.0    0    8  -  DL  12Oct16    0:00.00 [vmdaemon]
    root      10  0.0  0.0    0    8  -  DL  12Oct16    0:00.00 [audit]
    root      13  0.0  0.0    0    16  -  DL  12Oct16    0:00.00 [ng_queue]
    root      14  0.0  0.0    0    24  -  DL  12Oct16    0:00.14 [geom]
    root      16  0.0  0.0    0  200  -  DL  12Oct16    0:25.31 [usb]
    root      17  0.0  0.0    0    8  -  DL  12Oct16    0:54.01 [acpi_thermal]
    root      18  0.0  0.0    0    8  -  DL  12Oct16    0:00.84 [acpi_cooling0]
    root      19  0.0  0.0    0    8  -  DL  12Oct16    0:01.56 [idlepoll]
    root      20  0.0  0.0    0    8  -  DL  12Oct16    0:00.02 [pagezero]
    root      21  0.0  0.0    0    8  -  DL  12Oct16    0:07.42 [bufdaemon]
    root      22  0.0  0.0    0    8  -  DL  12Oct16    10:48.01 [syncer]
    root      23  0.0  0.0    0    8  -  DL  12Oct16    0:05.99 [vnlru]
    root      58  0.0  0.0    0    8  -  DL  12Oct16    0:14.21 [md0]
    root    614  0.0  2.5 81528 25456  -  Ss  12Oct16    1:56.85 php-fpm: master process (/usr/local/lib/php-fpm.conf) (php-fpm)
    root    658  0.0  0.4  9404  4304  -  Is  12Oct16    0:00.56 /sbin/devd -q
    root    1001  0.0  0.2 10108  1892  -  Ss  Tue07PM    0:00.93 /usr/sbin/cron -s
    root    1717  0.0  0.2 10172  1888  -  Is  Wed05PM    0:00.00 /usr/local/sbin/upsmon
    uucp    1982  0.0  0.2 10172  1904  -  S    Wed05PM    0:18.83 /usr/local/sbin/upsmon
    root    7146  0.0  0.7 17644  7136  -  Ss  12:34PM    0:01.56 sshd: root@pts/0 (sshd)
    root  11992  0.0  0.7 15032  6800  -  Is  12Oct16    0:00.02 /usr/sbin/sshd
    root  12058  0.0  0.2 14328  1836  -  Is  12Oct16    0:00.02 /usr/local/sbin/sshlockout_pf 15
    root  12702  0.0  0.2 10148  1896  -  Ss  13Oct16    2:32.17 /usr/sbin/syslogd -s -c -c -l /var/dhcpd/var/run/log -P /var/run/syslog.pid -f /var/etc/syslog.conf
    root  14566  0.0  0.6 12636  5648  -  Ss  Tue08PM    0:30.55 /usr/local/sbin/openvpn –config /var/etc/openvpn/server1.conf
    root  21958  0.0  0.2 10236  1988  -  Ss  12Oct16    1:05.44 /usr/local/sbin/filterlog -i pflog0 -p /var/run/filterlog.pid
    root  24578  0.0  0.2 10424  2072  -  Is  12Oct16    0:00.44 /usr/local/sbin/xinetd -syslog daemon -f /var/etc/xinetd.conf -pidfile /var/run/xinetd.pid
    root  34879  0.0  0.2 10232  1784  -  Is  13Oct16    0:00.02 /usr/local/sbin/sshlockout_pf 15
    root  39239  0.0  0.5 23924  5380  -  Is  12Oct16    0:00.00 nginx: master process /usr/local/sbin/nginx -c /var/etc/nginx-webConfigurator.conf (nginx)
    root  39358  0.0  0.6 23924  6316  -  S    12Oct16    2:52.30 nginx: worker process (nginx)
    root  39634  0.0  0.6 23924  6292  -  S    12Oct16    2:49.24 nginx: worker process (nginx)
    root  40986  0.0  0.2  9948  1572  -  Is  12Oct16    0:00.00 /usr/local/bin/minicron 240 /var/run/ping_hosts.pid /usr/local/bin/ping_hosts.sh
    root  41233  0.0  0.2  9948  1584  -  I    12Oct16    0:00.52 minicron: helper /usr/local/bin/ping_hosts.sh  (minicron)
    root  41512  0.0  0.2  9948  1572  -  Is  12Oct16    0:00.00 /usr/local/bin/minicron 3600 /var/run/expire_accounts.pid /usr/local/sbin/fcgicli -f /etc/rc.expireaccounts
    root  41775  0.0  0.2  9948  1584  -  I    12Oct16    0:00.04 minicron: helper /usr/local/sbin/fcgicli -f /etc/rc.expireaccounts  (minicron)
    root  41902  0.0  0.2  9948  1572  -  Is  12Oct16    0:00.00 /usr/local/bin/minicron 86400 /var/run/update_alias_url_data.pid /usr/local/sbin/fcgicli -f /etc/rc.update_alias_url_data
    root  42380  0.0  0.2  9948  1584  -  I    12Oct16    0:00.00 minicron: helper /usr/local/sbin/fcgicli -f /etc/rc.update_alias_url_data  (minicron)
    dhcpd  42782  0.0  1.2 20440 11944  -  Ss  13Oct16    4:39.87 /usr/local/sbin/dhcpd -user dhcpd -group _dhcp -chroot /var/dhcpd -cf /etc/dhcpd.conf -pf /var/run/dhcpd.pid re0_vlan1
    nobody 44187  0.0  0.4 11340  3948  -  S    Thu01PM    4:42.85 /usr/local/sbin/dnsmasq --all-servers -C /dev/null --rebind-localhost-ok --stop-dns-rebind --listen-address=192.168.210.65 --listen-address=192.168.210.1 --listen-address=172.26.1.1 --listen-address=172.27.1.1 --listen-address=127.0.0.1 --bind-interfaces --edns-packet-max=4096 --rebind-domain-ok=/mydom.com/ --dns-forward-max=5000 --cache-size=10000 --local-ttl=1
    root  46326  0.0  1.0 16800 10208  -  Ss  Thu01PM    0:20.83 /usr/sbin/bsnmpd -c /var/etc/snmpd.conf -p /var/run/snmpd.pid
    root  48965  0.0  0.2 10460  2136  -  IN  Thu01PM    3:59.43 /bin/sh /var/db/rrd/updaterrd.sh
    root  52722  0.0  0.6 28292  6400  -  S<s  12oct16 =""  ="" 20:10.13="" usr="" local="" bin="" ipcad="" -rds<br="">root  55485  0.0  0.3 11484  2948  -  Is  12Oct16    0:00.37 /usr/local/libexec/ipsec/starter --daemon charon
    root  55694  0.0  1.4 49008 13984  -  Is  12Oct16    2:53.88 /usr/local/libexec/ipsec/charon --use-syslog
    root  56771  0.0  0.2 10232  1784  -  Is  12Oct16    0:00.02 /usr/local/sbin/sshlockout_pf 15
    root  65258  0.0  0.2  5856  1536  -  IN    4:47PM    0:00.00 sleep 60
    root  85053  0.0  1.7 17108 17144  -  Ss  14Oct16    1:09.52 /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf -p /var/run/ntpd.pid
    root  91845  0.0  0.2 10388  2060  -  INs  Thu02PM    0:00.02 /usr/local/sbin/check_reload_status
    root  92033  0.0  0.2 10388  1988  -  IN  Thu02PM    0:00.00 check_reload_status: Monitoring daemon of check_reload_status
    root  95968  0.0  0.2 10632  1856  -  Ss  Fri05PM    0:59.31 /usr/local/bin/dpinger -S -r 0 -i TransparentProxy -B 172.26.1.242 -p /var/run/dpinger_TransparentProxy~172.26.1.242~172.26.1.50.pid -u /var/run/dpinger_TransparentProxy~172.26.1.242~172.26.1.50.sock -C /etc/rc.gateway_alarm -d 1 -s 500 -l 2000 -t 60000 -A 1000 -D 500 -L 20 172.26.1.50
    root  96214  0.0  0.2 18824  1992  -  Ss  Fri05PM    0:44.34 /usr/local/bin/dpinger -S -r 0 -i Mat -B 10.11.225.171 -p /var/run/dpinger_Mat~10.11.225.171~10.11.225.169.pid -u /var/run/dpinger_Mat~10.11.225.171~10.11.225.169.sock -C /etc/rc.gateway_alarm -d 1 -s 1000 -l 2000 -t 5000 -A 1000 -D 500 -L 20 80.70.225.169
    root  96600  0.0  0.2 14728  1928  -  Ss  Fri05PM    1:17.26 /usr/local/bin/dpinger -S -r 0 -i Inter -B 12.13.143.12 -p /var/run/dpinger_Inter~12.13.143.12~12.13.143.9.pid -u /var/run/dpinger_Inter~12.13.143.12~12.13.143.9.sock -C /etc/rc.gateway_alarm -d 1 -s 500 -l 2000 -t 60000 -A 1000 -D 500 -L 40 5.17.143.9
    root  56668  0.0  0.2 10060  1676 v0  Is+  12Oct16    0:00.00 /usr/libexec/getty Pc ttyv0
    root    7449  0.0  0.2 10460  2192  0  Is  12:34PM    0:00.01 -sh (sh)
    root    7479  0.0  0.2 10460  2088  0  I    12:34PM    0:00.00 /bin/sh /etc/rc.initial
    root    8469  0.0  0.3 10820  2960  0  S    12:34PM    0:00.17 /bin/tcsh
    root  99127  0.0  0.2 10204  1888  0  R+    4:48PM    0:00.00 ps axuwww</s >

    Crontab:

    1,31    0-5    *      *      *      root    /usr/bin/nice -n20 adjkerntz -a
    1      3      1      *      *      root    /usr/bin/nice -n20 /etc/rc.update_bogons.sh
    */60    *      *      *      *      root    /usr/bin/nice -n20 /usr/local/sbin/expiretable -v -t 3600 sshlockout
    1      1      *      *      *      root    /usr/bin/nice -n20 /etc/rc.dyndns.update
    */60    *      *      *      *      root    /usr/bin/nice -n20 /usr/local/sbin/expiretable -v -t 3600 virusprot
    30      12      *      *      *      root    /usr/bin/nice -n20 /etc/rc.update_urltables
    */60    *      *      *      *      root    /usr/bin/nice -n20 /usr/local/sbin/expiretable -v -t 3600 webConfiguratorlockout

    What else to look at? Please, help!



  • I think by default PFSense clears states when the upstream gateway goes down. If you master thinks the gateway as gone down, even if it's because its link went down, it make nuke all of the states, which may then propagate to the fail-over?

    Entirely guessing, I've never used HA nor read on how to use it.



  • Harvy66, thanks for your reply. I didn't mention it, but my gateways are pretty stable, so it's definitely not the case. Also in the example I provided states was cleared for SSH connection which was made from local LAN to the BACKUP node only. No other states were affected.