Pfsense becomes unresponsive, forcing a hard boot

  • The problem machine:
    I have 2 pfsense firewalls (FW1a and FW1b we'll say).  They each have 1 WAN and 3 LAN interfaces and have CARP vips across the WAN and 2 of the LAN interfaces.  I'm using DHCP relay from LAN1 to a set of servers on LAN2.  There's no NAT or VPN involved, and the only packages I have installed are vmtools, OSPF, and zabbix2-agent.

    The problem is FW1a will just hang every few days, all interfaces become unresponsive and the console is frozen.  vCenter says the tools are not running.  CARP works fine and everything switches over to FW1b.  I have to force a reboot to get FW1a up and running again.  When it boots everything goes back to normal, there's no crash log, and when I look at the system log there's nothing suspicious.  In fact the last time i saw the issue, there was nothing in the "General" system log for 1.5 hours before the FW went offline.

    The log entries that are closest to the time of the latest failure are in the DHCP log (within 2 minutes):
    Sep 3 05:13:33 dhclient: Starting delete_old_states()
    Sep 3 05:13:33 dhclient: FAIL

    I see entries like that repeatedly, so I'm guessing that's not the issue.  I've also noticed that the DHCP relay seems to relay requests over the WAN even though I don't want it to.  Can't seem to find why that's happening, but that's not my biggest problem right now.

    I'm looking for some help to tell me what I should be looking at because I can't find anything pointing to an issue.

    More background:
    This issue has occurred on versions 2.1.2 through 2.1.5.  Everything is x64 running in a VMware ESX 5.5 cluster, all interfaces are E1000.  In an effort to rule out everything, I built a brand new VM and reloaded 2.1.5 and the issue still occurs.  The problem occurred before I installed the zabbix agents, I installed those to monitor the system to find out when it was happening.

    I had one instance where i lost FW1a and didn't know it, a day later FW1b did the same thing and I lost all connectivity.  I haven't experienced that again because FW1a has never been down for more than a few hours since I have monitoring installed.

    I have a total of 7 pfsense firewalls, I run those 3 packages on all of them without issue.  Other FWs are doing other things and have other packages installed.

  • @macralf, did you ever find out what caused this? I am facing a similar problem.

Log in to reply