HA firewall crashed and flooding network
-
Hi,
We have two 8200 firewalls in HA, today one of the units crashed and it started flooding my network and other firewalls in the same broadcast domain, is it possible to collect the crash log? if so where is the file located? and how do I go about sharing it with developers?
-
If it actually panicked you would see a crash report shown on the dashboard when you login after rebooting it.
Do you know what was being flooded? Did it somehow create a loop?
-
@stephenw10
Hi there, i've checked and there is no crash log after I login, however during the fault I could get to the GUI and it did mention a crash log but overall clicking on anything did nothing.I did get a packet capture from the active unit,
packetcapture-igc0-20250203115808.pcapIn terms of topology its two 8200's both with dual internet feeds however because of how the ISP designed the solution both feeds sit into the same broadcast domain, there was so much traffic that some of our smaller pfsense/ APU firewalls fell over and were showing packet loss and high CPU, until the secondary unit was reboot then all calmed down.
-
So two public IP ranges both in the same subnet for the two WANs?
What igc0 there? Looks to be an internal interface. If you check the source MAC of the packets in that flood there are 2, are those the igc0 MACs from each node?
-
@stephenw10
Yes, we have a /24 split into two /25's with each subnet living on a different router, with each router running HSRP to achieve active/activeigc0 is one of the two WAN interfaces, the source mac address in the capture is the mac from the secondary firewals'l (wan interface)
igc0 - (cc-feed)
igc1 - (ch-feed)ending (e3) = this is the mac for the ch-feed interface on the secondary firewall
ending (46) =this is the mac for the ch-feed interface on the primary firewallHowever the packet capture was from igc0, where as the above are on igc1
-
Hmm, so they are no setup in the expected pfSense HA config with CARP and pfsync? Or is that HSRP in the upstream routers?
If I'm understanding that correctly the looped broadcast packet we see on igb0 is being sent from igc1? But since those are in the same broadcast domain that would also have appeared on igc1?
However since it was originally sent from a private IP address it should never have been there. I assume that IP is from an internal subnet?
-
@stephenw10 carp with pfsync for the HA and hsrp for the upstream cisco routers,
And yes in terms of the loop and interfaces,
And surprisingly no, we don't use that address space anywhere not on any interfaces or NAT rules,
-
Huh, so no clue where it came from?
Do you know if it also appeared on igc1?
I assume pfSense itself sees a /25 subnet on each of those interfaces? But upstream something is set to use /24 for both?
Do you have block private networks set on either WAN? If so I'd expect it to be blocked.
But even if not broadcast traffic should not be forwarded like that.
-
@stephenw10 no idea what so ever, the pfsense firewalls and next hop routers only know about the/23, at some point our isp BT split the subnet however I'm guessing it's done via BGP.
And yes I have the block rfc and bogon for both interfaces on bother firewalls, during a fault condition there was a spike on both want interfaces, and another apu firewall fell over as a result of the traffic,
Is it possible to collect the crash log after the reboot, I'm hoping it's stored somewhere
-
Where was that APU connected?
If you check the monitoring graphs during the flood I assume you see a very large packet rate increase. On which interfaces?
Do you also see a large number of states increase?
-
@stephenw10 it was on the same physical switch as one of the 8200s shutting down the wan seems to fix the issue on the apu, I don't have a was of checking the states during the fault as I wanted to get everything up and running, I have now applied an acl on the switch to block rfc to rfc traffic as a precaution specifically on the wan interfaces
-
OK so the APU was also on the WAN side of the HA pair?
You should still have the monitoring graph data (RRD) from the time that would show a spike in firewall states.