Multiple issues, firewall freezes and whole network goes down.
-
@stephenw10 to make it clear, the firewall just freezes itself, even directly connecting to the console, no inputs are registered by the firewall through console. Until reboot, it is just at stuck at something.
-
Hmm, so all 4 of those ports are on-board.
Does it not respond even to
ctl+t
? -
@stephenw10 no, it does not respond to anything. I did not try ctrl + t but ctrl + c, ctrl + alt + del, enter, space, backspace, nothing works
-
@Laxarus
we have the same hardware but not the 25 gbps card.Please check over the IPMI interface for some PCIe, ... errors, we had a faulty broadcom card some months ago.
-
Sometimes ctl+t is the only thing that will produce a response.
-
@stephenw10 will try ctrl + c, if the same thing happens again (hoping not), I will try to troubleshoot with WAN when I go back (right now I only have remote access).
There is only one constant in all the situations, when WAN goes down, there is a big chance of firewall crashing or freezing.
And the two bugs that you have stated is contributing to this somehow when WAN goes down. Hopefully, the next release of pfsense will take care of these bugs.
Thanks for bearing with me until now and I really appreciate it.@slu thanks for the suggestion. I have checked the maintenance and health logs on the IPMI but there is nothing noteworthy there. It all seems normal.
-
So, I had the same issue again this morning and I still have no idea why this is happening. @stephenw10 I have tried ctrl + t and no response to that neither.
Any advise to debugging this is very much appreciated.
Full log here, the freeze happened around Sep 16 07:00
system.log.0 -
You need to tune the OVPN_S2S_VPNV4 gateway. It's throwing alarms repeatedly. It's clearly a pretty bad route because the alarms are legitimate for a default settings . However reloading the firewall each tie it fires is not helping anything. You might just disable the monitoring or monitoring action on that gateway.
But that shouldn't cause it to stop responding. The actual failure appears to happen here:
Sep 16 07:18:45 FIREWALL rc.gateway_alarm[63113]: >>> Gateway alarm: VPNAC_WG (Addr:10.11.0.1 Alarm:1 RTT:91.226ms RTTsd:79.944ms Loss:21%) Sep 16 07:18:45 FIREWALL check_reload_status[635]: updating dyndns VPNAC_WG Sep 16 07:18:45 FIREWALL check_reload_status[635]: Restarting IPsec tunnels Sep 16 07:18:45 FIREWALL check_reload_status[635]: Restarting OpenVPN tunnels/interfaces Sep 16 07:18:45 FIREWALL check_reload_status[635]: Reloading filter Sep 16 07:18:45 FIREWALL rc.gateway_alarm[65772]: >>> Gateway alarm: WAN_PPPOE (Addr:10.98.238.224 Alarm:1 RTT:5.947ms RTTsd:11.776ms Loss:21%) Sep 16 07:18:45 FIREWALL check_reload_status[635]: updating dyndns WAN_PPPOE Sep 16 07:18:45 FIREWALL check_reload_status[635]: Restarting IPsec tunnels Sep 16 07:18:45 FIREWALL check_reload_status[635]: Restarting OpenVPN tunnels/interfaces Sep 16 07:18:45 FIREWALL check_reload_status[635]: Reloading filter Sep 16 07:18:46 FIREWALL php-fpm[20435]: /rc.openvpn: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.*PROTO.*>'' returned exit code '1', the output was '' Sep 16 07:18:46 FIREWALL php-fpm[20435]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use VPNAC_WG. Sep 16 07:18:46 FIREWALL php-fpm[20435]: /rc.openvpn: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.*PROTO.*>'' returned exit code '1', the output was '' Sep 16 07:18:46 FIREWALL php-fpm[20435]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use WAN_PPPOE. Sep 16 07:18:46 FIREWALL php-fpm[51827]: /rc.dyndns.update: phpDynDNS (@.mydomain.org): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry. Sep 16 07:18:50 FIREWALL ppp[53627]: [wan_link0] LCP: no reply to 1 echo request(s) Sep 16 07:19:00 FIREWALL ppp[53627]: [wan_link0] LCP: no reply to 2 echo request(s) Sep 16 07:19:05 FIREWALL rc.gateway_alarm[23895]: >>> Gateway alarm: MNG_DHCP (Addr:192.168.2.1 Alarm:1 RTT:4.611ms RTTsd:15.937ms Loss:22%)
Where all gateways start to indicate failures and the pppoe goes down. Effectively no traffic is passing from that point.
But there are no lower level errors, the NICs do not show loss of link for example.
The firewall is still logging and running scripts it doesn't appear to be down. At least until the end of that log.
When did you try to connect? How did you connect?
-
@stephenw10 I have further tweaked the ovpn gateway.
Sep 16 07:18:46 FIREWALL php-fpm[20435]: /rc.openvpn: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.*PROTO.*>'' returned exit code '1', the output was '' Sep 16 07:18:46 FIREWALL php-fpm[20435]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use VPNAC_WG. Sep 16 07:18:46 FIREWALL php-fpm[20435]: /rc.openvpn: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.*PROTO.*>'' returned exit code '1', the output was '' Sep 16 07:18:46 FIREWALL php-fpm[20435]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use WAN_PPPOE.
here, why is firewall trying to get routing for ipv6, the tunnel is ipv4 only.
I had to power reset just to get everything back around 10:46.
64a4a208-8a92-48ff-8f9e-736090de796f-system3.zip -
Just to be clear though you are unable to see any response from the firewall even on the local physical firewall?
It's extremely unusual to see it still logging and running scripts at that time but unresponsive at the console. Like I don't think I've ever seen that.
-
@stephenw10 said in Multiple issues, firewall freezes and whole network goes down.:
Like I don't think I've ever seen that
the attached capture for unresponsive console
9979fb96-2531-417e-a50a-dd6d321f8e90-pfsense freeze.zip -
Ok that looks like the IPMI console? And I assume that usually works as expected?
Is it configured for video as the primary console?
Are you able to test using a physical console?
-
@stephenw10 Yep, normally, it works without a problem. I am not sure how it is configured since there are no options to change the behavior.
No, I cannot access the physical console since it is at remote site.
-
I mean if you check System > Advanced > Admin Access is the primary console set as video? Is serial even enabled?
-
@stephenw10 it is set as video console
-
Hmm, very odd then.
I assume other parts of the IPMI is working at that time? It doesn't fail entirely?
-
@stephenw10 I had no issue with IPMI at all. Maybe clean install pfsense when I get back to the remote site again?
-
Worth trying.
Any chance it's a network flood of some sort? Do you have any logging from attached switches?
I could just about imagine that presenting like that given sufficient packet numbers.
-
@stephenw10 I don't believe so. So far, I can only say that something is happening when pppoe goes down.
I can try to disable bridge mode on the modem and DMZ the firewall which will enable me to take the pppoe out of the equation. However, my WAN IP will not be correct in that case. -
It looks more like the PPPoE fails because igb0 stops passing traffic to me. Mostly because at least one other NIC also stops passing traffic at that time. They are both igb NICs.
A good test might be setup some other NIC type and test that at the time. Even better if it's a NIC on a different PCIe bus.