Multiple issues, firewall freezes and whole network goes down.
-
I doubt it's a pppoe issue. If it was it would either fail to connect entirely or disconnect consistently. This seems like something happens to the upstream device. Somehow it then fails. What's totally unclear though is why the firewall stops responding completely. The logs show it just keeps trying to connect. It makes me wonder if it's actually an IPMI issue somehow.
One thing I would do here is chnage the 'Modem' interface to a static IP with no gateway if you can. Assuming you are using that only to access the modem? When it's configured as dhcp pfSense treats it as a WAN and runs all the link scrips when igb1 bounces.
-
@stephenw10 hmm, I will try that during the weekend. No time to test it right now. Thanks for the suggestion. However, by logic, does it matter whether it is dhcp or static?
According to this recipe, it does recommend "static".
https://docs.netgate.com/pfsense/en/latest/recipes/modem-access.html
Now that I think about it:
My WAN IP is 88.....
and my WAN_PPPOE is 10......
For some reason, they are different and I am not behind CGNAT.
Does enabling "Use non-local gateway" on one of these gateways will make a difference? I am not exactly sure what this option does and which gateway it should be enabled? -
Yes, use static there if you can. When you set it as dhcp the server passes it a gateway to use and pfSense sets that on the interface turning it into a WAN. Then it triggers all the WAN IP scripts when it bounces.
-
@stephenw10 Okay, I set it up as static now. Let's see if it will survive the scheduled restart of Modem on Monday.
Now that you have reminded me, I have another interface (network) which is not WAN but with a DHCP gateway. It is connected to a switch (This switch connects all IPMI devices together in one network with elevated privilidges). The switch acts as a DHCP server. By this logic, should I also set this up as static?The MNG interface below.
-
You could also change that but it won't be nearly as impactful because it's on a different NIC. It shouldn't lose link at the same time.
-
@stephenw10 Survived the Monday but will monitor it for changes.
Just curious is there a way to mark the DHCP gateway as local only instead of WAN or this default behavir cannot be changed?
-
The only way you can do that is to add a gateway separately to an interface config like you might with an internal router for example. But you can't do that for a dynamic interface type like DHCP. The server passes a gateway to the client to use and it is always added to the gateway.
You could maybe override the gateway that is passed in the advanced dhcp options. I'm not sure I've ever tried that. -
@stephenw10 so far nothing crashed yet with static IP.
I am guessing this is a very rare problem since you have to have pppoe DSL (most uses fiber these days) and configured the network to be able to access modem behind WAN via DHCP.
I will continue monitoring.There is no option that suggests to mark the gateway as internal network only. Maybe checking the boxes for "Disable Gateway Monitoring", "Disable Gateway Monitoring Action", and "Do not add static route for gateway monitor IP address via the chosen interface" might achieve a similar result but I dont want to try these for now.
Still no idea for the exact use case of "Use non-local gateway" option. -
You would have to use the DHCP advanced options field to force the dhcp client to ignore the gateway passed by the server. So adding
supersede routers
in the Option modifiers field should do it.But I would just use a static IP here.
-
@Laxarus Your traffic shaper is that required? That could be configured incorrectly.
https://forum.netgate.com/topic/171842/queue-management-algorithms-differences
-
@JonathanLee said in Multiple issues, firewall freezes and whole network goes down.:
https://forum.netgate.com/topic/171842/queue-management-algorithms-differences
My main interface is on 2x25G LAGG and LAGG is not supported with traffic shaper so other than bufferbloat nothing is set there.
-
@stephenw10 After setting the static IP, I did not get this for a long while.
However, today Jan 9th at 6 AM, the modem performed its weekly scheduled reboot. The same issue again. Everything went down. It stayed down until I performed a hard power reset since the console over IPMI was unresponsive.At this point, the only thing suspicious is the problem with multicast on the logs but I don't see how it is relevant to the WAN.
an 9 06:00:44 FIREWALL check_reload_status[635]: Linkup starting igb1 Jan 9 06:00:44 FIREWALL kernel: igb1: link state changed to DOWN Jan 9 06:00:45 FIREWALL php-fpm[78698]: /rc.linkup: Hotplug event detected for MODEM(opt2) static IP address (4: 192.168.0.2) Jan 9 06:00:45 FIREWALL php-fpm[78698]: /rc.linkup: DEVD Ethernet detached event for opt2 Jan 9 06:00:46 FIREWALL dhcpleases[64223]: Could not deliver signal HUP to process 66132: No such process. Jan 9 06:00:48 FIREWALL check_reload_status[635]: Linkup starting igb1 Jan 9 06:00:48 FIREWALL kernel: igb1: link state changed to UP Jan 9 06:00:55 FIREWALL ppp[99256]: caught fatal signal TERM Jan 9 06:00:55 FIREWALL ppp[99256]: [wan] IFACE: Close event Jan 9 06:00:55 FIREWALL ppp[99256]: [wan] IPCP: Close event Jan 9 06:00:55 FIREWALL ppp[99256]: [wan] IPCP: state change Opened --> Closing Jan 9 06:00:55 FIREWALL ppp[99256]: [wan] IPCP: SendTerminateReq #4 Jan 9 06:00:55 FIREWALL ppp[99256]: [wan] IPCP: LayerDown Jan 9 06:00:56 FIREWALL php-cgi[91601]: rc.kill_states: rc.kill_states: Removing states for IP {redacted}/32 Jan 9 06:00:58 FIREWALL check_reload_status[635]: Reloading filter Jan 9 06:00:58 FIREWALL check_reload_status[635]: Reloading filter Jan 9 06:00:58 FIREWALL php-fpm[592]: /rc.linkup: Hotplug event detected for MODEM(opt2) static IP address (4: 192.168.0.2) Jan 9 06:00:58 FIREWALL php-fpm[592]: /rc.linkup: DEVD Ethernet attached event for opt2 Jan 9 06:00:58 FIREWALL php-fpm[592]: /rc.linkup: HOTPLUG: Triggering address refresh on opt2 (igb1) Jan 9 06:00:58 FIREWALL check_reload_status[635]: rc.newwanip starting igb1 Jan 9 06:00:58 FIREWALL ppp[98752]: Multi-link PPP daemon for FreeBSD Jan 9 06:00:58 FIREWALL ppp[98752]: Jan 9 06:00:58 FIREWALL ppp[98752]: process 98752 started, version 5.9 Jan 9 06:00:58 FIREWALL ppp[98752]: waiting for process 99256 to die... Jan 9 06:00:59 FIREWALL ppp[98752]: waiting for process 99256 to die... Jan 9 06:00:59 FIREWALL php-fpm[14172]: /rc.newwanip: rc.newwanip: Info: starting on igb1. Jan 9 06:00:59 FIREWALL php-fpm[14172]: /rc.newwanip: rc.newwanip: on (IP address: 192.168.0.2) (interface: MODEM[opt2]) (real interface: igb1). Jan 9 06:01:00 FIREWALL rc.gateway_alarm[11168]: >>> Gateway alarm: VPNAC_WG (Addr:10.11.0.1 Alarm:1 RTT:0ms RTTsd:0ms Loss:100%) Jan 9 06:01:00 FIREWALL check_reload_status[635]: updating dyndns VPNAC_WG Jan 9 06:01:00 FIREWALL check_reload_status[635]: Restarting IPsec tunnels Jan 9 06:01:00 FIREWALL check_reload_status[635]: Restarting OpenVPN tunnels/interfaces Jan 9 06:01:00 FIREWALL check_reload_status[635]: Reloading filter Jan 9 06:01:00 FIREWALL ppp[98752]: waiting for process 99256 to die... Jan 9 06:01:01 FIREWALL ppp[98752]: waiting for process 99256 to die... Jan 9 06:01:01 FIREWALL php-cgi[91601]: rc.kill_states: rc.kill_states: Removing states for interface pppoe0 Jan 9 06:01:01 FIREWALL check_reload_status[635]: Rewriting resolv.conf Jan 9 06:01:01 FIREWALL ppp[99256]: [wan] IFACE: Removing IPv4 address from pppoe0 failed(IGNORING for now. This should be only for PPPoE friendly!): Can't assign requested address Jan 9 06:01:01 FIREWALL ppp[99256]: [wan] IFACE: Down event Jan 9 06:01:01 FIREWALL ppp[99256]: [wan] IFACE: Rename interface pppoe0 to pppoe0 Jan 9 06:01:01 FIREWALL ppp[99256]: [wan] IFACE: Set description "WAN" Jan 9 06:01:01 FIREWALL php-fpm[41860]: /rc.openvpn: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.*PROTO.*>'' returned exit code '1', the output was '' Jan 9 06:01:01 FIREWALL php-fpm[41860]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use VPNAC_WG. Jan 9 06:01:01 FIREWALL php-fpm[592]: /rc.linkup: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.*PROTO.*>'' returned exit code '1', the output was '' Jan 9 06:01:01 FIREWALL check_reload_status[635]: Restarting IPsec tunnels Jan 9 06:01:02 FIREWALL ppp[98752]: waiting for process 99256 to die... Jan 9 06:01:02 FIREWALL ppp[99256]: [wan] IPCP: SendTerminateReq #5 Jan 9 06:01:02 FIREWALL dhcpleases[60027]: Could not deliver signal HUP to process 68165: No such process. Jan 9 06:01:03 FIREWALL ppp[98752]: waiting for process 99256 to die... Jan 9 06:01:03 FIREWALL ppp[99256]: [wan] Bundle: Shutdown Jan 9 06:01:03 FIREWALL ppp[99256]: [wan_link0] Link: Shutdown Jan 9 06:01:03 FIREWALL ppp[99256]: process 99256 terminated Jan 9 06:01:04 FIREWALL rc.gateway_alarm[80528]: >>> Gateway alarm: OVPN_S2S_VPNV4 (Addr:10.25.25.2 Alarm:1 RTT:0ms RTTsd:0ms Loss:100%) Jan 9 06:01:04 FIREWALL check_reload_status[635]: updating dyndns OVPN_S2S_VPNV4 Jan 9 06:01:04 FIREWALL check_reload_status[635]: Restarting IPsec tunnels Jan 9 06:01:04 FIREWALL check_reload_status[635]: Restarting OpenVPN tunnels/interfaces Jan 9 06:01:04 FIREWALL check_reload_status[635]: updating dyndns wan Jan 9 06:01:04 FIREWALL php-fpm[592]: /rc.linkup: The command '/sbin/ifconfig 'pppoe0' description 'WAN'' returned exit code '1', the output was 'ifconfig: interface pppoe0 does not exist' Jan 9 06:01:04 FIREWALL php-fpm[592]: /rc.linkup: The command '/sbin/ifconfig 'pppoe0' -staticarp ' returned exit code '1', the output was 'ifconfig: interface pppoe0 does not exist' Jan 9 06:01:04 FIREWALL php-fpm[592]: /rc.linkup: The command '/usr/sbin/arp -d -i 'pppoe0' -a > /dev/null 2>&1 ' returned exit code '1', the output was '' Jan 9 06:01:04 FIREWALL ppp[98752]: web: web is not running Jan 9 06:01:04 FIREWALL ppp[98752]: [wan] Bundle: Interface ng0 created Jan 9 06:01:04 FIREWALL ppp[98752]: [wan_link0] Link: OPEN event Jan 9 06:01:04 FIREWALL ppp[98752]: [wan_link0] LCP: Open event Jan 9 06:01:04 FIREWALL ppp[98752]: [wan_link0] LCP: state change Initial --> Starting Jan 9 06:01:04 FIREWALL ppp[98752]: [wan_link0] LCP: LayerStart Jan 9 06:01:04 FIREWALL ppp[98752]: [wan_link0] PPPoE: Connecting to '' Jan 9 06:01:04 FIREWALL kernel: ng0: changing name to 'pppoe0'
-
Hmm, those logs really don't show an error. The upstream device dropped the link and re-linked 4s later. The PPP session reconnected.
If it stops responding even via IPMI it doesn't seem like a networking issue though. Like it's triggering some other problem.
Do any logs show anything after 6:01:04?
-
@stephenw10 I have attached the related full log file in my previous post.
-
So despite the fact it was still logging connection attempts it stops responding entirely even at the local console? At what time did you try to connect?
-
@stephenw10 around 19:30, I tried to connect to see what is going on but IPMI console display was unresponsive. I had to perform a hard power reset over IPMI.
-
So ~12hrs later? And it was still logging connection attempts?
I have no idea what could cause it to be unresponsive at the console but continue logging like that.
-
@stephenw10 it might be a weird ipmi gimmick. I need to confirm the actual display by connecting it directly to a monitor if and when this happens again (hopefully not). I have not yet updated to the latest. There is a chance this issue might get fixed totally. I dont want to update remotely so this has to wait.
Other than that, any idea why when WAN is interrupted, the whole network goes down as well? (Sometimes not every time)
-
Hmm, could the IPMI module be crashing? Does it have a shared port with the WAN?
-
@stephenw10 no it is a separate interface.