Multiple issues, firewall freezes and whole network goes down.
-
It looks more like the PPPoE fails because igb0 stops passing traffic to me. Mostly because at least one other NIC also stops passing traffic at that time. They are both igb NICs.
A good test might be setup some other NIC type and test that at the time. Even better if it's a NIC on a different PCIe bus.
-
@stephenw10 I can try ixl2 but 10g is overkill =:)
-
@stephenw10 It's been a while since this happened since my WAN was pretty stable. However, it happened again and most likely due to the weekly scheduled restart of the Modem. The weird thing is, for the previous scheduled restarts of the modem, this did not happen.
The show starts at Oct 14 06:00. Until then, everything is normal. Full log attached.
system.log.0
Some extractsOct 14 06:01:01 FIREWALL kernel: igb1: link state changed to DOWN Oct 14 06:01:02 FIREWALL php-fpm[55721]: /rc.linkup: Hotplug event detected for MODEM(opt2) dynamic IP address (4: dhcp) Oct 14 06:01:02 FIREWALL php-fpm[55721]: /rc.linkup: DEVD Ethernet detached event for opt2 Oct 14 06:01:03 FIREWALL rc.gateway_alarm[15550]: >>> Gateway alarm: MODEM_DHCP (Addr:192.168.0.1 Alarm:down RTT:0ms RTTsd:0ms Loss:100%) Oct 14 06:01:03 FIREWALL check_reload_status[635]: updating dyndns MODEM_DHCP Oct 14 06:01:03 FIREWALL check_reload_status[635]: Restarting IPsec tunnels Oct 14 06:01:03 FIREWALL check_reload_status[635]: Restarting OpenVPN tunnels/interfaces Oct 14 06:01:03 FIREWALL check_reload_status[635]: Reloading filter Oct 14 06:01:05 FIREWALL dhcpleases[36990]: Could not deliver signal HUP to process 613: No such process. Oct 14 06:01:05 FIREWALL check_reload_status[635]: Linkup starting igb1 Oct 14 06:01:05 FIREWALL kernel: igb1: link state changed to UP Oct 14 06:01:05 FIREWALL rc.gateway_alarm[40725]: >>> Gateway alarm: WAN_PPPOE (Addr:10.98.238.224 Alarm:1 RTT:0ms RTTsd:0ms Loss:100%) Oct 14 06:01:05 FIREWALL rc.gateway_alarm[40886]: >>> Gateway alarm: VPNAC_WG (Addr:10.11.0.1 Alarm:1 RTT:0ms RTTsd:0ms Loss:100%) Oct 14 06:01:05 FIREWALL check_reload_status[635]: updating dyndns WAN_PPPOE Oct 14 06:01:05 FIREWALL check_reload_status[635]: Restarting IPsec tunnels Oct 14 06:01:05 FIREWALL check_reload_status[635]: Restarting OpenVPN tunnels/interfaces Oct 14 06:01:05 FIREWALL check_reload_status[635]: Reloading filter Oct 14 06:01:05 FIREWALL check_reload_status[635]: Restarting IPsec tunnels Oct 14 06:01:07 FIREWALL php-fpm[89488]: /rc.dyndns.update: phpDynDNS (@.mydomain.org): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry. Oct 14 06:01:07 FIREWALL php-fpm[762]: /rc.openvpn: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.*PROTO.*>'' returned exit code '1', the output was '' Oct 14 06:01:07 FIREWALL php-fpm[762]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use MODEM_DHCP. Oct 14 06:01:07 FIREWALL php-fpm[83954]: /rc.openvpn: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.*PROTO.*>'' returned exit code '1', the output was '' Oct 14 06:01:07 FIREWALL php-fpm[83954]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use WAN_PPPOE. Oct 14 06:01:09 FIREWALL rc.gateway_alarm[85414]: >>> Gateway alarm: OVPN_S2S_VPNV4 (Addr:10.25.25.2 Alarm:1 RTT:0ms RTTsd:0ms Loss:100%) Oct 14 06:01:09 FIREWALL check_reload_status[635]: updating dyndns OVPN_S2S_VPNV4 Oct 14 06:01:09 FIREWALL check_reload_status[635]: Restarting IPsec tunnels Oct 14 06:01:09 FIREWALL check_reload_status[635]: Restarting OpenVPN tunnels/interfaces Oct 14 06:01:09 FIREWALL check_reload_status[635]: Reloading filter Oct 14 06:01:10 FIREWALL php-fpm[762]: /rc.openvpn: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.*PROTO.*>'' returned exit code '1', the output was '' Oct 14 06:01:10 FIREWALL php-fpm[762]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use OVPN_S2S_VPNV4. Oct 14 06:01:16 FIREWALL ppp[84675]: caught fatal signal TERM Oct 14 06:01:16 FIREWALL ppp[84675]: [wan] IFACE: Close event Oct 14 06:01:16 FIREWALL ppp[84675]: [wan] IPCP: Close event Oct 14 06:01:16 FIREWALL ppp[84675]: [wan] IPCP: state change Opened --> Closing Oct 14 06:01:16 FIREWALL ppp[84675]: [wan] IPCP: SendTerminateReq #4 Oct 14 06:01:16 FIREWALL ppp[84675]: [wan] IPCP: LayerDown Oct 14 06:01:16 FIREWALL php-cgi[48940]: rc.kill_states: rc.kill_states: Removing states for IP {redactedIP}/32 Oct 14 06:01:18 FIREWALL check_reload_status[635]: Reloading filter Oct 14 06:01:18 FIREWALL php-fpm[95879]: /rc.linkup: Hotplug event detected for MODEM(opt2) dynamic IP address (4: dhcp) Oct 14 06:01:18 FIREWALL php-fpm[95879]: /rc.linkup: DEVD Ethernet attached event for opt2 Oct 14 06:01:18 FIREWALL php-fpm[95879]: /rc.linkup: HOTPLUG: Configuring interface opt2 Oct 14 06:01:20 FIREWALL rc.gateway_alarm[15305]: >>> Gateway alarm: VPNAC_WG (Addr:10.11.0.1 Alarm:1 RTT:0ms RTTsd:0ms Loss:100%) Oct 14 06:01:20 FIREWALL check_reload_status[635]: updating dyndns VPNAC_WG Oct 14 06:01:20 FIREWALL check_reload_status[635]: Restarting IPsec tunnels Oct 14 06:01:20 FIREWALL check_reload_status[635]: Restarting OpenVPN tunnels/interfaces Oct 14 06:01:20 FIREWALL php-cgi[48940]: rc.kill_states: rc.kill_states: Removing states for interface pppoe0 Oct 14 06:01:20 FIREWALL check_reload_status[635]: Rewriting resolv.conf Oct 14 06:01:20 FIREWALL ppp[84675]: [wan] IFACE: Removing IPv4 address from pppoe0 failed(IGNORING for now. This should be only for PPPoE friendly!): Can't assign requested address Oct 14 06:01:20 FIREWALL ppp[84675]: [wan] IFACE: Down event Oct 14 06:01:20 FIREWALL ppp[84675]: [wan] IFACE: Rename interface pppoe0 to pppoe0 Oct 14 06:01:20 FIREWALL ppp[84675]: [wan] IFACE: Set description "WAN" Oct 14 06:01:21 FIREWALL ppp[84675]: [wan] IPCP: SendTerminateReq #5 Oct 14 06:01:21 FIREWALL ppp[84675]: [wan_link0] LCP: no reply to 1 echo request(s) Oct 14 06:01:21 FIREWALL php-fpm[762]: /rc.openvpn: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.*PROTO.*>'' returned exit code '1', the output was '' Oct 14 06:01:21 FIREWALL php-fpm[762]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use VPNAC_WG. Oct 14 06:01:22 FIREWALL ppp[84675]: [wan] Bundle: Shutdown Oct 14 06:01:23 FIREWALL ppp[84675]: [wan_link0] Link: Shutdown Oct 14 06:01:23 FIREWALL ppp[84675]: process 84675 terminated Oct 14 06:01:24 FIREWALL rc.gateway_alarm[59440]: >>> Gateway alarm: OVPN_S2S_VPNV4 (Addr:10.25.25.2 Alarm:1 RTT:0ms RTTsd:0ms Loss:100%) Oct 14 06:01:24 FIREWALL check_reload_status[635]: updating dyndns OVPN_S2S_VPNV4 Oct 14 06:01:24 FIREWALL check_reload_status[635]: Restarting IPsec tunnels Oct 14 06:01:24 FIREWALL check_reload_status[635]: Restarting OpenVPN tunnels/interfaces Oct 14 06:01:24 FIREWALL check_reload_status[635]: Reloading filter Oct 14 06:01:25 FIREWALL php-fpm[83954]: /rc.openvpn: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.*PROTO.*>'' returned exit code '1', the output was '' Oct 14 06:01:25 FIREWALL php-fpm[83954]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use OVPN_S2S_VPNV4. Oct 14 06:01:29 FIREWALL check_reload_status[635]: Linkup starting igb1 Oct 14 06:01:29 FIREWALL kernel: igb1: link state changed to DOWN Oct 14 06:01:32 FIREWALL check_reload_status[635]: Linkup starting igb1 Oct 14 06:01:33 FIREWALL kernel: igb1: link state changed to UP Oct 14 06:01:53 FIREWALL check_reload_status[635]: rc.newwanip starting igb1 Oct 14 06:01:53 FIREWALL php-fpm[95879]: /rc.linkup: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.*PROTO.*>'' returned exit code '1', the output was '' Oct 14 06:01:53 FIREWALL check_reload_status[635]: Restarting IPsec tunnels Oct 14 06:01:55 FIREWALL php-fpm[55721]: /rc.newwanip: rc.newwanip: Info: starting on igb1. Oct 14 06:01:55 FIREWALL php-fpm[55721]: /rc.newwanip: rc.newwanip: on (IP address: 192.168.0.2) (interface: MODEM[opt2]) (real interface: igb1). Oct 14 06:01:55 FIREWALL dhcpleases[26526]: Could not deliver signal HUP to process 13480: No such process. Oct 14 06:01:55 FIREWALL dhcpleases[49055]: Could not deliver signal HUP to process 13480: No such process. Oct 14 06:01:56 FIREWALL check_reload_status[635]: updating dyndns opt2 Oct 14 06:01:56 FIREWALL ppp[82148]: Multi-link PPP daemon for FreeBSD Oct 14 06:01:56 FIREWALL ppp[82148]: Oct 14 06:01:56 FIREWALL ppp[82148]: process 82148 started, version 5.9 Oct 14 06:01:56 FIREWALL ppp[82148]: web: web is not running Oct 14 06:01:56 FIREWALL ppp[82148]: [wan] Bundle: Interface ng0 created Oct 14 06:01:56 FIREWALL ppp[82148]: [wan_link0] Link: OPEN event Oct 14 06:01:56 FIREWALL ppp[82148]: [wan_link0] LCP: Open event Oct 14 06:01:56 FIREWALL ppp[82148]: [wan_link0] LCP: state change Initial --> Starting Oct 14 06:01:56 FIREWALL ppp[82148]: [wan_link0] LCP: LayerStart Oct 14 06:01:56 FIREWALL ppp[82148]: [wan_link0] PPPoE: Connecting to '' Oct 14 06:01:56 FIREWALL kernel: ng0: changing name to 'pppoe0' Oct 14 06:01:59 FIREWALL php-fpm[95879]: /rc.linkup: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.*PROTO.*>'' returned exit code '1', the output was '' Oct 14 06:01:59 FIREWALL check_reload_status[635]: Restarting IPsec tunnels Oct 14 06:02:00 FIREWALL dhcpleases[99884]: Could not deliver signal HUP to process 59176: No such process. Oct 14 06:02:02 FIREWALL check_reload_status[635]: updating dyndns wan Oct 14 06:02:02 FIREWALL check_reload_status[635]: Reloading filter Oct 14 06:02:02 FIREWALL check_reload_status[635]: Reloading filter Oct 14 06:02:02 FIREWALL php-fpm[94330]: /rc.linkup: Hotplug event detected for MODEM(opt2) dynamic IP address (4: dhcp) Oct 14 06:02:02 FIREWALL php-fpm[94330]: /rc.linkup: DEVD Ethernet detached event for opt2 Oct 14 06:02:03 FIREWALL ppp[82148]: PPPoE: rec'd ACNAME "TT-35-HATAY-ZTE-ZXR-02" Oct 14 06:02:03 FIREWALL ppp[82148]: [wan_link0] PPPoE: connection successful Oct 14 06:02:03 FIREWALL ppp[82148]: [wan_link0] Link: UP event Oct 14 06:02:03 FIREWALL ppp[82148]: [wan_link0] LCP: Up event Oct 14 06:02:03 FIREWALL ppp[82148]: [wan_link0] LCP: state change Starting --> Req-Sent Oct 14 06:02:03 FIREWALL ppp[82148]: [wan_link0] LCP: SendConfigReq #1 Oct 14 06:02:03 FIREWALL ppp[82148]: [wan_link0] PROTOCOMP Oct 14 06:02:03 FIREWALL ppp[82148]: [wan_link0] MRU 1492 Oct 14 06:02:03 FIREWALL ppp[82148]: [wan_link0] MAGICNUM 0x57649364 Oct 14 06:02:03 FIREWALL ppp[82148]: [wan_link0] LCP: rec'd Configure Request #1 (Req-Sent) Oct 14 06:02:03 FIREWALL ppp[82148]: [wan_link0] MAGICNUM 0x26b4014d Oct 14 06:02:03 FIREWALL ppp[82148]: [wan_link0] MRU 1492 Oct 14 06:02:03 FIREWALL ppp[82148]: [wan_link0] AUTHPROTO PAP Oct 14 06:02:03 FIREWALL ppp[82148]: [wan_link0] LCP: SendConfigAck #1 Oct 14 06:02:03 FIREWALL ppp[82148]: [wan_link0] MAGICNUM 0x26b4014d Oct 14 06:02:03 FIREWALL ppp[82148]: [wan_link0] MRU 1492 Oct 14 06:02:03 FIREWALL ppp[82148]: [wan_link0] AUTHPROTO PAP Oct 14 06:02:03 FIREWALL ppp[82148]: [wan_link0] LCP: state change Req-Sent --> Ack-Sent Oct 14 06:02:03 FIREWALL ppp[82148]: [wan_link0] LCP: rec'd Configure Reject #1 (Ack-Sent) Oct 14 06:02:03 FIREWALL ppp[82148]: [wan_link0] PROTOCOMP Oct 14 06:02:03 FIREWALL ppp[82148]: [wan_link0] LCP: SendConfigReq #2 Oct 14 06:02:03 FIREWALL ppp[82148]: [wan_link0] MRU 1492 Oct 14 06:02:03 FIREWALL ppp[82148]: [wan_link0] MAGICNUM 0x57649364
This is becoming unbearable for me since I need this remote system to be extra stable. Any help at this point is very much appreciated.
Do I need some professional help at this point? Can TAC lite help with this? I am not shy from spending some $$ to get this fixed at this point. I am even considering switching to opnsense but I like pfsense way better and unless as a last resort I will not. -
So it was completely unresponsive again? You had to power cycle it?
You are still using igb1 for WAN. Are you able to try a different NIC?
It's a interesting log. The igb NIC bounces twice causing a bunch of service restarts but then remains stable.
The PPP daemon then tries to reconnect but initially it fails with:Oct 14 06:02:17 FIREWALL ppp[73450]: [wan_link0] PAP: rec'd NAK #1 len: 27 Oct 14 06:02:17 FIREWALL ppp[73450]: [wan_link0] MESG: Authentication failed! Oct 14 06:02:17 FIREWALL ppp[73450]: [wan_link0] LCP: authorization failed
But then after ~40s:
Oct 14 06:02:45 FIREWALL ppp[73450]: [wan_link0] PAP: rec'd ACK #1 len: 39 Oct 14 06:02:45 FIREWALL ppp[73450]: [wan_link0] MESG: Authentication Successful,Welcome! Oct 14 06:02:45 FIREWALL ppp[73450]: [wan_link0] LCP: authorization successful
That implies the NIC is passing traffic and the remote server is responding at that point. The PPPoE link comes up correctly.
But only for ~90s, then:Oct 14 06:03:54 FIREWALL ppp[73450]: [wan_link0] LCP: no reply to 1 echo request(s)
After that the ppp daemon times out and cannot reconnect again. But it keeps trying.
Nothing is logged that seems be anything that would cause it to stop passing traffic.
-
@stephenw10 said in Multiple issues, firewall freezes and whole network goes down.:
So it was completely unresponsive again? You had to power cycle it?
Yeah, exactly the same thing
@stephenw10 said in Multiple issues, firewall freezes and whole network goes down.:
You are still using igb1 for WAN. Are you able to try a different NIC?
I have not been able to return back to this site to make this change yet. (will try to get someone over there to switch the ports as soon as possible)
is it possible that there is some configuration mistake on my part for pppoe?
-
I doubt it's a pppoe issue. If it was it would either fail to connect entirely or disconnect consistently. This seems like something happens to the upstream device. Somehow it then fails. What's totally unclear though is why the firewall stops responding completely. The logs show it just keeps trying to connect. It makes me wonder if it's actually an IPMI issue somehow.
One thing I would do here is chnage the 'Modem' interface to a static IP with no gateway if you can. Assuming you are using that only to access the modem? When it's configured as dhcp pfSense treats it as a WAN and runs all the link scrips when igb1 bounces.
-
@stephenw10 hmm, I will try that during the weekend. No time to test it right now. Thanks for the suggestion. However, by logic, does it matter whether it is dhcp or static?
According to this recipe, it does recommend "static".
https://docs.netgate.com/pfsense/en/latest/recipes/modem-access.html
Now that I think about it:
My WAN IP is 88.....
and my WAN_PPPOE is 10......
For some reason, they are different and I am not behind CGNAT.
Does enabling "Use non-local gateway" on one of these gateways will make a difference? I am not exactly sure what this option does and which gateway it should be enabled? -
Yes, use static there if you can. When you set it as dhcp the server passes it a gateway to use and pfSense sets that on the interface turning it into a WAN. Then it triggers all the WAN IP scripts when it bounces.
-
@stephenw10 Okay, I set it up as static now. Let's see if it will survive the scheduled restart of Modem on Monday.
Now that you have reminded me, I have another interface (network) which is not WAN but with a DHCP gateway. It is connected to a switch (This switch connects all IPMI devices together in one network with elevated privilidges). The switch acts as a DHCP server. By this logic, should I also set this up as static?The MNG interface below.
-
You could also change that but it won't be nearly as impactful because it's on a different NIC. It shouldn't lose link at the same time.
-
@stephenw10 Survived the Monday but will monitor it for changes.
Just curious is there a way to mark the DHCP gateway as local only instead of WAN or this default behavir cannot be changed?
-
The only way you can do that is to add a gateway separately to an interface config like you might with an internal router for example. But you can't do that for a dynamic interface type like DHCP. The server passes a gateway to the client to use and it is always added to the gateway.
You could maybe override the gateway that is passed in the advanced dhcp options. I'm not sure I've ever tried that. -
@stephenw10 so far nothing crashed yet with static IP.
I am guessing this is a very rare problem since you have to have pppoe DSL (most uses fiber these days) and configured the network to be able to access modem behind WAN via DHCP.
I will continue monitoring.There is no option that suggests to mark the gateway as internal network only. Maybe checking the boxes for "Disable Gateway Monitoring", "Disable Gateway Monitoring Action", and "Do not add static route for gateway monitor IP address via the chosen interface" might achieve a similar result but I dont want to try these for now.
Still no idea for the exact use case of "Use non-local gateway" option. -
You would have to use the DHCP advanced options field to force the dhcp client to ignore the gateway passed by the server. So adding
supersede routers
in the Option modifiers field should do it.But I would just use a static IP here.
-
@Laxarus Your traffic shaper is that required? That could be configured incorrectly.
https://forum.netgate.com/topic/171842/queue-management-algorithms-differences
-
@JonathanLee said in Multiple issues, firewall freezes and whole network goes down.:
https://forum.netgate.com/topic/171842/queue-management-algorithms-differences
My main interface is on 2x25G LAGG and LAGG is not supported with traffic shaper so other than bufferbloat nothing is set there.