How do I debug frequent Gateway alarms
-
Log:
Aug 14 09:21:23 rc.gateway_alarm 99686 >>> Gateway alarm: WAN_DHCP (Addr:109.247.161.1 Alarm:0 RTT:10.005ms RTTsd:3.372ms Loss:10%) Aug 14 09:20:47 php-fpm 97577 /rc.filter_configure_sync: The gateway: NORDVPN_VPNV4 is invalid or unknown, not using it. Aug 14 09:20:47 php-fpm 97577 /rc.filter_configure_sync: The gateway: NORDVPN_VPNV4 is invalid or unknown, not using it. Aug 14 09:20:47 php-fpm 97577 /rc.filter_configure_sync: The gateway: NORDVPN_VPNV4 is invalid or unknown, not using it. Aug 14 09:20:47 php-fpm 42160 /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP. Aug 14 09:20:47 php-fpm 42160 /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. '' Aug 14 09:20:47 php-fpm 42160 /rc.openvpn: Gateway, none 'available' for inet, use the first one configured. 'WAN_DHCP' Aug 14 09:20:46 check_reload_status Reloading filter Aug 14 09:20:46 check_reload_status Restarting OpenVPN tunnels/interfaces Aug 14 09:20:46 check_reload_status Restarting ipsec tunnels Aug 14 09:20:46 check_reload_status updating dyndns WAN_DHCP Aug 14 09:20:46 rc.gateway_alarm 69517 >>> Gateway alarm: WAN_DHCP (Addr:109.247.161.1 Alarm:1 RTT:10.788ms RTTsd:5.979ms Loss:21%) Aug 14 09:10:40 php-fpm 42160 /rc.filter_configure_sync: The gateway: NORDVPN_VPNV4 is invalid or unknown, not using it. Aug 14 09:10:40 php-fpm 42160 /rc.filter_configure_sync: The gateway: NORDVPN_VPNV4 is invalid or unknown, not using it. Aug 14 09:10:40 php-fpm 42160 /rc.filter_configure_sync: The gateway: NORDVPN_VPNV4 is invalid or unknown, not using it. Aug 14 09:10:40 php-fpm 97577 /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP. Aug 14 09:10:40 php-fpm 97577 /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. '' Aug 14 09:10:40 php-fpm 97577 /rc.openvpn: Gateway, none 'available' for inet, use the first one configured. 'WAN_DHCP' Aug 14 09:10:39 check_reload_status Reloading filter Aug 14 09:10:39 check_reload_status Restarting OpenVPN tunnels/interfaces Aug 14 09:10:39 check_reload_status Restarting ipsec tunnels Aug 14 09:10:39 check_reload_status updating dyndns WAN_DHCP Aug 14 09:10:39 rc.gateway_alarm 63748 >>> Gateway alarm: WAN_DHCP (Addr:109.247.161.1 Alarm:0 RTT:9.803ms RTTsd:3.116ms Loss:18%)
Been using pfsense without issues for years. I only made a few changes, but I don't see how any of them could have caused this new error.
They only changes I made to my existing setup pfsense were:
- CPU downgrade to Intel G2120
- Added VLANs
With the new CPU usage rarely exceeds 1%.
Outside of the VPN and VLANs I don't use any other features of pfsense.
I've tried factory reset of pfsense, and setup the VLANs and VPN from scratch, but this issue always comes back.
I tried another factory reset, this time with no changes to stock settings except disabling IPv6 on LAN and WAN. Waiting to see what this change will do.
-
Those alarms are telling you the pings to your monitor IP (or ISP) are not getting through. Do you have multiple gateways? Under System > Routing, what is the monitor IP set to for the WAN_DHCP gateway?
Downgrading the CPU sounds like you to had to disconnect cables in order to do so. Check your cable that is going from modem to WAN port on pfSense. If you can't test it, replace it.
-
There's just the one gateway under System > Routing.
The monitor IP: 109.247.161.1I have a ethernet cable tester so I will try that.
-
@Roy360 said in How do I debug frequent Gateway alarms:
There's just the one gateway under System > Routing.
I have a ethernet cable tester so I will try that.
If you're going to test the cable, make sure you give it a good tug or two to make sure it isn't an intermittent thing. You could get a false pass if not.
If you have only one gateway, then disable the gateway monitoring action if it isn't already checked. You have no need for that in a single gateway scenario. That won't solve the actual problem, but having it unchecked could make matters worse.
Do you have anything entered in the monitor IP field?
Edit, btw I'm not necessarily endorsing the use of 8.8.8.8 for the monitor IP, in your case leaving it blank should be pinging the gateway itself. If it is blank, then your gateway is either not responding or like I said the cable.
-
Thanks for the help!
@Raffi_I've disabled
Disable Gateway Monitoring
I didn't have any IP set underMonitor IP
The wan cable was snug, did not give when I gave it a tug.
In the logs I'm seeing a-lot of Hotplug events (new), but no gateway alarms so far. The hotplug events might be related me testing the cables.
I'm using a Supermicro board with Intel nics so I don't think there's an issue the NIC. In Microsoft Teams and VoIP, I keep getting disconnected from calls, (nothing in the logs that I noticed), so I'm going to put my cable modem back into router mode if replacing all the cables and power cycling everything doesn't work.Aug 14 11:36:51 check_reload_status Updating all dyndns Aug 14 11:36:51 check_reload_status Reloading filter Aug 14 11:36:50 php-fpm 80398 /system_gateways.php: Gateway, none 'available' for inet6, use the first one configured. '' Aug 14 11:36:48 check_reload_status Syncing firewall Aug 14 11:34:01 php-fpm 345 /index.php: Successful login for user 'admin' from: 192.168.1.172 (Local Database) Aug 14 10:53:19 check_reload_status Reloading filter Aug 14 10:53:19 php-fpm 65280 /rc.newwanip: rc.newwanip: on (IP address: 192.168.1.1) (interface: LAN[lan]) (real interface: em1). Aug 14 10:53:19 php-fpm 65280 /rc.newwanip: rc.newwanip: Info: starting on em1. Aug 14 10:53:18 check_reload_status Reloading filter Aug 14 10:53:18 check_reload_status rc.newwanip starting em1 Aug 14 10:53:18 php-fpm 65280 /rc.linkup: Hotplug event detected for LAN(lan) static IP (192.168.1.1 ) Aug 14 10:53:17 kernel em1: link state changed to UP Aug 14 10:53:17 check_reload_status Linkup starting em1 Aug 14 10:53:16 check_reload_status Reloading filter Aug 14 10:53:16 php-fpm 80398 /rc.linkup: Hotplug event detected for LAN(lan) static IP (192.168.1.1 ) Aug 14 10:53:15 kernel em1: link state changed to DOWN Aug 14 10:53:15 check_reload_status Linkup starting em1 Aug 14 10:53:05 check_reload_status Reloading filter Aug 14 10:53:05 php-fpm 345 /rc.newwanip: rc.newwanip: on (IP address: 192.168.1.1) (interface: LAN[lan]) (real interface: em1). Aug 14 10:53:05 php-fpm 345 /rc.newwanip: rc.newwanip: Info: starting on em1. Aug 14 10:53:04 check_reload_status Reloading filter Aug 14 10:53:04 check_reload_status rc.newwanip starting em1 Aug 14 10:53:04 php-fpm 345 /rc.linkup: Hotplug event detected for LAN(lan) static IP (192.168.1.1 ) Aug 14 10:53:03 kernel em1: link state changed to UP Aug 14 10:53:03 check_reload_status Linkup starting em1 Aug 14 10:53:01 check_reload_status Reloading filter Aug 14 10:53:01 php-fpm 65280 /rc.linkup: Hotplug event detected for LAN(lan) static IP (192.168.1.1 ) Aug 14 10:53:00 kernel em1: link state changed to DOWN Aug 14 10:53:00 check_reload_status Linkup starting em1 Aug 14 10:52:38 check_reload_status Reloading filter Aug 14 10:52:38 php-fpm 80398 /rc.newwanip: rc.newwanip: on (IP address: 192.168.1.1) (interface: LAN[lan]) (real interface: em1). Aug 14 10:52:38 php-fpm 80398 /rc.newwanip: rc.newwanip: Info: starting on em1. Aug 14 10:52:37 check_reload_status Reloading filter Aug 14 10:52:37 check_reload_status rc.newwanip starting em1 Aug 14 10:52:37 php-fpm 80398 /rc.linkup: Hotplug event detected for LAN(lan) static IP (192.168.1.1 ) Aug 14 10:52:36 kernel em1: link state changed to UP Aug 14 10:52:36 check_reload_status Linkup starting em1 Aug 14 10:52:31 check_reload_status Reloading filter Aug 14 10:52:31 php-fpm 345 /rc.linkup: Hotplug event detected for LAN(lan) static IP (192.168.1.1 ) Aug 14 10:52:30 kernel em1: link state changed to DOWN Aug 14 10:52:30 check_reload_status Linkup starting em1 Aug 14 10:26:55 nginx 2020/08/14 10:26:55 [error] 63332#100168: send() failed (54: Connection reset by peer) Aug 14 10:26:55 syslogd kernel boot file is /boot/kernel/kernel Aug 14 14:26:55 syslogd exiting on signal 15 Aug 14 10:26:55 check_reload_status Syncing firewall Aug 14 14:25:06 kernel cannot forward src fe80:2::eee:99ff:fe82:a978, dst 2607:fea8:4c60:e3:ec4:7aff:fe75:b40f, nxt 58, rcvif em1, outif em0 Aug 14 10:24:35 check_reload_status Reloading filter Aug 14 10:24:35 check_reload_status Syncing firewall Aug 14 14:24:18 kernel cannot forward src fe80:2::ae0d:1bff:fefc:25e7, dst 2607:fea8:4c60:e3:ec4:7aff:fe75:b40f, nxt 58, rcvif em1, outif em0 Aug 14 10:24:01 check_reload_status Syncing firewall Aug 14 10:23:56 check_reload_status Syncing firewall Aug 14 14:23:49 kernel cannot forward src fe80:2::c80:2ff:febe:7fca, dst 2607:fea8:4c60:e3:ec4:7aff:fe75:b40f, nxt 58, rcvif em1, outif em0 Aug 14 10:23:46 php-fpm 80398 /interfaces.php: Creating rrd update script
-
@Roy360 said in How do I debug frequent Gateway alarms:
Aug 14 10:53:17 em1: link state changed to UP
....
Aug 14 10:53:15 em1: link state changed to DOWNPrepare to abandon ( not using any more) the NIC on the pfSense side ( = em1) or the NIC on the other side - or - if you're lucky, the cable between them.
See these UPs and DOWNs as a bad electrical contact.
-
I meant to give it a tug when hooked up to your Ethernet tester, not pfSense. The tester could give a you the false sense that the cable is good when it's actually not. Therefore, stressing the cable a bit might put it in a failing state and give you a more accurate test result. If you're still having issues, it might be easier to replace the cable. That might be worth a try anyway if you have an extra cable around.
Yea, I think those hotplug events were you disconnecting the cable when you were testing.
-
@Raffi_ said in How do I debug frequent Gateway alarms:
I meant to give it a tug when hooked up to your Ethernet tester, not pfSense. The tester could give a you the false sense that the cable is good when it's actually not. Therefore, stressing the cable a bit might put it in a failing state and give you a more accurate test result. If you're still having issues, it might be easier to replace the cable. That might be worth a try anyway if you have an extra cable around.
Yea, I think those hotplug events were you disconnecting the cable when you were testing.
Never did I think it could be the cables. With my short runs I assumed cables either worked or they didn't.
Not a single error after I replaced all the cables.