ISP gw stopped responding
-
My ISP (a huge provider) claims there was no issues with Internet last night, but I lost connection to my pfSense and the network for 3-5 minutes. I do see this GW-alarm in the log and I lost connection to everything on my network.
My pfSense is directly connected to the ISPs equipment in the data center. Since I run ping from external data center, I could see the ISPs equipment responding, on the IP references as ISP_BOX below.
What could be wrong? I have around a year uptime or more on pfSense before last update to 2.7.2 one week ago, beside that no config has changed in years. I have had same gw-config for at least last 5 years, I would never mess around with it.
ISP_BOX below is the ISP's equipment on site and is the only active GW on the system. I do see some arp-action on the link to the ISPs box.
===group```
code_textMay 13 00:25:04 php-fpm 3496 /rc.openvpn: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.*PROTO.*>'' returned exit code '1', the output was '' May 13 00:25:03 check_reload_status 3543 Reloading filter May 13 00:25:03 check_reload_status 3543 Restarting OpenVPN tunnels/interfaces May 13 00:25:03 check_reload_status 3543 Restarting IPsec tunnels May 13 00:25:03 check_reload_status 3543 updating dyndns GW_WAN_2 May 13 00:25:03 rc.gateway_alarm 74528 >>> Gateway alarm: GW_WAN_2 (Addr:ISP_BOX Alarm:0 RTT:.341ms RTTsd:1.101ms Loss:5%) May 13 00:25:03 check_reload_status 3543 Reloading filter May 13 00:25:03 php-fpm 3899 /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. May 13 00:25:00 sshguard 50155 Now monitoring attacks. May 13 00:25:00 sshguard 13795 Exiting on signal. May 13 00:24:46 check_reload_status 3543 Restarting IPsec tunnels May 13 00:24:42 check_reload_status 3543 Reloading filter May 13 00:24:42 php-fpm 3496 /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. May 13 00:24:26 check_reload_status 3543 Restarting IPsec tunnels May 13 00:24:06 kernel arp: ISP_BOX moved from d0:d0:4b:66:6c:75 to 30:fd:65:89:4a:1a on igb0 May 13 00:19:52 bandwidthd 39455 Previouse graphing run not complete... Skipping current run May 13 00:19:52 bandwidthd 40348 Previouse graphing run not complete... Skipping current run May 13 00:19:42 check_reload_status 3543 Reloading filter May 13 00:19:42 php-fpm 42201 /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. May 13 00:16:50 check_reload_status 3543 Restarting IPsec tunnels May 13 00:16:32 bandwidthd 39455 Previouse graphing run not complete... Skipping current run May 13 00:16:31 bandwidthd 40348 Previouse graphing run not complete... Skipping current run May 13 00:15:33 check_reload_status 3543 Reloading filter May 13 00:15:33 php-fpm 3899 /rc.ipsec: IPSEC: One or more IPsec tunnel gateways have changed. Refreshing. May 13 00:15:18 php-fpm 3496 /rc.openvpn: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.*PROTO.*>'' returned exit code '1', the output was '' May 13 00:15:16 check_reload_status 3543 Reloading filter May 13 00:15:16 check_reload_status 3543 Restarting OpenVPN tunnels/interfaces May 13 00:15:16 check_reload_status 3543 Restarting IPsec tunnels May 13 00:15:16 check_reload_status 3543 updating dyndns GW_WAN_2 May 13 00:15:16 rc.gateway_alarm 42537 >>> Gateway alarm: GW_WAN_2 (Addr:ISP_BOX Alarm:1 RTT:.198ms RTTsd:.044ms Loss:22%) May 13 00:11:59 php 11666 [pfBlockerNG] No changes to Firewall rules, skipping Filter Reload May 13 00:07:03 php 13902 /usr/local/sbin/acbupload.php: Skipping ACB backup for (system): pfblockerng: saving dnsbl changes. May 13 00:06:19 php 11666 /usr/local/www/pfblockerng/pfblockerng.php: Beginning configuration backup to https://acb.netgate.com/save May 13 00:06:19 check_reload_status 3543 Syncing firewall ===
-
@fireix
What sticks out is the gateway alarm which indicates connectivity issue between the firewall and the monitored address.
Did the link physically go down?
Are you able to login to the providers modem and check any basic health stats?
It seems like (correctly) the provider just said there was no problem on their network but did they check their equipment at your home?
Regardless, the firewall couldn’t ping and was losing pings to your modem based on the alarm message as you can see. Investigate that -
@michmoor It is a direct network cable (0.5 meter) from my pfSense wan0-port in the rack to the fiber-box port in same rack that they provide (fully managed fiber box/converter). Not even a single switch between. It is with uptime warranties and all kind of high level SLA, so I have no access to their box. All I have is their word that it was no issues at their end. For them, it just looked like my pfSense went offline for 2-3 minutes. And I can't find anything in logs that says that my box took any reboots or did anything weird.
What is weird is that mac-address changed during that time...
-
@fireix who’s MAC address changed and how do you know?
-
@michmoor After extra round with the ISP, they just now admitted now they forgot to inform me of maintance! So it was my ISP!
"who’s MAC address changed and how do you know?"
The log I posted above, you see the ISP's box changed from one mac-address to another. I assume the log line below shows the mac address of the connected device on my WAN-port (igb0). Since their box is directly connected to this port, it can't be anything else than them.
May 13 00:24:06 kernel arp: ISP_BOX moved from d0:d0:4b:66:6c:75 to 30:fd:65:89:4a:1a on igb0