WAN repeatedly going down every few days again
-
I had this issue some time back (https://forum.netgate.com/topic/175077/weird-internet-access-issue/13?_=1725234039623). I seemed to have resolved it by cleaning up some unused VPN configurations.
But now the issue has creeped up again for some odd reason.
It's rather annoying as it happens during work or other important times and I have to restore connection manually. I'm pretty much at a loss and have no idea what to do.
These are the Gateway System Logs
Sep 1 09:24:10 dpinger 80377 send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% alarm_hold 10000ms dest_addr 1.1.1.1 bind_addr xxx.xxx.xxx.xxx identifier "WAN_DHCP "
Sep 1 09:24:10 dpinger 38900 exiting on signal 15
Sep 1 09:24:07 dpinger 38900 send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% alarm_hold 10000ms dest_addr 1.1.1.1 bind_addr xxx.xxx.xxx.xxx identifier "WAN_DHCP "
Sep 1 09:24:06 dpinger 3735 exiting on signal 15
Sep 1 09:24:05 dpinger 3735 WAN_DHCP 1.1.1.1: sendto error: 65
Sep 1 09:24:05 dpinger 3735 WAN_DHCP 1.1.1.1: sendto error: 65
Sep 1 09:23:51 dpinger 3735 WAN_DHCP 1.1.1.1: Alarm latency 0us stddev 0us loss 100%
Sep 1 09:23:49 dpinger 3735 send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% alarm_hold 10000ms dest_addr 1.1.1.1 bind_addr xxx.xxx.xxx.xxx identifier "WAN_DHCP "
Sep 1 09:23:49 dpinger 62015 exiting on signal 15
Sep 1 09:22:13 dpinger 62015 WAN_DHCP 1.1.1.1: Alarm latency 4395us stddev 763us loss 21% -
@pp-ng said in WAN repeatedly going down every few days again:
Sep 1 09:22:13 dpinger 62015 WAN_DHCP 1.1.1.1: Alarm latency 4395us stddev 763us loss 21%
That first log there shows the WAN has 21% packet loss.
Is that the full log or filtered for only dpinger?The fact dpinger is restarting there implies something else is happening that isn't shown.
-
That is the full gateway system log from today yea. Is there another log you'd like to see?
It brought the WAN down I guess due to the 1.1.1.1 monitoring? not sure
-
It's the gateway log or the system log? Looks like it might be the gateway log but the system log also contains those entries.
The gateway alarm can trigger a bunch of things but it should not bring the WAN down. I suspect there was more happening here in the main system log. The sendto 65 error there implies the DHCP lease was lost and the gateway was no longer in an existing subnet.
-
@stephenw10 It's the logs from Status > System Logs > Gateway
So when or if it happens again, what things should I look for or do?
Is there more debug logging I can enable to help narrow it down if it happens again?
-
@stephenw10 Also - to get my WAN back to 'online' I went into Interfaces > WAN and just clicked Save and then Activate. I know it runs several scripts or whatever in the background, so not sure which one got me back online, but that did it.
-
Right so check in the main System > General tab in the logs at that time. It will show interfaces losing link, processes restarting, IP address changes etc.
-
@stephenw10 Here you go
Sep 1 09:31:00 sshguard 32697 Now monitoring attacks.
Sep 1 09:31:00 sshguard 44278 Exiting on signal.
Sep 1 09:25:00 php 80793 /usr/local/sbin/acbupload.php: End of configuration backup to https://acb.netgate.com/save (success).
Sep 1 09:24:11 php-fpm 4605 /rc.start_packages: Restarting/Starting all packages.
Sep 1 09:24:11 php-fpm 4605 /rc.start_packages: Restarting/Starting all packages.
Sep 1 09:24:10 check_reload_status 630 Starting packages
Sep 1 09:24:10 php-fpm 21640 /interfaces.php: Creating rrd update script
Sep 1 09:24:10 check_reload_status 630 Reloading filter
Sep 1 09:24:10 check_reload_status 630 Starting packages
Sep 1 09:24:10 check_reload_status 630 Reloading filter
Sep 1 09:24:10 php-fpm 4605 /rc.newwanip: Netgate pfSense Plus package system has detected an IP change or dynamic WAN reconnection - 0.0.0.0 -> xxx.xxx.xxx.xxx - Restarting packages.
Sep 1 09:24:10 php-fpm 21640 /interfaces.php: Removing static route for monitor 1.1.1.1 and adding a new route through xxx.xxx.xxx.xxx
Sep 1 09:24:10 php-fpm 21640 /interfaces.php: Resyncing OpenVPN instances for interface WAN.
Sep 1 09:24:08 php-fpm 4605 /rc.newwanip: Creating rrd update script
Sep 1 09:24:08 php-fpm 4605 /rc.newwanip: Resyncing OpenVPN instances for interface WAN.
Sep 1 09:24:08 check_reload_status 630 updating dyndns wan
Sep 1 09:24:08 check_reload_status 630 Restarting IPsec tunnels
Sep 1 09:24:08 php-fpm 4605 /rc.newwanip: IP Address has changed, killing states on former IP Address 0.0.0.0.
Sep 1 09:24:08 php-fpm 21640 /interfaces.php: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.PROTO.>'' returned exit code '1', the output was ''
Sep 1 09:24:08 php-fpm 4605 /rc.newwanip: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.PROTO.>'' returned exit code '1', the output was ''
Sep 1 09:24:07 php-fpm 4605 /rc.newwanip: Removing static route for monitor 1.1.1.1 and adding a new route through xxx.xxx.xxx.xxx
Sep 1 09:24:07 php-fpm 4605 /rc.newwanip: rc.newwanip: on (IP address: xxx.xxx.xxx.xxx) (interface: WAN[wan]) (real interface: igb0).
Sep 1 09:24:07 php-fpm 4605 /rc.newwanip: rc.newwanip: Info: starting on igb0.
Sep 1 09:24:06 php-fpm 21640 /interfaces.php: Starting rtsold process on wan(igb0)
Sep 1 09:24:06 php-fpm 21640 /interfaces.php: Starting DHCP6 client for interfaces igb0 in DHCP6 without RA mode
Sep 1 09:24:06 php-fpm 21640 /interfaces.php: Accept router advertisements on interface igb0
Sep 1 09:24:06 php-fpm 21640 /interfaces.php: calling interface_dhcpv6_configure.
Sep 1 09:24:06 check_reload_status 630 rc.newwanip starting igb0
Sep 1 09:24:06 php-fpm 21640 /interfaces.php: Error starting gateway monitor for WAN_DHCP
Sep 1 09:24:06 php-fpm 21640 /interfaces.php: The command '/usr/local/bin/dpinger -S -r 0 -i WAN_DHCP -B xxx.xxx.xxx.xxx -p /var/run/dpinger_WAN_DHCP~xxx.xxx.xxx.xxx~1.1.1.1.pid -u /var/run/dpinger_WAN_DHCP~xxx.xxx.xxx.xxx~1.1.1.1.sock -C "/etc/rc.gateway_alarm" -d 1 -s 500 -l 2000 -t 60000 -A 1000 -D 500 -L 20 1.1.1.1 >/dev/null' returned exit code '1', the output was ''
Sep 1 09:24:03 php 1125 /usr/local/sbin/acbupload.php: New alert found: An error occurred while uploading the encrypted Netgate pfSense Plus configuration to https://acb.netgate.com/save ( Unable to resolve acb.netgate.com )
Sep 1 09:24:03 php 1125 /usr/local/sbin/acbupload.php: An error occurred while uploading the encrypted Netgate pfSense Plus configuration to https://acb.netgate.com/save ( Unable to resolve acb.netgate.com ) - Unable to resolve acb.netgate.com
Sep 1 09:24:01 php-fpm 35200 /interfaces.php: Beginning configuration backup to https://acb.netgate.com/save
Sep 1 09:24:01 check_reload_status 630 Syncing firewall
Sep 1 09:24:01 php-fpm 35200 /interfaces.php: Configuration Change: admin@xxx.xxx.xxx.xxx (Local Database): Interfaces settings changed
Sep 1 09:23:52 php-fpm 21640 /rc.openvpn: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.PROTO.>'' returned exit code '1', the output was ''
Sep 1 09:23:51 check_reload_status 630 Updating all dyndns
Sep 1 09:23:51 check_reload_status 630 Reloading filter
Sep 1 09:23:51 check_reload_status 630 Reloading filter
Sep 1 09:23:51 check_reload_status 630 Restarting OpenVPN tunnels/interfaces
Sep 1 09:23:51 check_reload_status 630 Restarting IPsec tunnels
Sep 1 09:23:51 check_reload_status 630 updating dyndns WAN_DHCP
Sep 1 09:23:51 rc.gateway_alarm 8515 >>> Gateway alarm: WAN_DHCP (Addr:1.1.1.1 Alarm:1 RTT:0ms RTTsd:0ms Loss:100%)
Sep 1 09:23:51 php-fpm 33576 /system_gateways.php: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.PROTO.>'' returned exit code '1', the output was ''
Sep 1 09:23:49 php-fpm 33576 /system_gateways.php: Removing static route for monitor 1.1.1.1 and adding a new route through xxx.xxx.xxx.xxx
Sep 1 09:23:47 php-fpm 33576 /system_gateways_edit.php: Beginning configuration backup to https://acb.netgate.com/save
Sep 1 09:23:47 check_reload_status 630 Syncing firewall
Sep 1 09:23:47 php-fpm 33576 /system_gateways_edit.php: Configuration Change: admin@xxx.xxx.xxx.xxx (Local Database): Gateway settings changed
Sep 1 09:23:36 php-fpm 33576 /index.php: Successful login for user 'admin' from: xxx.xxx.xxx.xxx (Local Database)
Sep 1 09:22:14 php-fpm 35200 /rc.openvpn: The command '/sbin/route -n6 get 'default' 2>/dev/null | /usr/bin/egrep 'flags: <.PROTO.>'' returned exit code '1', the output was ''
Sep 1 09:22:13 check_reload_status 630 Reloading filter
Sep 1 09:22:13 check_reload_status 630 Restarting OpenVPN tunnels/interfaces
Sep 1 09:22:13 check_reload_status 630 Restarting IPsec tunnels
Sep 1 09:22:13 check_reload_status 630 updating dyndns WAN_DHCP
Sep 1 09:22:13 rc.gateway_alarm 91060 >>> Gateway alarm: WAN_DHCP (Addr:1.1.1.1 Alarm:1 RTT:4.395ms RTTsd:.763ms Loss:21%) -
Ah, OK so a lot of those changes in the gateway log are because you logged in and resaved the gateway settings.
The initial packet loss itself doesn't trigger anything more than the alarm and the reload that brings.
Packet loss on the WAN is usually actually just packet loss. So some upstream routing or connection issue.
-
@stephenw10 So do we know what the issue could be or what should I do next time it happens?
-
The logs seem to show the WAN just started dropping packets.
Did you login and resave that in order to bring the WAN back up? Does it not come back up if you do not?
-
@stephenw10 to get it back up I went into Interfaces > WAN and saved.
Prior to that I might have gone into System > Routing and also saved, I cannot recall.
Why do these show up? I don't have any VPNs configured? Or is that just something that happens
Sep 1 09:23:51 check_reload_status 630 Restarting OpenVPN tunnels/interfaces
Sep 1 09:23:51 check_reload_status 630 Restarting IPsec tunnels -
Yes those lines are shown when the script runs but don't actually do anything if you don't have tunnels configured.
-
@pp-ng said in WAN repeatedly going down every few days again:
@stephenw10 Also - to get my WAN back to 'online' I went into Interfaces > WAN and just clicked Save and then Activate. I know it runs several scripts or whatever in the background, so not sure which one got me back online, but that did it.
Sounds to me more like a configuration problem on your WAN or your ISP/upstream provider. That somehow smells like you have DHCP on WAN and your box looses its connection because the DHCP address expired or your provider doesn't "know" it anymore. Or you get a quasi static IP from your ISP and configured it as static IP but your ISP needs to hand it out via DHCP.
We had some of that use cases in support here and most of them had that exact problem. Saving WAN config brought them online and after a few hours or days the connection dropped 'cause packages wouldn't go out/in anymore. Switching the WAN e.g. from static to DHCP or configuring it the way your ISP needs/wants it could solve that. Or check the ISP modem or %device% depending on your internet. That a manual "save & apply" from WAN brings you back seems to indicate that a manual performed DHCP restart seems to work, so I'd have a look at the way you get your IP from your ISP.
Cheers :)