Packages Restarting
-
igb1 is the backup WAN?
The issue in the linked thread is if an internal interface changes state and it not set to track a WAN for IPv6. So this is not that.
Packages should restart if a WAN changes IP. Anything that uses that IP would need to.
What is igb1 connected to? Why is it losing link to that?
Steve
-
@stephenw10 Yes, igb1 is the backup WAN. Its connected to a Netgear LB1121 (https://www.netgear.com/support/product/lb1121).
I have the Gateway Group set-up as a standard Failover, per the docs. So igb1 is Tier 2. Last night, I Enabled Gateway Monitoring & Gateway Action with default Monitoring IP. And, rebooted the router and the Netgear modem. Per the log, things calmed down then about 2:30am it picked up again thru about abput 4am. Since 4am I haven't seen any package restarts.
On a whim, I just deleted two cron jobs (one was a shutdown -r) but neither were scheduled for those times.
My primary Gateway (igb0) is a wISP. The bandwidth gets saturated at times, eventually leading to the failover. That was happening a lot the last several days (holiday weekend, so the wISP gets hammered by its customers).
Would frequent switching between gateways cause this? I've tried bumping up probe times (& related settings). Not really sure how else to deal with the GG switching during these episodes. I've also wondered if the Monitoring IP might be an issue; perhaps 1.1.1.1, 8.8.8.8, 9.9.9.9, etc are throttling the probes.
Other thing is, I didn't notice this until recently & wonder if its a V2.7 issue.
-
I wouldn't expect to see igb1 actually lose link whatever the gateway is doing. Does the Netgear modem device also log a loss of link?
-
I don't see a way to read logs from the Netgear LTE modem.
The router logs show this activity on until about 4am again, then calm. Issue picked up again this morning, I suspect about the time I turned on the Android TV. This morning I decided to disconnect the Neatgear LTE modem. So far, no issues.
I'm now wondering if the Primary WAN Latency Threshold & Packet Loss Threshold are too low, currently:
Latency: 350/500
Packet Loss: 15/35
So, AndroidTV (or some other devices) starts streaming, my wISP bandwidth get saturated. However, doesn't make sense that this is happening overnight until 4am. -
It also shouldn't cause the NIC to actually lose link. That should only happen if the NIC has it's setting changed or the other end of the link (the modem) drops it.
-
@stephenw10 Thanks. Since disconnecting the Netgear LTE modem, the system logs have been quiter. So maybe the modem or some other hardware issue (that I will narrow down.) To further complicate things, in reviewing last nights system log (below) its appears to me that maybe the wISP is losing connection or Quad9 is having issues. Reviewing logs from Sep 1, I see the same behaviour: GW_Primary alarm at 2:36am with various log e tries until about 04:00. I think this has been ongoing for some time, and I have tried different Monitor IPs during that time so I don't think its Quad9
Sep 7 03:49:24 php-fpm 87590 /rc.dyndns.update: phpDynDNS (): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry. Sep 7 03:49:23 php-fpm 1107 /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use GW_Primary. Sep 7 03:49:21 check_reload_status 412 Reloading filter Sep 7 03:49:21 check_reload_status 412 Restarting OpenVPN tunnels/interfaces Sep 7 03:49:21 check_reload_status 412 Restarting IPsec tunnels Sep 7 03:49:21 check_reload_status 412 updating dyndns GW_Primary Sep 7 03:49:21 rc.gateway_alarm 95468 >>> Gateway alarm: GW_Primary (Addr:9.9.9.9 Alarm:0 RTT:23.544ms RTTsd:6.640ms Loss:0%) Sep 7 03:49:10 php-cgi 95611 notify_monitor.php: Message sent to dev.note@domain.org OK Sep 7 03:48:59 php-fpm 18413 /rc.dyndns.update: phpDynDNS (): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry. Sep 7 03:48:58 php-fpm 22 /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use GW_Primary. Sep 7 03:48:58 php-fpm 22 9.9.9.9|38.XX.XX.Y7|GW_Primary|29.876ms|7.593ms|0.0%|online|none Sep 7 03:48:58 php-fpm 22 /rc.openvpn: MONITOR: GW_Primary is available now, adding to routing group GWG_Failover Sep 7 03:48:56 check_reload_status 412 Reloading filter Sep 7 03:48:56 check_reload_status 412 Restarting OpenVPN tunnels/interfaces Sep 7 03:48:56 check_reload_status 412 Restarting IPsec tunnels Sep 7 03:48:56 check_reload_status 412 updating dyndns GW_Primary Sep 7 03:48:56 rc.gateway_alarm 63076 >>> Gateway alarm: GW_Primary (Addr:9.9.9.9 Alarm:1 RTT:29.821ms RTTsd:7.671ms Loss:0%) Sep 7 03:47:44 php-fpm 383 /rc.dyndns.update: phpDynDNS (): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry. Sep 7 03:47:43 php-fpm 1107 /rc.filter_configure_sync: An error occurred while trying to find the interface got 38.XX.XX.Y6 . The rule has not been added. Sep 7 03:47:43 php-fpm 3255 /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use GW_Primary. Sep 7 03:47:41 check_reload_status 412 Reloading filter Sep 7 03:47:41 check_reload_status 412 Restarting OpenVPN tunnels/interfaces Sep 7 03:47:41 check_reload_status 412 Restarting IPsec tunnels Sep 7 03:47:41 check_reload_status 412 updating dyndns GW_Primary Sep 7 03:47:41 rc.gateway_alarm 30240 >>> Gateway alarm: GW_Primary (Addr:9.9.9.9 Alarm:1 RTT:4219.342ms RTTsd:3736.108ms Loss:75%) Sep 7 03:45:01 php-cgi 82400 rc.filter_configure_sync: An error occurred while trying to find the interface got 38.XX.XX.Y6 . The rule has not been added. Sep 7 03:30:01 php-cgi 57273 rc.filter_configure_sync: An error occurred while trying to find the interface got 38.XX.XX.Y6 . The rule has not been added. Sep 7 03:15:02 php-cgi 83850 rc.filter_configure_sync: An error occurred while trying to find the interface got 38.XX.XX.Y6 . The rule has not been added. Sep 7 03:00:02 php-cgi 53983 rc.filter_configure_sync: An error occurred while trying to find the interface got 38.XX.XX.Y6 . The rule has not been added. Sep 7 02:45:01 php-cgi 21419 rc.filter_configure_sync: An error occurred while trying to find the interface got 38.XX.XX.Y6 . The rule has not been added. Sep 7 02:37:03 php-cgi 63727 notify_monitor.php: Could not send the message to dev.note@domain.org -- Error: Failed to connect to ssl://smtp.dreamhost.com:465 [SMTP: Failed to connect socket: php_network_getaddresses: getaddrinfo for smtp.dreamhost.com failed: Address family for hostname not supported (code: -1, response: )] Sep 7 02:36:28 php-fpm 22 /rc.dyndns.update: phpDynDNS (): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry. Sep 7 02:36:27 php-fpm 87590 /rc.filter_configure_sync: An error occurred while trying to find the interface got 38.XX.XX.Y6 . The rule has not been added. Sep 7 02:36:26 php-fpm 8672 /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use GW_Primary. Sep 7 02:36:26 php-fpm 8672 9.9.9.9|38.XX.XX.Y7|GW_Primary|18.317ms|1.492ms|45%|down|highloss Sep 7 02:36:26 php-fpm 8672 /rc.openvpn: MONITOR: GW_Primary has packet loss, omitting from routing group GWG_Failover Sep 7 02:36:25 check_reload_status 412 Reloading filter Sep 7 02:36:25 check_reload_status 412 Restarting OpenVPN tunnels/interfaces Sep 7 02:36:25 check_reload_status 412 Restarting IPsec tunnels Sep 7 02:36:25 check_reload_status 412 updating dyndns GW_Primary Sep 7 02:36:25 rc.gateway_alarm 29639 >>> Gateway alarm: GW_Primary (Addr:9.9.9.9 Alarm:1 RTT:18.495ms RTTsd:1.552ms Loss:40%)
-
Yeah sounds like maybe that modem has an issue. I would not expect it to lose link on the Ethernet like that. Does it really not log anything? I assume it has some sort of UI you have access to?
WISPs can be hard to tune the gateway monitoring for. You do need that and gateway action enabled though if you need to failover.
-
@stephenw10 The Netgear modem does have a simple UI, but no obvious way to view logs. I don't see anything in its manual either. It may require Netgear support to get involved.
-
Does it show uptime? Any way to know if it rebooted?
-
No, I don't see uptime or any other way to tell if its rebooted. I've tried different cables. I was going to try to swap-out the wall wart & put a switch between the router and the modem. Haven't done that yet. In the meantime, I'm corrosponding with the ISP about the nightly outages.
-
@stephenw10 Factory reset modem seems to have cleared up the link issue. ISP claims nightly outage is scheduled service
-