Dual WAN monitoring failing
-
Hmm, that's the first thing that appears? igb2 link state changes to UP?
I expect to see it go down before that and it might show why at that point.
-
@stephenw10 said in Dual WAN monitoring failing:
I expect to see it go down before that and it might show why at that point.
Updated previous post with additional prior log entries. Link state does go down first
Also, I received several email notifications
1:45:41 MONITOR: WAN2_DHCP has packet loss, omitting from routing group gwgBalanced 1.1.1.1|192.169.XX.146|WAN2_DHCP|24.373ms|8.916ms|58%|down|highloss 1:47:01 MONITOR: WAN2_DHCP is available now, adding to routing group gwgBalanced 1.1.1.1|192.169.XX.146|WAN2_DHCP|19.491ms|3.007ms|0.0%|online|none
Received several of this notifications since resetting the gateway earlier today (wISP in horrible weather), though the gateway status is still reporting Online at the moment.
-
Mmm, Ok so it shows the link on igb2 actually goes down and that's the first thing that happens.
What is that NIC actually connecting to? Is that thing losing link?
-
@stephenw10 NIC is connected to a POE power injector then out to a small dish antenna (wISP is using ubiquite gear). Maybe a flaky ethernet cable? However...
The gateway was in Unknown state again this morning. I see what appears to be the same pattern in the logs, at about the same time. I haven't done a detailed comparision of the logs from each night yet.
Speculating that the wISP is doing some late night / early morning maintenace on theire side which causes a service interuption, but when they come back online my router isn't able to detect (ping).
-
Try putting switch between igb2 and the injector so it never loses link as a test.
Can you see the uptime on the wisp client device?
There is a dhclient race condition you might be hitting here if the upstream device reboots.
Steve
-
@stephenw10
Adding the switch allows the gateway to recover. Not sure yet why, without the switch, dpinger seems to lose its ability to ping the given Monitor IP and therefore doesn't report the gateway back online.There is a bit more complication to my set-up, and therefore, for testing, I need to simplify a few things and retest a bit longer.
-
So with the switch dpinger still shows monitoring failing but does it still show the link going down in the logs?
Try a pcap on the WAN, do you see the ping requests leaving still?
-
@stephenw10
I never did resolve this issue entirely, but a few things have changed. I think the link down issue was from a faulty POE injector for the wISP equipment. I still have that switch in place though.That said, WAN2 (wISP) monitoring is not working at all. WAN2 Gateway status is now constantly "Unknown", with RRT, RTTsd, Loss all reporting "pending". Doesn't seem to matter if the Monitor IP is blank or specified. Resetting the router or WAN2 gateway makes no difference.
I can Diagnostics > Ping using Source WAN2, for eample:
PING 8.8.4.4 (8.8.4.4): 56 data bytes 64 bytes from 8.8.4.4: icmp_seq=0 ttl=58 time=20.031 ms 64 bytes from 8.8.4.4: icmp_seq=1 ttl=58 time=19.862 ms 64 bytes from 8.8.4.4: icmp_seq=2 ttl=58 time=19.858 ms --- 8.8.4.4 ping statistics --- 3 packets transmitted, 3 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 19.858/19.917/20.031/0.081 ms
and Traceroute, except I see that initial hop begins with WAN1 (Source is set for WAN2):
1 10.X.X.1 9.732 ms 9.313 ms 9.577 ms 2 100.41.18.20 11.356 ms 11.587 ms 11.522 ms 3 * * * 4 * * * 5 209.85.245.245 19.006 ms 21.789 ms 18.562 ms 6 142.250.232.97 19.071 ms 20.712 ms 19.280 ms 7 8.8.4.4 18.553 ms 17.874 ms 18.492 ms
and pfcap from WAN2
15:45:23.837132 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:45:24.837138 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:45:29.836226 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:45:30.836189 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:45:31.836247 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:45:32.837259 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:45:33.837309 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:45:34.837255 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:45:34.860494 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: UDP, length 300 15:45:36.864170 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: UDP, length 300 15:45:39.837327 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:45:40.837308 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:45:41.837284 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:45:41.884901 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: UDP, length 300 15:45:42.838374 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:45:43.838329 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:45:44.838368 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:45:48.945150 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: UDP, length 300 15:45:49.836463 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:45:50.352587 IP 169.254.138.31.52198 > 255.255.255.255.10002: UDP, length 177 15:45:50.836438 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:45:51.836421 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:45:52.837437 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:45:53.837447 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:45:54.837443 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:45:59.837509 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:46:00.837537 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:46:01.837524 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:46:02.838573 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46 15:46:03.060206 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: UDP, length 300
-
@Ximulate said in Dual WAN monitoring failing:
15:46:00.837537 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46
15:46:01.837524 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46
15:46:02.838573 ARP, Request who-has 192.169.51.1 tell 192.169.51.63, length 46That's the WAN2 gateway IP I assume? If that's not responding to ARP that's a very broken situation.
-
-
@stephenw10
When this happens, the router GUI locks up & netowk devices start losing network connectivity (DHCP?). I removed the modem outright, so no back-up connection. Then, this happened this morning:
https://forum.netgate.com/topic/178428/php-fatal-error/44
I was able to log into the GUI & catch the PHP log. Typically I can't even log in to the GUI when this happens. Maybe there's more than one thing going on. -
Yes, that seems like more than one thing. The notices error has been seen by users in various configs so I doubt it's directly related. It could be some common cause though if you're seeing php stop responding for example. Hmm.