LAN Devices occasionally fail to load WAN data
-
It's a vague, and overly asked question, but I had to come here because I don't know what else to do.
Every ~24 hours or so, a device will not load something from the WAN/internet (TV, laptops, IoTs, wired, and wireless).
- I'm able to reach my netgate 4100
- I performed a factory reset and I'm running version 22.05-RELEASE
- I originally had 2 interfaces bridged, but I switched it to a single interface/port without any luck.
- I thought it might have something to do with DHCP, but it isn't because there isn't any issues in the logs and devices are assigned IPs or using static IPs.
- I also thought it might have something to do with DNS, but nslookup resolves WAN IP's.
- I can even ping google.com and fast.com on an affected device (edit: ping, but visiting won't work).
- I had pfBlockerNG enabled with a basic blocklist, but I also disabled that (issue still happens).
So basically, I have connectivity outside of getting actual "data"
Whats unusual is the problem exists for the device after re-joining the network (unplug/re-plug cables, disconnect/reconnect to AP). I can "turn it off and back on again" and still have the issue.
What can I do to further investigate?
-
Check your gateway status - if it reads offline, or the Gateway Logs read errors, it is probably a gateway monitor IP issue.
System->Routing -->> Edit icon for your WAN and set Gateway Monitoring IP to something public on the internet that will respond to a ping, typically people use either Google or CloudFlare's DNS servers as they are fast and will always reply to pings.
-
This post is deleted! -
I do see some activity that might be of interest in the gateway:
Aug 30 18:40:14 dpinger 779 send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 157.x.x.1 bind_addr 157.x.x.16 identifier "WAN_DHCP " Aug 30 18:41:09 dpinger 779 exiting on signal 15 Aug 30 18:41:09 dpinger 76936 send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 157.x.x.1 bind_addr 157.x.x.16 identifier "WAN_DHCP " Aug 30 18:41:09 dpinger 77237 send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 10.0.0.1 bind_addr 10.0.0.150 identifier "LAN4_DHCP " Aug 30 18:42:29 dpinger 77237 exiting on signal 15 Aug 30 18:42:29 dpinger 76936 exiting on signal 15 Aug 30 18:42:29 dpinger 63946 send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 157.x.x.1 bind_addr 157.x.x.16 identifier "WAN_DHCP "
I also added 8.8.8.8 (GoogleDNS) as a monitor IP. This IP is different from the other two DNS Servers (1.1.1.1 and 9.9.9.9).
-
@mike-3 Those are normal things in the gateway log -- something like this would be what I'd expect to see:
Aug 23 17:47:41 dpinger 78332 WAN_DHCP 8.8.4.4: Clear latency 15748us stddev 17765us loss 10%
But a much higher loss % than that. Longer latency that 15ms, too.
-
When it fails what do you actually see? Some error or it just times out?
Try to check the state table in pfSense for the failing connection when it happens. Do you see the expected state on LAN and the NAT'd state on WAN? Is there two way traffic on each state?
@mike-3 said in LAN Devices occasionally fail to load WAN data:
I can "turn it off and back on again" and still have the issue.
You mean pfSense or the client device there?
How do you recover access to the device when this happens? It just returns eventually?
Steve
-
Whats interesting is it's been over 10 hours and the device (it's single device at the moment) still can't connect. I tried disconnecting and reconnecting it via wifi, and DHCP gave the device an IP.
I'm not familiar with the state table, but by the name this sounds like it could be the issue.
Also, I don't know anything about the states or this table.
-
@stephenw10 said in LAN Devices occasionally fail to load WAN data:
You mean pfSense or the client device there?
The client device, a laptop and its a timeout
-
The state table can be seen in the pfSense gui in Diag > States.
When you connect out from a client you should see states opened on the LAN and WAN and the WAN state must have NAT. Something like:
They you can see my client pinging 1.1.1.1 and it is responding. You can see 5 packets IN and OUT on both interfaces.
If you try to connect to, for example, www.pfsense.org, you will see a bunch of states opened but they should all shows as two way traffic like:
If that's not happening then connections will fail.
Steve
-
Ok, this definitely sound like the place to look to find the cause (I've only researched at this point).
-
So I looked at the state table, and I couldn't see anything odd. I ran pfctl -sr and found
block drop in log inet6 all label "Default deny rule IPv6" ridentifier 1000000105
and in my firewall log, there are repeating errors that look like:
. . .
Sep 1 01:23:45 LAN1 Default deny rule IPv6 (1000000105)
. . .I found that looking at this
-
It's common to see blocked IPv6 traffic in networks that are not actively using it. Especially on an interface other than LAN. If you didn't add pass rules for IPv6 on LAN1 you will almost certainly see blocked traffic from hosts there probing for v6 servers. Usually using un-routable link-local addresses.
We would need to see the logs to know for sure.
Steve
-
I'm still having this issue. Sometimes, visiting sites or using applications (like zoom and teams) still starts to timeout. Hitting refresh can avoid it. Other devices continue to lose connectivity altogether (expect they can nslookup and ping external sites)
I found this in my gateway logs:
Aug 1 20:36:56 dpinger 85160 WAN_DHCP 157.x.x.37: sendto error: 50
Aug 1 20:36:57 dpinger 85160 WAN_DHCP 157.x.x.37: sendto error: 65
Aug 1 20:36:57 dpinger 85160 WAN_DHCP 157.x.x.37: sendto error: 65
Aug 1 20:36:58 dpinger 85160 WAN_DHCP 157.x.x.37: sendto error: 65
Aug 1 20:36:58 dpinger 85160 WAN_DHCP 157.x.x.37: sendto error: 65
Aug 1 20:36:58 dpinger 85160 exiting on signal 15
Aug 1 22:41:32 dpinger 86740 send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 10.0.0.1 bind_addr 10.0.0.86 identifier "WAN_DHCP "
Aug 1 22:41:37 dpinger 86740 exiting on signal 15
Aug 1 22:41:37 dpinger 9473 send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 10.0.0.1 bind_addr 10.0.0.86 identifier "WAN_DHCP "
Aug 1 22:44:07 dpinger 9473 WAN_DHCP 10.0.0.1: sendto error: 50
Aug 1 22:44:08 dpinger 9473 WAN_DHCP 10.0.0.1: sendto error: 50
Aug 1 22:44:08 dpinger 9473 WAN_DHCP 10.0.0.1: sendto error: 50
Aug 1 22:44:09 dpinger 9473 WAN_DHCP 10.0.0.1: sendto error: 50
Aug 1 22:44:09 dpinger 9473 WAN_DHCP 10.0.0.1: sendto error: 65
Aug 1 22:44:10 dpinger 9473 WAN_DHCP 10.0.0.1: sendto error: 65
Aug 1 22:44:10 dpinger 9473 WAN_DHCP 10.0.0.1: sendto error: 65
Aug 1 22:44:11 dpinger 9473 WAN_DHCP 10.0.0.1: sendto error: 65I'm not that good at this stuff.
Edit. I'm still investigating this else where. I thought I would post it here.
-
I was able to resolve this issue while researching some of the error codes above. I cannot specifically comment on the exact solution.