WAN connection dropping intermittently
-
Hi Stephen,
As you suggested, I ran a packet capture on the WAN interface (not in promiscuous mode) on the ICMP protocol. It looks like this when the WAN goes down:
It seems the packets are sent, but with no response. I also noticed that for some reason it starts pinging a different IP after some time. Not just 8.8.8.8, which is the monitoring IP for dpinger, but also an IP that whois claims belongs to Apple?
I also looked a bit more at the logs for when the Gateway is said to be down,. It seems there are intervals of exactly 20 minutes (or multiples of 20 minutes) if that could signify something:
Thanks!
Alex
-
20 mins sounds like an ARP issue. Check the actual pcap file or change the view type and make sure the MAC address it's sending those to doesn't change.
Those other pings could be from something on the LAN. In a WAN pcap they will have been translated to the WAN address.
The curious thing here is that as I understood it you said that during the outage LAN side clients could still ping 8.8.8.8. Anything upstream should see those identically to the pings from dpinger.
Is that correct?One possibility is that you have one the inconvenient ISPs that seem to forget your MAC address! We have seen a few users hit that and workaround it be setting a lower ARP timeout. However that breaks all traffic.
-
@stephenw10 said in WAN connection dropping intermittently:
20 mins sounds like an ARP issue. Check the actual pcap file or change the view type and make sure the MAC address it's sending those to doesn't change.
The destination MAC address remains unchanged before, during and after the connection drops.
@stephenw10 said in WAN connection dropping intermittently:
The curious thing here is that as I understood it you said that during the outage LAN side clients could still ping 8.8.8.8. Anything upstream should see those identically to the pings from dpinger.
Is that correct?No, when dpinger can't get out, neither can upstream clients. However, other devices placed on the WAN side work.
@stephenw10 said in WAN connection dropping intermittently:
One possibility is that you have one the inconvenient ISPs that seem to forget your MAC address! We have seen a few users hit that and workaround it be setting a lower ARP timeout. However that breaks all traffic.
It's a relatively small ISP and they've been pretty responsive - I could try asking them if I only knew what to ask :) But wouldn't that behaviour from the IPS have the same impact on other devices connected in place of pfsense?
-
Effectively the ISP gateway device loses your WAN from it's ARP table and it doesn't ARP for it. Instead it waits until pfSense renews it's ARP entry for the gateway.
Try setting:
sysctl net.link.ether.inet.max_age=300
That is 1200s by default, 20mins. If that seems to prevent it that confirms it's an ARP issue somewhere.
-
Thanks Stephen!
I've made that update - will revert back either if it continues dropping or in ~24 hours when it definitely would have without this change.
-
Is this your WAN IP :
?
I thought it was a RFC1918 IP.
Using a switch on the WAN side, and pfSense gets this 194.x.x.192 as a WAN IP, then what IP was used by the PC hooked up also to that switch ? How did this PC obtain a 'LAN' IP ? -
That is indeed the WAN IP.
The gateway is on the same subnet (just ending in 3 instead of 192). For the laptop on the WAN side I just grabbed another IP in the same subnet (it's a static IP setup so no DHCP), hoping they hadn't locked it down (which it turns out they hadn't).
Like I wrote a couple of responses above, it's a small ISP :-)
Cheers!
Alex
-
Ok, great, but the IP you auto assigned yourself could be assigned to some one else.
( and now 'ARP' gets confused, and the other person could experience WAN IP outages ... ^^) -
True, so I stopped doing that as soon as I had results from the test :-)
That said, there are only a few (<5) other users on this subnet (which seems accurate when I stare at ARP broadcasts), since almost all apartments have their home networks managed directly by the ISP (sitting behind their firewall and gateway), whereas I'm bypassing that.
-
It's been 24 hours and the network has been stable throughout. Incredibly happy and super grateful for you help Stephen and Gertjan.
Thank you!
Alex
-
Nice! That does imply some ARP issue. You shouldn't really have to do that. But if you do keep that in place you should add it as a system Tunable:
https://docs.netgate.com/pfsense/en/latest/config/advanced-tunables.html