runaway delay average and std. dev. on WAN
-
Hmm, seems like it might have rebooted? Unable to allocate local link info like that generally means pfSense doesn't have a IP in that subnet. So like it lost it's DHCP lease or the WAN went down.
Though I'd expect to see some monitoring ping failures if that was the case.
-
Aug 11 16:47:17 kernel arpresolve: can't allocate llinfo for 192.168.1.254 on mvneta2 Aug 11 16:47:20 php-fpm 836 /rc.newwanip: Removing static route for monitor [FIRST HOP] and adding a new route through [WAN GATEWAY] Aug 11 16:47:21 php-fpm 836 /rc.newwanip: Gateway, NONE AVAILABLE Aug 11 16:47:22 php-fpm 836 /rc.newwanip: Gateway, NONE AVAILABLE Aug 11 16:47:22 php-fpm 836 /rc.newwanip: IP Address has changed, killing states on former IP Address 0.0.0.0.
also
Aug 11 16:49:41 php-fpm 836 /rc.newwanip: Netgate pfSense Plus package system has detected an IP change or dynamic WAN reconnection - 0.0.0.0 -> [WAN IP] - Restarting packages.
does this look like the WAN went down and was recovered?
EDIT: extra info, the RG renews the DHCP lease for the pfsense appliance every 24 hours
-
Does the DHCP log show anything for the dhclient at that time?
It should renew without any interruption but clearly it lost an IP entirely at one point.
-
well, my DHCP log is flooded with hundreds of dhcpd entries so the log only goes back to the last hour. Most are DHCPREQUESTs and DHCPACK for LAN devices and their MAC addresses. also dhcp lease renew and ipv6 advertise address entries.
there are no entries for dhclient
-
You can filter that for the dhclient process:
-
@stephenw10 thanks! attached are the dhclient logs (forum flagged the pasted logs as spam...)
dhcplogs.txt -
Hmm, well the only thing there is that at that point the logs show it pulled a private IP:
Aug 11 16:45:02 dhclient 34520 bound to 192.168.1.64 -- renewal in 15 seconds.
That is usually a sign that the mode lost it's upstream connection and started handing out IPs itself. So if that did happen here that implies the line issues were reset by that upstream link reset/resync.
-
@stephenw10 can you please clarify, do you mean it's a sign that the RG lost its upstream connection?
and by line issues, do you mean att -> my house ONT, ONT -> RG, or RG -> pfsense?
if the RG is handing out IPs itself, does that create a problem? (i believe it's possible for me to disable DHCP server in the RG if that could be the source of the issues...) it hasn't handed out any IPs except passing the WAN IP to pfsense.
-
I mean something upstream of the AT&T router/gateway. Those usually only hand out private IPs themselves when they can't connect to the upstream server.
-
@stephenw10 thanks, sorry do you consider upstream in the direction of the ONT or in the direction of my pfsense firewall
-
Yes sorry in the direction of the ONT.
-
@stephenw10 below is my gateway monitoring and i'm definitely experiencing client-side performance issues as of today (random stuff like Amazon loading, Twitch loading, Youtube thumbnails delayed load, etc.)
pings to google and cloudflare as well as facebook, google, twitter all idle around 20+ms (higher than normal) but often spike to 60+ ms.
Going to pull the ethernet from 3100 -> RG, monitor, and update the thread.
-
pulling ethernet from:
- 3100 -> RG: no effect
- RG -> ONT: no effect
pulling power from ONT: no effect
restart RG via the web UI: appears to reset the issuealso definitely had gateway monitoring alarms leading up to this morning.
-
Hmm, well that definitely seems like an issue in the RG then.
-
@stephenw10 seems that way, however i replaced the RG earlier this year and that did not solve the issue.
-
Software issue in the RG firmware then maybe.
Just to confirm you said rebooting the 3100 made no difference? Only rebooting the RG fixes the issue?
-
@stephenw10 i spoke too soon.
restarting the RG did not reset the issue this time. but...
I previously set the system tunable net.isr.dispatch to 'deferred' and removed that 40min ago. it appears that this change may have improved the issue? i'll monitor and report back...
-
An update for anyone who may be experiencing this issue.
This issue is caused by ATT's RG firmware. The latency spikes and jitter are resolved on the BGW320-505 as of firmware 6.30.5.
This issue was somewhat widely discussed at /r/ATTFiber. Shame on ATT for taking 8+ months to release a firmware which fixed it. And I was only able to get the firmware update by working with a redditor who had a high-level engineering contact at ATT, who was able to MANUALLY push the firmware update to my device. Who knows when it would have rolled out to me...
Thank you to @stephenw10 for the help along the way.