PPPoE WAN stability issues with BT
-
BT's PPPoE gateways are often private IPs that don't respond to ping. Mine is here and yours look like that too in the log.
What sort of latency/packet loss are you seeing against 1.1.1.1?
If you only have one gateway you can disable gateway monitoring action to prevent unnecessary churn.
Steve
-
Its odd as I was able to use the default gateway monitor IP for a long time and it worked until a few months ago.
As for packet loss and latency, it varies wildly, for example right now I'm seeing 10ms with 0% packet loss but earlier today I has 20-30ms with 15% packet loss.
Reckon its possible that gateway monitoring has something to do with the issues?
To give an ideal of how wildly different things can get here is an image from my grafana speedtest DB from the last 2 days.
Spencer
-
Mmm, that doesn't look too bad. Some spikes like that when the line is saturated are to be expected.
But if you saw loss hit the 20% default threshold then the gateway would be marked down and the gateway action scripts fired. If you only have one gateway that only cause unnecessary disruption since there's no failover gateway.My PPPoE gateway also responded fine to ping for years but stopped some time ago now.
You might try using 8.8.8.8 instead just to see if there's any difference.Steve
-
Just to add an update to this, the stability has become even worse now.
This is the last 7 days, im starting to see a pattern of downtime in the mornings, this graph doesnt show it very well. At around 6/7 in the morning the WAN will drop out and fail to recover until I manually restart the modem or reset the WAN interface on the pfSense box, I find it a little odd that it doesnt recover automatically.I have also noticed my sync speed dropping consistently day in day.
This is the VDSL Status from the Vigor 130, doesnt seem to have any errors but does seem to have a high SNR Margin, as far as I was aware that should be around 6dB? Those results are also directly from the test socket.
I also set the Vigor to use MPoA instead of PPPoA, that didnt help either.
-
I would also expect it to recover. Do you just see it continually trying to connect in the PPP logs? Any error shown or is it just not seeing any responses?
Steve
-
@stephenw10 It seems to be attempting up to 20 times, then seeming like its about to success and then failing, excuse the massive logs but here is what I had from this morning for over an hour
https://pastebin.com/6cesuZyK
I would paste it inline but its 3500 log entries and thats after I filtered it lmao.
-
Ok I didn't read all of that but...
In fact it is reconnecting repeatedly and then disconnecting in a loop. It's not a PPP issue.
Did you disable gateway monitoring action? (assuming you only have one gateway).
Have you set 'State killing on gateway failure'?
Those things in combination could produce what you're seeing and neither is useful in a one WAN setup.
Steve
-
@stephenw10
I have now disabled gateway monitor action and I have left state killing on gateway failure as default off.I have also now disable the disabled the WAN_DHCP6 gateway as it was offline anyway.
So by the sounds of it my connection being unstable and triggering the gateway reset due to packet loss is also triggering another error state which gets stuck in a loop.
Spencer
-
Yeah, that's what the logs look like. At least initially.
See what difference that makes.Though I have that exact same setup and it reconnects without issue.
Steve
-
@stephenw10
Do you reckon its worth contacting BT for a DLM reset to see if that sorts the sync speed issues? Or just wait for the DLM profile to raise the sync speeds over the next week by itself. -
Yeah, if it's been bouncing the actual line it will be slow for a while but should improve over time. But that shouldn't be unless the modem was actually rebooting.
-
@stephenw10 Just an update on this, didnt need to call BT for a DLM reset it automatically jumped back up this morning. Allot of the line instability was being caused by a device on my network saturating the connection in the mornings, that lead to that packet loss alarm and the WAN interface reset issue. The gateway monitor action disabled now and I have put a HFSC traffic shaper in place to stop anything saturating the WAN interface to the point of extreme packet loss.
All a little odd to me still given I have never had to implement a traffic shaper or disable the gateway monitor before to have stability even under saturation but hey everything seem to work now.
Many thanks for the help, I'd buy you a beer if I could.
-
Good result.
As an alternative you can just tune the monitoring settings to better match your line. Some WANs have far higher latency under load.
You might also try an FQ_CODEL setup instead of HFSC.Steve