LCP no reply to echo requests
-
Hmm, that does seem like the upstream server really did stop responding then.
How long did you leave it? I assume it failed to recover by itself even after the extended time?
-
@stephenw10 The failure I had at midnight did recover after about 10 mins - so assume that the session timed out and allowed my side to reconnect.
Suspect it fits other patterns of people complaining of intermittent failures - just that pfsense (and/or) I have more evidence of the symptoms?
Thing is it's also been rock solid stable for most of the time I've had it, then it gets these odd wobbles.
Next time I'll grab all the logs of the box and see what else is happening - would something cause the firewall/pfblocker to block ALL incoming traffic for a time??
Paul
-
It could, but not if the state was initiated outbound as it is with the monitoring pings. And that would have no affect on anything at the LCP layer. pf doesn't see those at all.
-
@stephenw10 So been digging through this thread...
Seems that something goes on between pfsense and the ONT which changed between mpd versions - i.e. old works, new not so good. Which I think I read as when the link fails, pfsense doesn't trigger the ONT to close things down so has to wait to reconnect - which it does after about 10 mins (from my midnight failure).
If I can fix that, then at least recovery will be quicker and/or automatic?
Still doesn't explain why it's dying, but I'm still pointing the blame at the ISP end?
Not sure how much is done inside the ONT - assume its mostly equivalent to a fibre-ethernet bridge - i.e. all the PPPoE stuff is on a server somewhere?
-
I wouldn't expect the ONT to be involved there. Or the immediate remote side. If the session incorrectly stays open it would be between pfSense and the remote PPPoE server which can be some way upstream from the fiber.
-
Figured ONT was dumb to the pppoe layer - I assume from reading elsewhere that - this from the PPP logs...
Name: "acc-aln8.sx"
Is likely to be the name of the pppoe server in a data centre somewhere?
Plenty of threads on various forums out there with similar problems - i.e. dropped connections with openreach based FTTP and various vendors kit - and plenty more that go down the mpd5 rabbit hole which suggests that there's a bug somewhere in the freebsd stack mpd5/netgraph/drivers which causes at least the bad shutdown and hence the wait for the far end to timeout.
Very few fixes - but one of the easiest for me to try is to stuff a switch between the pfsense box and the ONT (think is related to bug being in the ethernet card driver (my intel cards not mentioned though!)? Anyway had some free ports on my switch so stuck them into their own VLAN world and will see what happens. Have a few other options because of this to try more extensive packet captures, alternative firewalls, etc.
-
Yes that server is probably at your ISP (or virtual ISP) and not at Openreach directly. And there's a good chance it varies between connections through some sort of load-balancing.
But, yes, try to solve the link failures first. That is certainly exaggerating whatever issues might be present in the PPP layer. -
@stephenw10 agreed - the pfsense piece is frustrating and I suspect other (not bsd based mpd) devices mask the isp behaviour as they reconnect straight away and the isp considers a few reconnects a day as just noise - users resetting, local power outs etc.
Pfsense mpd not reconnecting nicely (seems same across other bsd platforms) is the killer with no obvious fix…. Very frustrating…
-
@thondwe Update: Put a switch in between ONT and pfsense box (Protectli) and touching wood/crossing fingers it's been running fine for 3 days so far.
Possible theories
-
Dodgy plug at pfsense end - seems unlikely for intermittent fault with no packets being dropped?
-
ONT and pfsense has a issue with EEE/802.3az power management? Digging through other threads and some official pfsense documentation it seems disabling EEE for the Intel igb driver cards is recommended. Most drop outs have occurred late evening (bed time for the kids) so possible EEE doing it's thing enough to drop connection? Anyway switch doesn't have EEE so disables EEE for both devices. Can't confirm that EEE is enabled by default for pfsense - should be a command to check but don't know what it is.
Can't find any docs about the ONT (BT Nokia) to confirm it supports EEE but would not be surprised - but SmartHub 6 doesn't seem to have 802.3az as a feature and perhaps it's not that common for it to be enabled/available on home routers?
Waiting to see if things carry on working reliably!
-
-
It could also be that the switch-ONT link is still flapping but pfSense doesn't see that any longer so the problem is no longer causing the same sort of issues.
-
@stephenw10 Guess I need to prove it - assuming I get a few more days without incident, there are some settings to disable EEE which I'll try with the old wiring setup and see if that's stable.
Problem is that I've had many months of stability in the past - so unless it shows an incident pretty quick I may not know.
FYI - seems that this site may describe the innards of the ONTs - it's in a BT Openreach branded box but otherwise all the lights and sockets match.
Paul
-