pfSense dropping PPPoE following update from old version
-
I recently discovered a pfSense mini-PC at a client's premises that had not been updated since 2.4.5-p1. I updated it via the GUI to 2.6.0, then 2.7.0. Upgrading to 2.7.2 required me to resize the EFI boot partition, but this appeared to go smoothly. However, since the upgrade the pfSense has been dropping its PPPoE connection periodically during business hours. The logs show a simple signal term, followed by several reconnection attempts, eventually succeeding after 15+ attempts. This of course causes considerable disruption for the client.
A simple VDSL bridged modem sits in front of the pfSense. The ISP uses standard North American settings; leaving PPPoE settings at default/auto-negotiate has always worked here. I have tried disabling gateway monitoring, toggling multilink support, disabling bogon blocking, and basic troubleshooting ie staged powercycles etc. It will behave overnight as I am working on it, but disconnect again when the client starts utilizing the connection in the morning. There are two LAN NICs in addition to the WAN.
The ppp.log of the entire disconnect/retry/reconnect sequence can be found here: https://pastebin.com/EdVeYnCW
Thanks for reading, any suggestions or guidance would be appreciated. -
Got the same issue with BELL pppoe in Canada since 2.7.2 version. I still haven't try to bypass the Bell Sagemcom router 2000 set as a bridge. Next is to try on VL35 without the bridge but Bell don't like it as they can't give us support. Sorry for my english...
-
@mertch well now that's interesting, because it's Bell PPPoE I'm dealing with here as well. Bell of course says nothing is wrong, but then they'd probably still say that if their CO was on fire.
I noticed at some point during the reconnect sequence there appears an error about parameter negotiation failing, so I tried disabling the PPPoE settings that are not needed, leaving those that appear to successfully negotiate in the PPP logs. I also set the MTU and MRU (to 1492). Unsurprisingly, this has done nothing to fix the issue.
I have found I'm able to cause a disconnect by running iperf tests, maxing out the available bandwidth. However, it's not clear if this is replicating the same issue or simply causing a different one (flooding the connection can cause keepalive packets to be lost or arrive late).
The saga continues...
-
@fauxpaw
What NICs are used on pfSense for modem connection? Which vendor? This load-dependent disconnection can be caused by broken drivers, and it's a known problem, especially for Realtek. -
Anyone get anywhere on this? I think 2.6 and 2.7.2 have this and it's killing me, I need Wireguard and have some clients on PPPoE.
Too bad I have dozens of pfsense but if all those on pppoe have this random fail issue with no comment back, I guess Opensense?
-
@msmith9xr4 On my side, it was the HUB in bridge mode that was causing the issue. Have to plug to the ONT and the pfsense wan INT on BELL vlan. No more issue since then. If someone have an issue with HUB 4K from BELL, the ONT is builtin. Those hub are faulty. BELL is coming with a solution but if you don't want to wait, you need to mascarade the HUB with an XGSPON and a community firmware. I've found it on discord.
-
@w0w said in pfSense dropping PPPoE following update from old version:
What NICs are used on pfSense for modem connection? Which vendor? This load-dependent disconnection can be caused by broken drivers, and it's a known problem, especially for Realtek.
While I appreciate the suggestion regarding NIC vendors and potential driver issues, it seems tangential to the problem at hand. The system operated reliably on version 2.4.5-p1, which indicates that the hardware, drivers, and configuration were adequate under that version. Introducing hardware or driver compatibility as the primary suspect without addressing the core changes introduced between versions 2.4.5-p1 and 2.7.2 is unconvincing.
The issue appears to stem from software-level changes, possibly in how the PPPoE stack or network interface interacts under load in the newer versions. A more constructive approach might involve examining changelogs for relevant updates to PPPoE handling or driver changes between these versions and correlating them with the behavior observed in the logs.
Focusing on hardware compatibility is unlikely to yield answers when the core issue lies in behaviour introduced by software updates.
-
This was too disruptive to the client to troubleshoot this further. We sent someone out to do a clean install of 2.7, then load the config back in. No issues since.