new if_pppoe Backend - getting HA/CARP to work like in MPD
-
@w0w i created a new version 0.1.2 of the package which now checks if the state of an pppoe interface is up (ip address present). If that is true and the desired state of that interface is also up, the script will not reload the interface. so it should not break any legit connection.
regarding your issue with the GUI: I can't confirm the bug. I can add as many interfaces as I want and the GUI stays consistent. I am using the pfSense rowhelper in the GUI, so that is more or less standard functionality. can you give me some more details on when that fails? also which browser are you using and do you have any plugins in use which mangle the html of a page (e.g. adblock)?
-
@perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:
regarding your issue with the GUI: I can't confirm the bug.
I can't replicate it on the new version. Will test it functionality soon, thanks!
-
Tested—no luck. WAN stays Pending and only acquires IPv4; IPv6 never comes up. It seems you can’t just use
pppoe0 up
—you need to run something like:
/usr/local/sbin/pfSctl -c 'interface reload wan'
to bring it up correctly.see Bringing PPPoE and checking it after.
Here’s my script’s logic, see Bringing PPPoE and checking it after.
-
Purpose & scope
- A CARP-aware PPPoE watchdog for pfSense: it tracks the node’s CARP role (MASTER/BACKUP) and reacts by starting/validating or stopping PPPoE, and (on BACKUP) restoring missing VIPs (VHID 5).
- All syslog lines use a uniform prefix
PPPoE_script_message00X:
with ascending IDs.
-
Tunables / constants
LOCKFILE=/var/run/run.sh.lock
— stores PID (line 1) and last known CARP role (line 2).PPPOE_IF=pppoe0
,LAN_IF=lagg2
.- Role detection is by grepping
ifconfig ${LAN_IF}
forMASTER vhid 5
; VIP presence check greps forvhid 5
. PHP_BIN=/usr/local/bin/php
.- Internal flag
PPPOE_ALREADY_STARTED
tracks whether the script has (re)started PPPoE in this run.
-
Singleton guard
- On
start
,check_already_running()
reads the lockfile; if the recorded PID is alive (ps -p
), the script exits to avoid multiple instances.
- On
-
Optional discovery
find_pppoe_info()
grabs the firstpppoeN
interface and its IPv4 address (kept for parity with older versions; not strictly required elsewhere).
-
Main loop (role monitor)
-
start_monitoring()
:-
Logs launch (
001
) and initializesCUR_ROLE
from the lockfile if present. -
Every 30 seconds:
-
Derives
NEW_ROLE
fromifconfig ${LAN_IF}
(MASTER vhid 5
→ MASTER; otherwise BACKUP). -
If role unchanged → continue silently (no log spam).
-
If role changed:
- On MASTER: call
handle_master_carp()
; log002
. - On BACKUP: call
handle_non_master_carp()
; log003
.
- On MASTER: call
-
Update lockfile with current PID and the new role.
-
-
-
-
MASTER path (
handle_master_carp
)- If PPPoE hasn’t been (re)started in this run, call
handle_pppoe_start()
. - Otherwise, log/verify link (
004
) viacheck_pppoe()
.
- If PPPoE hasn’t been (re)started in this run, call
-
BACKUP path (
handle_non_master_carp
)-
Shut PPPoE down if any
pppoeN
exists:- Wait 10s,
ifconfig ${PPPOE_IF} down
, setPPPOE_ALREADY_STARTED=false
, log005
.
- Wait 10s,
-
Ensure VIPs (VHID 5) exist on
LAN_IF
:-
If
vhid 5
is missing, log006
and re-install VIPs by running:php -r 'require_once "/etc/inc/interfaces.inc"; interfaces_vips_configure();'
-
-
-
Bringing PPPoE up (
handle_pppoe_start
)-
Wait 130s to let CARP converge.
-
If no
pppoeN
exists:- Log
007
, runpfSctl -c 'interface reload wan'
, setPPPOE_ALREADY_STARTED=true
.
- Log
-
If
pppoe0
exists and is UP:- Log
008
, do nothing.
- Log
-
If it exists but is not UP:
ifconfig ${PPPOE_IF} up
, log009
.
-
-
Verifying PPPoE (
check_pppoe
)-
Wait 180s (grace period).
-
If no
pppoeN
is present:- Log
010
, trypfSctl -c 'interface reload wan'
. - On success: set
PPPOE_ALREADY_STARTED=true
; on failure log011
and return error.
- Log
-
-
Logging policy
- Routine role polls are quiet; logs emit only when the CARP role flips, plus the specific action logs (
004–011
) triggered by that transition.
- Routine role polls are quiet; logs emit only when the CARP role flips, plus the specific action logs (
-
Entry points
start
: runs singleton check, optional discovery, then the monitoring loop.stop
: placeholder (prints a message; no teardown).- Otherwise: prints usage and exits non-zero.
-
Operational timings (summary)
- Poll interval: 30s.
- MASTER bring-up grace: 130s (CARP settle).
- PPPoE verification grace: 180s.
- BACKUP PPPoE down delay: 10s.
-
Key side effects
- Keeps a persistent record of PID + last role in the lockfile.
- Ensures PPPoE is up and stable on MASTER, down on BACKUP.
- Auto-repairs missing VIPs (VHID 5) on BACKUP via pfSense PHP API.
-
-
@w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:
/usr/local/sbin/pfSctl -c 'interface reload wan'
this is exactly what the script is doing. See GitHub.
Please note, that the IPv6 stuff has nothing to do with my script but seems to be more related to the general if_pppoe troubles
-
@perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:
this is exactly what the script is doing. See GitHub
But there is something else, I don't know… safety timer, maybe. My script almost never fails to get the thing up and running, at least now. I will test it more and will give you more feedback this week, I hope.
-
@w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:
But there is something else, I don't know… safety timer, maybe.
Might be since your script is running every 30secs, so there is some some random delay between the CARP state change and the time your script runs.
In my case the script runs immediately with the CARP change.To test the behavior you could add some sleep in /usr/local/sbin/pppoe_ha_event (the shell script wrapper, not the php) prior to the exec line, e.g.:
#!/bin/sh sleep 5 exec /usr/local/bin/php -q /usr/local/sbin/pppoe_ha_event.php "$@"
let me know if that changes the behavior on your firewall
-
@perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:
In my case the script runs immediately with the CARP change.
And that’s a real problem. CARP can flip/flap on the interface several times per seconds, so you’ll end up running multiple commands and hit a race condition. You need to detect CARP state changes, but you don’t need this overreaction, so you need to check the state a bit later and only when you really detect that final state changed you run your commands. SO I don't think the event sleep timer can solve this problem, it's just getting into the queue.
2025-09-21 14:18:27.478723+03:00 check_reload_status 656 Configuring interface wan 2025-09-21 14:18:27.473171+03:00 pppoe-ha 37352 VHID 5 MASTER - UP wan (pppoe0) 2025-09-21 14:18:27.472332+03:00 pppoe-ha 37352 Handle CARP command for 5@lagg2 - MASTER 2025-09-21 14:18:27.451908+03:00 php-fpm 3406 /rc.carpbackup: HA cluster member "(10.0.90.5@lagg2): (LAN)" has resumed CARP state "BACKUP" for vhid 10 2025-09-21 14:18:27.132013+03:00 check_reload_status 656 Carp master event 2025-09-21 14:18:27.110669+03:00 pppoe-ha 28694 no mappings for VHID 10; ignoring 2025-09-21 14:18:27.110296+03:00 pppoe-ha 28694 Handle CARP command for 10@lagg2 - BACKUP 2025-09-21 14:18:27.073719+03:00 php-fpm 62482 /rc.carpbackup: HA cluster member "(10.0.87.5@ixl3.87): (WIFIAP)" has resumed CARP state "BACKUP" for vhid 9 2025-09-21 14:18:27.073279+03:00 php-fpm 62482 /rc.carpbackup: Suppressing repeat e-mail notification message. 2025-09-21 14:18:26.804464+03:00 check_reload_status 656 Carp backup event 2025-09-21 14:18:26.779787+03:00 pppoe-ha 17333 no mappings for VHID 10; ignoring 2025-09-21 14:18:26.779198+03:00 pppoe-ha 17333 Handle CARP command for 10@lagg2 - INIT 2025-09-21 14:18:26.769846+03:00 php-fpm 3406 /rc.carpbackup: HA cluster member "(10.0.87.5@ixl3.87): (WIFIAP)" has resumed CARP state "BACKUP" for vhid 9 2025-09-21 14:18:26.470904+03:00 php-fpm 51124 /rc.carpbackup: HA cluster member "(10.0.100.155@lagg0): (WAN2)" has resumed CARP state "BACKUP" for vhid 7 2025-09-21 14:18:26.360247+03:00 check_reload_status 656 Carp backup event 2025-09-21 14:18:26.332786+03:00 pppoe-ha 5832 no mappings for VHID 9; ignoring 2025-09-21 14:18:26.332425+03:00 pppoe-ha 5832 Handle CARP command for 9@ixl3.87 - BACKUP 2025-09-21 14:18:26.150680+03:00 php-fpm 3406 /rc.carpbackup: HA cluster member "(10.0.100.155@lagg0): (WAN2)" has resumed CARP state "BACKUP" for vhid 7 2025-09-21 14:18:25.999747+03:00 kernel - carp: 10@lagg2: BACKUP -> MASTER (preempting a slower master) 2025-09-21 14:18:25.999658+03:00 kernel - carp: 7@lagg0: BACKUP -> MASTER (preempting a slower master) 2025-09-21 14:18:25.999518+03:00 kernel - carp: 5@lagg2: BACKUP -> MASTER (preempting a slower master) 2025-09-21 14:18:25.968532+03:00 check_reload_status 656 Carp backup event 2025-09-21 14:18:25.951088+03:00 pppoe-ha 99836 no mappings for VHID 9; ignoring 2025-09-21 14:18:25.950701+03:00 pppoe-ha 99836 Handle CARP command for 9@ixl3.87 - INIT 2025-09-21 14:18:25.857771+03:00 php-fpm 3406 /rc.carpbackup: HA cluster member "(10.0.77.5@lagg2): (LAN)" has resumed CARP state "BACKUP" for vhid 5 2025-09-21 14:18:25.857471+03:00 php-fpm 3406 /rc.carpbackup: Suppressing repeat e-mail notification message. 2025-09-21 14:18:25.701909+03:00 php-fpm 7887 /rc.filter_synchronize: Beginning XMLRPC sync data to https://10.0.88.2:443/xmlrpc.php. 2025-09-21 14:18:25.701214+03:00 php-fpm 7887 /rc.filter_synchronize: XMLRPC versioncheck: 24.1 -- 24.1 2025-09-21 14:18:25.701112+03:00 php-fpm 7887 /rc.filter_synchronize: XMLRPC reload data success with https://10.0.88.2:443/xmlrpc.php (pfsense.host_firmware_version). 2025-09-21 14:18:25.667673+03:00 check_reload_status 656 Carp backup event 2025-09-21 14:18:25.650236+03:00 pppoe-ha 93377 no mappings for VHID 7; ignoring 2025-09-21 14:18:25.649873+03:00 pppoe-ha 93377 Handle CARP command for 7@lagg0 - BACKUP 2025-09-21 14:18:25.545307+03:00 php-fpm 16208 /rc.carpbackup: HA cluster member "(10.0.77.5@lagg2): (LAN)" has resumed CARP state "BACKUP" for vhid 5 2025-09-21 14:18:25.541200+03:00 php-fpm 7887 /rc.filter_synchronize: Beginning XMLRPC sync data to https://10.0.88.2:443/xmlrpc.php. 2025-09-21 14:18:25.368345+03:00 check_reload_status 656 Carp backup event 2025-09-21 14:18:25.351847+03:00 pppoe-ha 92360 no mappings for VHID 7; ignoring 2025-09-21 14:18:25.351495+03:00 pppoe-ha 92360 Handle CARP command for 7@lagg0 - INIT 2025-09-21 14:18:25.076109+03:00 check_reload_status 656 Carp backup event 2025-09-21 14:18:25.046639+03:00 pppoe-ha 89037 VHID 5 BACKUP - DOWN wan (pppoe0) 2025-09-21 14:18:25.045844+03:00 pppoe-ha 89037 Handle CARP command for 5@lagg2 - BACKUP 2025-09-21 14:18:24.772146+03:00 check_reload_status 656 Carp backup event 2025-09-21 14:18:24.743736+03:00 pppoe-ha 86518 VHID 5 INIT - DOWN wan (pppoe0) 2025-09-21 14:18:24.742953+03:00 pppoe-ha 86518 Handle CARP command for 5@lagg2 - INIT 2025-09-21 14:18:24.530618+03:00 kernel - carp: 10@lagg2: INIT -> BACKUP (initialization complete) 2025-09-21 14:18:24.530574+03:00 kernel - carp: 10@lagg2: BACKUP -> INIT (hardware interface up) 2025-09-21 14:18:24.530524+03:00 kernel - carp: 9@ixl3.87: INIT -> BACKUP (initialization complete) 2025-09-21 14:18:24.530480+03:00 kernel - carp: 9@ixl3.87: BACKUP -> INIT (hardware interface up) 2025-09-21 14:18:24.530434+03:00 kernel - carp: 7@lagg0: INIT -> BACKUP (initialization complete) 2025-09-21 14:18:24.530364+03:00 kernel - carp: 7@lagg0: BACKUP -> INIT (hardware interface up) 2025-09-21 14:18:24.530296+03:00 kernel - carp: 5@lagg2: INIT -> BACKUP (initialization complete) 2025-09-21 14:18:24.530178+03:00 kernel - carp: 5@lagg2: BACKUP -> INIT (hardware interface up) 2025-09-21 14:18:24.463858+03:00 check_reload_status 656 Carp backup event 2025-09-21 14:18:24.450717+03:00 check_reload_status 656 Syncing firewall 2025-09-21 14:18:24.322569+03:00 php-fpm 51124 /status_carp.php: Configuration Change: admin@10.0.77.3 (Local Database): Leave CARP maintenance mode 2025-09-21 14:17:52.159134+03:00 pppoe-ha 89953 VHID 5 BACKUP - DOWN wan (pppoe0) 2025-09-21 14:17:52.121279+03:00 pppoe-ha 89953 Reconcile: evaluating 1 mapping(s) 2025-09-21 14:12:34.906579+03:00 php-fpm 18188 /rc.filter_synchronize: XMLRPC reload data success with https://10.0.88.2:443/xmlrpc.php (pfsense.restore_config_section). 2025-09-21 14:12:31.710678+03:00 php-fpm 18188 /rc.filter_synchronize: Beginning XMLRPC sync data to https://10.0.88.2:443/xmlrpc.php. 2025-09-21 14:12:31.710120+03:00 php-fpm 18188 /rc.filter_synchronize: XMLRPC versioncheck: 24.1 -- 24.1 2025-09-21 14:12:31.710030+03:00 php-fpm 18188 /rc.filter_synchronize: XMLRPC reload data success with https://10.0.88.2:443/xmlrpc.php (pfsense.host_firmware_version). 2025-09-21 14:12:31.559469+03:00 php-fpm 18188 /rc.filter_synchronize: Beginning XMLRPC sync data to https://10.0.88.2:443/xmlrpc.php. 2025-09-21 14:12:30.477082+03:00 check_reload_status 656 Syncing firewall
Edit:
I think the next logic will be good for me and anyone else:
On first CARP event for (VHID@iface):
Start a Collect window = 5-10 s.
During these 5 s, just record each new role (MASTER/BACKUP). Always keep only the latest.
After Collect ends:
Start a Silence window = 5-10 s.
If any new event arrives in this window, restart: go back to step 1 (new Collect 5-10 s).
If no events arrive for the whole 5-10 s, we consider the state settled.
Act once on the last recorded role:
If last = MASTER → bring PPPoE up (only if not already up).
If last = BACKUP → bring PPPoE down (only if not already down).
Record the applied role to avoid repeating the same action later.
Safety add-ons (still simple):
Boot grace: skip everything for the first ~150 s after boot.
Demotion guard: if net.inet.carp.demotion > 0, postpone action and re-check later. -
@w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:
And that’s a real problem. CARP can flip/flap on the interface several times per seconds
I don't think this should happen. Normally CARP should be very stable in a working environment. In my case with all firewalls i manage CARP interfaces are never flapping without a reason and the reason only being a failure of some network device inbetween the firewalls or one of the firewalls itself.
From the log you sent with an event each second there seems to be something wrong with your config. On my firewalls I don't see a single CARP event in days or weeks
-
@perrin
This is just switching on maintenance mode on the primary, nothing unusual. -
Hi,
I really appreciate the time you put into this. Thanks for sharing.I have installed the solution. After analyzing the logs it is clear that
- CARP transition detected
- Slave starts PPPoE session successfully at first
- ISP rejects authentication with Too many sessions. ISP is refusing a second PPPoE login because the old session from my master pfSense is still alive
-Slave keeps retrying repeatedly but still no luck
(I even waited for 2-3 minutes).
So the slave's WAN is never up.
How to fix / work around? Add gui option to add a startup delay on the slave, so that when CARP changes, pfSense will wait 20 seconds before starting PPPoE.
MAC spoofing came also to my mind, but ISP can use a variety of signals to track PPPoE sessions:
- PPP username/session state (most important)
- PPPoE/PPPoE session id on their BRAS
- CPE MAC address / modem association
-
@crl
I have experimented with different variants, and I can say that using a delay is not a good solution, as I mentioned earlier, because the firewall status can change during that delay. The logic needs improvement, but I don’t have enough time to work on it right now.
My script version handles this case much better, but it’s slower and not fully synchronized with status changes.The only approach I see is to avoid breaking the connection immediately when the backup status is detected. Instead, register the status, start a time-based trigger that checks the status again before executing and quits if the current status has not changed or proceeds with the action if it is changed based on the first registered status. The same applies to the master: monitor it using a time-based trigger synchronized with the first status change, and quit if the status is unchanged or perform the action and then exit. This sounds simple but it is not, because we need also to ignore status changes after first change is detected and start it again in some time after all things have happened. And this all makes me think that logic becomes too complicated and too much code used to serve this implementation.
-
@crl said in new if_pppoe Backend - getting HA/CARP to work like in MPD:
ISP rejects authentication with Too many sessions. ISP is refusing a second PPPoE login because the old session from my master pfSense is still alive
-Slave keeps retrying repeatedly but still no luck
(I even waited for 2-3 minutes).Hi,
the same applies to my ISP. I also get a denied login at first when the slave comes up. Only in my case the ISP times out the old master session within a few minutes allowing the slave to connect.Whenever the master fails "badly" it is unable to end the session cleanly and will always result in the slave not able to establish a connection for the first amount of time.
@crl said in new if_pppoe Backend - getting HA/CARP to work like in MPD:
So the slave's WAN is never up.
I did not think about this case when designing the plugin cause from my understanding of PPPoE there is something called LCP keepalive which will time out a stale session at the ISP after some time. My ISP does that within seconds. Maybe your ISP has a quite lengthy setting of that timeout.
You could try to set the same MAC address on both firewalls for the PPPoE interface and see if that helps. The session definitely is still in a different state but maybe it helps with your ISP.
The most elegant solution however would be to syncronize the PPPoE session id, configuration values (IP addresses, gateways and so forth) between master and slave and have the slave pick up the current session. But that won't work without patching the if_pppoe itself which might be out of scope...
-
@perrin
How does your HA pair react if you put the master node into maintenance mode via Status → CARP → Enable Persistent Maintenance Mode (or whatever it’s called)? -
@w0w Enabling the Maintenance Mode on the Master raises its skew thus transitioning MASTER to BACKUP. pppoe-ha picks up the backup state an disables the interface accoringly.
Since i don't have a problem moving the PPPoE session, in my case the failover works as expected.
Maybe @crl should try that and see
a) if if_pppoe correctly closes the session on the master prior to disabling the interface and
b) if his backup can correctly establish a new PPPoE session -
Please check it this workaround:
Github Issue - ISP side 'Too many sessions' keeping backup pfsense's WAN downIt solves only one use case:
-OK: enter and leave carp maintenance mode on manual trigger-Solution requested: if a wan cable is pulled (between the wan switch and any of the pfsense devices) or if the pfsense machine is down:
perform MASTER --> BACKUP transition and connect pppoe on the BACKUP. Should the MASTER come back again, it shall take back the MASTER role and pppoe-reconnect on the MASTER. -
I tried to summarize what is going on during the switchover experiments. This is one example.
-
-
@w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:
@crl, @perrin do you both have dual stack pppoe?
In my case yes, dual stack v4 and V6@crl said in new if_pppoe Backend - getting HA/CARP to work like in MPD:
I tried to summarize what is going on during the switchover experiments. This is one example.
Some of these issues might be related to configuration and or default behavior of pfSense (e.g. when pppoe fails and you're expecting a carp switch.)
Do these things work as expected when you are using the old time based scripts? -
Yes, in my setup things work somewhat differently, as you noticed. There are at least a few reasons. Most importantly, every time PPPoE comes up, the VIPs get reconfigured and CARP reinitializes. I suspect this behavior is related to IPv6 and the fact that the LAN uses the Track Interface option to obtain its IPv6 address, but I’m not certain. I’m currently trying to track down the root cause—or perhaps it’s an “incompatible” configuration.
How does this behave on your side? As I understand it, bringing up PPPoE does not trigger VIP reconfiguration/CARP initialization for you, right?