new if_pppoe Backend - getting HA/CARP to work like in MPD
-
@w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:
I am not sure is it really necessary to break the connection at all?
what is happening on a reconcile is that all interfaces are forced in the state they should be in. For the UP-command the script currently does a 'interface reload' (/usr/local/sbin/pfSctl -c 'interface reload <interface>'), whereas for the down it does a ifconfig <interface> down
It showed up in my tests that using 'ifconfig <interface> up' as a UP-command would not reliably connect the pppoe interface in every case. That is why i opted in to use the reload command. The downside of this is in fact that it breaks the connection. I could add a check to see if the interface is already connected before reloading it to fix this behaviour in reconcile. I might be adding that.
@w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:
Overall, I can’t say it’s stable for me—I’m not sure why. I also need to fix another bug. It looks like something is preventing the VIPs from starting when the firewall boots.
great to hear that is is stable! Regarding the VIPs not coming up: I have no idea where this is coming from, probably has nothing to do with my script. On both of my firewalls the VIPs come up as expected. But i am only using VIPs as part of CARP.
-
@perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:
great to hear that is is stable!
Unfortunately I have to clarify: no — your variant does not work stable on my systems. The bug described above on an already loaded system with PPPoE up seems to start with initialization of your script, and the link falls into packet loss; I don’t even need to press the button. Also please look at the “Enable” checkbox, try to add several interfaces. For me it sometimes disappears at all, sometimes moves around.
-
@w0w i created a new version 0.1.2 of the package which now checks if the state of an pppoe interface is up (ip address present). If that is true and the desired state of that interface is also up, the script will not reload the interface. so it should not break any legit connection.
regarding your issue with the GUI: I can't confirm the bug. I can add as many interfaces as I want and the GUI stays consistent. I am using the pfSense rowhelper in the GUI, so that is more or less standard functionality. can you give me some more details on when that fails? also which browser are you using and do you have any plugins in use which mangle the html of a page (e.g. adblock)?
-
@perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:
regarding your issue with the GUI: I can't confirm the bug.
I can't replicate it on the new version. Will test it functionality soon, thanks!
-
Tested—no luck. WAN stays Pending and only acquires IPv4; IPv6 never comes up. It seems you can’t just use
pppoe0 up
—you need to run something like:
/usr/local/sbin/pfSctl -c 'interface reload wan'
to bring it up correctly.see Bringing PPPoE and checking it after.
Here’s my script’s logic, see Bringing PPPoE and checking it after.
-
Purpose & scope
- A CARP-aware PPPoE watchdog for pfSense: it tracks the node’s CARP role (MASTER/BACKUP) and reacts by starting/validating or stopping PPPoE, and (on BACKUP) restoring missing VIPs (VHID 5).
- All syslog lines use a uniform prefix
PPPoE_script_message00X:
with ascending IDs.
-
Tunables / constants
LOCKFILE=/var/run/run.sh.lock
— stores PID (line 1) and last known CARP role (line 2).PPPOE_IF=pppoe0
,LAN_IF=lagg2
.- Role detection is by grepping
ifconfig ${LAN_IF}
forMASTER vhid 5
; VIP presence check greps forvhid 5
. PHP_BIN=/usr/local/bin/php
.- Internal flag
PPPOE_ALREADY_STARTED
tracks whether the script has (re)started PPPoE in this run.
-
Singleton guard
- On
start
,check_already_running()
reads the lockfile; if the recorded PID is alive (ps -p
), the script exits to avoid multiple instances.
- On
-
Optional discovery
find_pppoe_info()
grabs the firstpppoeN
interface and its IPv4 address (kept for parity with older versions; not strictly required elsewhere).
-
Main loop (role monitor)
-
start_monitoring()
:-
Logs launch (
001
) and initializesCUR_ROLE
from the lockfile if present. -
Every 30 seconds:
-
Derives
NEW_ROLE
fromifconfig ${LAN_IF}
(MASTER vhid 5
→ MASTER; otherwise BACKUP). -
If role unchanged → continue silently (no log spam).
-
If role changed:
- On MASTER: call
handle_master_carp()
; log002
. - On BACKUP: call
handle_non_master_carp()
; log003
.
- On MASTER: call
-
Update lockfile with current PID and the new role.
-
-
-
-
MASTER path (
handle_master_carp
)- If PPPoE hasn’t been (re)started in this run, call
handle_pppoe_start()
. - Otherwise, log/verify link (
004
) viacheck_pppoe()
.
- If PPPoE hasn’t been (re)started in this run, call
-
BACKUP path (
handle_non_master_carp
)-
Shut PPPoE down if any
pppoeN
exists:- Wait 10s,
ifconfig ${PPPOE_IF} down
, setPPPOE_ALREADY_STARTED=false
, log005
.
- Wait 10s,
-
Ensure VIPs (VHID 5) exist on
LAN_IF
:-
If
vhid 5
is missing, log006
and re-install VIPs by running:php -r 'require_once "/etc/inc/interfaces.inc"; interfaces_vips_configure();'
-
-
-
Bringing PPPoE up (
handle_pppoe_start
)-
Wait 130s to let CARP converge.
-
If no
pppoeN
exists:- Log
007
, runpfSctl -c 'interface reload wan'
, setPPPOE_ALREADY_STARTED=true
.
- Log
-
If
pppoe0
exists and is UP:- Log
008
, do nothing.
- Log
-
If it exists but is not UP:
ifconfig ${PPPOE_IF} up
, log009
.
-
-
Verifying PPPoE (
check_pppoe
)-
Wait 180s (grace period).
-
If no
pppoeN
is present:- Log
010
, trypfSctl -c 'interface reload wan'
. - On success: set
PPPOE_ALREADY_STARTED=true
; on failure log011
and return error.
- Log
-
-
Logging policy
- Routine role polls are quiet; logs emit only when the CARP role flips, plus the specific action logs (
004–011
) triggered by that transition.
- Routine role polls are quiet; logs emit only when the CARP role flips, plus the specific action logs (
-
Entry points
start
: runs singleton check, optional discovery, then the monitoring loop.stop
: placeholder (prints a message; no teardown).- Otherwise: prints usage and exits non-zero.
-
Operational timings (summary)
- Poll interval: 30s.
- MASTER bring-up grace: 130s (CARP settle).
- PPPoE verification grace: 180s.
- BACKUP PPPoE down delay: 10s.
-
Key side effects
- Keeps a persistent record of PID + last role in the lockfile.
- Ensures PPPoE is up and stable on MASTER, down on BACKUP.
- Auto-repairs missing VIPs (VHID 5) on BACKUP via pfSense PHP API.
-
-
@w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:
/usr/local/sbin/pfSctl -c 'interface reload wan'
this is exactly what the script is doing. See GitHub.
Please note, that the IPv6 stuff has nothing to do with my script but seems to be more related to the general if_pppoe troubles
-
@perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:
this is exactly what the script is doing. See GitHub
But there is something else, I don't know… safety timer, maybe. My script almost never fails to get the thing up and running, at least now. I will test it more and will give you more feedback this week, I hope.
-
@w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:
But there is something else, I don't know… safety timer, maybe.
Might be since your script is running every 30secs, so there is some some random delay between the CARP state change and the time your script runs.
In my case the script runs immediately with the CARP change.To test the behavior you could add some sleep in /usr/local/sbin/pppoe_ha_event (the shell script wrapper, not the php) prior to the exec line, e.g.:
#!/bin/sh sleep 5 exec /usr/local/bin/php -q /usr/local/sbin/pppoe_ha_event.php "$@"
let me know if that changes the behavior on your firewall
-
@perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:
In my case the script runs immediately with the CARP change.
And that’s a real problem. CARP can flip/flap on the interface several times per seconds, so you’ll end up running multiple commands and hit a race condition. You need to detect CARP state changes, but you don’t need this overreaction, so you need to check the state a bit later and only when you really detect that final state changed you run your commands. SO I don't think the event sleep timer can solve this problem, it's just getting into the queue.
2025-09-21 14:18:27.478723+03:00 check_reload_status 656 Configuring interface wan 2025-09-21 14:18:27.473171+03:00 pppoe-ha 37352 VHID 5 MASTER - UP wan (pppoe0) 2025-09-21 14:18:27.472332+03:00 pppoe-ha 37352 Handle CARP command for 5@lagg2 - MASTER 2025-09-21 14:18:27.451908+03:00 php-fpm 3406 /rc.carpbackup: HA cluster member "(10.0.90.5@lagg2): (LAN)" has resumed CARP state "BACKUP" for vhid 10 2025-09-21 14:18:27.132013+03:00 check_reload_status 656 Carp master event 2025-09-21 14:18:27.110669+03:00 pppoe-ha 28694 no mappings for VHID 10; ignoring 2025-09-21 14:18:27.110296+03:00 pppoe-ha 28694 Handle CARP command for 10@lagg2 - BACKUP 2025-09-21 14:18:27.073719+03:00 php-fpm 62482 /rc.carpbackup: HA cluster member "(10.0.87.5@ixl3.87): (WIFIAP)" has resumed CARP state "BACKUP" for vhid 9 2025-09-21 14:18:27.073279+03:00 php-fpm 62482 /rc.carpbackup: Suppressing repeat e-mail notification message. 2025-09-21 14:18:26.804464+03:00 check_reload_status 656 Carp backup event 2025-09-21 14:18:26.779787+03:00 pppoe-ha 17333 no mappings for VHID 10; ignoring 2025-09-21 14:18:26.779198+03:00 pppoe-ha 17333 Handle CARP command for 10@lagg2 - INIT 2025-09-21 14:18:26.769846+03:00 php-fpm 3406 /rc.carpbackup: HA cluster member "(10.0.87.5@ixl3.87): (WIFIAP)" has resumed CARP state "BACKUP" for vhid 9 2025-09-21 14:18:26.470904+03:00 php-fpm 51124 /rc.carpbackup: HA cluster member "(10.0.100.155@lagg0): (WAN2)" has resumed CARP state "BACKUP" for vhid 7 2025-09-21 14:18:26.360247+03:00 check_reload_status 656 Carp backup event 2025-09-21 14:18:26.332786+03:00 pppoe-ha 5832 no mappings for VHID 9; ignoring 2025-09-21 14:18:26.332425+03:00 pppoe-ha 5832 Handle CARP command for 9@ixl3.87 - BACKUP 2025-09-21 14:18:26.150680+03:00 php-fpm 3406 /rc.carpbackup: HA cluster member "(10.0.100.155@lagg0): (WAN2)" has resumed CARP state "BACKUP" for vhid 7 2025-09-21 14:18:25.999747+03:00 kernel - carp: 10@lagg2: BACKUP -> MASTER (preempting a slower master) 2025-09-21 14:18:25.999658+03:00 kernel - carp: 7@lagg0: BACKUP -> MASTER (preempting a slower master) 2025-09-21 14:18:25.999518+03:00 kernel - carp: 5@lagg2: BACKUP -> MASTER (preempting a slower master) 2025-09-21 14:18:25.968532+03:00 check_reload_status 656 Carp backup event 2025-09-21 14:18:25.951088+03:00 pppoe-ha 99836 no mappings for VHID 9; ignoring 2025-09-21 14:18:25.950701+03:00 pppoe-ha 99836 Handle CARP command for 9@ixl3.87 - INIT 2025-09-21 14:18:25.857771+03:00 php-fpm 3406 /rc.carpbackup: HA cluster member "(10.0.77.5@lagg2): (LAN)" has resumed CARP state "BACKUP" for vhid 5 2025-09-21 14:18:25.857471+03:00 php-fpm 3406 /rc.carpbackup: Suppressing repeat e-mail notification message. 2025-09-21 14:18:25.701909+03:00 php-fpm 7887 /rc.filter_synchronize: Beginning XMLRPC sync data to https://10.0.88.2:443/xmlrpc.php. 2025-09-21 14:18:25.701214+03:00 php-fpm 7887 /rc.filter_synchronize: XMLRPC versioncheck: 24.1 -- 24.1 2025-09-21 14:18:25.701112+03:00 php-fpm 7887 /rc.filter_synchronize: XMLRPC reload data success with https://10.0.88.2:443/xmlrpc.php (pfsense.host_firmware_version). 2025-09-21 14:18:25.667673+03:00 check_reload_status 656 Carp backup event 2025-09-21 14:18:25.650236+03:00 pppoe-ha 93377 no mappings for VHID 7; ignoring 2025-09-21 14:18:25.649873+03:00 pppoe-ha 93377 Handle CARP command for 7@lagg0 - BACKUP 2025-09-21 14:18:25.545307+03:00 php-fpm 16208 /rc.carpbackup: HA cluster member "(10.0.77.5@lagg2): (LAN)" has resumed CARP state "BACKUP" for vhid 5 2025-09-21 14:18:25.541200+03:00 php-fpm 7887 /rc.filter_synchronize: Beginning XMLRPC sync data to https://10.0.88.2:443/xmlrpc.php. 2025-09-21 14:18:25.368345+03:00 check_reload_status 656 Carp backup event 2025-09-21 14:18:25.351847+03:00 pppoe-ha 92360 no mappings for VHID 7; ignoring 2025-09-21 14:18:25.351495+03:00 pppoe-ha 92360 Handle CARP command for 7@lagg0 - INIT 2025-09-21 14:18:25.076109+03:00 check_reload_status 656 Carp backup event 2025-09-21 14:18:25.046639+03:00 pppoe-ha 89037 VHID 5 BACKUP - DOWN wan (pppoe0) 2025-09-21 14:18:25.045844+03:00 pppoe-ha 89037 Handle CARP command for 5@lagg2 - BACKUP 2025-09-21 14:18:24.772146+03:00 check_reload_status 656 Carp backup event 2025-09-21 14:18:24.743736+03:00 pppoe-ha 86518 VHID 5 INIT - DOWN wan (pppoe0) 2025-09-21 14:18:24.742953+03:00 pppoe-ha 86518 Handle CARP command for 5@lagg2 - INIT 2025-09-21 14:18:24.530618+03:00 kernel - carp: 10@lagg2: INIT -> BACKUP (initialization complete) 2025-09-21 14:18:24.530574+03:00 kernel - carp: 10@lagg2: BACKUP -> INIT (hardware interface up) 2025-09-21 14:18:24.530524+03:00 kernel - carp: 9@ixl3.87: INIT -> BACKUP (initialization complete) 2025-09-21 14:18:24.530480+03:00 kernel - carp: 9@ixl3.87: BACKUP -> INIT (hardware interface up) 2025-09-21 14:18:24.530434+03:00 kernel - carp: 7@lagg0: INIT -> BACKUP (initialization complete) 2025-09-21 14:18:24.530364+03:00 kernel - carp: 7@lagg0: BACKUP -> INIT (hardware interface up) 2025-09-21 14:18:24.530296+03:00 kernel - carp: 5@lagg2: INIT -> BACKUP (initialization complete) 2025-09-21 14:18:24.530178+03:00 kernel - carp: 5@lagg2: BACKUP -> INIT (hardware interface up) 2025-09-21 14:18:24.463858+03:00 check_reload_status 656 Carp backup event 2025-09-21 14:18:24.450717+03:00 check_reload_status 656 Syncing firewall 2025-09-21 14:18:24.322569+03:00 php-fpm 51124 /status_carp.php: Configuration Change: admin@10.0.77.3 (Local Database): Leave CARP maintenance mode 2025-09-21 14:17:52.159134+03:00 pppoe-ha 89953 VHID 5 BACKUP - DOWN wan (pppoe0) 2025-09-21 14:17:52.121279+03:00 pppoe-ha 89953 Reconcile: evaluating 1 mapping(s) 2025-09-21 14:12:34.906579+03:00 php-fpm 18188 /rc.filter_synchronize: XMLRPC reload data success with https://10.0.88.2:443/xmlrpc.php (pfsense.restore_config_section). 2025-09-21 14:12:31.710678+03:00 php-fpm 18188 /rc.filter_synchronize: Beginning XMLRPC sync data to https://10.0.88.2:443/xmlrpc.php. 2025-09-21 14:12:31.710120+03:00 php-fpm 18188 /rc.filter_synchronize: XMLRPC versioncheck: 24.1 -- 24.1 2025-09-21 14:12:31.710030+03:00 php-fpm 18188 /rc.filter_synchronize: XMLRPC reload data success with https://10.0.88.2:443/xmlrpc.php (pfsense.host_firmware_version). 2025-09-21 14:12:31.559469+03:00 php-fpm 18188 /rc.filter_synchronize: Beginning XMLRPC sync data to https://10.0.88.2:443/xmlrpc.php. 2025-09-21 14:12:30.477082+03:00 check_reload_status 656 Syncing firewall
Edit:
I think the next logic will be good for me and anyone else:
On first CARP event for (VHID@iface):
Start a Collect window = 5-10 s.
During these 5 s, just record each new role (MASTER/BACKUP). Always keep only the latest.
After Collect ends:
Start a Silence window = 5-10 s.
If any new event arrives in this window, restart: go back to step 1 (new Collect 5-10 s).
If no events arrive for the whole 5-10 s, we consider the state settled.
Act once on the last recorded role:
If last = MASTER → bring PPPoE up (only if not already up).
If last = BACKUP → bring PPPoE down (only if not already down).
Record the applied role to avoid repeating the same action later.
Safety add-ons (still simple):
Boot grace: skip everything for the first ~150 s after boot.
Demotion guard: if net.inet.carp.demotion > 0, postpone action and re-check later. -
@w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:
And that’s a real problem. CARP can flip/flap on the interface several times per seconds
I don't think this should happen. Normally CARP should be very stable in a working environment. In my case with all firewalls i manage CARP interfaces are never flapping without a reason and the reason only being a failure of some network device inbetween the firewalls or one of the firewalls itself.
From the log you sent with an event each second there seems to be something wrong with your config. On my firewalls I don't see a single CARP event in days or weeks
-
@perrin
This is just switching on maintenance mode on the primary, nothing unusual.