Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    new if_pppoe Backend - getting HA/CARP to work like in MPD

    Scheduled Pinned Locked Moved Development
    42 Posts 3 Posters 3.8k Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • w0wW Offline
      w0w @perrin
      last edited by

      @perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

      great to hear that is is stable!

      Unfortunately I have to clarify: no — your variant does not work stable on my systems. The bug described above on an already loaded system with PPPoE up seems to start with initialization of your script, and the link falls into packet loss; I don’t even need to press the button. Also please look at the “Enable” checkbox, try to add several interfaces. For me it sometimes disappears at all, sometimes moves around.

      P 1 Reply Last reply Reply Quote 0
      • P Offline
        perrin @w0w
        last edited by

        @w0w i created a new version 0.1.2 of the package which now checks if the state of an pppoe interface is up (ip address present). If that is true and the desired state of that interface is also up, the script will not reload the interface. so it should not break any legit connection.

        regarding your issue with the GUI: I can't confirm the bug. I can add as many interfaces as I want and the GUI stays consistent. I am using the pfSense rowhelper in the GUI, so that is more or less standard functionality. can you give me some more details on when that fails? also which browser are you using and do you have any plugins in use which mangle the html of a page (e.g. adblock)?

        w0wW 1 Reply Last reply Reply Quote 0
        • w0wW Offline
          w0w @perrin
          last edited by

          @perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

          regarding your issue with the GUI: I can't confirm the bug.

          I can't replicate it on the new version. Will test it functionality soon, thanks!

          1 Reply Last reply Reply Quote 0
          • w0wW Offline
            w0w
            last edited by w0w

            Tested—no luck. WAN stays Pending and only acquires IPv4; IPv6 never comes up. It seems you can’t just use pppoe0 up—you need to run something like:
            /usr/local/sbin/pfSctl -c 'interface reload wan'
            to bring it up correctly.

            see Bringing PPPoE and checking it after.

            Here’s my script’s logic, see Bringing PPPoE and checking it after.

            • Purpose & scope

              • A CARP-aware PPPoE watchdog for pfSense: it tracks the node’s CARP role (MASTER/BACKUP) and reacts by starting/validating or stopping PPPoE, and (on BACKUP) restoring missing VIPs (VHID 5).
              • All syslog lines use a uniform prefix PPPoE_script_message00X: with ascending IDs.
            • Tunables / constants

              • LOCKFILE=/var/run/run.sh.lock — stores PID (line 1) and last known CARP role (line 2).
              • PPPOE_IF=pppoe0, LAN_IF=lagg2.
              • Role detection is by grepping ifconfig ${LAN_IF} for MASTER vhid 5; VIP presence check greps for vhid 5.
              • PHP_BIN=/usr/local/bin/php.
              • Internal flag PPPOE_ALREADY_STARTED tracks whether the script has (re)started PPPoE in this run.
            • Singleton guard

              • On start, check_already_running() reads the lockfile; if the recorded PID is alive (ps -p), the script exits to avoid multiple instances.
            • Optional discovery

              • find_pppoe_info() grabs the first pppoeN interface and its IPv4 address (kept for parity with older versions; not strictly required elsewhere).
            • Main loop (role monitor)

              • start_monitoring():

                • Logs launch (001) and initializes CUR_ROLE from the lockfile if present.

                • Every 30 seconds:

                  • Derives NEW_ROLE from ifconfig ${LAN_IF} (MASTER vhid 5 → MASTER; otherwise BACKUP).

                  • If role unchanged → continue silently (no log spam).

                  • If role changed:

                    • On MASTER: call handle_master_carp(); log 002.
                    • On BACKUP: call handle_non_master_carp(); log 003.
                  • Update lockfile with current PID and the new role.

            • MASTER path (handle_master_carp)

              • If PPPoE hasn’t been (re)started in this run, call handle_pppoe_start().
              • Otherwise, log/verify link (004) via check_pppoe().
            • BACKUP path (handle_non_master_carp)

              1. Shut PPPoE down if any pppoeN exists:

                • Wait 10s, ifconfig ${PPPOE_IF} down, set PPPOE_ALREADY_STARTED=false, log 005.
              2. Ensure VIPs (VHID 5) exist on LAN_IF:

                • If vhid 5 is missing, log 006 and re-install VIPs by running:

                  • php -r 'require_once "/etc/inc/interfaces.inc"; interfaces_vips_configure();'
            • Bringing PPPoE up (handle_pppoe_start)

              • Wait 130s to let CARP converge.

              • If no pppoeN exists:

                • Log 007, run pfSctl -c 'interface reload wan', set PPPOE_ALREADY_STARTED=true.
              • If pppoe0 exists and is UP:

                • Log 008, do nothing.
              • If it exists but is not UP:

                • ifconfig ${PPPOE_IF} up, log 009.
            • Verifying PPPoE (check_pppoe)

              • Wait 180s (grace period).

              • If no pppoeN is present:

                • Log 010, try pfSctl -c 'interface reload wan'.
                • On success: set PPPOE_ALREADY_STARTED=true; on failure log 011 and return error.
            • Logging policy

              • Routine role polls are quiet; logs emit only when the CARP role flips, plus the specific action logs (004–011) triggered by that transition.
            • Entry points

              • start: runs singleton check, optional discovery, then the monitoring loop.
              • stop: placeholder (prints a message; no teardown).
              • Otherwise: prints usage and exits non-zero.
            • Operational timings (summary)

              • Poll interval: 30s.
              • MASTER bring-up grace: 130s (CARP settle).
              • PPPoE verification grace: 180s.
              • BACKUP PPPoE down delay: 10s.
            • Key side effects

              • Keeps a persistent record of PID + last role in the lockfile.
              • Ensures PPPoE is up and stable on MASTER, down on BACKUP.
              • Auto-repairs missing VIPs (VHID 5) on BACKUP via pfSense PHP API.

            P 1 Reply Last reply Reply Quote 0
            • P Offline
              perrin @w0w
              last edited by

              @w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

              /usr/local/sbin/pfSctl -c 'interface reload wan'

              this is exactly what the script is doing. See GitHub.

              Please note, that the IPv6 stuff has nothing to do with my script but seems to be more related to the general if_pppoe troubles

              w0wW 1 Reply Last reply Reply Quote 1
              • w0wW Offline
                w0w @perrin
                last edited by

                @perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                this is exactly what the script is doing. See GitHub

                👍

                But there is something else, I don't know… safety timer, maybe. My script almost never fails to get the thing up and running, at least now. I will test it more and will give you more feedback this week, I hope.

                P 1 Reply Last reply Reply Quote 0
                • P Offline
                  perrin @w0w
                  last edited by

                  @w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                  But there is something else, I don't know… safety timer, maybe.

                  Might be since your script is running every 30secs, so there is some some random delay between the CARP state change and the time your script runs.
                  In my case the script runs immediately with the CARP change.

                  To test the behavior you could add some sleep in /usr/local/sbin/pppoe_ha_event (the shell script wrapper, not the php) prior to the exec line, e.g.:

                  #!/bin/sh
                  sleep 5
                  exec /usr/local/bin/php -q /usr/local/sbin/pppoe_ha_event.php "$@"
                  

                  let me know if that changes the behavior on your firewall

                  w0wW 1 Reply Last reply Reply Quote 1
                  • w0wW Offline
                    w0w @perrin
                    last edited by w0w

                    @perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                    In my case the script runs immediately with the CARP change.

                    And that’s a real problem. CARP can flip/flap on the interface several times per seconds, so you’ll end up running multiple commands and hit a race condition. You need to detect CARP state changes, but you don’t need this overreaction, so you need to check the state a bit later and only when you really detect that final state changed you run your commands. SO I don't think the event sleep timer can solve this problem, it's just getting into the queue.

                    2025-09-21 14:18:27.478723+03:00 	check_reload_status 	656 	Configuring interface wan
                    2025-09-21 14:18:27.473171+03:00 	pppoe-ha 	37352 	VHID 5 MASTER - UP wan (pppoe0)
                    2025-09-21 14:18:27.472332+03:00 	pppoe-ha 	37352 	Handle CARP command for 5@lagg2 - MASTER
                    2025-09-21 14:18:27.451908+03:00 	php-fpm 	3406 	/rc.carpbackup: HA cluster member "(10.0.90.5@lagg2): (LAN)" has resumed CARP state "BACKUP" for vhid 10
                    2025-09-21 14:18:27.132013+03:00 	check_reload_status 	656 	Carp master event
                    2025-09-21 14:18:27.110669+03:00 	pppoe-ha 	28694 	no mappings for VHID 10; ignoring
                    2025-09-21 14:18:27.110296+03:00 	pppoe-ha 	28694 	Handle CARP command for 10@lagg2 - BACKUP
                    2025-09-21 14:18:27.073719+03:00 	php-fpm 	62482 	/rc.carpbackup: HA cluster member "(10.0.87.5@ixl3.87): (WIFIAP)" has resumed CARP state "BACKUP" for vhid 9
                    2025-09-21 14:18:27.073279+03:00 	php-fpm 	62482 	/rc.carpbackup: Suppressing repeat e-mail notification message.
                    2025-09-21 14:18:26.804464+03:00 	check_reload_status 	656 	Carp backup event
                    2025-09-21 14:18:26.779787+03:00 	pppoe-ha 	17333 	no mappings for VHID 10; ignoring
                    2025-09-21 14:18:26.779198+03:00 	pppoe-ha 	17333 	Handle CARP command for 10@lagg2 - INIT
                    2025-09-21 14:18:26.769846+03:00 	php-fpm 	3406 	/rc.carpbackup: HA cluster member "(10.0.87.5@ixl3.87): (WIFIAP)" has resumed CARP state "BACKUP" for vhid 9
                    2025-09-21 14:18:26.470904+03:00 	php-fpm 	51124 	/rc.carpbackup: HA cluster member "(10.0.100.155@lagg0): (WAN2)" has resumed CARP state "BACKUP" for vhid 7
                    2025-09-21 14:18:26.360247+03:00 	check_reload_status 	656 	Carp backup event
                    2025-09-21 14:18:26.332786+03:00 	pppoe-ha 	5832 	no mappings for VHID 9; ignoring
                    2025-09-21 14:18:26.332425+03:00 	pppoe-ha 	5832 	Handle CARP command for 9@ixl3.87 - BACKUP
                    2025-09-21 14:18:26.150680+03:00 	php-fpm 	3406 	/rc.carpbackup: HA cluster member "(10.0.100.155@lagg0): (WAN2)" has resumed CARP state "BACKUP" for vhid 7
                    2025-09-21 14:18:25.999747+03:00 	kernel 	- 	carp: 10@lagg2: BACKUP -> MASTER (preempting a slower master)
                    2025-09-21 14:18:25.999658+03:00 	kernel 	- 	carp: 7@lagg0: BACKUP -> MASTER (preempting a slower master)
                    2025-09-21 14:18:25.999518+03:00 	kernel 	- 	carp: 5@lagg2: BACKUP -> MASTER (preempting a slower master)
                    2025-09-21 14:18:25.968532+03:00 	check_reload_status 	656 	Carp backup event
                    2025-09-21 14:18:25.951088+03:00 	pppoe-ha 	99836 	no mappings for VHID 9; ignoring
                    2025-09-21 14:18:25.950701+03:00 	pppoe-ha 	99836 	Handle CARP command for 9@ixl3.87 - INIT
                    2025-09-21 14:18:25.857771+03:00 	php-fpm 	3406 	/rc.carpbackup: HA cluster member "(10.0.77.5@lagg2): (LAN)" has resumed CARP state "BACKUP" for vhid 5
                    2025-09-21 14:18:25.857471+03:00 	php-fpm 	3406 	/rc.carpbackup: Suppressing repeat e-mail notification message.
                    2025-09-21 14:18:25.701909+03:00 	php-fpm 	7887 	/rc.filter_synchronize: Beginning XMLRPC sync data to https://10.0.88.2:443/xmlrpc.php.
                    2025-09-21 14:18:25.701214+03:00 	php-fpm 	7887 	/rc.filter_synchronize: XMLRPC versioncheck: 24.1 -- 24.1
                    2025-09-21 14:18:25.701112+03:00 	php-fpm 	7887 	/rc.filter_synchronize: XMLRPC reload data success with https://10.0.88.2:443/xmlrpc.php (pfsense.host_firmware_version).
                    2025-09-21 14:18:25.667673+03:00 	check_reload_status 	656 	Carp backup event
                    2025-09-21 14:18:25.650236+03:00 	pppoe-ha 	93377 	no mappings for VHID 7; ignoring
                    2025-09-21 14:18:25.649873+03:00 	pppoe-ha 	93377 	Handle CARP command for 7@lagg0 - BACKUP
                    2025-09-21 14:18:25.545307+03:00 	php-fpm 	16208 	/rc.carpbackup: HA cluster member "(10.0.77.5@lagg2): (LAN)" has resumed CARP state "BACKUP" for vhid 5
                    2025-09-21 14:18:25.541200+03:00 	php-fpm 	7887 	/rc.filter_synchronize: Beginning XMLRPC sync data to https://10.0.88.2:443/xmlrpc.php.
                    2025-09-21 14:18:25.368345+03:00 	check_reload_status 	656 	Carp backup event
                    2025-09-21 14:18:25.351847+03:00 	pppoe-ha 	92360 	no mappings for VHID 7; ignoring
                    2025-09-21 14:18:25.351495+03:00 	pppoe-ha 	92360 	Handle CARP command for 7@lagg0 - INIT
                    2025-09-21 14:18:25.076109+03:00 	check_reload_status 	656 	Carp backup event
                    2025-09-21 14:18:25.046639+03:00 	pppoe-ha 	89037 	VHID 5 BACKUP - DOWN wan (pppoe0)
                    2025-09-21 14:18:25.045844+03:00 	pppoe-ha 	89037 	Handle CARP command for 5@lagg2 - BACKUP
                    2025-09-21 14:18:24.772146+03:00 	check_reload_status 	656 	Carp backup event
                    2025-09-21 14:18:24.743736+03:00 	pppoe-ha 	86518 	VHID 5 INIT - DOWN wan (pppoe0)
                    2025-09-21 14:18:24.742953+03:00 	pppoe-ha 	86518 	Handle CARP command for 5@lagg2 - INIT
                    2025-09-21 14:18:24.530618+03:00 	kernel 	- 	carp: 10@lagg2: INIT -> BACKUP (initialization complete)
                    2025-09-21 14:18:24.530574+03:00 	kernel 	- 	carp: 10@lagg2: BACKUP -> INIT (hardware interface up)
                    2025-09-21 14:18:24.530524+03:00 	kernel 	- 	carp: 9@ixl3.87: INIT -> BACKUP (initialization complete)
                    2025-09-21 14:18:24.530480+03:00 	kernel 	- 	carp: 9@ixl3.87: BACKUP -> INIT (hardware interface up)
                    2025-09-21 14:18:24.530434+03:00 	kernel 	- 	carp: 7@lagg0: INIT -> BACKUP (initialization complete)
                    2025-09-21 14:18:24.530364+03:00 	kernel 	- 	carp: 7@lagg0: BACKUP -> INIT (hardware interface up)
                    2025-09-21 14:18:24.530296+03:00 	kernel 	- 	carp: 5@lagg2: INIT -> BACKUP (initialization complete)
                    2025-09-21 14:18:24.530178+03:00 	kernel 	- 	carp: 5@lagg2: BACKUP -> INIT (hardware interface up)
                    2025-09-21 14:18:24.463858+03:00 	check_reload_status 	656 	Carp backup event
                    2025-09-21 14:18:24.450717+03:00 	check_reload_status 	656 	Syncing firewall
                    2025-09-21 14:18:24.322569+03:00 	php-fpm 	51124 	/status_carp.php: Configuration Change: admin@10.0.77.3 (Local Database): Leave CARP maintenance mode
                    2025-09-21 14:17:52.159134+03:00 	pppoe-ha 	89953 	VHID 5 BACKUP - DOWN wan (pppoe0)
                    2025-09-21 14:17:52.121279+03:00 	pppoe-ha 	89953 	Reconcile: evaluating 1 mapping(s)
                    2025-09-21 14:12:34.906579+03:00 	php-fpm 	18188 	/rc.filter_synchronize: XMLRPC reload data success with https://10.0.88.2:443/xmlrpc.php (pfsense.restore_config_section).
                    2025-09-21 14:12:31.710678+03:00 	php-fpm 	18188 	/rc.filter_synchronize: Beginning XMLRPC sync data to https://10.0.88.2:443/xmlrpc.php.
                    2025-09-21 14:12:31.710120+03:00 	php-fpm 	18188 	/rc.filter_synchronize: XMLRPC versioncheck: 24.1 -- 24.1
                    2025-09-21 14:12:31.710030+03:00 	php-fpm 	18188 	/rc.filter_synchronize: XMLRPC reload data success with https://10.0.88.2:443/xmlrpc.php (pfsense.host_firmware_version).
                    2025-09-21 14:12:31.559469+03:00 	php-fpm 	18188 	/rc.filter_synchronize: Beginning XMLRPC sync data to https://10.0.88.2:443/xmlrpc.php.
                    2025-09-21 14:12:30.477082+03:00 	check_reload_status 	656 	Syncing firewall 
                    
                    

                    Edit:

                    I think the next logic will be good for me and anyone else:

                    On first CARP event for (VHID@iface):
                    Start a Collect window = 5-10 s.
                    During these 5 s, just record each new role (MASTER/BACKUP). Always keep only the latest.
                    After Collect ends:
                    Start a Silence window = 5-10 s.
                    If any new event arrives in this window, restart: go back to step 1 (new Collect 5-10 s).
                    If no events arrive for the whole 5-10 s, we consider the state settled.
                    Act once on the last recorded role:
                    If last = MASTER → bring PPPoE up (only if not already up).
                    If last = BACKUP → bring PPPoE down (only if not already down).
                    Record the applied role to avoid repeating the same action later.
                    Safety add-ons (still simple):
                    Boot grace: skip everything for the first ~150 s after boot.
                    Demotion guard: if net.inet.carp.demotion > 0, postpone action and re-check later.

                    P 1 Reply Last reply Reply Quote 0
                    • P Offline
                      perrin @w0w
                      last edited by

                      @w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                      And that’s a real problem. CARP can flip/flap on the interface several times per seconds

                      I don't think this should happen. Normally CARP should be very stable in a working environment. In my case with all firewalls i manage CARP interfaces are never flapping without a reason and the reason only being a failure of some network device inbetween the firewalls or one of the firewalls itself.

                      From the log you sent with an event each second there seems to be something wrong with your config. On my firewalls I don't see a single CARP event in days or weeks

                      w0wW 1 Reply Last reply Reply Quote 0
                      • w0wW Offline
                        w0w @perrin
                        last edited by w0w

                        @perrin
                        This is just switching on maintenance mode on the primary, nothing unusual.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.