Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    new if_pppoe Backend - getting HA/CARP to work like in MPD

    Scheduled Pinned Locked Moved Development
    42 Posts 3 Posters 3.8k Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • w0wW Offline
      w0w @perrin
      last edited by w0w

      @perrin
      If the PPPoE interface is already up and the selected VIP is MASTER, this is what I see in the logsб when pressing "run reconcile now"

      2025-09-19 19:04:02.698352+03:00 	rc.gateway_alarm 	94407 	>>> Gateway alarm: WAN_PPPOE (Addr:212xxxx Alarm:1 RTT:.537ms RTTsd:.089ms Loss:22%)
      2025-09-19 19:03:53.092946+03:00 	kernel 	- 	pppoe0: link state changed to UP
      2025-09-19 19:03:51.202925+03:00 	kernel 	- 	Limiting ICMPv6 destination unreachable output from 116 to 103 packets/sec
      2025-09-19 19:03:50.152914+03:00 	kernel 	- 	Limiting ICMPv6 destination unreachable output from 111 to 98 packets/sec
      2025-09-19 19:03:49.102945+03:00 	kernel 	- 	Limiting ICMPv6 destination unreachable output from 106 to 100 packets/sec
      2025-09-19 19:03:48.052233+03:00 	kernel 	- 	pppoe0: link state changed to DOWN
      2025-09-19 19:03:47.887051+03:00 	php-fpm 	4863 	/rc.interfaces_wan_configure: calling interface_dhcpv6_configure.
      2025-09-19 19:03:47.846816+03:00 	kernel 	- 	pppoe0: link state changed to DOWN
      2025-09-19 19:03:47.842883+03:00 	kernel 	- 	if_pppoe: pppoe0: failed to clear IP address: 49
      2025-09-19 19:03:46.631320+03:00 	check_reload_status 	680 	Configuring interface wan
      2025-09-19 19:03:46.626441+03:00 	pppoe-ha 	84394 	VHID 5 MASTER - UP wan (pppoe0)
      2025-09-19 19:03:46.600614+03:00 	pppoe-ha 	84394 	Reconcile: evaluating 1 mapping(s) 
      

      I am not sure is it really necessary to break the connection at all? This ended up with
      b2e798d2-ea0e-4fa7-8e69-818228197050-image.png for both ipv4 and 6

      Overall, I can’t say it’s stable for me—I’m not sure why. I also need to fix another bug. It looks like something is preventing the VIPs from starting when the firewall boots. Thats why I used

       $PHP_BIN -r 'require_once "/etc/inc/interfaces.inc"; interfaces_vips_configure();'
      
      P 1 Reply Last reply Reply Quote 0
      • P Offline
        perrin @w0w
        last edited by

        @w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

        I am not sure is it really necessary to break the connection at all?

        what is happening on a reconcile is that all interfaces are forced in the state they should be in. For the UP-command the script currently does a 'interface reload' (/usr/local/sbin/pfSctl -c 'interface reload <interface>'), whereas for the down it does a ifconfig <interface> down

        It showed up in my tests that using 'ifconfig <interface> up' as a UP-command would not reliably connect the pppoe interface in every case. That is why i opted in to use the reload command. The downside of this is in fact that it breaks the connection. I could add a check to see if the interface is already connected before reloading it to fix this behaviour in reconcile. I might be adding that.

        @w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

        Overall, I can’t say it’s stable for me—I’m not sure why. I also need to fix another bug. It looks like something is preventing the VIPs from starting when the firewall boots.

        great to hear that is is stable! Regarding the VIPs not coming up: I have no idea where this is coming from, probably has nothing to do with my script. On both of my firewalls the VIPs come up as expected. But i am only using VIPs as part of CARP.

        w0wW 1 Reply Last reply Reply Quote 0
        • w0wW Offline
          w0w @perrin
          last edited by

          @perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

          great to hear that is is stable!

          Unfortunately I have to clarify: no — your variant does not work stable on my systems. The bug described above on an already loaded system with PPPoE up seems to start with initialization of your script, and the link falls into packet loss; I don’t even need to press the button. Also please look at the “Enable” checkbox, try to add several interfaces. For me it sometimes disappears at all, sometimes moves around.

          P 1 Reply Last reply Reply Quote 0
          • P Offline
            perrin @w0w
            last edited by

            @w0w i created a new version 0.1.2 of the package which now checks if the state of an pppoe interface is up (ip address present). If that is true and the desired state of that interface is also up, the script will not reload the interface. so it should not break any legit connection.

            regarding your issue with the GUI: I can't confirm the bug. I can add as many interfaces as I want and the GUI stays consistent. I am using the pfSense rowhelper in the GUI, so that is more or less standard functionality. can you give me some more details on when that fails? also which browser are you using and do you have any plugins in use which mangle the html of a page (e.g. adblock)?

            w0wW 1 Reply Last reply Reply Quote 0
            • w0wW Offline
              w0w @perrin
              last edited by

              @perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

              regarding your issue with the GUI: I can't confirm the bug.

              I can't replicate it on the new version. Will test it functionality soon, thanks!

              1 Reply Last reply Reply Quote 0
              • w0wW Offline
                w0w
                last edited by w0w

                Tested—no luck. WAN stays Pending and only acquires IPv4; IPv6 never comes up. It seems you can’t just use pppoe0 up—you need to run something like:
                /usr/local/sbin/pfSctl -c 'interface reload wan'
                to bring it up correctly.

                see Bringing PPPoE and checking it after.

                Here’s my script’s logic, see Bringing PPPoE and checking it after.

                • Purpose & scope

                  • A CARP-aware PPPoE watchdog for pfSense: it tracks the node’s CARP role (MASTER/BACKUP) and reacts by starting/validating or stopping PPPoE, and (on BACKUP) restoring missing VIPs (VHID 5).
                  • All syslog lines use a uniform prefix PPPoE_script_message00X: with ascending IDs.
                • Tunables / constants

                  • LOCKFILE=/var/run/run.sh.lock — stores PID (line 1) and last known CARP role (line 2).
                  • PPPOE_IF=pppoe0, LAN_IF=lagg2.
                  • Role detection is by grepping ifconfig ${LAN_IF} for MASTER vhid 5; VIP presence check greps for vhid 5.
                  • PHP_BIN=/usr/local/bin/php.
                  • Internal flag PPPOE_ALREADY_STARTED tracks whether the script has (re)started PPPoE in this run.
                • Singleton guard

                  • On start, check_already_running() reads the lockfile; if the recorded PID is alive (ps -p), the script exits to avoid multiple instances.
                • Optional discovery

                  • find_pppoe_info() grabs the first pppoeN interface and its IPv4 address (kept for parity with older versions; not strictly required elsewhere).
                • Main loop (role monitor)

                  • start_monitoring():

                    • Logs launch (001) and initializes CUR_ROLE from the lockfile if present.

                    • Every 30 seconds:

                      • Derives NEW_ROLE from ifconfig ${LAN_IF} (MASTER vhid 5 → MASTER; otherwise BACKUP).

                      • If role unchanged → continue silently (no log spam).

                      • If role changed:

                        • On MASTER: call handle_master_carp(); log 002.
                        • On BACKUP: call handle_non_master_carp(); log 003.
                      • Update lockfile with current PID and the new role.

                • MASTER path (handle_master_carp)

                  • If PPPoE hasn’t been (re)started in this run, call handle_pppoe_start().
                  • Otherwise, log/verify link (004) via check_pppoe().
                • BACKUP path (handle_non_master_carp)

                  1. Shut PPPoE down if any pppoeN exists:

                    • Wait 10s, ifconfig ${PPPOE_IF} down, set PPPOE_ALREADY_STARTED=false, log 005.
                  2. Ensure VIPs (VHID 5) exist on LAN_IF:

                    • If vhid 5 is missing, log 006 and re-install VIPs by running:

                      • php -r 'require_once "/etc/inc/interfaces.inc"; interfaces_vips_configure();'
                • Bringing PPPoE up (handle_pppoe_start)

                  • Wait 130s to let CARP converge.

                  • If no pppoeN exists:

                    • Log 007, run pfSctl -c 'interface reload wan', set PPPOE_ALREADY_STARTED=true.
                  • If pppoe0 exists and is UP:

                    • Log 008, do nothing.
                  • If it exists but is not UP:

                    • ifconfig ${PPPOE_IF} up, log 009.
                • Verifying PPPoE (check_pppoe)

                  • Wait 180s (grace period).

                  • If no pppoeN is present:

                    • Log 010, try pfSctl -c 'interface reload wan'.
                    • On success: set PPPOE_ALREADY_STARTED=true; on failure log 011 and return error.
                • Logging policy

                  • Routine role polls are quiet; logs emit only when the CARP role flips, plus the specific action logs (004–011) triggered by that transition.
                • Entry points

                  • start: runs singleton check, optional discovery, then the monitoring loop.
                  • stop: placeholder (prints a message; no teardown).
                  • Otherwise: prints usage and exits non-zero.
                • Operational timings (summary)

                  • Poll interval: 30s.
                  • MASTER bring-up grace: 130s (CARP settle).
                  • PPPoE verification grace: 180s.
                  • BACKUP PPPoE down delay: 10s.
                • Key side effects

                  • Keeps a persistent record of PID + last role in the lockfile.
                  • Ensures PPPoE is up and stable on MASTER, down on BACKUP.
                  • Auto-repairs missing VIPs (VHID 5) on BACKUP via pfSense PHP API.

                P 1 Reply Last reply Reply Quote 0
                • P Offline
                  perrin @w0w
                  last edited by

                  @w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                  /usr/local/sbin/pfSctl -c 'interface reload wan'

                  this is exactly what the script is doing. See GitHub.

                  Please note, that the IPv6 stuff has nothing to do with my script but seems to be more related to the general if_pppoe troubles

                  w0wW 1 Reply Last reply Reply Quote 1
                  • w0wW Offline
                    w0w @perrin
                    last edited by

                    @perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                    this is exactly what the script is doing. See GitHub

                    👍

                    But there is something else, I don't know… safety timer, maybe. My script almost never fails to get the thing up and running, at least now. I will test it more and will give you more feedback this week, I hope.

                    P 1 Reply Last reply Reply Quote 0
                    • P Offline
                      perrin @w0w
                      last edited by

                      @w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                      But there is something else, I don't know… safety timer, maybe.

                      Might be since your script is running every 30secs, so there is some some random delay between the CARP state change and the time your script runs.
                      In my case the script runs immediately with the CARP change.

                      To test the behavior you could add some sleep in /usr/local/sbin/pppoe_ha_event (the shell script wrapper, not the php) prior to the exec line, e.g.:

                      #!/bin/sh
                      sleep 5
                      exec /usr/local/bin/php -q /usr/local/sbin/pppoe_ha_event.php "$@"
                      

                      let me know if that changes the behavior on your firewall

                      w0wW 1 Reply Last reply Reply Quote 1
                      • w0wW Offline
                        w0w @perrin
                        last edited by w0w

                        @perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                        In my case the script runs immediately with the CARP change.

                        And that’s a real problem. CARP can flip/flap on the interface several times per seconds, so you’ll end up running multiple commands and hit a race condition. You need to detect CARP state changes, but you don’t need this overreaction, so you need to check the state a bit later and only when you really detect that final state changed you run your commands. SO I don't think the event sleep timer can solve this problem, it's just getting into the queue.

                        2025-09-21 14:18:27.478723+03:00 	check_reload_status 	656 	Configuring interface wan
                        2025-09-21 14:18:27.473171+03:00 	pppoe-ha 	37352 	VHID 5 MASTER - UP wan (pppoe0)
                        2025-09-21 14:18:27.472332+03:00 	pppoe-ha 	37352 	Handle CARP command for 5@lagg2 - MASTER
                        2025-09-21 14:18:27.451908+03:00 	php-fpm 	3406 	/rc.carpbackup: HA cluster member "(10.0.90.5@lagg2): (LAN)" has resumed CARP state "BACKUP" for vhid 10
                        2025-09-21 14:18:27.132013+03:00 	check_reload_status 	656 	Carp master event
                        2025-09-21 14:18:27.110669+03:00 	pppoe-ha 	28694 	no mappings for VHID 10; ignoring
                        2025-09-21 14:18:27.110296+03:00 	pppoe-ha 	28694 	Handle CARP command for 10@lagg2 - BACKUP
                        2025-09-21 14:18:27.073719+03:00 	php-fpm 	62482 	/rc.carpbackup: HA cluster member "(10.0.87.5@ixl3.87): (WIFIAP)" has resumed CARP state "BACKUP" for vhid 9
                        2025-09-21 14:18:27.073279+03:00 	php-fpm 	62482 	/rc.carpbackup: Suppressing repeat e-mail notification message.
                        2025-09-21 14:18:26.804464+03:00 	check_reload_status 	656 	Carp backup event
                        2025-09-21 14:18:26.779787+03:00 	pppoe-ha 	17333 	no mappings for VHID 10; ignoring
                        2025-09-21 14:18:26.779198+03:00 	pppoe-ha 	17333 	Handle CARP command for 10@lagg2 - INIT
                        2025-09-21 14:18:26.769846+03:00 	php-fpm 	3406 	/rc.carpbackup: HA cluster member "(10.0.87.5@ixl3.87): (WIFIAP)" has resumed CARP state "BACKUP" for vhid 9
                        2025-09-21 14:18:26.470904+03:00 	php-fpm 	51124 	/rc.carpbackup: HA cluster member "(10.0.100.155@lagg0): (WAN2)" has resumed CARP state "BACKUP" for vhid 7
                        2025-09-21 14:18:26.360247+03:00 	check_reload_status 	656 	Carp backup event
                        2025-09-21 14:18:26.332786+03:00 	pppoe-ha 	5832 	no mappings for VHID 9; ignoring
                        2025-09-21 14:18:26.332425+03:00 	pppoe-ha 	5832 	Handle CARP command for 9@ixl3.87 - BACKUP
                        2025-09-21 14:18:26.150680+03:00 	php-fpm 	3406 	/rc.carpbackup: HA cluster member "(10.0.100.155@lagg0): (WAN2)" has resumed CARP state "BACKUP" for vhid 7
                        2025-09-21 14:18:25.999747+03:00 	kernel 	- 	carp: 10@lagg2: BACKUP -> MASTER (preempting a slower master)
                        2025-09-21 14:18:25.999658+03:00 	kernel 	- 	carp: 7@lagg0: BACKUP -> MASTER (preempting a slower master)
                        2025-09-21 14:18:25.999518+03:00 	kernel 	- 	carp: 5@lagg2: BACKUP -> MASTER (preempting a slower master)
                        2025-09-21 14:18:25.968532+03:00 	check_reload_status 	656 	Carp backup event
                        2025-09-21 14:18:25.951088+03:00 	pppoe-ha 	99836 	no mappings for VHID 9; ignoring
                        2025-09-21 14:18:25.950701+03:00 	pppoe-ha 	99836 	Handle CARP command for 9@ixl3.87 - INIT
                        2025-09-21 14:18:25.857771+03:00 	php-fpm 	3406 	/rc.carpbackup: HA cluster member "(10.0.77.5@lagg2): (LAN)" has resumed CARP state "BACKUP" for vhid 5
                        2025-09-21 14:18:25.857471+03:00 	php-fpm 	3406 	/rc.carpbackup: Suppressing repeat e-mail notification message.
                        2025-09-21 14:18:25.701909+03:00 	php-fpm 	7887 	/rc.filter_synchronize: Beginning XMLRPC sync data to https://10.0.88.2:443/xmlrpc.php.
                        2025-09-21 14:18:25.701214+03:00 	php-fpm 	7887 	/rc.filter_synchronize: XMLRPC versioncheck: 24.1 -- 24.1
                        2025-09-21 14:18:25.701112+03:00 	php-fpm 	7887 	/rc.filter_synchronize: XMLRPC reload data success with https://10.0.88.2:443/xmlrpc.php (pfsense.host_firmware_version).
                        2025-09-21 14:18:25.667673+03:00 	check_reload_status 	656 	Carp backup event
                        2025-09-21 14:18:25.650236+03:00 	pppoe-ha 	93377 	no mappings for VHID 7; ignoring
                        2025-09-21 14:18:25.649873+03:00 	pppoe-ha 	93377 	Handle CARP command for 7@lagg0 - BACKUP
                        2025-09-21 14:18:25.545307+03:00 	php-fpm 	16208 	/rc.carpbackup: HA cluster member "(10.0.77.5@lagg2): (LAN)" has resumed CARP state "BACKUP" for vhid 5
                        2025-09-21 14:18:25.541200+03:00 	php-fpm 	7887 	/rc.filter_synchronize: Beginning XMLRPC sync data to https://10.0.88.2:443/xmlrpc.php.
                        2025-09-21 14:18:25.368345+03:00 	check_reload_status 	656 	Carp backup event
                        2025-09-21 14:18:25.351847+03:00 	pppoe-ha 	92360 	no mappings for VHID 7; ignoring
                        2025-09-21 14:18:25.351495+03:00 	pppoe-ha 	92360 	Handle CARP command for 7@lagg0 - INIT
                        2025-09-21 14:18:25.076109+03:00 	check_reload_status 	656 	Carp backup event
                        2025-09-21 14:18:25.046639+03:00 	pppoe-ha 	89037 	VHID 5 BACKUP - DOWN wan (pppoe0)
                        2025-09-21 14:18:25.045844+03:00 	pppoe-ha 	89037 	Handle CARP command for 5@lagg2 - BACKUP
                        2025-09-21 14:18:24.772146+03:00 	check_reload_status 	656 	Carp backup event
                        2025-09-21 14:18:24.743736+03:00 	pppoe-ha 	86518 	VHID 5 INIT - DOWN wan (pppoe0)
                        2025-09-21 14:18:24.742953+03:00 	pppoe-ha 	86518 	Handle CARP command for 5@lagg2 - INIT
                        2025-09-21 14:18:24.530618+03:00 	kernel 	- 	carp: 10@lagg2: INIT -> BACKUP (initialization complete)
                        2025-09-21 14:18:24.530574+03:00 	kernel 	- 	carp: 10@lagg2: BACKUP -> INIT (hardware interface up)
                        2025-09-21 14:18:24.530524+03:00 	kernel 	- 	carp: 9@ixl3.87: INIT -> BACKUP (initialization complete)
                        2025-09-21 14:18:24.530480+03:00 	kernel 	- 	carp: 9@ixl3.87: BACKUP -> INIT (hardware interface up)
                        2025-09-21 14:18:24.530434+03:00 	kernel 	- 	carp: 7@lagg0: INIT -> BACKUP (initialization complete)
                        2025-09-21 14:18:24.530364+03:00 	kernel 	- 	carp: 7@lagg0: BACKUP -> INIT (hardware interface up)
                        2025-09-21 14:18:24.530296+03:00 	kernel 	- 	carp: 5@lagg2: INIT -> BACKUP (initialization complete)
                        2025-09-21 14:18:24.530178+03:00 	kernel 	- 	carp: 5@lagg2: BACKUP -> INIT (hardware interface up)
                        2025-09-21 14:18:24.463858+03:00 	check_reload_status 	656 	Carp backup event
                        2025-09-21 14:18:24.450717+03:00 	check_reload_status 	656 	Syncing firewall
                        2025-09-21 14:18:24.322569+03:00 	php-fpm 	51124 	/status_carp.php: Configuration Change: admin@10.0.77.3 (Local Database): Leave CARP maintenance mode
                        2025-09-21 14:17:52.159134+03:00 	pppoe-ha 	89953 	VHID 5 BACKUP - DOWN wan (pppoe0)
                        2025-09-21 14:17:52.121279+03:00 	pppoe-ha 	89953 	Reconcile: evaluating 1 mapping(s)
                        2025-09-21 14:12:34.906579+03:00 	php-fpm 	18188 	/rc.filter_synchronize: XMLRPC reload data success with https://10.0.88.2:443/xmlrpc.php (pfsense.restore_config_section).
                        2025-09-21 14:12:31.710678+03:00 	php-fpm 	18188 	/rc.filter_synchronize: Beginning XMLRPC sync data to https://10.0.88.2:443/xmlrpc.php.
                        2025-09-21 14:12:31.710120+03:00 	php-fpm 	18188 	/rc.filter_synchronize: XMLRPC versioncheck: 24.1 -- 24.1
                        2025-09-21 14:12:31.710030+03:00 	php-fpm 	18188 	/rc.filter_synchronize: XMLRPC reload data success with https://10.0.88.2:443/xmlrpc.php (pfsense.host_firmware_version).
                        2025-09-21 14:12:31.559469+03:00 	php-fpm 	18188 	/rc.filter_synchronize: Beginning XMLRPC sync data to https://10.0.88.2:443/xmlrpc.php.
                        2025-09-21 14:12:30.477082+03:00 	check_reload_status 	656 	Syncing firewall 
                        
                        

                        Edit:

                        I think the next logic will be good for me and anyone else:

                        On first CARP event for (VHID@iface):
                        Start a Collect window = 5-10 s.
                        During these 5 s, just record each new role (MASTER/BACKUP). Always keep only the latest.
                        After Collect ends:
                        Start a Silence window = 5-10 s.
                        If any new event arrives in this window, restart: go back to step 1 (new Collect 5-10 s).
                        If no events arrive for the whole 5-10 s, we consider the state settled.
                        Act once on the last recorded role:
                        If last = MASTER → bring PPPoE up (only if not already up).
                        If last = BACKUP → bring PPPoE down (only if not already down).
                        Record the applied role to avoid repeating the same action later.
                        Safety add-ons (still simple):
                        Boot grace: skip everything for the first ~150 s after boot.
                        Demotion guard: if net.inet.carp.demotion > 0, postpone action and re-check later.

                        P 1 Reply Last reply Reply Quote 0
                        • P Offline
                          perrin @w0w
                          last edited by

                          @w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                          And that’s a real problem. CARP can flip/flap on the interface several times per seconds

                          I don't think this should happen. Normally CARP should be very stable in a working environment. In my case with all firewalls i manage CARP interfaces are never flapping without a reason and the reason only being a failure of some network device inbetween the firewalls or one of the firewalls itself.

                          From the log you sent with an event each second there seems to be something wrong with your config. On my firewalls I don't see a single CARP event in days or weeks

                          w0wW 1 Reply Last reply Reply Quote 0
                          • w0wW Offline
                            w0w @perrin
                            last edited by w0w

                            @perrin
                            This is just switching on maintenance mode on the primary, nothing unusual.

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.