Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    new if_pppoe Backend - getting HA/CARP to work like in MPD

    Scheduled Pinned Locked Moved Development
    59 Posts 4 Posters 4.9k Views 5 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • w0wW Offline
      w0w @perrin
      last edited by

      @perrin
      How does your HA pair react if you put the master node into maintenance mode via Status → CARP → Enable Persistent Maintenance Mode (or whatever it’s called)?

      P 1 Reply Last reply Reply Quote 0
      • P Offline
        perrin @w0w
        last edited by

        @w0w Enabling the Maintenance Mode on the Master raises its skew thus transitioning MASTER to BACKUP. pppoe-ha picks up the backup state an disables the interface accoringly.

        Since i don't have a problem moving the PPPoE session, in my case the failover works as expected.

        Maybe @crl should try that and see

        a) if if_pppoe correctly closes the session on the master prior to disabling the interface and
        b) if his backup can correctly establish a new PPPoE session

        1 Reply Last reply Reply Quote 1
        • C Offline
          crl @perrin
          last edited by crl

          Please check it this workaround:
          Github Issue - ISP side 'Too many sessions' keeping backup pfsense's WAN down

          It solves only one use case:
          -OK: enter and leave carp maintenance mode on manual trigger

          -Solution requested: if a wan cable is pulled (between the wan switch and any of the pfsense devices) or if the pfsense machine is down:
          perform MASTER --> BACKUP transition and connect pppoe on the BACKUP. Should the MASTER come back again, it shall take back the MASTER role and pppoe-reconnect on the MASTER.

          C 1 Reply Last reply Reply Quote 1
          • C Offline
            crl @crl
            last edited by

            I tried to summarize what is going on during the switchover experiments. This is one example.

            2a61333b-245d-4e7b-8640-dfe047400ef5-image.png

            w0wW 1 Reply Last reply Reply Quote 1
            • w0wW Offline
              w0w @crl
              last edited by

              @crl
              This 2:20 looks familiar to me...
              @crl, @perrin do you both have dual stack pppoe?

              P 1 Reply Last reply Reply Quote 0
              • P Offline
                perrin @w0w
                last edited by perrin

                @w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                @crl, @perrin do you both have dual stack pppoe?
                In my case yes, dual stack v4 and V6

                @crl said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                I tried to summarize what is going on during the switchover experiments. This is one example.

                2a61333b-245d-4e7b-8640-dfe047400ef5-image.png

                Some of these issues might be related to configuration and or default behavior of pfSense (e.g. when pppoe fails and you're expecting a carp switch.)
                Do these things work as expected when you are using the old time based scripts?

                w0wW 1 Reply Last reply Reply Quote 0
                • w0wW Offline
                  w0w @perrin
                  last edited by

                  @perrin

                  Yes, in my setup things work somewhat differently, as you noticed. There are at least a few reasons. Most importantly, every time PPPoE comes up, the VIPs get reconfigured and CARP reinitializes. I suspect this behavior is related to IPv6 and the fact that the LAN uses the Track Interface option to obtain its IPv6 address, but I’m not certain. I’m currently trying to track down the root cause—or perhaps it’s an “incompatible” configuration.

                  How does this behave on your side? As I understand it, bringing up PPPoE does not trigger VIP reconfiguration/CARP initialization for you, right?

                  P 1 Reply Last reply Reply Quote 0
                  • P Offline
                    perrin @w0w
                    last edited by

                    @w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                    @perrin

                    How does this behave on your side? As I understand it, bringing up PPPoE does not trigger VIP reconfiguration/CARP initialization for you, right?

                    No, with my config no VIP reconfig takes place when PPPoE comes up. In my case PPPoE is running in a vlan from the provider side and I've added the carp VIP on the "physical" interface, so without a vlan tag. This only triggers when a firewall goes down or the interface goes down, which in my case is exactly what I am expecting it to do.

                    In my case I am running two Proxmox hosts each running a virtual pfSense, one being master one being slave.
                    The most common reason I need failover to happen is when we are rebooting one of the Proxmox hosts due to software upgrades. In this case the master pfSense would be shut down cleanly and the slave takes over all interfaces with the PPPoE being one of them.

                    w0wW 1 Reply Last reply Reply Quote 0
                    • w0wW Offline
                      w0w @perrin
                      last edited by

                      @perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                      In my case I am running two Proxmox hosts each running a virtual pfSense, one being master one being slave.

                      I am running the same configuration. Looks like I have found something related to this VIP reconfiguration issue. I will do some tests and report back if I find anything else.

                      1 Reply Last reply Reply Quote 0
                      • w0wW Offline
                        w0w
                        last edited by w0w

                        I've experimented a lot with code, here is what I did to make it work with “buggy” config. pppoe_ha_event.php .

                        The biggest difference is that we shouldn’t run pfSctl -c 'interface reload <friendly>' (e.g., wan) if the PPPoE interface already exists. We only do that if, for some reason, the interface doesn’t exist. The shell script does the same, by the way.
                        Changes:

                        • MASTER bring-up path updated: on MASTER we now first try ifconfig <real pppoeX> up if the PPPoE interface already exists; if it doesn’t, we fall back to pfSctl -c 'interface reload <friendly>' (e.g., wan). (Original only triggered the pfSctl reload path.)
                        • CARP event suppression window: after switching to MASTER, the script temporarily ignores further CARP events (~60 seconds total in two 30s steps) to prevent flapping during stabilization.
                        • Staged targeted reconciles: after ~30s (still MASTER) run a focused reconcile; after another ~30s run a safety reconcile. These checks act only if state truly differs (see next point).
                        • Smarter reconcile rules: if MASTER and PPPoE already has a valid IPv4 P2P or global IPv6 address, do nothing; if BACKUP, ensure the real PPPoE iface is down.
                        • BACKUP/INIT handling refined: on BACKUP/INIT we bring the real PPPoE interface down. On INIT we first re-read actual CARP state; only bring the PPPoE real iface down if the current state is truly BACKUP. Actually ignores init state, only backup brings pppoeX down.
                        • Quiet periodic health check: every 5 minutes, perform a low-noise reconcile (skipped during the suppression window) to keep state honest if it missed for some reason. - this feature currently broken and I don't think iti is needed anyway

                        @perrin
                        I apologize for the possibly clunky AI-assisted code changes—I hope it works for you too. For now it’s been running quite stably on my side. Failover is instant and stable. Thank you for bringing it to life in a more acceptable form than what I had.

                        zjamaliZ P 2 Replies Last reply Reply Quote 0
                        • zjamaliZ Offline
                          zjamali @w0w
                          last edited by

                          @w0w Can these changes merged with original git repo so i can test it out?

                          w0wW 1 Reply Last reply Reply Quote 0
                          • w0wW Offline
                            w0w @zjamali
                            last edited by

                            @zjamali
                            1000033679.jpg
                            Diagnostic - Edit File select
                            /usr/local/sbin/pppoe_ha_event.php
                            You can just replace the content of the file with one stored in archive.

                            1 Reply Last reply Reply Quote 0
                            • P Offline
                              perrin @w0w
                              last edited by perrin

                              Thanks for updating the script and testing.

                              @w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                              MASTER bring-up path updated: on MASTER we now first try ifconfig <real pppoeX> up if the PPPoE interface already exists; if it doesn’t, we fall back to pfSctl -c 'interface reload <friendly>' (e.g., wan). (Original only triggered the pfSctl reload path.)

                              does that work in your case? In my tests doing a ifconfig xxx up did not connect the interface. Can you confirm if ifconfig up is sufficient in your case?

                              @w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                              BACKUP/INIT handling refined: on BACKUP/INIT we bring the real PPPoE interface down. On INIT we first re-read actual CARP state; only bring the PPPoE real iface down if the current state is truly BACKUP. Actually ignores init state, only backup brings pppoeX down.

                              I remember that ignoring INIT state caused a problem which leads to both firewalls trying to connect to PPPoE, that is why I handled INIT in the same way as BACKUP to prevent an unclear state.

                              @w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                              Smarter reconcile rules: if MASTER and PPPoE already has a valid IPv4 P2P or global IPv6 address, do nothing; if BACKUP, ensure the real PPPoE iface is down.

                              That already is be the current functionality of the function get_pppoe_status

                              @w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                              Staged targeted reconciles: after ~30s (still MASTER) run a focused reconcile; after another ~30s run a safety reconcile. These checks act only if state truly differs (see next point).

                              I'd really love to come around any time delay based method. Time delays are never accurate under all circumstances and can cause issues with different configurations. They way it is implemented is quite stable using a file for syncing the script calls but it would be much cleaner if we could avoid running some background tasks in case of failover. I'd like to handle the devd events as purely as pfSense itself does that internally with pure pfSctl calls.

                              Can we try to understand why the time delay in your configuration is the better approach as compared to the pure event based approach?

                              w0wW 1 Reply Last reply Reply Quote 0
                              • w0wW Offline
                                w0w @perrin
                                last edited by w0w

                                @perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                                does that work in your case?

                                At first it didn’t work because the status changed from INIT to BACKUP within just a few milliseconds. And every time php followed it to put down pppoex.

                                It seems this caused if_pppoe or some pfSense code to get stuck in an unknown state; sometimes I even noticed that the IPv6 address remained on the interface.

                                Now it is working just fine, reconnecting in just seconds.

                                @perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                                I remember that ignoring INIT state caused a problem which leads to both firewalls trying to connect to PPPoE, that is why I handled INIT in the same way as BACKUP to prevent an unclear state.

                                It looks like it never happened to me, but maybe I need more tests to be done.

                                @perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                                That already is be the current functionality of the function get_pppoe_status

                                Yep, possible that this part is unnecessary or AI just listed one of my earliest changes for some reason. Will check it later.

                                @perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                                I'd really love to come around any time delay based method. Time delays are never accurate under all circumstances and can cause issues with different configurations. They way it is implemented is quite stable using a file for syncing the script calls but it would be much cleaner if we could avoid running some background tasks in case of failover. I'd like to handle the devd events as purely as pfSense itself does that internally with pure pfSctl calls.

                                Can we try to understand why the time delay in your configuration is the better approach as compared to the pure event based approach?

                                I think this is an incorrect description of what actually happens. The script still handles devd events as before. However, when a master event occurs, it brings pppoex up without delay. Then, it ignores devd events for 30 seconds to give the system some time to stabilize, and afterwards checks the status.

                                If any devd events were missed during this time, we simply repeat the reconciliation process and again ignore events for 30 seconds to allow the system to stabilize. After that, the script continues listening for events.

                                This logic can definitely be improved.

                                In my case, I can't just listen to events continuously, because after connecting to the ISP, I receive a backup status for a very short time. This causes the firewall to enter a continuous loop of connecting and disconnecting.

                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post
                                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.