Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    new if_pppoe Backend - getting HA/CARP to work like in MPD

    Scheduled Pinned Locked Moved Development
    60 Posts 4 Posters 6.3k Views 5 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P Offline
      perrin @w0w
      last edited by

      @w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

      And that’s a real problem. CARP can flip/flap on the interface several times per seconds

      I don't think this should happen. Normally CARP should be very stable in a working environment. In my case with all firewalls i manage CARP interfaces are never flapping without a reason and the reason only being a failure of some network device inbetween the firewalls or one of the firewalls itself.

      From the log you sent with an event each second there seems to be something wrong with your config. On my firewalls I don't see a single CARP event in days or weeks

      w0wW 1 Reply Last reply Reply Quote 0
      • w0wW Offline
        w0w @perrin
        last edited by w0w

        @perrin
        This is just switching on maintenance mode on the primary, nothing unusual.

        1 Reply Last reply Reply Quote 0
        • C Offline
          crl
          last edited by

          Hi,
          I really appreciate the time you put into this. Thanks for sharing.

          I have installed the solution. After analyzing the logs it is clear that

          • CARP transition detected
          • Slave starts PPPoE session successfully at first
          • ISP rejects authentication with Too many sessions. ISP is refusing a second PPPoE login because the old session from my master pfSense is still alive
            -Slave keeps retrying repeatedly but still no luck
            (I even waited for 2-3 minutes).

          So the slave's WAN is never up.

          How to fix / work around? Add gui option to add a startup delay on the slave, so that when CARP changes, pfSense will wait 20 seconds before starting PPPoE.

          MAC spoofing came also to my mind, but ISP can use a variety of signals to track PPPoE sessions:

          • PPP username/session state (most important)
          • PPPoE/PPPoE session id on their BRAS
          • CPE MAC address / modem association
          w0wW P 2 Replies Last reply Reply Quote 0
          • w0wW Offline
            w0w @crl
            last edited by

            @crl
            I have experimented with different variants, and I can say that using a delay is not a good solution, as I mentioned earlier, because the firewall status can change during that delay. The logic needs improvement, but I don’t have enough time to work on it right now.
            My script version handles this case much better, but it’s slower and not fully synchronized with status changes.

            The only approach I see is to avoid breaking the connection immediately when the backup status is detected. Instead, register the status, start a time-based trigger that checks the status again before executing and quits if the current status has not changed or proceeds with the action if it is changed based on the first registered status. The same applies to the master: monitor it using a time-based trigger synchronized with the first status change, and quit if the status is unchanged or perform the action and then exit. This sounds simple but it is not, because we need also to ignore status changes after first change is detected and start it again in some time after all things have happened. And this all makes me think that logic becomes too complicated and too much code used to serve this implementation.

            1 Reply Last reply Reply Quote 0
            • P Offline
              perrin @crl
              last edited by perrin

              @crl said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

              ISP rejects authentication with Too many sessions. ISP is refusing a second PPPoE login because the old session from my master pfSense is still alive
              -Slave keeps retrying repeatedly but still no luck
              (I even waited for 2-3 minutes).

              Hi,
              the same applies to my ISP. I also get a denied login at first when the slave comes up. Only in my case the ISP times out the old master session within a few minutes allowing the slave to connect.

              Whenever the master fails "badly" it is unable to end the session cleanly and will always result in the slave not able to establish a connection for the first amount of time.

              @crl said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

              So the slave's WAN is never up.

              I did not think about this case when designing the plugin cause from my understanding of PPPoE there is something called LCP keepalive which will time out a stale session at the ISP after some time. My ISP does that within seconds. Maybe your ISP has a quite lengthy setting of that timeout.

              You could try to set the same MAC address on both firewalls for the PPPoE interface and see if that helps. The session definitely is still in a different state but maybe it helps with your ISP.

              The most elegant solution however would be to syncronize the PPPoE session id, configuration values (IP addresses, gateways and so forth) between master and slave and have the slave pick up the current session. But that won't work without patching the if_pppoe itself which might be out of scope...

              w0wW C 2 Replies Last reply Reply Quote 0
              • w0wW Offline
                w0w @perrin
                last edited by

                @perrin
                How does your HA pair react if you put the master node into maintenance mode via Status → CARP → Enable Persistent Maintenance Mode (or whatever it’s called)?

                P 1 Reply Last reply Reply Quote 0
                • P Offline
                  perrin @w0w
                  last edited by

                  @w0w Enabling the Maintenance Mode on the Master raises its skew thus transitioning MASTER to BACKUP. pppoe-ha picks up the backup state an disables the interface accoringly.

                  Since i don't have a problem moving the PPPoE session, in my case the failover works as expected.

                  Maybe @crl should try that and see

                  a) if if_pppoe correctly closes the session on the master prior to disabling the interface and
                  b) if his backup can correctly establish a new PPPoE session

                  1 Reply Last reply Reply Quote 1
                  • C Offline
                    crl @perrin
                    last edited by crl

                    Please check it this workaround:
                    Github Issue - ISP side 'Too many sessions' keeping backup pfsense's WAN down

                    It solves only one use case:
                    -OK: enter and leave carp maintenance mode on manual trigger

                    -Solution requested: if a wan cable is pulled (between the wan switch and any of the pfsense devices) or if the pfsense machine is down:
                    perform MASTER --> BACKUP transition and connect pppoe on the BACKUP. Should the MASTER come back again, it shall take back the MASTER role and pppoe-reconnect on the MASTER.

                    C 1 Reply Last reply Reply Quote 1
                    • C Offline
                      crl @crl
                      last edited by

                      I tried to summarize what is going on during the switchover experiments. This is one example.

                      2a61333b-245d-4e7b-8640-dfe047400ef5-image.png

                      w0wW 1 Reply Last reply Reply Quote 1
                      • w0wW Offline
                        w0w @crl
                        last edited by

                        @crl
                        This 2:20 looks familiar to me...
                        @crl, @perrin do you both have dual stack pppoe?

                        P 1 Reply Last reply Reply Quote 0
                        • P Offline
                          perrin @w0w
                          last edited by perrin

                          @w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                          @crl, @perrin do you both have dual stack pppoe?
                          In my case yes, dual stack v4 and V6

                          @crl said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                          I tried to summarize what is going on during the switchover experiments. This is one example.

                          2a61333b-245d-4e7b-8640-dfe047400ef5-image.png

                          Some of these issues might be related to configuration and or default behavior of pfSense (e.g. when pppoe fails and you're expecting a carp switch.)
                          Do these things work as expected when you are using the old time based scripts?

                          w0wW 1 Reply Last reply Reply Quote 0
                          • w0wW Offline
                            w0w @perrin
                            last edited by

                            @perrin

                            Yes, in my setup things work somewhat differently, as you noticed. There are at least a few reasons. Most importantly, every time PPPoE comes up, the VIPs get reconfigured and CARP reinitializes. I suspect this behavior is related to IPv6 and the fact that the LAN uses the Track Interface option to obtain its IPv6 address, but I’m not certain. I’m currently trying to track down the root cause—or perhaps it’s an “incompatible” configuration.

                            How does this behave on your side? As I understand it, bringing up PPPoE does not trigger VIP reconfiguration/CARP initialization for you, right?

                            P 1 Reply Last reply Reply Quote 0
                            • P Offline
                              perrin @w0w
                              last edited by

                              @w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                              @perrin

                              How does this behave on your side? As I understand it, bringing up PPPoE does not trigger VIP reconfiguration/CARP initialization for you, right?

                              No, with my config no VIP reconfig takes place when PPPoE comes up. In my case PPPoE is running in a vlan from the provider side and I've added the carp VIP on the "physical" interface, so without a vlan tag. This only triggers when a firewall goes down or the interface goes down, which in my case is exactly what I am expecting it to do.

                              In my case I am running two Proxmox hosts each running a virtual pfSense, one being master one being slave.
                              The most common reason I need failover to happen is when we are rebooting one of the Proxmox hosts due to software upgrades. In this case the master pfSense would be shut down cleanly and the slave takes over all interfaces with the PPPoE being one of them.

                              w0wW 1 Reply Last reply Reply Quote 0
                              • w0wW Offline
                                w0w @perrin
                                last edited by

                                @perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                                In my case I am running two Proxmox hosts each running a virtual pfSense, one being master one being slave.

                                I am running the same configuration. Looks like I have found something related to this VIP reconfiguration issue. I will do some tests and report back if I find anything else.

                                1 Reply Last reply Reply Quote 0
                                • w0wW Offline
                                  w0w
                                  last edited by w0w

                                  I've experimented a lot with code, here is what I did to make it work with “buggy” config. pppoe_ha_event.php .

                                  The biggest difference is that we shouldn’t run pfSctl -c 'interface reload <friendly>' (e.g., wan) if the PPPoE interface already exists. We only do that if, for some reason, the interface doesn’t exist. The shell script does the same, by the way.
                                  Changes:

                                  • MASTER bring-up path updated: on MASTER we now first try ifconfig <real pppoeX> up if the PPPoE interface already exists; if it doesn’t, we fall back to pfSctl -c 'interface reload <friendly>' (e.g., wan). (Original only triggered the pfSctl reload path.)
                                  • CARP event suppression window: after switching to MASTER, the script temporarily ignores further CARP events (~60 seconds total in two 30s steps) to prevent flapping during stabilization.
                                  • Staged targeted reconciles: after ~30s (still MASTER) run a focused reconcile; after another ~30s run a safety reconcile. These checks act only if state truly differs (see next point).
                                  • Smarter reconcile rules: if MASTER and PPPoE already has a valid IPv4 P2P or global IPv6 address, do nothing; if BACKUP, ensure the real PPPoE iface is down.
                                  • BACKUP/INIT handling refined: on BACKUP/INIT we bring the real PPPoE interface down. On INIT we first re-read actual CARP state; only bring the PPPoE real iface down if the current state is truly BACKUP. Actually ignores init state, only backup brings pppoeX down.
                                  • Quiet periodic health check: every 5 minutes, perform a low-noise reconcile (skipped during the suppression window) to keep state honest if it missed for some reason. - this feature currently broken and I don't think iti is needed anyway

                                  @perrin
                                  I apologize for the possibly clunky AI-assisted code changes—I hope it works for you too. For now it’s been running quite stably on my side. Failover is instant and stable. Thank you for bringing it to life in a more acceptable form than what I had.

                                  zjamaliZ P 2 Replies Last reply Reply Quote 0
                                  • zjamaliZ Offline
                                    zjamali @w0w
                                    last edited by

                                    @w0w Can these changes merged with original git repo so i can test it out?

                                    w0wW 1 Reply Last reply Reply Quote 0
                                    • w0wW Offline
                                      w0w @zjamali
                                      last edited by

                                      @zjamali
                                      1000033679.jpg
                                      Diagnostic - Edit File select
                                      /usr/local/sbin/pppoe_ha_event.php
                                      You can just replace the content of the file with one stored in archive.

                                      1 Reply Last reply Reply Quote 0
                                      • P Offline
                                        perrin @w0w
                                        last edited by perrin

                                        Thanks for updating the script and testing.

                                        @w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                                        MASTER bring-up path updated: on MASTER we now first try ifconfig <real pppoeX> up if the PPPoE interface already exists; if it doesn’t, we fall back to pfSctl -c 'interface reload <friendly>' (e.g., wan). (Original only triggered the pfSctl reload path.)

                                        does that work in your case? In my tests doing a ifconfig xxx up did not connect the interface. Can you confirm if ifconfig up is sufficient in your case?

                                        @w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                                        BACKUP/INIT handling refined: on BACKUP/INIT we bring the real PPPoE interface down. On INIT we first re-read actual CARP state; only bring the PPPoE real iface down if the current state is truly BACKUP. Actually ignores init state, only backup brings pppoeX down.

                                        I remember that ignoring INIT state caused a problem which leads to both firewalls trying to connect to PPPoE, that is why I handled INIT in the same way as BACKUP to prevent an unclear state.

                                        @w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                                        Smarter reconcile rules: if MASTER and PPPoE already has a valid IPv4 P2P or global IPv6 address, do nothing; if BACKUP, ensure the real PPPoE iface is down.

                                        That already is be the current functionality of the function get_pppoe_status

                                        @w0w said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                                        Staged targeted reconciles: after ~30s (still MASTER) run a focused reconcile; after another ~30s run a safety reconcile. These checks act only if state truly differs (see next point).

                                        I'd really love to come around any time delay based method. Time delays are never accurate under all circumstances and can cause issues with different configurations. They way it is implemented is quite stable using a file for syncing the script calls but it would be much cleaner if we could avoid running some background tasks in case of failover. I'd like to handle the devd events as purely as pfSense itself does that internally with pure pfSctl calls.

                                        Can we try to understand why the time delay in your configuration is the better approach as compared to the pure event based approach?

                                        w0wW 1 Reply Last reply Reply Quote 0
                                        • w0wW Offline
                                          w0w @perrin
                                          last edited by w0w

                                          @perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                                          does that work in your case?

                                          At first it didn’t work because the status changed from INIT to BACKUP within just a few milliseconds. And every time php followed it to put down pppoex.

                                          It seems this caused if_pppoe or some pfSense code to get stuck in an unknown state; sometimes I even noticed that the IPv6 address remained on the interface.

                                          Now it is working just fine, reconnecting in just seconds.

                                          @perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                                          I remember that ignoring INIT state caused a problem which leads to both firewalls trying to connect to PPPoE, that is why I handled INIT in the same way as BACKUP to prevent an unclear state.

                                          It looks like it never happened to me, but maybe I need more tests to be done.

                                          @perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                                          That already is be the current functionality of the function get_pppoe_status

                                          Yep, possible that this part is unnecessary or AI just listed one of my earliest changes for some reason. Will check it later.

                                          @perrin said in new if_pppoe Backend - getting HA/CARP to work like in MPD:

                                          I'd really love to come around any time delay based method. Time delays are never accurate under all circumstances and can cause issues with different configurations. They way it is implemented is quite stable using a file for syncing the script calls but it would be much cleaner if we could avoid running some background tasks in case of failover. I'd like to handle the devd events as purely as pfSense itself does that internally with pure pfSctl calls.

                                          Can we try to understand why the time delay in your configuration is the better approach as compared to the pure event based approach?

                                          I think this is an incorrect description of what actually happens. The script still handles devd events as before. However, when a master event occurs, it brings pppoex up without delay. Then, it ignores devd events for 30 seconds to give the system some time to stabilize, and afterwards checks the status.

                                          If any devd events were missed during this time, we simply repeat the reconciliation process and again ignore events for 30 seconds to allow the system to stabilize. After that, the script continues listening for events.

                                          This logic can definitely be improved.

                                          In my case, I can't just listen to events continuously, because after connecting to the ISP, I receive a backup status for a very short time. This causes the firewall to enter a continuous loop of connecting and disconnecting.

                                          1 Reply Last reply Reply Quote 0
                                          • w0wW Offline
                                            w0w
                                            last edited by w0w

                                            https://github.com/woffko/pfSense-pppoe-ha/blob/main/pfSense-pkg-pppoe-ha/stage/usr/local/sbin/pppoe_ha_event.php

                                            A bit improved code and logic.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.