Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Traffic is not re-routed over secondary internet connection (PPPOE), once it returns from being down.

    Scheduled Pinned Locked Moved Plus 22.05 Development Snapshots (Retired)
    32 Posts 4 Posters 2.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • B
      BNetworker
      last edited by BNetworker

      (I recently updated to 22.05b hoping this was resolved and re-tested multi-Wan after updating to 22.05.b.20220524.0600 today. )

      Short version: Traffic is not re-routed over secondary internet connection PPPOE, once it returns from being down.

      Config:

      • Primary Internet Cable (Default Gateway, primary for internal clients, backup for guest network)
      • Secondary PPPOE DSL (Primary for guest network, backup for internal clients)

      Everything works well, routes as expected, until the PPPOE connection fails and traffic is routed out the primary (Cable) connection. Once that secondary connection goes down, it will move traffic to the primary connection, that works well. But it appears to never route over the secondary when it (the PPPOE) comes back up. This is true even for new connections/clients that have zero states in the firewall before connection.

      I was hoping that the fix in 22.05b would fix that: Fixed: Services are not restarted when PPP interfaces connect #12811. (https://redmine.pfsense.org/issues/12811).

      It appears the same. If I save a rule in the firewall, or bounce the interface again, the traffic immediately goes out the secondary again. Just not doing it automatically.

      Netgate 6100
      22.05-BETA (amd64)
      22.05.b.20220524.0600

      Is there any more info I can provide to narrow this one down?

      w0wW 1 Reply Last reply Reply Quote 0
      • w0wW
        w0w @BNetworker
        last edited by

        @bnetworker
        I have run into the same issue several times and the last time I've checked that pfSense showing PPPoE as default gateway, but on the client PCs traffic still going to the backup line. Can not reproduce this issue manually, when interrupting PPPoE by unplugging parent interface โ€” everything works as expected after it's plugged back on.

        1 Reply Last reply Reply Quote 0
        • B
          BNetworker
          last edited by BNetworker

          If I'm reading this right, this is a known issue:

          • https://redmine.pfsense.org/issues/855

          With changes coming:

          • https://redmine.pfsense.org/issues/12092#note-15

          And discussion:

          • https://forum.netgate.com/topic/170690/new-state-killing-mechanisms-12092

          "This does not include killing states when a gateway recovers (e.g. when WAN1 recovers, kill states on WAN2) though the underlying changes in pf bring that closer to reality (#855)"

          My question is, this is all regarding current states. If I put a new client on the network, once the WAN 2 has recovered, in that case, with no states, I'm assuming that connection should be routed out WAN 2, but is not.

          1 Reply Last reply Reply Quote 0
          • B
            BNetworker
            last edited by

            After the WAN2 comes back online, if I go to Status / Filter Reload, and reload, this will also immediately resolve it, and the new connection goes out of WAN2.

            1 Reply Last reply Reply Quote 0
            • B
              BNetworker
              last edited by

              @w0w is right. This appears to be something not working correctly, depending on how the gateway goes down. This is different than the state killing mechanisms.

              If the interface goes down completely for WAN2 and returns, (logs show) it automatically does a
              check_reload_status 634 Reloading filter, and routes properly out WAN2 for new connections.

              If the connection goes down via packet loss though, and WAN2 is marked as down, upon return, the filter is not auto reloaded (Not mentioned in logs) and traffic for new connections is still going out the WAN1.

              Is this expected?

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                I would expect it to reload the filter since that's the mechanism by which the route is updated for new connections.

                There are two places you can check:

                In the generated pf rules file /tmp/rules.debug that shows a list of gateways and gateway groups aliases near to the top of the file. In a failover group the gateway shown there is updated and should reflect the current gateway.

                In the running ruleset. Run pfctl -vsr the rules shown will have 'route-to' applied via the assigned gateway and that should be updated.
                If that doesn't change when the PPPoE WAN comes back up traffic will not be routed via it.

                Steve

                1 Reply Last reply Reply Quote 0
                • B
                  BNetworker
                  last edited by

                  @stephenw10 - Thank you! That is super helpful. Yeah, it appears that it can't be reloading the filter after the WAN2 PPPoE comes back up:

                  When Working OK, before Failure of WAN2:

                  pass in quick on igc1 route-to (pppoe0 67.x.x.x) inet from 192.168.5.0/24 to any flags S/SA keep state label "USER_RULE: Default allow LAN to any rule" label "id:1653587590" label "gw:WAN2WAN1" ridentifier 1653587590
                    [ Evaluations: 3155      Packets: 711983    Bytes: 541613398   States: 11    ]
                    [ Inserted: uid 0 pid 45541 State Creations: 1235  ]
                  
                  # Gateways
                  GWWAN_DHCP = " route-to ( ix3 73.x.x.x ) "
                  GWWAN2_PPPOE = " route-to ( pppoe0 67.x.x.x ) "
                  GWWAN1WAN2 = "  route-to { ( ix3 73.x.x.x )  }  "
                  GWWAN2WAN1 = "  route-to { ( pppoe0 67.x.x.x )  }  "
                  

                  After WAN2 recovery, no filter reloading:

                  pass in quick on igc1 route-to (ix3 73.x.x.x) inet from 192.168.5.0/24 to any flags S/SA keep state label "USER_RULE: Default allow LAN to any rule" label "id:1653587590" label "gw:WAN2WAN1" ridentifier 1653587590
                    [ Evaluations: 3289      Packets: 737967    Bytes: 552196864   States: 45    ]
                    [ Inserted: uid 0 pid 361 State Creations: 45    ]
                  
                  # Gateways
                  GWWAN_DHCP = " route-to ( ix3 73.x.x.x ) "
                  GWWAN2_PPPOE = "  "
                  GWWAN1WAN2 = "  route-to { ( ix3 73.x.x.x )  }  "
                  GWWAN2WAN1 = "  route-to { ( ix3 73.x.x.x )  }  "
                  
                  1 Reply Last reply Reply Quote 1
                  • B
                    BNetworker
                    last edited by BNetworker

                    Interesting stuff here, no filter reload logged after WAN2 comes back up. What caught my eye is that "WAN2_PPPOE is available now, adding to routing group WAN1WAN2". I don't see any equivalent line for WAN2WAN1 gateway group.

                    A manual filter reload fixed it again. Everything back to normal.

                    May 26 17:22:44	ppp	27297	[opt1] IPCP: state change Ack-Sent --> Opened
                    May 26 17:22:44	ppp	27297	[opt1] IPCP: LayerUp
                    May 26 17:22:44	ppp	27297	[opt1] 174.x.x.x -> 67.x.x.x
                    May 26 17:22:44	check_reload_status	634	rc.newwanip starting pppoe0
                    May 26 17:22:44	ppp	27297	[opt1] IFACE: Up event
                    May 26 17:22:44	ppp	27297	[opt1] IFACE: Rename interface ng0 to pppoe0
                    May 26 17:22:44	ppp	27297	[opt1] IFACE: Add description "WAN2"
                    May 26 17:22:45	php-fpm	16062	/rc.newwanip: rc.newwanip: Info: starting on pppoe0.
                    May 26 17:22:45	php-fpm	16062	/rc.newwanip: rc.newwanip: on (IP address: 174.x.x.x) (interface: WAN2[opt1]) (real interface: pppoe0).
                    May 26 17:22:46	php-fpm	16062	/rc.newwanip: MONITOR: WAN2_PPPOE is available now, adding to routing group WAN1WAN2
                    May 26 17:22:46	php-fpm	16062	67.x.x.x|174.x.x.x|WAN2_PPPOE|7.753ms|0.132ms|0.0%|online|none
                    May 26 17:22:46	php-fpm	16062	/rc.newwanip: Gateway, NONE AVAILABLE
                    May 26 17:22:46	php-fpm	16062	/rc.newwanip: IP Address has changed, killing states on former IP Address 174.x.x.x.
                    May 26 17:22:46	php-fpm	16062	/rc.newwanip: Resyncing OpenVPN instances for interface WAN2.
                    May 26 17:22:47	php-fpm	16062	/rc.newwanip: Creating rrd update script
                    May 26 17:22:47	php-fpm	16062	/rc.newwanip: Netgate pfSense Plus package system has detected an IP change or dynamic WAN reconnection - 174.x.x.x -> 174.x.x.x - Restarting packages.
                    May 26 17:22:47	check_reload_status	634	Starting packages
                    May 26 17:22:47	php	21531	notify_monitor.php: Message sent to xxxxxxxx@gmail.com OK
                    May 26 17:22:48	php-fpm	79431	/rc.start_packages: Restarting/Starting all packages.
                    

                    I waited a while, nothing changed, so I did the manual filter reload

                    May 26 18:59:27	check_reload_status	634	Reloading filter
                    
                    1 Reply Last reply Reply Quote 0
                    • jimpJ
                      jimp Rebel Alliance Developer Netgate
                      last edited by

                      Based on the log messages from rc.newwanip the code path it was taking did run a filter_configure_sync() but maybe it was too early in the process and it needs another after reconfiguring the later items.

                      You could try a change like this to see if it makes a difference:

                      diff --git a/src/etc/rc.newwanip b/src/etc/rc.newwanip
                      index 34aa4c602d..ae2c68fa38 100755
                      --- a/src/etc/rc.newwanip
                      +++ b/src/etc/rc.newwanip
                      @@ -279,9 +279,7 @@ if (!is_ipaddr($oldip) || ($curwanip != $oldip) || file_exists("{$g['tmp_path']}
                              if (empty($config['interfaces'][$interface]['ipaddrv6'])) {
                                      unlink_if_exists("{$g['tmp_path']}/{$interface}_upstart6");
                              }
                      -} else {
                      -       /* signal filter reload */
                      -       filter_configure();
                       }
                       
                      +filter_configure();
                       ?>
                      

                      Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                      Need help fast? Netgate Global Support!

                      Do not Chat/PM for help!

                      1 Reply Last reply Reply Quote 0
                      • jimpJ
                        jimp Rebel Alliance Developer Netgate
                        last edited by

                        I just tested this in my lab on 2.7.0 snapshots and even without the above change it seems to work for me. After a PPPoE interface reconnected I checked the rules and it has the interface back in the gateway groups it should be in as the preferred tier.

                        Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                        Need help fast? Netgate Global Support!

                        Do not Chat/PM for help!

                        1 Reply Last reply Reply Quote 0
                        • B
                          BNetworker
                          last edited by

                          @jimp - Thanks for the additional info. I made the suggested change:

                          unlink_if_exists("{$g['tmp_path']}/{$interface}_upstart6");
                          	}
                          
                          }
                          
                          +filter_configure();
                          ?>
                          

                          And i'm super happy to report that the filter did reload, and the WAN2 came up in gateways. Clients are routing NEW traffic out WAN2 as expected now! :

                          May 27 08:34:35	ppp	27297	[opt1] IPCP: LayerUp
                          May 27 08:34:35	ppp	27297	[opt1] 174.x.x.x -> 67.x.x.x
                          May 27 08:34:35	check_reload_status	634	rc.newwanip starting pppoe0
                          May 27 08:34:35	ppp	27297	[opt1] IFACE: Up event
                          May 27 08:34:35	ppp	27297	[opt1] IFACE: Rename interface ng0 to pppoe0
                          May 27 08:34:35	ppp	27297	[opt1] IFACE: Add description "WAN2"
                          May 27 08:34:36	php-fpm	16062	/rc.newwanip: rc.newwanip: Info: starting on pppoe0.
                          May 27 08:34:36	php-fpm	16062	/rc.newwanip: rc.newwanip: on (IP address: 174.x.x.x) (interface: WAN2[opt1]) (real interface: pppoe0).
                          May 27 08:34:38	php-fpm	16062	/rc.newwanip: MONITOR: WAN2_PPPOE is available now, adding to routing group WAN1WAN2
                          May 27 08:34:38	php-fpm	16062	67.x.x.x|174.x.x.x|WAN2_PPPOE|7.966ms|0.128ms|0.0%|online|none
                          May 27 08:34:38	php-fpm	16062	/rc.newwanip: Gateway, NONE AVAILABLE
                          May 27 08:34:38	php-fpm	16062	/rc.newwanip: IP Address has changed, killing states on former IP Address 174.x.x.x.
                          May 27 08:34:38	php-fpm	16062	/rc.newwanip: Resyncing OpenVPN instances for interface WAN2.
                          May 27 08:34:38	php-fpm	16062	/rc.newwanip: Creating rrd update script
                          May 27 08:34:38	php-fpm	16062	/rc.newwanip: Netgate pfSense Plus package system has detected an IP change or dynamic WAN reconnection - 174.x.x.x.x -> 174.x.x.x - Restarting packages.
                          May 27 08:34:38	check_reload_status	634	Starting packages
                          May 27 08:34:38	check_reload_status	634	Reloading filter
                          
                          # Gateways
                          GWWAN_DHCP = " route-to ( ix3 73.x.x.1 ) "
                          GWWAN2_PPPOE = " route-to ( pppoe0 67.x.x.10 ) "
                          GWWAN1WAN2 = "  route-to { ( ix3 73.x.x.1 )  }  "
                          GWWAN2WAN1 = "  route-to { ( pppoe0 67.x.x.10 )  }  "
                          
                          1 Reply Last reply Reply Quote 0
                          • jimpJ
                            jimp Rebel Alliance Developer Netgate
                            last edited by

                            Interesting. I'm curious why it works OK for me here in my lab without that change.

                            Without knowing more about why it helps I'm hesitant to commit the change as-is. Though it should be reasonably safe from what I can see.

                            Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                            Need help fast? Netgate Global Support!

                            Do not Chat/PM for help!

                            1 Reply Last reply Reply Quote 0
                            • B
                              BNetworker
                              last edited by BNetworker

                              @jimp - As you can see I had accidently left the + in

                              +filter_configure();
                              

                              Funny thing is it still resolved the issue. Not sure if it still ran the command, or if it was removing the other code that allowed it to work. I took out the + and tested, still works.

                              1 Reply Last reply Reply Quote 0
                              • jimpJ
                                jimp Rebel Alliance Developer Netgate
                                last edited by

                                The + in that context is fairly harmless, it would affect the return value of the function but the return value isn't checked so it's just tossed out.

                                I made https://redmine.pfsense.org/issues/13228 to track this for the next release. For now you can add that in a system patches package entry and set it to auto-apply.

                                Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                                Need help fast? Netgate Global Support!

                                Do not Chat/PM for help!

                                1 Reply Last reply Reply Quote 1
                                • B
                                  BNetworker
                                  last edited by

                                  Thanks @jimp - Will do. Let me know if you need any more testing, or can think of a way to further troubleshoot / debug.

                                  1 Reply Last reply Reply Quote 0
                                  • B
                                    BNetworker
                                    last edited by

                                    Looking through the code, It must be matching this section:

                                    if (!is_ipaddr($oldip) || ($curwanip != $oldip) || file_exists("{$g['tmp_path']}/{$interface}_upstart4") ||
                                       (!is_ipaddrv4($config['interfaces'][$interface]['ipaddr']) && ($config['interfaces'][$interface]['ipaddr'] != 'dhcp'))) {
                                    

                                    Cause we get this in the log, which is from below that if statement:

                                    May 27 08:34:38	php-fpm	16062	/rc.newwanip: IP Address has changed, killing states on former IP Address 174.x.x.107.
                                    

                                    The filter reload that is called then is:

                                    filter_configure_sync();
                                    

                                    Since we are matching that section, we would skip this else and not actually do the filter_configure():

                                    } else {
                                    	/* signal filter reload */
                                    	filter_configure();
                                    

                                    Is the filter_configure_sync(); functionally the same as the filter_configure(); we manually put in?

                                    1 Reply Last reply Reply Quote 0
                                    • jimpJ
                                      jimp Rebel Alliance Developer Netgate
                                      last edited by

                                      Both methods end up running filter_configure_sync() but one is directly running the function and the other sends the event through the event queue which can introduce a little delay before it gets executed.

                                      Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                                      Need help fast? Netgate Global Support!

                                      Do not Chat/PM for help!

                                      1 Reply Last reply Reply Quote 0
                                      • B
                                        BNetworker
                                        last edited by BNetworker

                                        as a test, in rc.newwanip, I put it all back to default, then changed line 222 from

                                        filter_configure_sync();
                                        

                                        to

                                        filter_configure();
                                        

                                        Leaving the else at the bottom, and it also funtions correctly. In the logs I see the filter reloading much sooner, but it still works. So i'm not sure it's a timing issue. Maybe another issue it has with the filter_configure_sync(); command

                                        check_reload_status	634	Reloading filter
                                        
                                        1 Reply Last reply Reply Quote 0
                                        • jimpJ
                                          jimp Rebel Alliance Developer Netgate
                                          last edited by

                                          IIIRC it has to call filter_configure_sync() on that code path because some of the functions called after it need the data it updates to be done before they run. When using filter_configure() it may happen after which leads to other problems.

                                          Doing it again at the end is probably the safest way to handle it without (re)introducing other hard to chase down problems.

                                          Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                                          Need help fast? Netgate Global Support!

                                          Do not Chat/PM for help!

                                          B 1 Reply Last reply Reply Quote 0
                                          • B
                                            BNetworker @jimp
                                            last edited by

                                            @jimp Sure, that makes sense. I was just hoping to give you as much info as possible to try and narrow it down. I'm not too sure where to go from here to help find the root cause.

                                            Would be interesting if @w0w could re-produce the (temp) fix with his setup as well.

                                            w0wW 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.