Traffic is not re-routed over secondary internet connection (PPPOE), once it returns from being down.
-
(I recently updated to 22.05b hoping this was resolved and re-tested multi-Wan after updating to 22.05.b.20220524.0600 today. )
Short version: Traffic is not re-routed over secondary internet connection PPPOE, once it returns from being down.
Config:
- Primary Internet Cable (Default Gateway, primary for internal clients, backup for guest network)
- Secondary PPPOE DSL (Primary for guest network, backup for internal clients)
Everything works well, routes as expected, until the PPPOE connection fails and traffic is routed out the primary (Cable) connection. Once that secondary connection goes down, it will move traffic to the primary connection, that works well. But it appears to never route over the secondary when it (the PPPOE) comes back up. This is true even for new connections/clients that have zero states in the firewall before connection.
I was hoping that the fix in 22.05b would fix that: Fixed: Services are not restarted when PPP interfaces connect #12811. (https://redmine.pfsense.org/issues/12811).
It appears the same. If I save a rule in the firewall, or bounce the interface again, the traffic immediately goes out the secondary again. Just not doing it automatically.
Netgate 6100
22.05-BETA (amd64)
22.05.b.20220524.0600Is there any more info I can provide to narrow this one down?
-
@bnetworker
I have run into the same issue several times and the last time I've checked that pfSense showing PPPoE as default gateway, but on the client PCs traffic still going to the backup line. Can not reproduce this issue manually, when interrupting PPPoE by unplugging parent interface โ everything works as expected after it's plugged back on. -
If I'm reading this right, this is a known issue:
- https://redmine.pfsense.org/issues/855
With changes coming:
- https://redmine.pfsense.org/issues/12092#note-15
And discussion:
- https://forum.netgate.com/topic/170690/new-state-killing-mechanisms-12092
"This does not include killing states when a gateway recovers (e.g. when WAN1 recovers, kill states on WAN2) though the underlying changes in pf bring that closer to reality (#855)"
My question is, this is all regarding current states. If I put a new client on the network, once the WAN 2 has recovered, in that case, with no states, I'm assuming that connection should be routed out WAN 2, but is not.
-
After the WAN2 comes back online, if I go to Status / Filter Reload, and reload, this will also immediately resolve it, and the new connection goes out of WAN2.
-
@w0w is right. This appears to be something not working correctly, depending on how the gateway goes down. This is different than the state killing mechanisms.
If the interface goes down completely for WAN2 and returns, (logs show) it automatically does a
check_reload_status 634 Reloading filter
, and routes properly out WAN2 for new connections.If the connection goes down via packet loss though, and WAN2 is marked as down, upon return, the filter is not auto reloaded (Not mentioned in logs) and traffic for new connections is still going out the WAN1.
Is this expected?
-
I would expect it to reload the filter since that's the mechanism by which the route is updated for new connections.
There are two places you can check:
In the generated pf rules file
/tmp/rules.debug
that shows a list of gateways and gateway groups aliases near to the top of the file. In a failover group the gateway shown there is updated and should reflect the current gateway.In the running ruleset. Run
pfctl -vsr
the rules shown will have 'route-to' applied via the assigned gateway and that should be updated.
If that doesn't change when the PPPoE WAN comes back up traffic will not be routed via it.Steve
-
@stephenw10 - Thank you! That is super helpful. Yeah, it appears that it can't be reloading the filter after the WAN2 PPPoE comes back up:
When Working OK, before Failure of WAN2:
pass in quick on igc1 route-to (pppoe0 67.x.x.x) inet from 192.168.5.0/24 to any flags S/SA keep state label "USER_RULE: Default allow LAN to any rule" label "id:1653587590" label "gw:WAN2WAN1" ridentifier 1653587590 [ Evaluations: 3155 Packets: 711983 Bytes: 541613398 States: 11 ] [ Inserted: uid 0 pid 45541 State Creations: 1235 ]
# Gateways GWWAN_DHCP = " route-to ( ix3 73.x.x.x ) " GWWAN2_PPPOE = " route-to ( pppoe0 67.x.x.x ) " GWWAN1WAN2 = " route-to { ( ix3 73.x.x.x ) } " GWWAN2WAN1 = " route-to { ( pppoe0 67.x.x.x ) } "
After WAN2 recovery, no filter reloading:
pass in quick on igc1 route-to (ix3 73.x.x.x) inet from 192.168.5.0/24 to any flags S/SA keep state label "USER_RULE: Default allow LAN to any rule" label "id:1653587590" label "gw:WAN2WAN1" ridentifier 1653587590 [ Evaluations: 3289 Packets: 737967 Bytes: 552196864 States: 45 ] [ Inserted: uid 0 pid 361 State Creations: 45 ]
# Gateways GWWAN_DHCP = " route-to ( ix3 73.x.x.x ) " GWWAN2_PPPOE = " " GWWAN1WAN2 = " route-to { ( ix3 73.x.x.x ) } " GWWAN2WAN1 = " route-to { ( ix3 73.x.x.x ) } "
-
Interesting stuff here, no filter reload logged after WAN2 comes back up. What caught my eye is that "WAN2_PPPOE is available now, adding to routing group WAN1WAN2". I don't see any equivalent line for WAN2WAN1 gateway group.
A manual filter reload fixed it again. Everything back to normal.
May 26 17:22:44 ppp 27297 [opt1] IPCP: state change Ack-Sent --> Opened May 26 17:22:44 ppp 27297 [opt1] IPCP: LayerUp May 26 17:22:44 ppp 27297 [opt1] 174.x.x.x -> 67.x.x.x May 26 17:22:44 check_reload_status 634 rc.newwanip starting pppoe0 May 26 17:22:44 ppp 27297 [opt1] IFACE: Up event May 26 17:22:44 ppp 27297 [opt1] IFACE: Rename interface ng0 to pppoe0 May 26 17:22:44 ppp 27297 [opt1] IFACE: Add description "WAN2" May 26 17:22:45 php-fpm 16062 /rc.newwanip: rc.newwanip: Info: starting on pppoe0. May 26 17:22:45 php-fpm 16062 /rc.newwanip: rc.newwanip: on (IP address: 174.x.x.x) (interface: WAN2[opt1]) (real interface: pppoe0). May 26 17:22:46 php-fpm 16062 /rc.newwanip: MONITOR: WAN2_PPPOE is available now, adding to routing group WAN1WAN2 May 26 17:22:46 php-fpm 16062 67.x.x.x|174.x.x.x|WAN2_PPPOE|7.753ms|0.132ms|0.0%|online|none May 26 17:22:46 php-fpm 16062 /rc.newwanip: Gateway, NONE AVAILABLE May 26 17:22:46 php-fpm 16062 /rc.newwanip: IP Address has changed, killing states on former IP Address 174.x.x.x. May 26 17:22:46 php-fpm 16062 /rc.newwanip: Resyncing OpenVPN instances for interface WAN2. May 26 17:22:47 php-fpm 16062 /rc.newwanip: Creating rrd update script May 26 17:22:47 php-fpm 16062 /rc.newwanip: Netgate pfSense Plus package system has detected an IP change or dynamic WAN reconnection - 174.x.x.x -> 174.x.x.x - Restarting packages. May 26 17:22:47 check_reload_status 634 Starting packages May 26 17:22:47 php 21531 notify_monitor.php: Message sent to xxxxxxxx@gmail.com OK May 26 17:22:48 php-fpm 79431 /rc.start_packages: Restarting/Starting all packages.
I waited a while, nothing changed, so I did the manual filter reload
May 26 18:59:27 check_reload_status 634 Reloading filter
-
Based on the log messages from
rc.newwanip
the code path it was taking did run afilter_configure_sync()
but maybe it was too early in the process and it needs another after reconfiguring the later items.You could try a change like this to see if it makes a difference:
diff --git a/src/etc/rc.newwanip b/src/etc/rc.newwanip index 34aa4c602d..ae2c68fa38 100755 --- a/src/etc/rc.newwanip +++ b/src/etc/rc.newwanip @@ -279,9 +279,7 @@ if (!is_ipaddr($oldip) || ($curwanip != $oldip) || file_exists("{$g['tmp_path']} if (empty($config['interfaces'][$interface]['ipaddrv6'])) { unlink_if_exists("{$g['tmp_path']}/{$interface}_upstart6"); } -} else { - /* signal filter reload */ - filter_configure(); } +filter_configure(); ?>
-
I just tested this in my lab on 2.7.0 snapshots and even without the above change it seems to work for me. After a PPPoE interface reconnected I checked the rules and it has the interface back in the gateway groups it should be in as the preferred tier.
-
@jimp - Thanks for the additional info. I made the suggested change:
unlink_if_exists("{$g['tmp_path']}/{$interface}_upstart6"); } } +filter_configure(); ?>
And i'm super happy to report that the filter did reload, and the WAN2 came up in gateways. Clients are routing NEW traffic out WAN2 as expected now! :
May 27 08:34:35 ppp 27297 [opt1] IPCP: LayerUp May 27 08:34:35 ppp 27297 [opt1] 174.x.x.x -> 67.x.x.x May 27 08:34:35 check_reload_status 634 rc.newwanip starting pppoe0 May 27 08:34:35 ppp 27297 [opt1] IFACE: Up event May 27 08:34:35 ppp 27297 [opt1] IFACE: Rename interface ng0 to pppoe0 May 27 08:34:35 ppp 27297 [opt1] IFACE: Add description "WAN2" May 27 08:34:36 php-fpm 16062 /rc.newwanip: rc.newwanip: Info: starting on pppoe0. May 27 08:34:36 php-fpm 16062 /rc.newwanip: rc.newwanip: on (IP address: 174.x.x.x) (interface: WAN2[opt1]) (real interface: pppoe0). May 27 08:34:38 php-fpm 16062 /rc.newwanip: MONITOR: WAN2_PPPOE is available now, adding to routing group WAN1WAN2 May 27 08:34:38 php-fpm 16062 67.x.x.x|174.x.x.x|WAN2_PPPOE|7.966ms|0.128ms|0.0%|online|none May 27 08:34:38 php-fpm 16062 /rc.newwanip: Gateway, NONE AVAILABLE May 27 08:34:38 php-fpm 16062 /rc.newwanip: IP Address has changed, killing states on former IP Address 174.x.x.x. May 27 08:34:38 php-fpm 16062 /rc.newwanip: Resyncing OpenVPN instances for interface WAN2. May 27 08:34:38 php-fpm 16062 /rc.newwanip: Creating rrd update script May 27 08:34:38 php-fpm 16062 /rc.newwanip: Netgate pfSense Plus package system has detected an IP change or dynamic WAN reconnection - 174.x.x.x.x -> 174.x.x.x - Restarting packages. May 27 08:34:38 check_reload_status 634 Starting packages May 27 08:34:38 check_reload_status 634 Reloading filter
# Gateways GWWAN_DHCP = " route-to ( ix3 73.x.x.1 ) " GWWAN2_PPPOE = " route-to ( pppoe0 67.x.x.10 ) " GWWAN1WAN2 = " route-to { ( ix3 73.x.x.1 ) } " GWWAN2WAN1 = " route-to { ( pppoe0 67.x.x.10 ) } "
-
Interesting. I'm curious why it works OK for me here in my lab without that change.
Without knowing more about why it helps I'm hesitant to commit the change as-is. Though it should be reasonably safe from what I can see.
-
@jimp - As you can see I had accidently left the + in
+filter_configure();
Funny thing is it still resolved the issue. Not sure if it still ran the command, or if it was removing the other code that allowed it to work. I took out the + and tested, still works.
-
The
+
in that context is fairly harmless, it would affect the return value of the function but the return value isn't checked so it's just tossed out.I made https://redmine.pfsense.org/issues/13228 to track this for the next release. For now you can add that in a system patches package entry and set it to auto-apply.
-
Thanks @jimp - Will do. Let me know if you need any more testing, or can think of a way to further troubleshoot / debug.
-
Looking through the code, It must be matching this section:
if (!is_ipaddr($oldip) || ($curwanip != $oldip) || file_exists("{$g['tmp_path']}/{$interface}_upstart4") || (!is_ipaddrv4($config['interfaces'][$interface]['ipaddr']) && ($config['interfaces'][$interface]['ipaddr'] != 'dhcp'))) {
Cause we get this in the log, which is from below that if statement:
May 27 08:34:38 php-fpm 16062 /rc.newwanip: IP Address has changed, killing states on former IP Address 174.x.x.107.
The filter reload that is called then is:
filter_configure_sync();
Since we are matching that section, we would skip this else and not actually do the filter_configure():
} else { /* signal filter reload */ filter_configure();
Is the filter_configure_sync(); functionally the same as the filter_configure(); we manually put in?
-
Both methods end up running
filter_configure_sync()
but one is directly running the function and the other sends the event through the event queue which can introduce a little delay before it gets executed. -
as a test, in rc.newwanip, I put it all back to default, then changed line 222 from
filter_configure_sync();
to
filter_configure();
Leaving the else at the bottom, and it also funtions correctly. In the logs I see the filter reloading much sooner, but it still works. So i'm not sure it's a timing issue. Maybe another issue it has with the filter_configure_sync(); command
check_reload_status 634 Reloading filter
-
IIIRC it has to call
filter_configure_sync()
on that code path because some of the functions called after it need the data it updates to be done before they run. When usingfilter_configure()
it may happen after which leads to other problems.Doing it again at the end is probably the safest way to handle it without (re)introducing other hard to chase down problems.
-