Traffic is not re-routed over secondary internet connection (PPPOE), once it returns from being down.
-
@jimp - Thanks for the additional info. I made the suggested change:
unlink_if_exists("{$g['tmp_path']}/{$interface}_upstart6"); } } +filter_configure(); ?>
And i'm super happy to report that the filter did reload, and the WAN2 came up in gateways. Clients are routing NEW traffic out WAN2 as expected now! :
May 27 08:34:35 ppp 27297 [opt1] IPCP: LayerUp May 27 08:34:35 ppp 27297 [opt1] 174.x.x.x -> 67.x.x.x May 27 08:34:35 check_reload_status 634 rc.newwanip starting pppoe0 May 27 08:34:35 ppp 27297 [opt1] IFACE: Up event May 27 08:34:35 ppp 27297 [opt1] IFACE: Rename interface ng0 to pppoe0 May 27 08:34:35 ppp 27297 [opt1] IFACE: Add description "WAN2" May 27 08:34:36 php-fpm 16062 /rc.newwanip: rc.newwanip: Info: starting on pppoe0. May 27 08:34:36 php-fpm 16062 /rc.newwanip: rc.newwanip: on (IP address: 174.x.x.x) (interface: WAN2[opt1]) (real interface: pppoe0). May 27 08:34:38 php-fpm 16062 /rc.newwanip: MONITOR: WAN2_PPPOE is available now, adding to routing group WAN1WAN2 May 27 08:34:38 php-fpm 16062 67.x.x.x|174.x.x.x|WAN2_PPPOE|7.966ms|0.128ms|0.0%|online|none May 27 08:34:38 php-fpm 16062 /rc.newwanip: Gateway, NONE AVAILABLE May 27 08:34:38 php-fpm 16062 /rc.newwanip: IP Address has changed, killing states on former IP Address 174.x.x.x. May 27 08:34:38 php-fpm 16062 /rc.newwanip: Resyncing OpenVPN instances for interface WAN2. May 27 08:34:38 php-fpm 16062 /rc.newwanip: Creating rrd update script May 27 08:34:38 php-fpm 16062 /rc.newwanip: Netgate pfSense Plus package system has detected an IP change or dynamic WAN reconnection - 174.x.x.x.x -> 174.x.x.x - Restarting packages. May 27 08:34:38 check_reload_status 634 Starting packages May 27 08:34:38 check_reload_status 634 Reloading filter
# Gateways GWWAN_DHCP = " route-to ( ix3 73.x.x.1 ) " GWWAN2_PPPOE = " route-to ( pppoe0 67.x.x.10 ) " GWWAN1WAN2 = " route-to { ( ix3 73.x.x.1 ) } " GWWAN2WAN1 = " route-to { ( pppoe0 67.x.x.10 ) } "
-
Interesting. I'm curious why it works OK for me here in my lab without that change.
Without knowing more about why it helps I'm hesitant to commit the change as-is. Though it should be reasonably safe from what I can see.
-
@jimp - As you can see I had accidently left the + in
+filter_configure();
Funny thing is it still resolved the issue. Not sure if it still ran the command, or if it was removing the other code that allowed it to work. I took out the + and tested, still works.
-
The
+
in that context is fairly harmless, it would affect the return value of the function but the return value isn't checked so it's just tossed out.I made https://redmine.pfsense.org/issues/13228 to track this for the next release. For now you can add that in a system patches package entry and set it to auto-apply.
-
Thanks @jimp - Will do. Let me know if you need any more testing, or can think of a way to further troubleshoot / debug.
-
Looking through the code, It must be matching this section:
if (!is_ipaddr($oldip) || ($curwanip != $oldip) || file_exists("{$g['tmp_path']}/{$interface}_upstart4") || (!is_ipaddrv4($config['interfaces'][$interface]['ipaddr']) && ($config['interfaces'][$interface]['ipaddr'] != 'dhcp'))) {
Cause we get this in the log, which is from below that if statement:
May 27 08:34:38 php-fpm 16062 /rc.newwanip: IP Address has changed, killing states on former IP Address 174.x.x.107.
The filter reload that is called then is:
filter_configure_sync();
Since we are matching that section, we would skip this else and not actually do the filter_configure():
} else { /* signal filter reload */ filter_configure();
Is the filter_configure_sync(); functionally the same as the filter_configure(); we manually put in?
-
Both methods end up running
filter_configure_sync()
but one is directly running the function and the other sends the event through the event queue which can introduce a little delay before it gets executed. -
as a test, in rc.newwanip, I put it all back to default, then changed line 222 from
filter_configure_sync();
to
filter_configure();
Leaving the else at the bottom, and it also funtions correctly. In the logs I see the filter reloading much sooner, but it still works. So i'm not sure it's a timing issue. Maybe another issue it has with the filter_configure_sync(); command
check_reload_status 634 Reloading filter
-
IIIRC it has to call
filter_configure_sync()
on that code path because some of the functions called after it need the data it updates to be done before they run. When usingfilter_configure()
it may happen after which leads to other problems.Doing it again at the end is probably the safest way to handle it without (re)introducing other hard to chase down problems.
-
-
@bnetworker
Can you provide steps to reproduce this issue?
I am asking, because I have had this issue several times, but did not find how to trigger it. It does not happening every time when PPPoE connection is down even if it's ISP failure or whatever. -
The way I can trigger it (100% of the time) here is to drop (unplug) the DSL line going into the modem/bridge. Then plug it back in. It will re-negotiate and them I'm stuck with the blank gateway. As you said, If you drop Ethernet (from modem/bridge to Netgate box), it's been functioning correctly.
-
@bnetworker
I have plain PPPoE, no modem, just ethernet cable. I'll try some other methods tomorrow, I hope, and let you know. -
@bnetworker
No, I can not re-produce this on the 22.05.b.20220524.1701, what build you have now? -
22.05.b.20220524.0600, but I've had this issue on every recent version. So, it may be a difference in config that is causing the issue. My setup is
DSL -> Modem in Bridge Mode (Carrier VLAN setup here) -> PFSense (Auth here)
-
@bnetworker
How did you configure the default gateway? Mine is configured as group and using tiers to prioritize which one is the primary. -
@w0w -
Yes the overall default gateway is my primary gateway group, WAN1WAN2, with WAN 1 having tier 1 priority, WAN2 Tier 2.
But... configured in the firewall for INSIDE, I have explicitly setup the WAN1WAN2 gateway group as being their default gateway. The Guest network explicitly has WAN2WAN1.
Now that the filters are reloading at the end of the rc.newwanip, I've had zero failover issues. It's been working great.
-
@bnetworker
I've similar configuration and anyway I've tried โ I don't have this re-routing issue on the last build without any patching. -
@w0w - It would be interesting to see if when your PPPoE returns, if you see "Filter Reload" in your logs. Mine does not, until I put in the manual workaround.
-
@bnetworker
Looks like yes... but not sure...
When it's going downJun 4 07:11:46 php-fpm 61963 /rc.openvpn: MONITOR: WAN_PPPOE has packet loss, omitting from routing group WAN_FAIL_BACK Jun 4 07:11:46 check_reload_status 47693 Reloading filter
When it's UP
Jun 4 07:17:25 check_reload_status 47693 Reloading filter Jun 4 07:17:25 check_reload_status 47693 Restarting OpenVPN tunnels/interfaces Jun 4 07:17:25 check_reload_status 47693 Restarting IPsec tunnels Jun 4 07:17:25 check_reload_status 47693 updating dyndns HENETV6_TUNNELV6 Jun 4 07:17:25 rc.gateway_alarm 16761 >>> Gateway alarm: HENETV6_TUNNELV6 (Addr:x001:xx0:27:191::1 Alarm:1 RTT:0.000ms RTTsd:0.000ms Loss:100%) Jun 4 07:17:23 php-fpm 93505 /rc.newwanip: Removing static route for monitor 8.8.8.8 and adding a new route through x0.0.x00.1 Jun 4 07:17:22 php-fpm 93505 /rc.newwanip: Default gateway setting Interface HENETV6_TUNNELV6 Gateway as default. Jun 4 07:17:22 php-fpm 93505 ---xxx.xxx.xxx.xxx---|---xxx.xxx.xxx.xxx---|WAN_PPPOE|0.254ms|0.022ms|0.0%|online|none Jun 4 07:17:22 php-fpm 93505 /rc.newwanip: MONITOR: WAN_PPPOE is available now, adding to routing group WAN_FAIL_BACK Jun 4 07:17:22 kernel gif0: link state changed to UP Jun 4 07:17:22 kernel gif0: link state changed to DOWN ********* Jun 4 07:17:20 ppp 17338 [wan] IFACE: Up event Jun 4 07:17:20 check_reload_status 47693 rc.newwanip starting pppoe0 *********
This "Reloading filter" appears several times, not just PPPoE, but also Ipv6 tunneling and IPSEC, Openvpn (Resyncing OpenVPN instances for interface WAN) and so on, and I have other "spam" in logs too, like snort. So sometimes it's very difficult to understand what was exactly happened.