Is this normal or is something broken? wan failure with specific configuration.
-
setup:
pfsense no packages installed other than "RRD_Summary".
VPN client connection to a commercial VPN provider.
unbound in forward mode to external dns providergateways are WAN_DHCP and OPT1_VPNV4
- both gateways set to "disable gateway monitoring action" and "do not kill states on gateway failure"
- default gateway set to "WAN_DHCP"
misc settiongs are set to: "Don't kill states from firewall itself" and "do not kill states on gateway failure"
"Do not create rules when gateway is down" is checked on.
The behavior i am experiencing is that if i create a policy routing rule with WAN_DHCP as the gateway, pfsense becomes unstable.
definition of unstable.
wan drops off and refuses to route to default gateway "wan_dhcp" address. however for a period of time other internet ip's are accessible.
eventually, all internet traffic stop. and arp request to gateway show as incomplete.
soon after that the web console and ssh become inaccessible.resolutions to bring back online:
one or more of these work depending on how far things have gone in the behavior.- navigate to the WAN interface and without changing anything simply save and apply the settings.
- restart dhcp client via ssh killall and dhclient <interface> commands
- ifconfig down and ifconfig up the wan interface. (sometime works with lan interface as well)
- reboot the pfsense box.
If i remove all policy routing rule with WAN_DHCP as the gateway,
Things are stable, no routing issues. automatically recovers when my crappy isp drops stuff.This appears to be a routing issue as i am utilizing ip addresses in all of my testing as opposed to dns.
however, i am at best a novice at network administration.i currently have figure out how to configure a stable envioronment, but i am curious if this is expected behavior or if there is some sort of bug.
Anyone have any thoughts related to this?
I am happy to supply additional information if needed. -
Anything logged when this starts to happen? Check the System, Routing and gateway logs.
What pfSense version are you running?
Steve
-
This post is deleted! -
@stephenw10
The monitoring log stating loss of packets on both gateways as it self implodes. Nothing in the logs at all other than that. You see stuff from minutes or hours before the event.as the event progresses in severity, you get stuff related to dpinger not having routes due to arp incomplete, but I believe that to be a symptom of the issue that starts earlier.
You see normal stuff as it restarts from one of the above actions. nothing before the action though that stands out.
Check routes and they are all still there. (at least until I lose ssh and web console, then who knows)
---- server information ----
Version 24.03-RELEASE (amd64)
built on Mon May 13 7:17:00 CDT 2024
FreeBSD 15.0-CURRENTCPU Type 13th Gen Intel(R) Core(TM) i5-1340P
16 CPUs
AES-NI CPU Crypto: Yes (active)
IPsec-MB Crypto: Yes (inactive)
QAT Crypto: NoLoad average
0.05, 0.06, 0.06
CPU usage
1%
Memory usage
2% of 65022 MiB
SWAP usage
0% of 1024 MiB -
Hmm, so the gateways disappear from the ARP table?
What are you using for monitor IPs on the gateways?
How exactly are you applying the policy routing?
-
Initially, gateway ip arp is fine, then eventually it goes to "incomplete"
Immediate indication that it is going to fail is that dpinger stops getting replies for dhcp_wan. Sometimes the vpn will reconnect and dpinger will go green on the vpn while it is red on dhcp_wan. Then eventually it all fails.
Using isp gateway for monitoring ip is the fastest reporter of "there are about to be issues" but i also tried isp dns or 8.8.8.8
Failure was slower to be reported with these but end result was unchanged.Switched around trying to find monitoring ip that was stable, I believe currently it is at 8.8.8.8 on dhcp_wan and the internal 10.x.x.x address of the vpn dns server for OPT1_VPNV4
No monitoring ip had any real effect on stability, just changed how quickly a report of impending failure was surfaced.
policy routing:
traffic to specific destination host are routed out WAN_DHCP
and
traffic from specific source host on lan are routed out WAN_DHCPthen I redirect traffic from specific ports or from specific lan hosts out through vpn: OPT1_VPNV4 only (there is a drop rule for these ports/host afterwards so they can't go directly out WAN_DHCP) this is the reason for enabling "Do not create rules when gateway is down"
DNS rule to route to OPT1_VPNV4 from PIhole server on lan, with a rule directly below to allow out WAN_DHCP (so i can get initial dns functionality direct out until vpn comes online)
last rule in list is to route all remaining out OPT1_VPNV4 (catch all lan to wan rule)
These are the same policy routing rules I am using now. with stable configuration. only difference is that I do not specifically identify the target route for WAN_DHCP rules, I leave it as default and it is stable.
-
something new i see in my logs now that i did not see with the specific wan_dhcp gateway on the rules.
i see a bunch of these. which don't seem to be a problem. just new.
sharing in case it means anything.Jul 16 09:08:12 dpinger 54030 WAN_DHCP 8.8.8.8: duplicate echo reply received
Jul 16 09:08:11 dpinger 54030 WAN_DHCP 8.8.8.8: duplicate echo reply received
Jul 16 09:08:11 dpinger 54030 WAN_DHCP 8.8.8.8: duplicate echo reply received
Jul 16 09:08:10 dpinger 54030 WAN_DHCP 8.8.8.8: duplicate echo reply received
Jul 16 09:08:10 dpinger 54030 WAN_DHCP 8.8.8.8: duplicate echo reply received
Jul 16 09:08:09 dpinger 54030 WAN_DHCP 8.8.8.8: duplicate echo reply received
Jul 16 09:08:09 dpinger 54030 WAN_DHCP 8.8.8.8: duplicate echo reply received
Jul 16 09:08:08 dpinger 54030 WAN_DHCP 8.8.8.8: duplicate echo reply received -
Hmm, those are all rules on the LAN I assume?
And with the WAN policy rules in place you still have a default route via the WAN too?
-
@stephenw10
yes all lan rules.yes default route change from automatic to wan_dhcp just like it is now.
and if I run a netstat -rn I can clearly see that the wan interface is the default route.wan rules have just one rule opening an external port inbound.
(no specific allow rule for outbound traffic) but traffic does go out evidently. (assuming some sort of automatic rule is happening)no floating rules
1 rule in opt1 allowing opt1 address any port any destination to gateway opt_vpnv4
-
Hmm, that duplicate echo seems odd. This feels like some upstream ARP issue but I have no idea how policy routing could cause that.
I would try to replicate it then check the states at the time see to make sure traffic is still being sent out of the correct interface.
If nothing obvious appears there then run a pcap on WAN to see what (if anythign) is leaving there.
-
@stephenw10 the duplicate echo, is with the stable configuration. current. what works. I believe is an ISP issue.
When i had the wan gateway on the rules. never actually saw duplicate reply, but again things were unstable and just occasionally crashed.