Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Is this normal or is something broken? wan failure with specific configuration.

    Scheduled Pinned Locked Moved General pfSense Questions
    11 Posts 2 Posters 376 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      mikek
      last edited by mikek

      setup:
      pfsense no packages installed other than "RRD_Summary".
      VPN client connection to a commercial VPN provider.
      unbound in forward mode to external dns provider

      gateways are WAN_DHCP and OPT1_VPNV4

      • both gateways set to "disable gateway monitoring action" and "do not kill states on gateway failure"
      • default gateway set to "WAN_DHCP"

      misc settiongs are set to: "Don't kill states from firewall itself" and "do not kill states on gateway failure"

      "Do not create rules when gateway is down" is checked on.

      The behavior i am experiencing is that if i create a policy routing rule with WAN_DHCP as the gateway, pfsense becomes unstable.
      definition of unstable.
      wan drops off and refuses to route to default gateway "wan_dhcp" address. however for a period of time other internet ip's are accessible.
      eventually, all internet traffic stop. and arp request to gateway show as incomplete.
      soon after that the web console and ssh become inaccessible.

      resolutions to bring back online:
      one or more of these work depending on how far things have gone in the behavior.

      1. navigate to the WAN interface and without changing anything simply save and apply the settings.
      2. restart dhcp client via ssh killall and dhclient <interface> commands
      3. ifconfig down and ifconfig up the wan interface. (sometime works with lan interface as well)
      4. reboot the pfsense box.

      If i remove all policy routing rule with WAN_DHCP as the gateway,
      Things are stable, no routing issues. automatically recovers when my crappy isp drops stuff.

      This appears to be a routing issue as i am utilizing ip addresses in all of my testing as opposed to dns.
      however, i am at best a novice at network administration.

      i currently have figure out how to configure a stable envioronment, but i am curious if this is expected behavior or if there is some sort of bug.

      Anyone have any thoughts related to this?
      I am happy to supply additional information if needed.

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Anything logged when this starts to happen? Check the System, Routing and gateway logs.

        What pfSense version are you running?

        Steve

        M 2 Replies Last reply Reply Quote 0
        • M
          mikek @stephenw10
          last edited by

          This post is deleted!
          1 Reply Last reply Reply Quote 0
          • M
            mikek @stephenw10
            last edited by

            @stephenw10
            The monitoring log stating loss of packets on both gateways as it self implodes. Nothing in the logs at all other than that. You see stuff from minutes or hours before the event.

            as the event progresses in severity, you get stuff related to dpinger not having routes due to arp incomplete, but I believe that to be a symptom of the issue that starts earlier.

            You see normal stuff as it restarts from one of the above actions. nothing before the action though that stands out.

            Check routes and they are all still there. (at least until I lose ssh and web console, then who knows)

            ---- server information ----
            Version 24.03-RELEASE (amd64)
            built on Mon May 13 7:17:00 CDT 2024
            FreeBSD 15.0-CURRENT

            CPU Type 13th Gen Intel(R) Core(TM) i5-1340P
            16 CPUs
            AES-NI CPU Crypto: Yes (active)
            IPsec-MB Crypto: Yes (inactive)
            QAT Crypto: No

            Load average
            0.05, 0.06, 0.06
            CPU usage
            1%
            Memory usage
            2% of 65022 MiB
            SWAP usage
            0% of 1024 MiB

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              Hmm, so the gateways disappear from the ARP table?

              What are you using for monitor IPs on the gateways?

              How exactly are you applying the policy routing?

              1 Reply Last reply Reply Quote 0
              • M
                mikek
                last edited by

                Initially, gateway ip arp is fine, then eventually it goes to "incomplete"

                Immediate indication that it is going to fail is that dpinger stops getting replies for dhcp_wan. Sometimes the vpn will reconnect and dpinger will go green on the vpn while it is red on dhcp_wan. Then eventually it all fails.

                Using isp gateway for monitoring ip is the fastest reporter of "there are about to be issues" but i also tried isp dns or 8.8.8.8
                Failure was slower to be reported with these but end result was unchanged.

                Switched around trying to find monitoring ip that was stable, I believe currently it is at 8.8.8.8 on dhcp_wan and the internal 10.x.x.x address of the vpn dns server for OPT1_VPNV4

                No monitoring ip had any real effect on stability, just changed how quickly a report of impending failure was surfaced.

                policy routing:
                traffic to specific destination host are routed out WAN_DHCP
                and
                traffic from specific source host on lan are routed out WAN_DHCP

                then I redirect traffic from specific ports or from specific lan hosts out through vpn: OPT1_VPNV4 only (there is a drop rule for these ports/host afterwards so they can't go directly out WAN_DHCP) this is the reason for enabling "Do not create rules when gateway is down"

                DNS rule to route to OPT1_VPNV4 from PIhole server on lan, with a rule directly below to allow out WAN_DHCP (so i can get initial dns functionality direct out until vpn comes online)

                last rule in list is to route all remaining out OPT1_VPNV4 (catch all lan to wan rule)

                These are the same policy routing rules I am using now. with stable configuration. only difference is that I do not specifically identify the target route for WAN_DHCP rules, I leave it as default and it is stable.

                M 1 Reply Last reply Reply Quote 0
                • M
                  mikek @mikek
                  last edited by

                  something new i see in my logs now that i did not see with the specific wan_dhcp gateway on the rules.
                  i see a bunch of these. which don't seem to be a problem. just new.
                  sharing in case it means anything.

                  Jul 16 09:08:12 dpinger 54030 WAN_DHCP 8.8.8.8: duplicate echo reply received
                  Jul 16 09:08:11 dpinger 54030 WAN_DHCP 8.8.8.8: duplicate echo reply received
                  Jul 16 09:08:11 dpinger 54030 WAN_DHCP 8.8.8.8: duplicate echo reply received
                  Jul 16 09:08:10 dpinger 54030 WAN_DHCP 8.8.8.8: duplicate echo reply received
                  Jul 16 09:08:10 dpinger 54030 WAN_DHCP 8.8.8.8: duplicate echo reply received
                  Jul 16 09:08:09 dpinger 54030 WAN_DHCP 8.8.8.8: duplicate echo reply received
                  Jul 16 09:08:09 dpinger 54030 WAN_DHCP 8.8.8.8: duplicate echo reply received
                  Jul 16 09:08:08 dpinger 54030 WAN_DHCP 8.8.8.8: duplicate echo reply received

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by stephenw10

                    Hmm, those are all rules on the LAN I assume?

                    And with the WAN policy rules in place you still have a default route via the WAN too?

                    M 1 Reply Last reply Reply Quote 0
                    • M
                      mikek @stephenw10
                      last edited by mikek

                      @stephenw10
                      yes all lan rules.

                      yes default route change from automatic to wan_dhcp just like it is now.
                      and if I run a netstat -rn I can clearly see that the wan interface is the default route.

                      wan rules have just one rule opening an external port inbound.
                      (no specific allow rule for outbound traffic) but traffic does go out evidently. (assuming some sort of automatic rule is happening)

                      no floating rules

                      1 rule in opt1 allowing opt1 address any port any destination to gateway opt_vpnv4

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Hmm, that duplicate echo seems odd. This feels like some upstream ARP issue but I have no idea how policy routing could cause that.

                        I would try to replicate it then check the states at the time see to make sure traffic is still being sent out of the correct interface.

                        If nothing obvious appears there then run a pcap on WAN to see what (if anythign) is leaving there.

                        M 1 Reply Last reply Reply Quote 0
                        • M
                          mikek @stephenw10
                          last edited by mikek

                          @stephenw10 the duplicate echo, is with the stable configuration. current. what works. I believe is an ISP issue.

                          When i had the wan gateway on the rules. never actually saw duplicate reply, but again things were unstable and just occasionally crashed.

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post
                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.