Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Intermitent loss of WAN routing

    General pfSense Questions
    3
    13
    692
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A
      AGawthrope @michmoor
      last edited by

      @michmoor Hi. With 'Gateway Monitoring' enabled and 'Gateway Monitoring Action' disabled. I'm seeing the intermitent WAN routing issue described. i.e. generating an ARP Request appears to bring the reception of traffic on the WAN interface back to life.

      Andrew

      M 1 Reply Last reply Reply Quote 0
      • M
        michmoor LAYER 8 Rebel Alliance @AGawthrope
        last edited by

        @AGawthrope what IP are you monitoring? Something on the internet or your providers gateway (public IP i would assume)

        Firewall: NetGate,Palo Alto-VM,Juniper SRX
        Routing: Juniper, Arista, Cisco
        Switching: Juniper, Arista, Cisco
        Wireless: Unifi, Aruba IAP
        JNCIP,CCNP Enterprise

        A 1 Reply Last reply Reply Quote 0
        • A
          AGawthrope @michmoor
          last edited by AGawthrope

          @michmoor The service providers gateway. That known to pfSense+ as its default gateway and obtained via DHCP.

          1 Reply Last reply Reply Quote 0
          • A
            AGawthrope @AGawthrope
            last edited by AGawthrope

            A little further info from some analysis of the 'dpinger' log over the last six days. The problem has occured each day and healed itself after 20mins of outage. The times in the table are those from the log i.e. 'Status|System Logs|System|Gateways' so I'm not surprised the difference between the first Alarm latency message and the corresponding Clear latency isn't exactly 20mins.

            Each outage is very close to 20mins or the ARP refresh timeout.....

            Edit: I should add that there were no alarm messages logged between the 1st to 3rd June.

            Screenshot 2024-06-06 at 17.44.12.png

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              If you see outgoing traffic but no replies that sounds like an ARP issue upstream. Perhaps something else sometimes using your IP address.

              One thing you can try is setting net.link.ether.inet.max_age to something shorter than 1200 and seeing what difference that makes.

              A 1 Reply Last reply Reply Quote 0
              • A
                AGawthrope @stephenw10
                last edited by

                @stephenw10 Thank you. Possibly, but that feels a little esoteric. My earlier investigations looked at the provision of an IP address and the ISP DHCP server always provides me with the same IPv4 address. If it was also providing it to another then I'd expect some variance from time to time.

                Would I be correct in thinking that I could set the max_age variable via the System Tunables page?

                Thanks
                Andrew

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Yes you can set that as a system tunable.

                  It doesn't have to be the ISP handing out your address via DHCP. Just some other device sending ARP packets with your IP. Potentially.

                  A 1 Reply Last reply Reply Quote 1
                  • A
                    AGawthrope @stephenw10
                    last edited by

                    @stephenw10 Thanks for that and understood. Getting anything sensible from the ISP regarding duplicate IP's is a non-starter. Their technical 'support' haven't even heard of IPv6! I'll leave changing the ARP timeout for a few days as I'm keen to see if there is any pattern to the problem.

                    I'm also keen to hear what others may suggest.

                    Thanks
                    Andrew

                    1 Reply Last reply Reply Quote 0
                    • A
                      AGawthrope @AGawthrope
                      last edited by

                      I wanted to post an update to close-off this thread for now.

                      Further analysis has identified that the problem is only occuring Monday through Friday and during working hours. This alone makes me suspicious that its an ISP triggered event that is the root cause.

                      Nominally my pfSense+ WAN interface receives an ARP Request from the ISP virtual gateway/router (the client facing interface of a VRRP group router) every 60 secs. pfSense+ is configured with a 1200sec ARP table timeout. Because pfSense+ relearns the MAC address of the ISP gateway/router from these 60sec exchanges it does not originate its own ARP Request every 1200secs. So when the intermitent problem occurs - which it is still doing - and ALL packets, including ARP Requests from the ISP virtual gateway/router cease the ARP table entry in pfSense+ ages for 19/20mins until its expiry at which point pfSense+ originates an ARP Request and nominal service/routing is fully restored - including the 60sec reception of ARP Requests from the ISP.

                      If during the period when no traffic is being received from the ISP and before the pfSense+ ARP table timeout of the ISP gateway/router, any traffic is sent to the ISP gateway/router no response will be received.

                      On expiry of the ARP table entry in pfSense+ or by forcing pfSense+ to originate a new ARP Request (for any ISP host on the same subnet as the pfSense+ WAN interface that is not already in the pfSense+ ARP table) nominal service is immediately restored with the resumption of all incoming traffic - including the 60sec, ISP ARP Requests.

                      For now, I'm thinking that something is occurring on the ISP side which stops them originating ARP Requests and they quickly loose layer-2 addressing knowledge of my pfSense+ interface. Hence when pfSense+ transmits an ARP request the ISP is able to relearn the Ethernet and IPv4 address of the pfSense+ WAN interface and thus able to start passing traffic again.

                      As trying to communicate this to the ISP will be hell, I plan to explore all possible causes on my side. So my next step is to monitor all traffic passing between the Netgate 4100 and the ISP ONT on a separate computer with promiscusous interface. If when the problem occurs, I don't see incoming traffic then I'm confident its an ISP problem; If I do see incoming traffic then I know it's a pfSense+ problem.

                      @stephenw10 suggested reducing the ARP timeout period. I'm confident this would reduce the duration during which no traffic is received but the timeout would need to be very short and thus would create a high volume of ARP traffic to provide a usable workaround.
                      For now I'm running a script which ping's the ISP default router/gateway and when no response is received it ping's a different host on the same subnet and which is not in the pfSense+ ARP table. This restores service within the ping period (1 sec) making a usable workaround.

                      Andrew

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Fun! If you just have something pinging continually against another host does that also fail? Since it would already be in the arp table, if it's a real host.

                        A 1 Reply Last reply Reply Quote 0
                        • A
                          AGawthrope @stephenw10
                          last edited by

                          @stephenw10 Yes, indeed :-). When pinging something continually and the problem occurs it will fail until pfSense+ ages and renews the ARP table entry or, as with my script, any ARP Request containing the layer-2 and layer-3 addresses of the pfSense+ WAN interface is transmitted to the ISP.

                          Thanks @stephenw10.

                          Andrew

                          1 Reply Last reply Reply Quote 1
                          • First post
                            Last post
                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.