Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Public WAN VIP failing after 20 minutes

    Scheduled Pinned Locked Moved HA/CARP/VIPs
    15 Posts 4 Posters 2.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • jimpJ Offline
      jimp Rebel Alliance Developer Netgate
      last edited by

      20 minutes is a common ARP table timeout. Could be an IP conflict or something upstream that isn't correctly picking up the CARP VIP's MAC from ARP.

      Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

      Need help fast? Netgate Global Support!

      Do not Chat/PM for help!

      1 Reply Last reply Reply Quote 0
      • B Offline
        BCSE
        last edited by

        So this could be something on the ISP side?

        At the moment we have 5 LAN subnets. All of them are having VIP's and seems to work fine. All of the WAN & LAN traffic is passing a HP-1810 switch. Mostly with VLAN's. I can see the VIP MAC addresses on the switch. As far as i can see nothing conflicts.

        The ISP side is a /29 subnet. Currently using 2 IP's on the actual WAN interfaces and one for the VIP.

        Hardware = PC Engines APU1D4

        I removed the WAN VIP and configured it again with one of the other IP addresses in the /29 range but that didn't help.

        1 Reply Last reply Reply Quote 0
        • C Offline
          cmb
          last edited by

          IP or MAC conflict (or other issue) the likely cause. Save and apply sends a gratuitous ARP, which will clear up some problems along those lines for a period of time.

          Since it happens on multiple IPs, assuming you don't have those IPs assigned elsewhere, try changing the VHID so the virtual MAC changes.

          1 Reply Last reply Reply Quote 0
          • B Offline
            BCSE
            last edited by

            I already changed the VHID (from 1 to 10) when i reconfigured the WAN VIP and changed the IP address. Sorry for not mentioning that. Still failing after 20 minutes. Tried to change the VHID again to 99 and hit the save button. Again failing after 20 minutes.

            So i removed the VIP again. I changed the WAN IP of the 2nd FW to the IP address i used for the WAN VIP. This one is still working.
            I recreated the VIP. Used random not in use VHID (202). And took the IP address that i was using for the 2nd FW WAN ip. Again failing after 20 minutes.

            Old config
            FW1 WAN IP xx.xx.xx.82/29 <- Working
            FW2 WAN IP xx.xx.xx.83/29 <- Working
            VIP WAN IP xx.xx.xx.84/29 also tried 85 & 86 <- All failing after 20 min.

            Current config
            FW1 WAN IP xx.xx.xx.82/29 <- Working
            FW2 WAN IP xx.xx.xx.84/29 <- Working
            VIP WAN IP xx.xx.xx.83/29 <- Failing after 20 min.

            IP adresses are only assigned to the FW's. I tried the complete /29 range by now. By switching to several VHID's the MAC address was changed every time. Still not working… :(

            1 Reply Last reply Reply Quote 0
            • C Offline
              cmb
              last edited by

              Packet capture and filter on the affected IP, what happens?

              1 Reply Last reply Reply Quote 0
              • B Offline
                BCSE
                last edited by

                Wireshark

                1 0.000000 PcEngine_XX:XX:X8 Broadcast ARP 42 Gratuitous ARP for XX.XX.XX.83 (Request)
                2 0.001040 PcEngine_XX:XX:Xc Broadcast ARP 60 Gratuitous ARP for XX.XX.XX.83 (Request)

                I removed all the ping request & reply's here.

                2445 1198.885005 CiscoSpv_XX:XX:Xb Broadcast ARP 60 Who has XX.XX.XX.83?  Tell XX.XX.XX.81
                2446 1198.885029 PcEngine_XX:XX:X8 CiscoSpv_XX:XX:Xb ARP 42 XX.XX.XX.83 is at 00:00:XX:XX:XX:Xb

                The same capture from the pfSense packet capture field.

                09:18:47.564989 ARP, Request who-has XX.XX.XX.83 tell XX.XX.XX.83, length 28
                09:18:47.566029 ARP, Request who-has XX.XX.XX.83 tell XX.XX.XX.83, length 46

                09:38:46.449994 ARP, Request who-has XX.XX.XX.83 tell XX.XX.XX.81, length 46
                09:38:46.450018 ARP, Reply XX.XX.XX.83 is-at 00:00:XX:XX:XX:Xb, length 28

                1 Reply Last reply Reply Quote 0
                • C Offline
                  cmb
                  last edited by

                  I mean capture when it's not working, sounds like it was working fine at that point?

                  1 Reply Last reply Reply Quote 0
                  • B Offline
                    BCSE
                    last edited by

                    At the time of the capture it was working for 20 minutes as you can see. At 09:38:46 the interface is down.

                    Capture of this morning. Didn't edit and saved the VIP so the interface was still down.

                    07:19:19.562103 ARP, Request who-has XX.XX.XX.83 tell XX.XX.XX.81, length 46
                    07:19:19.562128 ARP, Reply XX.XX.XX.83 is-at 00:00:XX:XX:XX:Xb, length 28
                    07:39:19.688892 ARP, Request who-has XX.XX.XX.83 tell XX.XX.XX.81, length 46
                    07:39:19.688914 ARP, Reply XX.XX.XX.83 is-at 00:00:XX:XX:XX:Xb, length 28
                    07:59:19.398768 ARP, Request who-has XX.XX.XX.83 tell XX.XX.XX.81, length 46
                    07:59:19.398793 ARP, Reply XX.XX.XX.83 is-at 00:00:XX:XX:XX:Xb, length 28
                    08:19:19.255317 ARP, Request who-has XX.XX.XX.83 tell XX.XX.XX.81, length 46
                    08:19:19.255341 ARP, Reply XX.XX.XX.83 is-at 00:00:XX:XX:XX:Xb, length 28

                    1 Reply Last reply Reply Quote 0
                    • C Offline
                      cmb
                      last edited by

                      That's much more telling. You're not getting anything coming in on that IP. And 20 minutes is definitely the upstream ARP cache timeout. You're replying to the ARP requests correctly. That confirms the problem resides where I said it did previously, with an IP or MAC conflict. Having changed the VHID, it's probably not the MAC. My best guess is something else is replying to that ARP request as well, which you won't see from that perspective. If you have access to the next hop router, check its ARP cache when it's not working. If you don't, have your ISP check it and tell you what MAC they're showing.

                      1 Reply Last reply Reply Quote 0
                      • B Offline
                        BCSE
                        last edited by

                        Thanks for pointing us in the right direction. We tested the WAN interfaces a bit more on the WAN side by placing a machine on that side. The VIP seems to work fine.

                        So it looks like the modem of the ISP is not working properly. So we Googled som more and found another topic with the same problem, even the same ISP provider (UPC) Netherlands.

                        https://forum.pfsense.org/index.php?topic=66838.0

                        'Problem almost resolved'. Testing with the script from that topic…....

                        1 Reply Last reply Reply Quote 0
                        • C Offline
                          cmb
                          last edited by

                          Ah that's fun. Your modem is broken, that behavior is in violation of RFC 826.

                          1 Reply Last reply Reply Quote 0
                          • B Offline
                            BCSE
                            last edited by

                            Added an IP Alias for testing. That one keeps working. Not understanding why an IP Alias keeps working and a CARP ip not.

                            1 Reply Last reply Reply Quote 0
                            • C Offline
                              cmb
                              last edited by

                              Because of the diff in the way ARP is answered between them, it's perfectly valid both ways, but with broken CPE the CARP way can be problematic.

                              1 Reply Last reply Reply Quote 0
                              • C Offline
                                cr_hyland
                                last edited by

                                We also have exactly this issue with UPC Ireland.

                                No resolution as of yet no matter what we tried.

                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post
                                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.