Public WAN VIP failing after 20 minutes
-
Packet capture and filter on the affected IP, what happens?
-
Wireshark
1 0.000000 PcEngine_XX:XX:X8 Broadcast ARP 42 Gratuitous ARP for XX.XX.XX.83 (Request)
2 0.001040 PcEngine_XX:XX:Xc Broadcast ARP 60 Gratuitous ARP for XX.XX.XX.83 (Request)I removed all the ping request & reply's here.
2445 1198.885005 CiscoSpv_XX:XX:Xb Broadcast ARP 60 Who has XX.XX.XX.83? Tell XX.XX.XX.81
2446 1198.885029 PcEngine_XX:XX:X8 CiscoSpv_XX:XX:Xb ARP 42 XX.XX.XX.83 is at 00:00:XX:XX:XX:XbThe same capture from the pfSense packet capture field.
09:18:47.564989 ARP, Request who-has XX.XX.XX.83 tell XX.XX.XX.83, length 28
09:18:47.566029 ARP, Request who-has XX.XX.XX.83 tell XX.XX.XX.83, length 4609:38:46.449994 ARP, Request who-has XX.XX.XX.83 tell XX.XX.XX.81, length 46
09:38:46.450018 ARP, Reply XX.XX.XX.83 is-at 00:00:XX:XX:XX:Xb, length 28 -
I mean capture when it's not working, sounds like it was working fine at that point?
-
At the time of the capture it was working for 20 minutes as you can see. At 09:38:46 the interface is down.
Capture of this morning. Didn't edit and saved the VIP so the interface was still down.
07:19:19.562103 ARP, Request who-has XX.XX.XX.83 tell XX.XX.XX.81, length 46
07:19:19.562128 ARP, Reply XX.XX.XX.83 is-at 00:00:XX:XX:XX:Xb, length 28
07:39:19.688892 ARP, Request who-has XX.XX.XX.83 tell XX.XX.XX.81, length 46
07:39:19.688914 ARP, Reply XX.XX.XX.83 is-at 00:00:XX:XX:XX:Xb, length 28
07:59:19.398768 ARP, Request who-has XX.XX.XX.83 tell XX.XX.XX.81, length 46
07:59:19.398793 ARP, Reply XX.XX.XX.83 is-at 00:00:XX:XX:XX:Xb, length 28
08:19:19.255317 ARP, Request who-has XX.XX.XX.83 tell XX.XX.XX.81, length 46
08:19:19.255341 ARP, Reply XX.XX.XX.83 is-at 00:00:XX:XX:XX:Xb, length 28 -
That's much more telling. You're not getting anything coming in on that IP. And 20 minutes is definitely the upstream ARP cache timeout. You're replying to the ARP requests correctly. That confirms the problem resides where I said it did previously, with an IP or MAC conflict. Having changed the VHID, it's probably not the MAC. My best guess is something else is replying to that ARP request as well, which you won't see from that perspective. If you have access to the next hop router, check its ARP cache when it's not working. If you don't, have your ISP check it and tell you what MAC they're showing.
-
Thanks for pointing us in the right direction. We tested the WAN interfaces a bit more on the WAN side by placing a machine on that side. The VIP seems to work fine.
So it looks like the modem of the ISP is not working properly. So we Googled som more and found another topic with the same problem, even the same ISP provider (UPC) Netherlands.
https://forum.pfsense.org/index.php?topic=66838.0
'Problem almost resolved'. Testing with the script from that topic…....
-
Ah that's fun. Your modem is broken, that behavior is in violation of RFC 826.
-
Added an IP Alias for testing. That one keeps working. Not understanding why an IP Alias keeps working and a CARP ip not.
-
Because of the diff in the way ARP is answered between them, it's perfectly valid both ways, but with broken CPE the CARP way can be problematic.
-
We also have exactly this issue with UPC Ireland.
No resolution as of yet no matter what we tried.