WAN VIP Troubles



  • After this latest 2.4.4 update, I've been having some concerns with the reliability of one pfsense box. I decided to set up a second and walked through the HA setup documentation. I created a dedicated VLAN for pfSync and XMLRPC sync. I set up pf1 as 10.0.<subnet>.2/24 and pf2 as .3/24. Then I added a CARP address to each subnet at 10.0.<subnet>.1/24. (side note: in my network, subnets/vlans us the same id--10.0.<subnet/vlan>.0/24)

    At this point, I was using DHCP on the WAN side so I could only have noisy fail-overs, but they worked. If pf1 went down, pf2 would become CARP master, and a noisy recovery ensued.

    I switched ISPs so that I could get some affordable addresses. They provided me three addresses from a /24 pool. Setting up the static addresses and switching providers was relatively seamless.

    I was really super stoked to add that CARP address to the WAN interface, but once I did, things stopped rolling out so smooth. As soon as the address was added, traffic started taking a crap. I tried to do some diagnostics and came to the assumption I must be squatting on an in-use VRID, although this is a weak area for me so my assumption means very little.

    I contacted my ISP and they said that they block absolutely nothing at all and they also said, "You are the only customer on this CMTS that is using VRRP. Any range between 1-255 for your VRID would be acceptable."

    I was able to recreate this by shutting down pf2 and spinning up a pfsense install from scratch, doing no other configuration than the basic setup (no internal redundancy, dedicated test vlan), adding the static addresses to wan and carp. With that simplified verification done, I brought pf2 back up and then I disabled pfSync and I disabled XMLRPC sync for the virtual addresses. The goal- keep the network running on pf1, test on pf2. (seems to be working)

    As soon as I add the carp vip to pf2, I see about 70-90% packet loss from "WAN" in the pf2 debug/ping page. However, I see 0% loss from "VIP". If I update outbound nat rules to send traffic for vlan:31 [*], I see traffic leave, but never return. If I try to wget against a server, I see packets hit the web server and then leave, but never show back up on my side of the modem.

    [*] A VLAN/Subnet used for testing that I moved from pf1 to pf2.

    This is what I'm trying to achieve: https://i.imgur.com/GGzXLs0.png
    (I have the cellular modem disabled while I troubleshoot this issue)

    This is a bit of an older picture of what it physically looks like: https://i.imgur.com/THXtGhw.jpg
    (most notably- pf1 and modem (and pf2) use a dedicated wan vlan)

    I've done some more reading so now I'm setting up more packet captures to re-test. I'll update with anything interesting that I find. I'd love some advice for what to look for or whatever else I may have screwed up.


  • Netgate

    Going to have to slow down and take things one at a time.

    When you put your primary WAN interface on one address, your secondary WAN interface on another, and your CARP VIP on the third, is the Primary the CARP MASTER, and the Secondary the CARP BACKUP?

    You can packet capture on your WAN to see if anyone else is using CARP/VRRP. That's pretty much the only way to know.