CARPS fails over but downloads don't resume/sessions drop

  • As stated… failover works fine.  Doing a ping session, pulling a cable, CARP fails over to the 2nd box (both identical sg-8860).  But testing a download with curl does not resume/continue... and that isn't what I expected.  Reading some other forum postings, it seems most likely that 1) I'm not syncing session/state (but I've dbl-checked, checkbox is selected), or 2) the outbound NAT is not a VIP - but this test was to a 1:1 NAT host (in the 'dmz'), so I don't see how that could be possibility either.  Anything else I can check?  Or that I might be missing?  I will say this has two WAN interfaces, and quite a few VIPs (something like 90 each), not certain that plays a role.  I just recently saw where you can pick 1 CARP interface to be the main and make the others an IP Alias with the CARP as parent.  Happy to try that if it is thought that might help.

    And, as I purchased these units from pfsense store, I could burn a support ticket on it.  Willing to, right now we're limping along with the current FW 'solution' (cough), and I want to get this done and into place very soon.

  • LAYER 8 Netgate

    The number of states on the backup unit should roughly mirror the master unit.

    You can also run Diagnostics > States on the backup unit to be sure the state you're testing is there.

    You need to make sure outbound NAT is using the WAN CARP VIP and the LAN side hosts are using the LAN CARP VIP as their default gateway.

    I tested this many, many times doing things like watching live video streaming etc. It's not always totally hitless but the state sync seems to work really well.

  • Thanks, I'll check the Diag->States.  Confirmed on the LAN CARP IP, but - as this is 1:1 NAT, I'm relatively certain that outbound NAT is using that, and not the WAN CARP IP.  The docs specifically state that any 1:1NAT overrides manual/outbound NAT… which is what confuses me, b/c it seems like that would be the issue.

  • LAYER 8 Netgate

    If you're going to 1:1 NAT it should work as long as the inside global address is a CARP VIP or a stacked IP alias. Missed the 1:1 NAT info in the OP. Sorry.

  • pfsync has to be enabled on both the primary and secondary. Usually it's not enabled on one or the other in that case, or an incorrect interface specified on one or both. Or firewall rules not allowing pfsync.

    If everything works fine after failover, but you lose states in the process, your NAT, VIP, etc. config is fine.

  • Fixed!  Thanks Derelict and cmb/Chris… it was the latter (firewall rules).  On the pfsync interface, I had allowed TCP/UDP, and not "pfsync" (as the traffic type) from  Changed that, and it worked right off (curl'ing a smallish .tgz file).

    Appreciate the responses, sometimes you just need someone else to 2nd-eye what you are doing (wrong)

  • Aaaaand… I broke it again - same behavior.  Unclear exactly how I did that.  I was putting some Snort stuff together, but even suspecting that, and disabling it, still get no-resume behavior (testing from one of the WAN interface sides.

    Interestingly, if I reverse the scenario - start downloading a file from the LAN side, pull a cable, that does resume.  So it is somehow related to the WAN side, or the number of VIPS/1:1NATs I have?  B/c WAN, DMZ, and LAN are all using CARP VIPs.  I'll do some more testing, but yes, FW2 (looking in Diag->States) does have that in there (http connection), so states are synching.