AON using carp address fills log with dropped return packets

hcoin

My firewall logs flooded with dropped tcp:SA and TCP:R dropped packets where the source addresses are all the http or imap ports of commonly visited internet websites, and the source address is always the CARP VIP on the WAN interface. This started happening when I set the outbound nat rule to use the WAN CARP VIP and not the WAN interface address.

If I reset the manual outbound nat rule to use the interface address (wan), then the flood becomes more normal looking.

The internal clients aren't complaining so some recovery process must be masking the effects. But the logs are being flooded and I prefer to not simply ignore so many dropped packets without understanding what's going on first.

Any ideas?

(Setup is basic PFsense HA, Wan <-> primary/backup PF <-> LAN )

hcoin

When there is a carp VIP on the wan side that is being used as the NAT translation address for outgoing packets, shouldn't there be a rule like:

pass out route-to ( <wan interface="" name=""><wan gateway="" ip="">) inet from <carp vip="">to ! <wan subnet="">flags s/SA keep state allow-opts

Similar to the rule for the interface ip itself? So that when the packets come back aimed at the carp vip they don't get dropped?

Yes? Maybe? What?</wan></carp></wan></wan>

cmb

@hcoin:

When there is a carp VIP on the wan side that is being used as the NAT translation address for outgoing packets, shouldn't there be a rule like:

pass out route-to ( <wan interface="" name=""><wan gateway="" ip="">) inet from <carp vip="">to ! <wan subnet="">flags s/SA keep state allow-opts

Similar to the rule for the interface ip itself? So that when the packets come back aimed at the carp vip they don't get dropped?</wan></carp></wan></wan>

No.

Some level of dropped return traffic is normal, a flood of it that continues is a problem. Without knowing more about the circumstances it's hard to guess.

hcoin

Here's an average two seconds, with only ac couple idle computers with browsers up refreshing pages and an email check. The carp vip on the WAN is … 196. Change one single AON nat rule from the carp vip to the interface address (... 192) and there's only an entry in the log every 5-10 secs or so, the usual break in attempts, etc.

23.1.4.74:80 xx.xx.xx.196:23314 TCP:SA
74.125.225.130:80 xx.xx.xx.196:47371 TCP:SA
74.125.225.136:80 xx.xx.xx.196:65389 TCP:SA
23.1.4.74:80 xx.xx.xx.196:33947 TCP:SA
74.125.225.130:80 xx.xx.xx.196:47371 TCP:SA
74.125.225.136:80 xx.xx.xx.196:65389 TCP:R
74.125.225.136:80 xx.xx.xx.196:65389 TCP:R
74.125.225.136:80 xx.xx.xx.196:65389 TCP:R
74.125.225.136:80 xx.xx.xx.196:65389 TCP:R
74.125.225.136:80 xx.xx.xx.196:65389 TCP:R
74.125.225.136:80 xx.xx.xx.196:65389 TCP:R
74.125.225.136:80 xx.xx.xx196:65389 TCP:R
65.126.84.115:443 xx.xx.xx.196:19536 TCP:R
65.126.84.115:443 xx.xx.xx.196:19536 TCP:R
74.125.225.136:80 xx.xx.xx.196:65389 TCP:R
74.125.225.136:80 xx.xx.xx.196:65389 TCP:R
74.125.225.136:80 xx.xx.xx.196:65389 TCP:R
74.125.225.136:80 xx.xx.xx.196:65389 TCP:R
65.55.7.141:443 xx.xx.xx.196:14540 TCP:R
65.55.7.141:443 xx.xx.xx196:14540 TCP:R
107.22.237.84:80 xx.xx.xx.196:45342 TCP:R
65.55.7.141:443 xx.xx.xx.196:14540 TCP:R
65.55.7.141:443 xx.xx.xx.196:14540 TCP:R
65.55.7.141:443 xx.xx.xx.196:14540 TCP:R
65.55.7.141:443 xx.xx.xx.196:14540 TCP:SA
107.22.237.84:80 xx.xx.xx.196:45342 TCP:R
107.22.237.84:80 xx.xx.xx.196:45342 TCP:R
69.171.237.32:80 xx.xx.xx.196:44632 TCP:SA
107.22.237.84:80 xx.xx.xx.196:45342 TCP:SA
184.73.228.94:80 xx.xx.xx.196:7486 TCP:R
184.73.228.94:80 xx.xx.xx.196:7486 TCP:R

Setup is 2 wan (one tier 2, the other tier 1) <-> 2 PF boxes, release, typical HA with private lan for pfsync <-> LAN , DMZ.

If I read the above correctly, the NAT / routing system isn't keeping track that it sent packets via the carp VIP on the WAN ,so that when packets come in reply they are just blocked. Users aren't seeing much as retires appear to succeed. But the logs are so flooded as to be useless and clearly response must be suffering. Change the NAT rule from the CARP VIP to the native interface and it all goes back to normal.

It must be obvious but as it's the default rule that's generating all the above blocks I'm sure missing it. Could there be some setting on an outgoing rule that tells it not to keep state info about anything sent via carp on the wan some of the time? Hardly seems possible.

cmb

That looks like you have asymmetric routing somehow, your egress traffic going out a different path than ingress. IP conflict possibly or something else going on there.

hcoin

Major clue. Maybe it's about vip arps surrounding master/slave transistions, or maybe pfsync issues between primary and backup routers to do with VIP addresses.

cmb's comment about 'asymmetric' seems on target diagnostically, it was what I thought at first also and I went half crazy trying track packets through the master router (the one owning the carp vip's for the lan and wan). As the previous postings show, I had no joy at all. Log activity showed presumed TCP outbound connections using the carp VIP on the wan led to replies by the cloud web servers which were then dropped as if the outgoing state was never registered. Change the nat outbound rule on the master to use the interface address and– all good, logs normal. No clue to be had whatever within the master pf box.

So, if the asymmetric routes cmb mentions weren't happening in the master pf box where all the traffic should be consolidated, was it possibly in the box that was supposed to be just the quiet backup? All carp VIP's in backup mode according to the dashboard?

I turned off the slave PF box. In theory that should be a difference that makes no difference, taking a 'backup' carp interface offline on the wan and lan shouldn't matter at all. But it did: The logs while the back up was off are all normal on the master.

Bring the 'backup' back online, still all carp VIP's on the backup showing 'backup' and on the primary as 'master' -- and the log flood noted above resumes on the primary/master pf box.

I notice that neither the master nor backup pf boxes ARP tables have entries for the LAN or WAN CARP VIP. Not the wrong entry, no entry. I did

arp -s <vip ip=""><vip link="" level="" addr="">temp pub
for the lan and wan vips on the primary and backup pf boxes, log flood continued as before.

I disabled the 'WAN' interface on the BACKUP box, log flood stops. Re-enable it, back they come.

I disconnect the 'WAN' ethernet cable on the backup box (but don't disable the interface in PF), and the log on the master floods with LAN dropped packet complaints like

dropped: 74.125.142.16:993 192.168.29.102:64248 TCP:PA
some web addr lan client

What's going on? Hope these clues help.</vip></vip>

hcoin

The current theory is this effect is connected with the lan being bridged to an openvpn TAP interface, even when the openvpn server has no connections.