PFsense stops sending traffic after upgrade
-
@stephenw10 Just on the LAN side, despite LAN and WAN going into the same X550-T2 NIC. There's entries in the ARP table for the WAN side but the LAN one just shows as incomplete.
Running a pcap on the pfsense box, I can see us asking and getting replies ARPing for the gateway, but I don't see replies to us being asked for our mac address. -
Hmm, but you do see ARP requests for the CARP VIP coming into the LAN?
-
Running a pcap on the pfsense box I can see ARP request and response for the upstream gateway, but only requests with no response for the CARP VIP.
Bit of background, as this system is running captive portal for a widely dispersed network, clients are not L2-adjacent with the LAN. There's routers running VRRP as the gateway on both LAN and WAN side. But this hasn't changed and has worked for many years, starting with 2.4.x and upgrading since.
-
Hmm, I'm not aware of anything known that would do that.
I assume the CARP traffic itself between the nodes continues as normal? The correct nodes remain as master/backup?
-
CARP does appear to be working correctly, correct node is master, when I reboot it everything fails over to the backup and then fails back.
On the CARP status page each node only lists itself under State Creator Host IDs, but that doesn't appear to be affecting it working. -
Hmm, and to recover from this state resaving the interface allows it? Or reconnecting the cable?
-
As I'm remote I've mostly ended up rebooting as it's easier. But on one occasion I did log into the connected switch and downed and upped the ports which brought it back to life.
-
Hmm, seeing nothing specific that would apply here but the fact you're running captive portal seems suspicious. Especially because 2.7.0 was the first version that used pf instead of ipfw for captive portal.
Do you have captive portal running on the LAN interface directly? Where the CARP VIP is? -
Yes, captive portal is on the LAN interface with CARP.
I was wondering as it seems to stop talking ARP if the issue is lower down. Presumably the jump from FreeBSD 12 to 14 involved some changes down in the drivers? -
Well yes a lot of things changed including drivers. But also a lot of people are running CARP without issue. Far fewer are running CARP and captive portal.
Check the generated ruleset in /tmp/rules.debug.
You should see entries for both captive portal and CARP. CARP should always be passed unless it's a reflected packet from itself. The fact it only fails after some time implies something must be expiring or changing somewhere. If it was just captive portal it would simply be blocked immediately.
I assume you don't see anything in the firewall logs when it stops?
-
The ruleset in rules.debug looks okay.
CARP rules
block in quick proto carp from (self) to any ridentifier 1000000201
pass quick proto carp ridentifier 1000000202 no stateCaptive Portal
pass in quick on ix0 proto tcp from any to <cpzoneid_2_cpips> port 8002 ridentifier 13004 keep state(sloppy)
pass out quick on ix0 proto tcp from 192.76.7.4 port 8002 to any flags any ridentifier 13005 keep state(sloppy)
block in quick on ix0 from any to ! <cpzoneid_2_cpips> ! tagged cpzoneid_2_auth ridentifier 13006ix0 is the LAN interface.
Nothing in system.log around the time it's broken. The gateway.log at the time it's broken is full of lines like:
dpinger 76248 - - LAN_GW <ip address>: sendto error: 64But I think that's somewhat expected? The networking has broken so it's not able to ping the gateway.
-
Hmm, yes the fact it's ARPing for the LAN side gateway and the gateway is responding but it's NOT in the pfSense table does seem to point at the NIC not passing traffic. At least inbound.
Yet it appears in a packet capture so the driver is seeing it.