strange connectivity errors in HA

prakti

Hi experts,

i've a pfsense+ HA cluster. Both members are connected directly per fiber for carp / pfsync. Further, both devices are connected redundant to the lan side (lagg0.x) and also to the wan side (lagg1.200). On wan side there's only one VIP for the public IP, on lan side 6 VIPs exist for different transfer networks to different lan areas.

In vlan 200 , also the ISP routers speaking HRRP for their virtual gateway address. HA - CARP + VIPs are working fine. The pfsenses seems doing what they should.

Now the problem :

per diagnostic -> ping / traceroute each firewall member can reach eg. 8.8.8.8 or the public IPs of the ISP routers or each other per phsyically IPs correctly, IF i don't set manually an source interface. If i set the source NIC (or vlan IP interface) the second HA member can't reach (ping / traceroute) anything outside. Not the first HA member, nor the ISP routers, nothing.

If i do the same on the first HA member, everything is working EXCEPT reaching the publich ip of the second HA member.

While rebooting the primary HA member, the second member get the master role for all VIPs, can ping everything outside (without setting a source interface), but no other traffic arrive the wan area.

It seems, don't see the wood for the trees :-/
Sorry for my german english ;-)

Any ideas where to start ?

prakti

Some more hints after debugging the problem above:
after setting System -> Advanced -> Firewall & NAT -> Disable Firewall AND "pfctl -d" the problem still exists.

From console on the second member i can ping the provider gateway. But after adding the source addr or source interface parameter to ping or traceroute there are no answers. :-(

After disabling the firewall / packet filter function, i can be sure, that's not a firewall rule or NAT problem, correct?
The nic's used in this firewall are sfxge0: Solarflare SFN7122F SFP+ Server Adapter

viragomann

@prakti said in strange connectivity errors in HA:

If i set the source NIC (or vlan IP interface) the second HA member can't reach (ping / traceroute) anything outside.

Can you give more detail on this, please?

Which IP? The interface IP or the CARP VIP?
How are the interfaces and virtual IPs configured on both nodes?

prakti

@viragomann
thank you very much for your reply. From the second member , i'm testing "source pinging" with the interface IP. The interfaces are VLANs, trunked to the pfsenses as link aggregations (lagg1.x for the WAN trunk, lagg0.xxxx for the wan trunk)

Our IP Range from Versatel is 83.x.x.48/28 ...
HSRP address of versatel is 83.x.x.49
1st router of verstel is 83.x.x.50
2nd router of versatel is 83.x.x.51

1st of my pfsense is 83.x.x.60
2nd of my pfsense is 83.x.x.61
VIP of both is
83.x.x.53
83.x.x.54
83.x.x.55

Without setting an internal source address, the .x.61 can ping everything above, 8.8.8.8 and everything else.

The internal (LAN) transfer segments looks like that:
1st of my pfsense is 172.23.0.2
2nd of my pfsense is 172.23.0.3
VIP 172.23.0.1
for example the the addressing to the core network (extreme networks virtual fabric vsp7400 platform) in this case:
core switch 1 172.23.0.250
core switch 2 172.23.0.251
core switch 3 172.23.0.252
core switch 4 172.23.0.253
core virtual (vrrp) 172.23.0.254

For this example , this continues to five more transfer networks like:
The internal (LAN) transfer segments looks like that:
1st of my pfsense is 172.23.1.2
2nd of my pfsense is 172.23.1.3
VIP 172.23.1.1
for example the the addressing to the core network (extreme virtual fabric) in this case:
core switch 1 172.23.1.250
core switch 2 172.23.1.251
core switch 3 172.23.1.252
core switch 4 172.23.1.253
core virtual (vrrp) 172.23.1.254
etc ....

and the problem looks like that:

traceroute -n -s 172.23.3.3 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8) from 172.23.3.3, 64 hops max, 40 byte packets
1 * * *
2 * * *
3 * * *
4 * * *
5 * * **

and the same result is tracing 83.x.x.60 (1st fw member) or the 83.x.x.49, when setting an source ip :-/

I'm a bit desperate

viragomann

@prakti said in strange connectivity errors in HA:

traceroute -n -s 172.23.3.3 8.8.8.8

So which device is this source IP assigned to?

Note that you can only use IPs assigned to the respective pfSense itself. You cannot use an arbitrary internal IP or a CARP or any other VIP hooking up on a CARP VIP if the node is in backup state.

prakti

@viragomann said in strange connectivity errors in HA:

So which device is this source IP assigned to?

The 172.23.3.3 is (one of) the internal ip address of the vlan nic from the second HA member.
The 172.23.3.2 is the address of the first HA member and
the The 172.23.3.1 is the VIP for this VLAN

"172.23.3.1/24 (vhid: 8)"

So "traceroute -s" should work?

viragomann

@prakti
Yes, it's the same here.

I investigated this by sniffing the WAN traffic. The reason was found immediately.
If you use an internal IP to ping a public host, the outbound NAT rule is applied to the traffic, which translates the source to the CARP WAN VIP.
Hence responses go to the master node and the backup doesn't get a reply.

prakti

@viragomann
Hi viragomann,

thank you very much for your time and investigation. Your answer was very important bringing me back to the correct path for debugging. The reason, why clients can't reach the internet was an inconsistent configuration of pfBlockNG between the two HA members. I've ignored erros like this:

/rc.filter_configure_sync: New alert found: Unresolvable source alias 'pfB_BinaryDefense_v4' for rule 'NAT Allow HTTPS_2_xxxxxxxx'
Dec 14 16:17:17 svrfw02 php-fpm[32037]: /rc.filter_configure_sync: New alert found: Unresolvable source alias 'pfB_DNSBLIP_v4' for rule 'NAT Allow HTTP_2_xxxxxxxx'
Dec 14 16:17:17 svrfw02 php-fpm[32037]: /rc.filter_configure_sync: New alert found: Unresolvable source alias 'pfB_DNSBLIP_v4' for rule 'NAT Allow HTTPS_2_xxxxxxxx'
Dec 14 16:17:18 svrfw02 php-fpm[32037]: /rc.filter_configure_sync: New alert found: There were error(s) loading the rules: /tmp/rules.debug:299: syntax error - The line in question reads [299]: rdr on lagg1.808 inet proto tcp from ! to 83.x.x.54 port 443 -> $SERVER_xxxxxxxx

After fixing this, switching between carps members works correctly.
Again, thank you for your assistance !!!!!