100% Packet loss on primary firewall with HA Enabled (PFSync/CARP/NAT)
-
We have this setup:
Firewall 1: WAN: 192...201/24
Firewall 2: WAN: 192...202/24Reason for two CARP for WAN is due to rules on ISP end, specific traffic needs to go out via .2
CARP WAN: 192...2/24
CARP WAN2: 192...3/24ISP Gateway: 192...1/24
I'll power up the second firewall later in the day although the change of subnet's for the CARP might be the fix.
-
After making the tweaks to the CARP subnets I'm still left in the same situation, one switch is permanently sat with 100% packet loss, the other 0%
Any logs I can provide that'll help diagnose the issue?
-
@jgzowski said in 100% Packet loss on primary firewall with HA Enabled (PFSync/CARP/NAT):
After making the tweaks to the CARP subnets I'm still left in the same situation, one switch is permanently sat with 100% packet loss, the other 0%
What do you mean one switch?
What exactly are you testing and how?
-
@derelict Sorry, i meant to say Firewall.
I disabled monitoring of gateway as it does seem to function as expected.Issue I'm seeing now though within the logs is:
A communications error occurred while attempting to call XMLRPC method host_firmware_version
Configuration from primary isn't replicating to the secondary.Followed the instructions exactly and have double checked them now many times. Only have sync settings set on the primary, firewall rules for sync port set up.
-
@jgzowski said in 100% Packet loss on primary firewall with HA Enabled (PFSync/CARP/NAT):
@derelict Sorry, i meant to say Firewall.
I disabled monitoring of gateway as it does seem to function as expected.does or does not? Because it works fine.
Issue I'm seeing now though within the logs is:
A communications error occurred while attempting to call XMLRPC method host_firmware_versionThat works fine too. Can you ping the other side that you're syncing to? Can you Diagnostics > Test Port to it on your webgui port? Is the admin password the same as is set in the sync settings?
Configuration from primary isn't replicating to the secondary.
Followed the instructions exactly and have double checked them now many times. Only have sync settings set on the primary, firewall rules for sync port set up.
If it was done exactly as documented it would be working. I'd check everything again.
-
both firewalls are working, the packet loss issue seems to be there still although i disabled monitoring of it as as soon as primary drops the secondary works and vice versa.
Primary managed to update the configuration on the secondary after the secondary had a reboot, since reboot though it's back to doing:
A communications error occurred while attempting to call XMLRPC method restore_config_section: @ 2019-01-30 16:56:19
A communications error occurred while attempting to call XMLRPC method host_firmware_version: @ 2019-01-30 16:56:37Port test from SYNCPORT:
Port test to host: 10.200.0.2 Port: 443 successfulUsing HTTPS for webgui
Firewall logs from SYNCPORT:
SYNCPORT tcp 10.200.0.1:40286 -> 10.200.0.2:443 FIN_WAIT_2:FIN_WAIT_2 0 / 0 0 B / 0 B
SYNCPORT pfsync 10.200.0.1 -> 10.200.0.2 MULTIPLE:MULTIPLE 21.66 K / 577 23.69 MiB / 460 KiB
SYNCPORT tcp 10.200.0.1:40286 -> 10.200.0.2:443 FIN_WAIT_2:FIN_WAIT_2 4 / 3 216 B / 164 B--- EDIT
It now seems to be working, have not changed anything else but it works.
-
Do you have State killing on gateway failure enabled in System > Advanced, Miscellaneous?
-
no, should i?
-
No. Not unless you know you need it. It is commonly the cause of the XMLRPC sync state being killed, resulting in errors like you are seeing.
There has to be a reason for what you are seeing. What are the rules on the sync interfaces on both nodes?
Are you just using the admin user/password for this or did you create another user?
Are you familiar with packet capturing? Capturing HTTPS traffic on the sync interfaces might yield a clue.
-
Rules on both firewalls for the syncport:
States Protocol Source Port Destination Port Gateway Queue Schedule Description Actions 0 /0 B IPv4+6 * SYNCPORT net * * * * none
I'm using the default user admin with the same password on each firewall.
Recorded full packet capture, looked in Wireshark and can't see anything glaringly obvious. servers are talking to each other, passing key exchange/handshake followed by many SYN,ACK and Application Data. Communication is going both ways ending with a FIN, ACK from the primary server and an ACK from the secondary.
-
Think i've solved it. Had a NAT Outbound rule for any traffic to anywhere to use NAT Adddress CARP.
Added mapping for source of the LAN and another for source of the SYNCPORT and instructed the SYNCPORT not to use NAT.Also made changes to DNS Resolver so that All interfaces resolve to the NAT CARP as DNS was set to 8.8.8.8 and 8.8.4.4
-
Why would sync interface traffic ever have to go out the WAN?
Yes, outbound NAT with source any is almost never right - especially to a CARP VIP.
Traffic from Localhost should NAT to the interface address
Traffic from inside hosts should:
- Use the local interface CARP VIP as their default gateway
- Have outbound NAT to the WAN CARP VIP set.
Traffic from the sync interface should never need internet access.