CARP Failover not Working on Manual Outbound NAT
-
I have just ordered two "Mercury II 1U Server Six Ethernet pfSense Appliance" from Hacom. I have been using pfsense in other locations for a few years now, but no where near the depth of what I am trying to do now. I work in the medical field and am planning to implement these new firewalls shortly, but want to make sure everything is tested and working first before I put them into production. I have also been following and reading the book very closely and seem to be running into trouble. My ultimate design is to setup an enivorment with two pfsense firewalls, with carp failover as well as mutli-wan failover. At the moment I am currently just trying to setup CARP with one WAN. My primary wan is a comcast buisness line. I have each firewall setup for carp, and the master and backup statuses a re showing up correctly. However when I switched to manual outbound NAT, the internet stops working. Once i switched back to automatic, it works just perfectly. I have 2 CARP IP's on the firewalls that they are sharing, one LAN and WAN. Then each firewall interface has its own WAN and LAN ip. If I havent provided enough detail for you, just let me know what you need and I will get it for you. I would really really appreciate your help because im starting to pull my hair out.
-
Are you sure that the CARP VIP you have on WAN is working properly?
Can you add a firewall rule to allow ICMP echo requests to the WAN CARP VIP and then try to ping it from somewhere remote on the Internet?
If it works on auto, the traffic would just be leaving by the WAN IP. But if your manual outbound NAT rules use the WAN CARP VIP, then it may not be working properly when trying to talk to the outside world.
-
Without adding any rules I could not ping either of the WAN interfaces or its CARP IP. Once added however I could ping all three. However something interesting that happened, I looked at the firewall logs and only the second firewall showed ICMP being blocked on the CARP IP. Correct me if I am wrong, but shouldnt the primary firewall show that unless the secondary is in master mode due to a failover? It almost seems like the CARP IP is being handled by the secondary and not the primary firewall.
-
That is correct, only the master should show that traffic.
If you look at Status > CARP, does the main firewall really show as master for all of the CARP VIPs?
-
Yes sir. Here is the screenshots. No offense for the blur outs, but since this is a medical facility I cannot publish our public IP's out.
-
Are there any CARP transitions logged under Status > System Logs?
Any bridging going on?
-
Firewall1's Logs
Dec 27 16:36:06 php: : Processing start -
Dec 27 16:36:06 php: : Processing -
Dec 27 16:38:26 check_reload_status: reloading filter
Dec 27 16:38:28 php: : Beginning XMLRPC sync to http://172.16.1.3:80.
Dec 27 16:38:29 php: : XMLRPC sync successfully completed with http://172.16.1.3:80.
Dec 28 08:53:34 check_reload_status: reloading filter
Dec 28 08:53:36 php: : Beginning XMLRPC sync to http://172.16.1.3:80.
Dec 28 08:53:36 php: : XMLRPC sync successfully completed with http://172.16.1.3:80.
Dec 28 08:55:56 check_reload_status: reloading filter
Dec 28 08:55:58 php: : Beginning XMLRPC sync to http://172.16.1.3:80.
Dec 28 08:55:59 php: : XMLRPC sync successfully completed with http://172.16.1.3:80.
Dec 28 09:00:00 check_reload_status: check_reload_status is starting
Dec 28 09:01:27 syslogd: exiting on signal 15
Dec 28 09:01:27 syslogd: kernel boot file is /boot/kernel/kernel
Dec 28 09:02:39 kernel: carp0: incorrect hash
Dec 28 09:02:39 kernel: carp1: incorrect hash
Dec 28 09:05:45 dnsmasq[633]: reading /etc/resolv.conf
Dec 28 09:05:45 dnsmasq[633]: using nameserver 8.8.4.4#53
Dec 28 09:05:45 dnsmasq[633]: using nameserver 8.8.8.8#53
Dec 28 09:05:45 dnsmasq[633]: exiting on receipt of SIGTERM
Dec 28 09:05:46 dnsmasq[37190]: started, version 2.45 cachesize 150
Dec 28 09:05:46 dnsmasq[37190]: compile time options: IPv6 GNU-getopt BSD-bridge ISC-leasefile no-DBus no-I18N TFTP
Dec 28 09:05:46 dnsmasq[37190]: reading /etc/resolv.conf
Dec 28 09:05:46 dnsmasq[37190]: using nameserver 8.8.4.4#53
Dec 28 09:05:46 dnsmasq[37190]: using nameserver 8.8.8.8#53
Dec 28 09:05:46 dnsmasq[37190]: read /etc/hosts - 2 addresses
Dec 28 09:05:51 check_reload_status: webConfigurator restart in progress
Dec 28 09:05:58 php: : Creating rrd update script
Dec 28 09:05:58 check_reload_status: reloading filter
Dec 28 09:06:01 php: : Beginning XMLRPC sync to https://172.16.1.3:443.
Dec 28 09:07:16 php: : A communications error occured while attempting XMLRPC sync with username admin https://172.16.1.3:443.
Dec 28 09:07:16 php: : New alert found: A communications error occured while attempting XMLRPC sync with username admin https://172.16.1.3:443.
Dec 28 09:07:16 php: : Beginning XMLRPC sync to https://172.16.1.3:443.
Dec 28 09:08:31 php: : A communications error occured while attempting XMLRPC sync with username admin https://172.16.1.3:443.
Dec 28 09:08:31 php: : New alert found: A communications error occured while attempting XMLRPC sync with username admin https://172.16.1.3:443.
Dec 28 09:14:56 dnsmasq[37190]: reading /etc/resolv.conf
Dec 28 09:14:56 dnsmasq[37190]: using nameserver 8.8.4.4#53
Dec 28 09:14:56 dnsmasq[37190]: using nameserver 8.8.8.8#53
Dec 28 09:14:56 dnsmasq[37190]: exiting on receipt of SIGTERM
Dec 28 09:14:57 dnsmasq[38400]: started, version 2.45 cachesize 150
Dec 28 09:14:57 dnsmasq[38400]: compile time options: IPv6 GNU-getopt BSD-bridge ISC-leasefile no-DBus no-I18N TFTP
Dec 28 09:14:57 dnsmasq[38400]: reading /etc/resolv.conf
Dec 28 09:14:57 dnsmasq[38400]: using nameserver 8.8.4.4#53
Dec 28 09:14:57 dnsmasq[38400]: using nameserver 8.8.8.8#53
Dec 28 09:14:57 dnsmasq[38400]: read /etc/hosts - 2 addresses
Dec 28 09:15:02 check_reload_status: webConfigurator restart in progress
Dec 28 09:15:10 php: : Creating rrd update script
Dec 28 09:15:10 check_reload_status: reloading filter
Dec 28 09:15:12 php: : Beginning XMLRPC sync to http://172.16.1.3:80.
Dec 28 09:15:12 php: : XMLRPC sync successfully completed with http://172.16.1.3:80.The information displayed in RED is only an error because I had changed the GUI to be HTTPS. Once switched back it went back to normal, I was just making sure I could switch to HTTPS.
Firewall2's Logs
Dec 27 16:33:58 check_reload_status: reloading filter
Dec 27 16:35:49 kernel: carp0: BACKUP -> MASTER (preempting a slower master)
Dec 27 16:35:49 kernel: carp0: link state changed to UP
Dec 27 16:35:51 kernel: em0: link state changed to DOWN
Dec 27 16:35:51 kernel: carp1: link state changed to DOWN
Dec 27 16:35:59 kernel: carp1: INIT -> BACKUP
Dec 27 16:35:59 kernel: em0: link state changed to UP
Dec 27 16:35:59 kernel: carp1: link state changed to DOWN
Dec 27 16:36:01 check_reload_status: rc.linkup starting
Dec 27 16:36:02 kernel: carp0: MASTER -> BACKUP (more frequent advertisement received)
Dec 27 16:36:02 kernel: carp0: link state changed to DOWN
Dec 27 16:36:02 php: : Processing em0 - start
Dec 27 16:36:02 php: : Processing start -
Dec 27 16:36:02 php: : Processing -
Dec 27 16:36:02 kernel: carp1: link state changed to UP
Dec 27 16:36:04 kernel: carp1: MASTER -> BACKUP (more frequent advertisement received)
Dec 27 16:36:04 kernel: carp1: link state changed to DOWN
Dec 27 16:38:32 check_reload_status: reloading filter
Dec 28 01:10:00 check_reload_status: check_reload_status is starting
Dec 28 08:53:40 check_reload_status: reloading filter
Dec 28 08:56:03 check_reload_status: reloading filter
Dec 28 09:01:25 syslogd: exiting on signal 15
Dec 28 09:01:25 syslogd: kernel boot file is /boot/kernel/kernel
Dec 28 09:01:49 kernel: carp0: link state changed to DOWN
Dec 28 09:01:49 kernel: carp1: link state changed to DOWN
Dec 28 09:02:29 syslogd: exiting on signal 15
Dec 28 09:02:29 syslogd: kernel boot file is /boot/kernel/kernel
Dec 28 09:02:39 kernel: carp0: INIT -> MASTER (preempting)
Dec 28 09:02:39 kernel: carp0: link state changed to UP
Dec 28 09:02:39 kernel: carp1: INIT -> MASTER (preempting)
Dec 28 09:02:39 kernel: carp1: link state changed to UP
Dec 28 09:02:39 kernel: carp0: link state changed to DOWN
Dec 28 09:02:39 kernel: carp0: INIT -> MASTER (preempting)
Dec 28 09:02:39 kernel: carp0: link state changed to UP
Dec 28 09:02:39 kernel: carp1: link state changed to DOWN
Dec 28 09:02:39 kernel: carp1: INIT -> MASTER (preempting)
Dec 28 09:02:39 kernel: carp1: link state changed to UP
Dec 28 09:02:39 kernel: carp0: MASTER -> BACKUP (more frequent advertisement received)
Dec 28 09:02:39 kernel: carp0: link state changed to DOWN
Dec 28 09:02:39 kernel: carp0: link state changed to DOWN
Dec 28 09:02:39 kernel: carp0: INIT -> MASTER (preempting)
Dec 28 09:02:39 kernel: carp0: link state changed to UP
Dec 28 09:02:39 kernel: carp1: link state changed to DOWN
Dec 28 09:02:39 kernel: carp1: INIT -> MASTER (preempting)
Dec 28 09:02:39 kernel: carp1: link state changed to UP
Dec 28 09:02:40 kernel: carp1: MASTER -> BACKUP (more frequent advertisement received)
Dec 28 09:02:40 kernel: carp1: link state changed to DOWN
Dec 28 09:02:40 kernel: carp0: MASTER -> BACKUP (more frequent advertisement received)
Dec 28 09:02:40 kernel: carp0: link state changed to DOWN
Dec 28 09:15:16 check_reload_status: reloading filter -
You can switch to https as long as you do it on both boxes. The protocol just has to match between the two.
Logs look ok, a couple flips like that are normal at boot time.
-
I just rebooted both firewalls for fresh startup. What should we do from here?
-
I'd run a packet capture on both master and backup looking for traffic for that CARP VIP, then run the external ping again, confirm which box is actually getting the traffic.
If that works and the master is getting the traffic, you may have errors in your manual outbound NAT rules, so post screenshots of those.
-
Packet capture shows that all traffic is being received on the second firewall. Currently NAT is set to Automatic, however that shouldn't effect receiving pings on the correct firewall.
-
I have just confirmed with packet capture that firewall1 is receiving all the LAN CARP traffic, just not the WAN.
-
That is rather odd, especially if it is showing as backup. Are these devices plugged into the rear of your cable modem/router? If so you might try plugging them into a separate switch and then uplinking to that, just to eliminate that as a possible cause.
-
Currently I have a connection going from my Comcast Modem to a 24 port Netgear Switch, then from the switch into the two firewalls. For my LAN I have that also plugged into a seperate 24 port switch.
-
Alright I have deleted the WAN CARP IP and re-added and it looks like the primary firewall is now receiving its traffic.
-
Nevermind it has flopped back over to firewall2. Why is it that firewall2 would be receiving all the traffic?
-
Typically that only happens when it takes over as a CARP master.
The only other times I've seen similar CARP craziness was with some really broken switches.
Any way you can try another (perhaps different brand) switch on the WAN side as a test to see if the problem goes away?
-
I have switched from a netgear switch to a linksys switch. I have also changed around the interface IPs and CARP IP. We will see what this delivers.
-
Also could it be a problem with CARP not syncing right? I have changed the virtual IP's on firewall1 and they have yet to reflect on firewall2.
-
No, it wouldn't have much to do with a sync failure.
CARP heartbeats happen on the interfaces where the VIPs reside, i.e. a CARP VIP on WAN sends its heartbeats on WAN.
XMLRPC sync happens over the sync interface, it only handles configuration.
pfsync only happens over the sync interface, it only synchronizes states (insertions, deletions, etc)So a problem with CARP on WAN is nearly always a problem with the switch or connectivity on WAN.