CARP failing over when rc.newwanip runs?
-
I'm having an issue with my CARP where it fails over seemingly randomly to the other unit.
I have 2 units on AWS running CARP on the WAN interface of both with the AWS HA plugin. When the unit fails over in this way, the other unit takes over the route table and when it fails back to the original master, the route table is not changed back. This results in traffic not going out and traffic not going over VPN tunnels.
In the system.log on the master, this seems to happen when it runs rc.newwanip. I'm not sure what this process does or how it is triggered, I don't see any interface up/down in the logs.
Master Unit Sep 3 11:08:47 pfSense-AZa kernel: ena0: ioctl promisc/allmulti Sep 3 11:08:48 pfSense-AZa check_reload_status[458]: Carp backup event Sep 3 11:08:48 pfSense-AZa check_reload_status[458]: rc.newwanip starting ena0 Sep 3 11:08:48 pfSense-AZa kernel: carp: 1@ena0: MASTER -> INIT (hardware interface up) Sep 3 11:08:48 pfSense-AZa kernel: ena0: promiscuous mode disabled Sep 3 11:08:49 pfSense-AZa php-fpm[14697]: /rc.carpbackup: HA cluster member "(192.0.2.101@ena0): (WAN)" has resumed CARP state "BACKUP" for vhid 1 Sep 3 11:08:49 pfSense-AZa php-fpm[14697]: /rc.newwanip: rc.newwanip: Info: starting on ena0. Sep 3 11:08:49 pfSense-AZa php-fpm[14697]: /rc.newwanip: rc.newwanip: on (IP address: 10.200.247.175) (interface: WAN[wan]) (real interface: ena0). Sep 3 11:08:49 pfSense-AZa check_reload_status[458]: Carp backup event Sep 3 11:08:49 pfSense-AZa php-fpm[14697]: /rc.newwanip: waiting for pfsync... Sep 3 11:08:49 pfSense-AZa kernel: ena0: ioctl promisc/allmulti Sep 3 11:08:49 pfSense-AZa kernel: ena0: promiscuous mode enabled Sep 3 11:08:49 pfSense-AZa kernel: carp: 1@ena0: INIT -> BACKUP (initialization complete) Sep 3 11:08:49 pfSense-AZa kernel: carp: demoted by 0 to 0 (pfsync bulk start) Sep 3 11:08:50 pfSense-AZa kernel: carp: demoted by 0 to 0 (pfsync bulk done) Sep 3 11:08:50 pfSense-AZa php-fpm[95703]: /rc.carpbackup: HA cluster member "(192.0.2.101@ena0): (WAN)" has resumed CARP state "BACKUP" for vhid 1 Sep 3 11:08:52 pfSense-AZa check_reload_status[458]: Carp master event Sep 3 11:08:52 pfSense-AZa kernel: carp: 1@ena0: BACKUP -> MASTER (preempting a slower master) Sep 3 11:08:53 pfSense-AZa php-fpm[95703]: /rc.carpmaster: HA cluster member "(192.0.2.101@ena0): (WAN)" has resumed CARP state "MASTER" for vhid 1 Sep 3 11:08:55 pfSense-AZa php-fpm[95703]: /rc.carpmaster: Couldn't get parameters for vhid on interface ena1 Sep 3 11:08:55 pfSense-AZa php-fpm[95703]: /rc.carpmaster: Couldn't determine advbase and advskew for vhid on interface ena1 Sep 3 11:08:56 pfSense-AZa kernel: arpresolve: can't allocate llinfo for 10.200.2.1 on ena1 Sep 3 11:08:56 pfSense-AZa kernel: arpresolve: can't allocate llinfo for 10.200.2.1 on ena1 Sep 3 11:09:21 pfSense-AZa php-fpm[14697]: /rc.newwanip: pfsync done in 31 seconds. Sep 3 11:09:21 pfSense-AZa php-fpm[14697]: /rc.newwanip: Configuring CARP settings finalize... Sep 3 11:09:21 pfSense-AZa check_reload_status[458]: Reloading filter Sep 3 11:09:21 pfSense-AZa check_reload_status[458]: Reloading filter Sep 3 11:09:52 pfSense-AZa php-cgi[36489]: aws_highavail_periodic: New alert found: Resource eipalloc-0a117cb1c304b1f63 has been modified by a lower priority master, Sep 3 11:09:52 pfSense-AZa php-cgi[36489]: troubleshooting of CARP vhid wan@1 may be necessary. Sep 3 11:09:52 pfSense-AZa php-cgi[36489]: The resource has been restored to the expected state.
On the secondary device, I get a master timed out event that switches it from backup to master, then it changes right back to backup.
Backup Unit Sep 3 11:08:52 pfSense-AZb check_reload_status[458]: Carp master event Sep 3 11:08:52 pfSense-AZb check_reload_status[458]: Carp backup event Sep 3 11:08:52 pfSense-AZb kernel: carp: 1@ena0: BACKUP -> MASTER (master timed out) Sep 3 11:08:52 pfSense-AZb kernel: carp: 1@ena0: MASTER -> BACKUP (more frequent advertisement received) Sep 3 11:08:53 pfSense-AZb php-fpm[406]: /rc.carpmaster: HA cluster member "(192.0.2.101@ena0): (WAN)" has resumed CARP state "MASTER" for vhid 1 Sep 3 11:08:53 pfSense-AZb php-fpm[57097]: /rc.carpbackup: HA cluster member "(192.0.2.101@ena0): (WAN)" has resumed CARP state "BACKUP" for vhid 1 Sep 3 11:08:55 pfSense-AZb php-fpm[406]: /rc.carpmaster: Couldn't get parameters for vhid on interface ena1 Sep 3 11:08:55 pfSense-AZb php-fpm[406]: /rc.carpmaster: Couldn't determine advbase and advskew for vhid on interface ena1 Sep 3 11:08:55 pfSense-AZb php-fpm[406]: /rc.carpmaster: Couldn't get parameters for vhid on interface ena1 Sep 3 11:08:55 pfSense-AZb php-fpm[406]: /rc.carpmaster: Couldn't determine advbase and advskew for vhid on interface ena1
Here is a picture of my VIP.
I've tried different values for the base but this doesn't seem to resolve the issue.
The only thing I can seem to find is that rc.newwanip script that runs before the issue happens every time. The WAN interface is set for DHCP as it's on AWS but the IP of the interface never changes, as it's set statically in the AWS console. I was thinking maybe trying to set the WAN interface to a static instead of DHCP for testing but I'll need to do that afterhours, however I don't believe this should be causing an issue with the CARP. I may just have a misconfiguration that I'm not seeing.
Thanks in advance for the help.