Closing this. Thanks for pointing me into the direction of testing the Ping on the CARP VIP. That ended up being the issue. Turns out somehow ISP took back one of our 3 IPs, we got them to put it back on our account and now we are back to normal. Can ping off that CARP VIP as well as port forwarding works now using the CARP VIP as Destination Address.
Thanks @JeGr . I've now installed Filer and I can definitely see the use in it for restoring/syncing my script files. I can see that I can also probably use it for /etc/pfSense-devd.conf. But that brings the next problem of what happens when the Netgate team updates this file? The "latest" and correct version would get overwritten by my file in Filer. Out of curiosity I've checked the file on GitHub and it was indeed updated 2 months ago and those changes are in the file on my routers. So that means it will definitely change with an upcoming upgrade.
Is there no other/better way to force the maintenance mode or execute the devd actions without modifying a system file?
I have some layer 2 errors on the switch (spanning-tree). I will try to fix the errors and provide feedback as soon as possible, but I only have "downtime" at Friday to test my configs.
The failover itself works fine by entering to maintenance mode but the VPN tunnels don't want to bring up. They should and it works when tunnels are terminated with other vendors. This situation is only with AWS cloud. Moreover, the pfsense should initiate the connection. The AWS never brings the VPN tunnels up. In case when I use the policy based VPN (the traffic initiated behind the firewall) it works fine. Moreover, the same setup as I have now such as VTI interfaces, routed-based VPNs were configured on VyOS which switchover the tunnels automatically in case of failover.
Ah thanks :) That clears it up pretty much. Never actually ran into that issue besides static mappings and that is no problem in a cluster that I'm aware of ;)
@Derelict Just wanted to let you know know it's looking allot better now and I think it was just that lingering interface that should have been down that caused the issue (which then caused others).
Thanks for coming back so quick on a Sunday. FYI, I've now hit another Intel 10G known issue which I'll post once I re-read the previous ones