Secondary router not passing traffic


  • Hi all,

    I have two servers A and B connected via switching and the OS is XCP-ng/XenServer 8.1.

    Both servers A and B have the exact same hardware config and overall configuration. Each server has its own /32 IP, which is used only for management purposes and API.

    All the networks (a public subnet, and the internal networks) are delivered point-to-point via vlans.

    I configured CARP HA following the tutorials, everything. I have gone through the config step by step several times and everything is correctly configured.

    • Status > CARP shows the same pfSync nodes on both sides;
    • Configuration done on the primary is reflected maybe under a second on the secondary;
    • I can ping without a problem;
    • All the IP's are reflected on both routers
    • IPSec VPN site to site using CARP WAN IP
    • NAT rules to NAT using the CARP WAN IP

    Testing CARP using both Status > CARP Enter persistent CARP maintenance mode or shutting down the first router:

    • Secondary instantly changes from Backup to Master on all CARP IP's
    • No traffic to outside, can't access the VM's. IPSec VPN goes down.

    Primary router comes up:

    • Automatically takes the Master role and the secondary goes to Backup on all CARP IP's
    • Traffic starts flowing normally
    • VPN quickly recovers

    And a weird behaviour:
    Whenever I reboot the secondary router, the certificate shows as untrusted and I have to accept the certificate again. Only when I reboot, only on the secondary router.

    I'd truly appreciate some help, because I have no clue where to look next.

    System Logs on the secondary router (all the logs below are from the secondary firewall) - newest entries on top:

    Aug 27 16:40:37	kernel		arp: 1.2.3.146 moved from 78:20:94:35:f8:98 to 82:90:7c:4d:88:60 on xn0
    Aug 27 16:40:31	kernel		arp: 1.2.3.146 moved from 82:90:7c:4d:88:60 to 78:20:94:35:f8:98 on xn0
    Aug 27 16:40:07	kernel		arp: 1.2.3.146 moved from 78:20:94:35:f8:98 to 82:90:7c:4d:88:60 on xn0
    Aug 27 16:39:40	kernel		arp: 1.2.3.146 moved from 82:90:7c:4d:88:60 to 78:20:94:35:f8:98 on xn0
    Aug 27 16:38:53	kernel		arp: 1.2.3.146 moved from 78:20:94:35:f8:98 to 82:90:7c:4d:88:60 on xn0
    Aug 27 16:38:17	kernel		arp: 1.2.3.146 moved from 82:90:7c:4d:88:60 to 78:20:94:35:f8:98 on xn0
    

    1.2.3.146 is the upstream gateway

    When I enable persistent maintenance mode, this is what shows on the log of on the secondary router - newest entries on top:

    Aug 27 16:40:52	php-fpm	60821	/rc.carpmaster: HA cluster member "(172.16.140.254@xn2): (DMZ)" has resumed CARP state "MASTER" for vhid 5
    Aug 27 16:40:51	php-fpm	12134	/rc.carpmaster: HA cluster member "(1.2.3.155@xn0): (WAN)" has resumed CARP state "MASTER" for vhid 58
    Aug 27 16:40:51	php-fpm	60821	/rc.carpmaster: HA cluster member "(1.2.3.153@xn0): (WAN)" has resumed CARP state "MASTER" for vhid 56
    Aug 27 16:40:51	php-fpm	12134	/rc.carpmaster: HA cluster member "(1.2.3.152@xn0): (WAN)" has resumed CARP state "MASTER" for vhid 55
    Aug 27 16:40:51	php-fpm	348	/rc.carpmaster: HA cluster member "(1.2.3.151@xn0): (WAN)" has resumed CARP state "MASTER" for vhid 54
    Aug 27 16:40:51	php-fpm	60821	/rc.carpmaster: HA cluster member "(1.2.3.150@xn0): (WAN)" has resumed CARP state "MASTER" for vhid 53
    Aug 27 16:40:51	php-fpm	348	/rc.carpmaster: HA cluster member "(1.2.3.147@xn0): (WAN)" has resumed CARP state "MASTER" for vhid 50
    Aug 27 16:40:51	php-fpm	348	/rc.carpmaster: HA cluster member "(172.16.200.254@xn4): (DATA)" has resumed CARP state "MASTER" for vhid 20
    Aug 27 16:40:51	php-fpm	12134	/rc.carpmaster: HA cluster member "(172.16.190.254@xn3): (ADM)" has resumed CARP state "MASTER" for vhid 13
    Aug 27 16:40:51	php-fpm	60821	/rc.carpmaster: HA cluster member "(172.16.110.254@xn1): (LAN)" has resumed CARP state "MASTER" for vhid 1
    Aug 27 16:40:50	check_reload_status		Reloading filter
    Aug 27 16:40:50	php-fpm	347	/xmlrpc.php: Resyncing OpenVPN instances.
    Aug 27 16:40:50	php-fpm	347	/xmlrpc.php: Gateway, none 'available' for inet6, use the first one configured. ''
    Aug 27 16:40:50	php-fpm	348	/rc.carpmaster: HA cluster member "(1.2.3.159@xn0): (WAN)" has resumed CARP state "MASTER" for vhid 62
    Aug 27 16:40:50	php-fpm	12134	/rc.carpmaster: HA cluster member "(1.2.3.158@xn0): (WAN)" has resumed CARP state "MASTER" for vhid 61
    Aug 27 16:40:50	php-fpm	348	/rc.carpmaster: HA cluster member "(1.2.3.157@xn0): (WAN)" has resumed CARP state "MASTER" for vhid 60
    Aug 27 16:40:50	php-fpm	12134	/rc.carpmaster: HA cluster member "(1.2.3.156@xn0): (WAN)" has resumed CARP state "MASTER" for vhid 59
    Aug 27 16:40:50	php-fpm	348	/rc.carpmaster: HA cluster member "(1.2.3.154@xn0): (WAN)" has resumed CARP state "MASTER" for vhid 57
    Aug 27 16:40:49	kernel		carp: 62@xn0: BACKUP -> MASTER (preempting a slower master)
    Aug 27 16:40:49	kernel		carp: 61@xn0: BACKUP -> MASTER (preempting a slower master)
    Aug 27 16:40:49	kernel		carp: 60@xn0: BACKUP -> MASTER (preempting a slower master)
    Aug 27 16:40:49	kernel		carp: 5@xn2: BACKUP -> MASTER (preempting a slower master)
    Aug 27 16:40:49	kernel		carp: 59@xn0: BACKUP -> MASTER (preempting a slower master)
    Aug 27 16:40:49	kernel		carp: 58@xn0: BACKUP -> MASTER (preempting a slower master)
    Aug 27 16:40:49	kernel		carp: 57@xn0: BACKUP -> MASTER (preempting a slower master)
    Aug 27 16:40:49	kernel		carp: 56@xn0: BACKUP -> MASTER (preempting a slower master)
    Aug 27 16:40:49	kernel		carp: 55@xn0: BACKUP -> MASTER (preempting a slower master)
    Aug 27 16:40:49	kernel		carp: 54@xn0: BACKUP -> MASTER (preempting a slower master)
    Aug 27 16:40:49	kernel		carp: 53@xn0: BACKUP -> MASTER (preempting a slower master)
    Aug 27 16:40:49	kernel		carp: 50@xn0: BACKUP -> MASTER (preempting a slower master)
    Aug 27 16:40:49	kernel		carp: 20@xn4: BACKUP -> MASTER (preempting a slower master)
    Aug 27 16:40:49	kernel		carp: 1@xn1: BACKUP -> MASTER (preempting a slower master)
    Aug 27 16:40:49	kernel		carp: 13@xn3: BACKUP -> MASTER (preempting a slower master)
    Aug 27 16:40:49	check_reload_status		Syncing firewall
    Aug 27 16:40:49	check_reload_status		Carp master event
    Aug 27 16:40:49	check_reload_status		Carp master event
    Aug 27 16:40:49	check_reload_status		Carp master event
    Aug 27 16:40:49	check_reload_status		Carp master event
    Aug 27 16:40:49	check_reload_status		Carp master event
    Aug 27 16:40:49	check_reload_status		Carp master event
    Aug 27 16:40:49	check_reload_status		Carp master event
    Aug 27 16:40:49	check_reload_status		Carp master event
    Aug 27 16:40:49	check_reload_status		Carp master event
    Aug 27 16:40:49	check_reload_status		Carp master event
    Aug 27 16:40:49	check_reload_status		Carp master event
    Aug 27 16:40:49	check_reload_status		Carp master event
    Aug 27 16:40:49	check_reload_status		Carp master event
    Aug 27 16:40:49	check_reload_status		Carp master event
    Aug 27 16:40:49	check_reload_status		Carp master event
    

    And when I disable the maintenance mode on the master - newest entries on top:

    Aug 27 16:41:34	php-fpm	64435	/rc.carpbackup: HA cluster member "(1.2.3.158@xn0): (WAN)" has resumed CARP state "BACKUP" for vhid 61
    Aug 27 16:41:34	php-fpm	348	/rc.carpbackup: HA cluster member "(1.2.3.153@xn0): (WAN)" has resumed CARP state "BACKUP" for vhid 56
    Aug 27 16:41:34	php-fpm	8955	/rc.carpbackup: HA cluster member "(1.2.3.152@xn0): (WAN)" has resumed CARP state "BACKUP" for vhid 55
    Aug 27 16:41:34	php-fpm	60821	/rc.carpbackup: HA cluster member "(1.2.3.151@xn0): (WAN)" has resumed CARP state "BACKUP" for vhid 54
    Aug 27 16:41:34	php-fpm	12134	/rc.carpbackup: HA cluster member "(1.2.3.150@xn0): (WAN)" has resumed CARP state "BACKUP" for vhid 53
    Aug 27 16:41:34	php-fpm	64435	/rc.carpbackup: HA cluster member "(1.2.3.147@xn0): (WAN)" has resumed CARP state "BACKUP" for vhid 50
    Aug 27 16:41:33	check_reload_status		Reloading filter
    Aug 27 16:41:33	php-fpm	20724	/xmlrpc.php: Gateway, none 'available' for inet6, use the first one configured. ''
    Aug 27 16:41:33	php-fpm	347	/rc.carpbackup: HA cluster member "(1.2.3.159@xn0): (WAN)" has resumed CARP state "BACKUP" for vhid 62
    Aug 27 16:41:33	php-fpm	60821	/rc.carpbackup: HA cluster member "(1.2.3.157@xn0): (WAN)" has resumed CARP state "BACKUP" for vhid 60
    Aug 27 16:41:33	php-fpm	347	/rc.carpbackup: HA cluster member "(1.2.3.156@xn0): (WAN)" has resumed CARP state "BACKUP" for vhid 59
    Aug 27 16:41:33	php-fpm	348	/rc.carpbackup: HA cluster member "(1.2.3.155@xn0): (WAN)" has resumed CARP state "BACKUP" for vhid 58
    Aug 27 16:41:33	php-fpm	12134	/rc.carpbackup: HA cluster member "(1.2.3.154@xn0): (WAN)" has resumed CARP state "BACKUP" for vhid 57
    Aug 27 16:41:33	php-fpm	60821	/rc.carpbackup: HA cluster member "(172.16.140.254@xn2): (DMZ)" has resumed CARP state "BACKUP" for vhid 5
    Aug 27 16:41:33	php-fpm	12134	/rc.carpbackup: HA cluster member "(172.16.200.254@xn4): (DATA)" has resumed CARP state "BACKUP" for vhid 20
    Aug 27 16:41:33	php-fpm	64435	/rc.carpbackup: HA cluster member "(172.16.190.254@xn3): (ADM)" has resumed CARP state "BACKUP" for vhid 13
    Aug 27 16:41:33	php-fpm	348	/rc.carpbackup: HA cluster member "(172.16.110.254@xn1): (LAN)" has resumed CARP state "BACKUP" for vhid 1
    Aug 27 16:41:32	kernel		ifa_maintain_loopback_route: deletion failed for interface xn4: 3
    Aug 27 16:41:32	kernel		ifa_maintain_loopback_route: deletion failed for interface xn3: 3
    Aug 27 16:41:32	kernel		ifa_maintain_loopback_route: deletion failed for interface xn2: 3
    Aug 27 16:41:32	kernel		ifa_maintain_loopback_route: deletion failed for interface xn1: 3
    Aug 27 16:41:32	kernel		ifa_maintain_loopback_route: deletion failed for interface xn0: 3
    Aug 27 16:41:32	kernel		ifa_maintain_loopback_route: deletion failed for interface xn0: 3
    Aug 27 16:41:32	kernel		ifa_maintain_loopback_route: deletion failed for interface xn0: 3
    Aug 27 16:41:32	kernel		ifa_maintain_loopback_route: deletion failed for interface xn0: 3
    Aug 27 16:41:32	kernel		ifa_maintain_loopback_route: deletion failed for interface xn0: 3
    Aug 27 16:41:32	kernel		ifa_maintain_loopback_route: deletion failed for interface xn0: 3
    Aug 27 16:41:32	kernel		ifa_maintain_loopback_route: deletion failed for interface xn0: 3
    Aug 27 16:41:32	kernel		ifa_maintain_loopback_route: deletion failed for interface xn0: 3
    Aug 27 16:41:32	kernel		ifa_maintain_loopback_route: deletion failed for interface xn0: 3
    Aug 27 16:41:32	kernel		ifa_maintain_loopback_route: deletion failed for interface xn0: 3
    Aug 27 16:41:32	kernel		ifa_maintain_loopback_route: deletion failed for interface xn0: 3
    Aug 27 16:41:32	kernel		carp: 62@xn0: MASTER -> BACKUP (more frequent advertisement received)
    Aug 27 16:41:32	kernel		carp: 61@xn0: MASTER -> BACKUP (more frequent advertisement received)
    Aug 27 16:41:32	kernel		carp: 60@xn0: MASTER -> BACKUP (more frequent advertisement received)
    Aug 27 16:41:32	kernel		carp: 5@xn2: MASTER -> BACKUP (more frequent advertisement received)
    Aug 27 16:41:32	kernel		carp: 59@xn0: MASTER -> BACKUP (more frequent advertisement received)
    Aug 27 16:41:32	kernel		carp: 58@xn0: MASTER -> BACKUP (more frequent advertisement received)
    Aug 27 16:41:32	kernel		carp: 57@xn0: MASTER -> BACKUP (more frequent advertisement received)
    Aug 27 16:41:32	kernel		carp: 56@xn0: MASTER -> BACKUP (more frequent advertisement received)
    Aug 27 16:41:32	kernel		carp: 55@xn0: MASTER -> BACKUP (more frequent advertisement received)
    Aug 27 16:41:32	kernel		carp: 54@xn0: MASTER -> BACKUP (more frequent advertisement received)
    Aug 27 16:41:32	kernel		carp: 53@xn0: MASTER -> BACKUP (more frequent advertisement received)
    Aug 27 16:41:32	kernel		carp: 50@xn0: MASTER -> BACKUP (more frequent advertisement received)
    Aug 27 16:41:32	kernel		carp: 20@xn4: MASTER -> BACKUP (more frequent advertisement received)
    Aug 27 16:41:32	kernel		carp: 1@xn1: MASTER -> BACKUP (more frequent advertisement received)
    Aug 27 16:41:32	kernel		carp: 13@xn3: MASTER -> BACKUP (more frequent advertisement received)
    Aug 27 16:41:32	check_reload_status		Syncing firewall
    Aug 27 16:41:32	check_reload_status		Carp backup event
    Aug 27 16:41:32	check_reload_status		Carp backup event
    Aug 27 16:41:32	check_reload_status		Carp backup event
    Aug 27 16:41:32	check_reload_status		Carp backup event
    Aug 27 16:41:32	check_reload_status		Carp backup event
    Aug 27 16:41:32	check_reload_status		Carp backup event
    Aug 27 16:41:32	check_reload_status		Carp backup event
    Aug 27 16:41:32	check_reload_status		Carp backup event
    Aug 27 16:41:32	check_reload_status		Carp backup event
    Aug 27 16:41:32	check_reload_status		Carp backup event
    Aug 27 16:41:32	check_reload_status		Carp backup event
    Aug 27 16:41:32	check_reload_status		Carp backup event
    Aug 27 16:41:32	check_reload_status		Carp backup event
    Aug 27 16:41:32	check_reload_status		Carp backup event
    Aug 27 16:41:32	check_reload_status		Carp backup event
    

    But no traffic goes through the secondary.


  • Is "Synchronize states" checked on both routers in System/HA Sync? Not having states synced would block existing connections but new connections should work.

    Possibly something upstream isn't liking the IP changes? Did you look at https://docs.netgate.com/pfsense/en/latest/book/highavailability/high-availability-troubleshooting.html#other-switch-and-layer-2-issues