Packet loss when secondary is online.
-
I have two routers, both are SG-8860's, running 2.4.3-RELEASE-p1.
I have CARP (WAN, LAN and a guest VLAN for guest wireless isolation) set up and H/A sync.
Recently, odd things have been happening on both the LAN and WAN side of things. During business hours when the LAN and WAN are being used, packet loss spikes up from 0-1% (typically pegged at 0%) up to 20%+. We have some Synology NASs that are also in an H/A.. they begin to have heartbeat/syncing issues and can't communicate effectively with each other. When I look at the switches, the activity light pattern reminds me of a broadcast storm. I've sniffed the network for broadcast/multicast traffic and there is a lot but it doesn't look like a loop. Obviously I can't enable broadcast storm protection because that kills CARP.
I started by taking the backup pfSense router out of the picture entirely by physically disconnecting LAN and WAN, but leaving the sync interface online. Packet loss dropped like a rock back to what it should be. Then I plugged LAN and WAN back in, and put it into CARP maintenance mode to ensure that CARP is out of the picture ... WAN started receiving packets, packet loss started dropping on #2 and rising on #1. Interesting - even with CARP disabled, WAN packets still come through to #2.
Both routers are connected to the same modem, and in fact with #2, I'm using MAC address cloning to make it look like #1. Why? Because my ISP will only bind my public IPs to a single MAC address, so I have them all bound to #1 and I have to pretend to be #1 when I'm failing over to #2.
Now as a reader, you wouldn't be particularly surprised this is falling apart - except that I set this up almost a year ago and it's worked perfectly until now. Lots of fail-over testing has gone very well, I myself was surprised it worked so well given I had to use MAC cloning and share a modem.
So here's my question.
Is this possibly related to the latest release, since I did only just upgrade to it maybe two weeks ago?
Why does #2 receive WAN packets even with CARP in maintenance mode?
Is it possible to have it only bring up the WAN interface when CARP puts #2 into master mode?
Could this be solved by bringing in a second modem? I'd still have to do the MAC cloning but I'm wondering if having two lines running into one modem might be part of the issue.
Should I be talking to my ISP about their multi-homing setup and seeing if we can bind those IPs to multiple MACs?Basically I don't know where to look. On the LAN side of things, maybe all of the traffic congestion is/was being caused by the network not knowing which router is the primary, and having to repeat packets to a different IP.. I'm not sure, that's a vague guess at best.
-
@ash-0 said in Packet loss when secondary is online.:
Both routers are connected to the same modem, and in fact with #2, I'm using MAC address cloning to make it look like #1. Why? Because my ISP will only bind my public IPs to a single MAC address, so I have them all bound to #1 and I have to pretend to be #1 when I'm failing over to #2.
I would consider that ISP service to be incompatible with CARP/VRRP/HA in that case.
-
Just wanted to post an update on this.
I found out something interesting. Multi-path homing, at least the way my ISP does it, allows a single MAC address to be assigned to an IP address.
Occasionally, there's some kind of auditing process that happens once in a while. During initial set-up, I cloned the primary router's MAC into the secondary; that way, if the primary went down, the secondary would appear to be the same as the primary (from the ISP's perspective) and voila, packets flowing smoothly, CARP does its thing.
That audit process breaks things, though, so that doesn't work. In other words, it wasn't a failure in the equipment so much as just luck that it worked at all in the first place.
-
@derelict correct.