Network goes down after pfsense comes up.

nuro

Hi Al

I suspect this might be a hardware issue, so here goes:
We have been replacing our Linux gateways with pfsense. Pfsense has exceeded our expectations at every turn, so thanks to all the developers :)

Recently we have had a massive setback. We are running the primary gateway on a Sun x2100 (4 nics), with a dual nic. generic machine as backup (using CARP+pfsync).

This is how we configure the machines:
Multiple nics > lagg0 > vlans on lagg. The nics are all connected to a Catalyst 2960 Cisco switch, using an etherchannel.
Pfsense and switch both configured for LACP. The sync interface has it's own vlan.

Now, the primary and backup machines have been running fine for 2 weeks. Yesterday, out of the blue, our whole network was brought to a grinding halt, with 50%+ packet loss. By pure chance, I killed the primary gateway, and when the backup machine kicked in, the network returned to normal. If I switch the primary machine back on, the network dies again.
Both machines have the same config, as they are running XML sync. If I enable only one of the 4 ports for the primary machine (not always the same one), it still happens. Luckily (and thanks to pfsense), the backup gateway is running smoothly.

Dmesg shows nothing out of the ordinary. I'm at a bit of a loss :(

Any ideas?

TIA

nuro

I forgot to mention, the link graph on the cisco shows 1000's of packet errors after 2 or 3 minutes after the port is re-activated.

jasonlitka

@nuro:

I forgot to mention, the link graph on the cisco shows 1000's of packet errors after 2 or 3 minutes after the port is re-activated.

Are both pfSense systems plugged into the same Cisco switch? If so, are they in the same physical block? Most switches use multiple controllers inside and I've had a couple flake out and either not establish links or experience high packet loss but only in a certain 8- or 16-port block (out of 48 ports).

EDIT: Also, try fixing both the switch and your pfSense boxes to 100 FD if either is set to auto-negotiate. I recently had an issue with a new carrier and not being able to get more than 2Mbit/s upstream. As soon as I changed the ports it spiked up to 20Mbit/s (what I'm paying for).

nuro

@Jason:

@nuro:

I forgot to mention, the link graph on the cisco shows 1000's of packet errors after 2 or 3 minutes after the port is re-activated.

Are both pfSense systems plugged into the same Cisco switch? If so, are they in the same physical block? Most switches use multiple controllers inside and I've had a couple flake out and either not establish links or experience high packet loss but only in a certain 8- or 16-port block (out of 48 ports).

EDIT: Also, try fixing both the switch and your pfSense boxes to 100 FD if either is set to auto-negotiate. I recently had an issue with a new carrier and not being able to get more than 2Mbit/s upstream. As soon as I changed the ports it spiked up to 20Mbit/s (what I'm paying for).

The two servers are on separate switches, but the nics for each machine are on the same block. I'm going to try your suggestion and shuffle things around a bit, and set the port to 100 FD. Thanks.