Duplicate WAN IP Takes down pfSense. Is this plausible?

DigitalPlumberNZ

Hi all

Forgive my vagueness and the lack of logging. This relates to an employment disciplinary issue, and I either don't have the logs in question or am forbidden from releasing them as a condition of even getting the logs.

That out of the way, I happened to be working in a rack housing my employer's network equipment when the firewall, a Netgate C2758 running pfSense 2.3.1, threw its toys. I am being blamed for causing the outage because I was installing pfSense on a physically-adjacent C2758 around the time that the outage manifested; a C2758 with no connected NICs aside from the IPMI interface.

The presentation was a failure to forward traffic, resulting in loss of service. This failure affected, as best we could tell, traffic inbound on any interface, including physical (WAN) and VLAN, even on interfaces where the inbound rules amount to "allow any/any". I could originate ping traffic outbound from console on the firewall, but nothing would pass through. A reboot resulted in the firewall failing to come back up, apparently with a corrupted filesystem as when it finally (after 30+ minutes) boots you cannot log in as any user and if it's rebooted to single and fsck run there are many, maaaaaaaaaaany broken things.

My employer has supplied the system logs to an external contractor (if you're reading this Steven, hi!), who has announced that the cause of the outage was that a duplicate WAN IP address appeared on the network at 08:14, the time when external services were observed to fail (OpenVPN user reported connection drop, IPsec tunnels start retrying and failing, etc).
At least, I think that's what he's saying based on the response to the question "From the logs can you tell what happened around the time that [the firewall] became unstable?", the consultant has answered "It seems very apparent that the configuration was overwritten and duplicate IP address appeared causing loss of connectivity."

How I managed to overwrite the config when by the consultant's own evidence I did not perform a console login to the firewall until 08:31, well, I don't know. But I retrieved a copy of the config.xml from the dead firewall and used it to get the newly-installed firewall up and running so whatever I supposedly did to overwrite I clearly did it very poorly. The firewall also continued to log traffic for more than 20 minutes after the outage initially presented, hooray for remote syslog.

I know this is horribly confused and confusing, and the situation is not in any way aided by my employer constantly adjusting their narrative. The consultant appears to have been told that I was there to go from one firewall operational firewall to having implemented a full CARP pair, based on his other answers, when my employer's been told multiple times that my sole task on the visit was to install pfSense and get IPMI running so I could complete configuration from the office.

So, with that somewhat vague background, and ignoring the difficulty of a duplicate IP appearing from a system that was installed out-of-the-box (literally zero configuration performed) and not connected to the network, is it at all plausible that a duplicate WAN IP address appearing on the network would cause pfSense to stop forwarding traffic from any source to any destination and then lock up so hard that the box had to be taken out of service?

If you're answering, please indicate your level of expertise. I'm hoping to wave this discussion about to demonstrate that what's claimed is simply not credible.

Derelict

Yes. Causing a duplicate IP address for WAN address will cause just about all traffic to stop, depending on the nature of the traffic.

DigitalPlumberNZ

@Derelict:

Yes. Causing a duplicate IP address for WAN address will cause just about all traffic to stop, depending on the nature of the traffic.

Including traffic on other networks that have no relationship to the WAN? eg: traffic between networks on OPT1 and OPT2. That sounds pretty severe.

DigitalPlumberNZ

@Derelict:

Yes. Causing a duplicate IP address for WAN address will cause just about all traffic to stop, depending on the nature of the traffic.

Also, what of the hard lock-up? That does not sound like a reasonable response to detecting a duplicate WAN IP address.

Derelict

Doubt any of that actually happened. But it depends on the nature of the traffic as I said before. Creating an IP address conflict for WAN address is not something you want to do on any router/firewall.

DigitalPlumberNZ

@Derelict:

Doubt any of that actually happened. But it depends on the nature of the traffic as I said before. Creating an IP address conflict for WAN address is not something you want to do on any router/firewall.

It absolutely happened, from the loss of forwarding to the hard lock-up on reboot. Traffic from networks inside the firewall could not reach other networks inside the firewall, such as coming in on one VLAN and destined for another VLAN on a different physical port. "Everything is broken" is a quote from a fellow system admin when he rang me. It was very definitely not just traffic related to the WAN.

So, playing make-believe if you really feel you have to, does the scenario I describe sound like something that could be triggered by detection of a duplicate IP address on the WAN interface? Or is it more likely to have been caused by a failure of the system itself?

Derelict

No. A duplicate IP address on WAN should not cause that sort of failure. Something else is likely at play there. Given the information provided that could be anything. It sounds like something you were doing went wrong. The duplicate IP address might just be a symptom of whatever else was done.

DigitalPlumberNZ

@Derelict:

No. A duplicate IP address on WAN should not cause that sort of failure. Something else is likely at play there. Given the information provided that could be anything. It sounds like something you were doing went wrong. The duplicate IP address might just be a symptom of whatever else was done.

Absolute honest truth, I installed pfSense on a box that was totally disconnected apart from its IPMI interface. Even on a separate power phase from the firewall that went down. Configuration performed on the new install prior to having to press it into service was zero beyond changing the admin password. A very brief connection from igb3 to a switch port that was access VLAN 1 (a VLAN otherwise unused on our network, and not configured on the firewalls) to confirm port identity, and that not until about 08:18, at least four minutes after people started reporting loss of service.

I just want it to go away. It sucks being blamed for breaking a system I didn't touch. I was utterly paranoid about the possibility of the new install conflicting, no matter how unlikely, so made sure that all network cables were disconnected prior to starting and then verified switch config before connecting igb3.

I've got the system logs being couriered to me soon, so hopefully I'll actually be able to see what the consultant saw. All I have right now is syslog, which doesn't show anything untoward. At present I just have his report.

DigitalPlumberNZ

Got the logs. No idea how the consultant reached his conclusion as the system.log has no entries until a couple of hours after the outage started. In fact, none of the logs with entries during the timeframe around the outage show anything vaguely helpful except for documenting the precise time when things began going south; the timing-out of OpenvPN connections, the retry attempts for IPsec.

Anyone know what a duplicate IP address looks like in the logs?