Perplexing Problem with PFSense
-
@itworxnz Different physical houses? With different grounds? I’d expect copper linking them to ground them together putting voltage on the wires. That can burn ports out. Use fiber between buildings.
Aside from that I’ve seen a bad NIC or switch port cause a switch to go wonky.
-
Realtek NICs that are throwing watchdog timeouts will almost inevitably stop responding at some point.
You could try the alternative driver. But you'd be better of using something with Intel NICs.Steve
-
The Switches are managed?
Do you use Loop Protection or Spanning Tree?
It could be a Loop, this will broadcast your Gateway out of the world.
And SteveITS point with the ground is very important! -
@ITWorxNZ I had a similar problem but it was some years ago and eventually an upgrade of the device in question solved the issue. It turned out to be a laptop which would occasionally go bananas and completely flood the network. Disconnecting or rebooting it solved it until next time.
So I'm thinking @NOCling has a good point on the Loop issue, I would investigate that. Is it always the same houses that you need to disconnect?
-
Yes, what actually fails when this happens?
Can any house ping any other house?
-
@steveits Hmm, that is interesting about the grounding. Will put that on the list to look into. We've changed switches and routers and the problem still happens.
-
@stephenw10 Tried the 197 driver, one of the NICs didn't like it. But I tend to discount the timeouts, as we got them on the Intel NIC machine too, and the time it happens doesn't correspond with the outages.
-
@itworxnz Wouldn't grounding just fault the involved Eth port, not the entire switch? Which means if it's something from one of the houses, at max it would create a problem for that house, not everyone else?
-
(Replies all done as one post. Thanks to all)
The network just goes unresponsive. Can't even ping the PFsense box.
No loop protection or spanning tree, the switches are all unmanaged.
It's always the same two blocks out of seven that seem to cause it, but everyone is affected.
But now I'm thinking that if we bypass the wired connections in the wall, that will more or less prove that we have a CAT5 fault somewhere and it's time to replace wires. That's about the only thing we haven't done.
Today I tried to change the DNS settings from a forwarder to a resolver, but I couldn't make it work. Might have to talk to the ISP.
Is there a guide somewhere to correctly interpreting the output from pftop?
-
@gblenn said in Perplexing Problem with PFSense:
@itworxnz Wouldn't grounding just fault the involved Eth port, not the entire switch? Which means if it's something from one of the houses, at max it would create a problem for that house, not everyone else?
I should clarify I wasn’t necessarily saying that was OP’s problem, but it can cause devices to be flaky or burn out, especially around electrical storms.
Reference for OP: https://www.cablinginstall.com/cable/article/16465312/ground-potentials-and-damage-to-lan-equipment
-
@itworxnz said in Perplexing Problem with PFSense:
It's always the same two blocks out of seven that seem to cause it, but everyone is affected.
If the router/gateway went down everyone would be affected but the different hosts in the same subnet would still be able to connect to each other. Can we assume that isn't case?
When this happens if you run a pcap somewhere do you see anything incoming?
-
@itworxnz If it's a cable fault and the problem occurs intermittently, it has to be something physically affecting the cable doesn't it. Temperature variations, vibrations or movement somehow...? But even if that happens, how does that create a loop? One cable can't really make a loop, unless there is an additional cable which is broken, but sometimes "springs to life" activating the loop?
It's always the same two blocks out of seven that seem to cause it, but everyone is affected.
Assuming it is a loop fault with two blocks involved you would need a direct connection between those two block switches somehow. The way you built it, cables run from master switch to each of the block switches, and that's it, right?
So if you didn't put a cable connecting two blocks directly, it would have to be the tenants making it happen somehow? Two houses in different blocks, sharing connection on the main subnet (wan side of house routers). -
@steveits said in Perplexing Problem with PFSense:
Different physical houses? With different grounds? I’d expect copper linking them to ground them together putting voltage on the wires. That can burn ports out. Use fiber between buildings.
While I agree it's a bad idea to run copper between buildings, there is no electrical connection between the NIC and cable. NICs have a transformer for passing the signal, but blocks low frequencies, such as power. Also, twisted pair Ethernet was designed to share cables with telephones, where there could be 90V 20Hz ringing current.
For a bit of history, check out StarLAN.
-
@itworxnz said in Perplexing Problem with PFSense:
That's about the only thing we haven't done.
Have you tried monitoring with something like Wireshark or Packet Capture to see what's actually on the wire? Do the LEDs show anything unusual?
-
Just a thought... the houses in these two blocks, do all of them have routers? If not, the one's that don't, how are they connecting? Assume they a switch and perhaps also AP's which may be meshed (and hardwired) which could create a loop wouldn't it?
Regardless, changing the block switches to managed switches with loop protection and spanning tree is probably a good idea.
-
@gblenn Excuse me, I think I said the wrong thing. It's two houses in the same block. If either one of them has their connection plugged back in, the internet fails for everyone. But... sometimes, one of them works. Then they both work. Then a week later, the problem happens again. Is quite frustrating.
Haven't tried Wireshark, will look into that.
Some houses have routers (yes, double-natting, but people want their own wireless). Others just have switches to share ethernet around the house.
I think I'm going to try stringing CAT5 out the windows. Not very elegant, but if the problem goes away, we can replace the internal cables as they're obviously the problem.
If it doesn't go away... I'll be hammering my punching bag.
-
@itworxnz Ok, but that's a good thing then, only two houses to focus on. That's way better than 8...
What makes these two houses different? No routers perhaps? If that's the case, adding routers will likely isolate the problem from the rest of the network.
If not, then my guess is there's a problem close to the block switch. Perhaps a fault in the patch panel where those two houses come in to connect into the block switch? Or there is a problem in the switch itself (unlikely?)I don't really see how a faulty cable, or two, could create a loop which could be the cause of broadcast storms. They still only connect to one single interface at each end. Unless they both connect to some switch in between which you are unaware of...
-
@gblenn Afraid not, they both have routers - that have been both swapped out or factory reset and reconfigured. The block switch has been replaced too. No change.
I'm assuming a patch panel fault too, so bypassing those wires will prove it once and for all. I'm only really guessing it's a loop condition, but the problem is exactly like one.
If the bypassing works, we'll get a cable guy in and replace what's behind the patch panels. But after considering all the replies here, the problem does not appear to be PFSense.
-
@itworxnz Yes, simple enough to bypass and test without any long cable runs.
BTW, out of curiosity, are the two houses connecting next to each other in the panel?I changed my patch to a Keystone version when I made some changes at home. Super flexible and simple to use and virtually no risk messing up the cabling since you can use factory patch cables back and front... no need for a cable guy...
-
@stephenw10 said in Perplexing Problem with PFSense:
@itworxnz said in Perplexing Problem with PFSense:
It's always the same two blocks out of seven that seem to cause it, but everyone is affected.
If the router/gateway went down everyone would be affected but the different hosts in the same subnet would still be able to connect to each other. Can we assume that isn't case?
Still need that questions answering to determine what sort of problem you are dealing with. And I would still do this:
When this happens if you run a pcap somewhere do you see anything incoming?
This doesn't seem like a bad cable to me or a bad switch port. Those would only effect devices connected to them. For something to take down the entire subnet across multiple switches such that no traffic can move across the network at all it pretty much has to be a flood of some sort.
But if things can still ping other local hosts just not the local gateway I'd be looking for a rogue dhcp server or something doing ARP poisoning perhaps.
You should really be using VLANs to separate these user groups out. That would prevent something like a rogue dhcp server affecting everyone.
Steve