Extremely Frustrating Outages
-
The WAN is the only internet connection. OPT1 is reserved for administrative connections, in case something happens to the LAN it gives us a backup. Just DHCP and open firewall rules in case we plug into it. What am I looking for in the WAN capture? While at the site I mentioned above we found the issue to be a device connecting to the wifi, I've been made aware that it is happening to another client of ours so I'm just beginning this all over again. I'm not aware of any possible way that something inside of the network can cause the firewall and modem to lose communication so this is new to me. Thanks for the assistance!
-
The only thing I could imagine causing a problem from inside is something spoofing the MAC or IP maybe. Even then it would prevent the inside clients connect9ing but should not stop pfSense seeing the modem.
One thing we do se relatively often is a rogue DHCP server. A router being used as an access point that decided to go back to being an access point for example. Or I once saw a situation with a cell phone configured in hotspot mode that stole dhcp clients when it was broughr into the office. Tough to troubleshoot because it only happened when all the employees were present.
That can appear as really weird behavior.I only asked about OPT1 because you said you tested using that port as WAN and it was OK. But presumably it has a different MAC so might pull a different IP?
Steve
-
@stephenw10 said in Extremely Frustrating Outages:
The only thing I could imagine causing a problem from inside is something spoofing the MAC or IP maybe. Even then it would prevent the inside clients connecting but should not stop pfSense seeing the modem.
Which is exactly my thoughts. How could something in the network prevent the firewall from seeing the modem? How could something in the network prevent even other devices from seeing the modem? That's what makes it so frustrating as it just doesn't seem possible.
One thing we do se relatively often is a rogue DHCP server. A router being used as an access point that decided to go back to being an access point for example. Or I once saw a situation with a cell phone configured in hotspot mode that stole dhcp clients when it was broughr into the office. Tough to troubleshoot because it only happened when all the employees were present.
That can appear as really weird behavior.I've seen plenty of times with routers resetting breaking a network. I usually find those with devices outside of the network scope or if I unplug the router and arp the IP. Never thought of a phone hotspot mucking up DHCP, though. Regardless, all those would do is stop internal traffic from getting out.
I only asked about OPT1 because you said you tested using that port as WAN and it was OK. But presumably it has a different MAC so might pull a different IP?
Steve
I see. I plugged in OPT1 purely from a Layer 1 perspective to see if some kind of voltage on the line (since Spectrum was blaming voltage feedback) would be causing the issue. However this is happening, everything appears fine inside of the network. The modem just becomes unresponsive, dropping packets and facing very high latency. To me, that would indicate a modem issue but, at least at 1 client, it isn't.
-
Ok pcap on the WAN when this is happening. Try to access the modem. See what's happening in the capture. Is the modem actually talking that long to respond? Errors? Re-transmissions?
Steve
-
@stephenw10 said in Extremely Frustrating Outages:
Ok pcap on the WAN when this is happening. Try to access the modem. See what's happening in the capture. If the modem actually talking thar long to respond? Errors? Re-transmissions?
Steve
This site is remote to me so I'm a little limited on what I can do. I was remotely connected into the firewall when they started having issues again. I managed to get a pcap but couldn't connect to a PC to try to log into the modem as the service was too bad. By the time I got in the service had corrected itself. I'm not sure what I'm looking for in the pcap, though. Normally I'd go pick it apart by protocol to diagnose an SMB, FTP or SIP issue. I do see a lot of Protocol=QUIC, Info=.....Len=55[Malformed Packet]. Not sure how normal that is.
-
@stewart said in Extremely Frustrating Outages:
QUIC
Here's what QUIC is. If you're getting malformed packets, that tends to indicate a hardware issue nearby. Malformed packets shouldn't be passing through routers or switches, as they'd be caught with the CRC check. What MAC address are they coming from? That would indicate the failing hardware.
-
Hardware offloading in the NIC can make the checksum appear invalid in a pcap.
I would disable all hardware offloading anyway in Sys > Adv > Net.
Steve
-
@jknott said in Extremely Frustrating Outages:
@stewart said in Extremely Frustrating Outages:
QUIC
Here's what QUIC is. If you're getting malformed packets, that tends to indicate a hardware issue nearby. Malformed packets shouldn't be passing through routers or switches, as they'd be caught with the CRC check. What MAC address are they coming from? That would indicate the failing hardware.
I see the Malformed Packets coming into my pfSense box from the modem MAC address but I also see them leaving my pfSense box going into the modem MAC address. That would indicate that Wireshark is saying that packets coming and going are all malformed. Perhaps that is due to the Hardware offloading that @stephenw10 was mentioning?
-
I've now checked the Disable hardware checksum offload box.
I did manage to get another packet capture. There are hundreds, if not more, of
-TCP Retransmissions
-TCP Dup ACK
-TCP Out of Order
-TCP Previous segment not captured -
-
You mean throw a switch in there with port mirroring into a PC and run wireshark on there?
-
Yes, just in case the pfsense NIC is the source. If the errors appear in Packet Capture, but not Wireshark that's likely the cause.
-
@jknott said in Extremely Frustrating Outages:
Yes, just in case the pfsense NIC is the source. If the errors appear in Packet Capture, but not Wireshark that's likely the cause.
In the first site that had this issue, that's what I thought as a possibility so I swapped the firewall. Can't say for sure that it's the same as this site but at the last site it didn't help. The errors persisted across 2 firewalls.
-
Here's a snippet from when things are bad.
-
Can you upload the capture?
-
@jknott I can tomorrow, but wouldn't want it public? How should I send it to you?
-
-
Mmm, that pcap is pretty ugly though.
No packet loss on the WAN when this happens?
Almost looks like asymmetry. I'd still be looking for something on the wifi providing an alternate route somehow.
Steve
-
@stephenw10 said in Extremely Frustrating Outages:
Mmm, that pcap is pretty ugly though.
No packet loss on the WAN when this happens?
Almost looks like asymmetry. I'd still be looking for something on the wifi providing an alternate route somehow.
Steve
From inside the network I ping:
Switch - No packet loss
LAN IP - No Packet loss
WAN IP - Some packet loss when the logs show services restarting due to the Gateway going up and down.
Gateway IP (modem) - Similar packet loss but also high latency during the issues.From outside the network I ping:
Gateway IP (modem) - Packet Loss and High Ping
WAN IP - Packet Loss and High PingIn the case of the first client I also had a laptop plugged directly into the modem with a spare public IP assigned to it. During the issues I would see:
Gateway IP (modem) - Packet Loss and High Ping
WAN IP - Packet Loss which I believe is due to the interface restarting as the gateway goes up and down.If something on the wifi were causing an alternate route, how could that affect me from being able to ping the modem remotely? It would just mess up the packets inside the network, no?
-
Indeed it would not.
From that description it looks far more like an upstream issue. A failing modem or whatever that is connected to.
Steve