Frequent internet loss - need help figuring out where and why? Maybe pfSense, Modem, ISP, or all 3?
-
@mmiller7 said in Frequent internet loss - need help figuring out where and why? Maybe pfSense, Modem, ISP, or all 3?:
re0: watchdog timeout
That's a very common issue with Realtek crap NICs, you can try to use the official Realtek driver (hint: look into the Hardware section) or better yet switch to Intel NICs.
-
I agree with Grimson. The Realtek NICs can be very dodgy to work with. You should make sure that you have disabled all three of:
Hardware Checksum Offloading
Hardware TCP Segmentation Offloading
Hardware Large Receive Offloadingat the bottom of System/Advanced/Networking
The "supersede interface-mtu 0" fix remains necessary for you if you were having the arpresolve/llinfo errors and frequent drops without it. The fix is now referenced in the upgrade guide for 2.4.4 for cases where the advanced options section has been touched.
You can look through the network card tuning recommendations, and try variations on the MSI/MSIX fixes you see there by adapting them for (re) cards.
https://www.netgate.com/docs/pfsense/hardware/tuning-and-troubleshooting-network-cards.html
For example, adding something like these in /boot/loader.conf.local
net.inet.tcp.tso=0
hw.pci.enable_msix=0
hw.pci.enable_msi=0
hw.re.tso_enable=0Take a look through the forums, and you will see that many people have problems with Realtek hardware.
I hope this helps.
-
@grimson The Zotac is a NUC-style low power mini box so the NICs can't be changed and it has no expansion slots, I did try a USB NIC already (AX88179) and it had the exact same problem. Actually I think everything I own that isn't a laptop has Realtek NICs on the motherboard.
Both my WAN and LAN are Realtek chips, re1 (the LAN) never seems to blink (metaphorically, that is), its only re0 (WAN) that is blowing up. The re1 LAN has even more throughput because its got 3 VLANs going thru it vs re0 has no VLANs.
-
@bfeitell said in Frequent internet loss - need help figuring out where and why? Maybe pfSense, Modem, ISP, or all 3?:
I agree with Grimson. The Realtek NICs can be very dodgy to work with. You should make sure that you have disabled all three of:
Hardware Checksum Offloading
Hardware TCP Segmentation Offloading
Hardware Large Receive Offloadingat the bottom of System/Advanced/Networking
All 3 of those were already disabled
The "supersede interface-mtu 0" fix remains necessary for you if you were having the arpresolve/llinfo errors and frequent drops without it. The fix is now referenced in the upgrade guide for 2.4.4 for cases where the advanced options section has been touched.
I'll leave it in - I think it did (slightly) help my speeds even if it didn't help my reliability. I was seeing that prior to 2.4.4 (I hoped upgrading would help things).
You can look through the network card tuning recommendations, and try variations on the MSI/MSIX fixes you see there by adapting them for (re) cards.
https://www.netgate.com/docs/pfsense/hardware/tuning-and-troubleshooting-network-cards.html
For example, adding something like these in /boot/loader.conf.local
net.inet.tcp.tso=0
hw.pci.enable_msix=0
hw.pci.enable_msi=0
hw.re.tso_enable=0Take a look through the forums, and you will see that many people have problems with Realtek hardware.
I hope this helps.
I'll look thru those and see what I can add.
I'm still wondering about the modem - anyone think it could be going bad with those errors that keep jumping up to high numbers shortly before it dies? I just don't understand why it would crash pfSense if the modem stops passing data for a while? And the fact it ran stable for over a year, then now is unstable seems odd it would be a hardware incompatibility?
-
@mmiller7 said in Frequent internet loss - need help figuring out where and why? Maybe pfSense, Modem, ISP, or all 3?:
@grimson The Zotac is a NUC-style low power mini box so the NICs can't be changed and it has no expansion slots, I did try a USB NIC already (AX88179) and it had the exact same problem.
Then it is simply a bad choice for a pfSense installation.
@mmiller7 said in Frequent internet loss - need help figuring out where and why? Maybe pfSense, Modem, ISP, or all 3?:
Actually I think everything I own that isn't a laptop has Realtek NICs on the motherboard.
And those are all consumer grade devices, primarily intended to run Windows where the Realtek NICs work halfway decent (in a consumer use-case). pfSense is designed to run on enterprise grade hardware and based on FreeBSD.
The Realtek drivers from FreeBSD are pretty bad, the FreeBSD drivers from Realtek themselves are a bit better, but far from the quality of Intel (or Broadcom) drivers. If you want a stable and reliable pfSense installation you need to switch hardware. That's simply how it is, if you don't believe me check the hardware section and the FreeBSD forums. If you don't believe them and still insist on using Realtek NICs you'll have to live with crashes and issues related to those interfaces.
Those are the facts, and for me there is no reason to discuss this any further.
-
There may be some issues -- but I can tell you there are also plenty of people successfully using this same Zotac box with pfSense based on the reviews I was reading and multiple others I personally know who are using pfSense on it with no problems. Also the fact it worked for well over a year without any problems for me, seemingly it can't be that bad if I'm only starting to see issues in the past month with the same configuration. It also doesn't explain why the SAME chipset is working totally fine with the LAN interface even when the WAN crashes out.
I've used other FreeBSD based "appliances" including Monowall and FreeNAS (pre-0.8) - I know there can be issues with drivers, I have seen it. This doesn't fit the pattern I've seen with other incompatible devices though - in all those cases it would be unstable or slow out of the box, not working for a long period of time then blow up.
-
Had another drop-out tonight even with the extra options in there tweaking stuff.
At this point I'm going to try a new modem (one that supports 32x8 channels vs 16x4) and see if that helps any. When I called my ISP the tech dug around a bit and he thought it could be I'm just dropping offline because of too many errors from an over-saturated node. I do see I'm up to 24x3 channels bonded vs 16x3 with the old modem, maybe the extra few channels will help if they over-subscribed the network.
-
If you're seeing those watchdog errors in the re NIC then the only solution that has been reported to work is the alternative driver. Lot's of users with Zotac boxes have hit that issue. I wouldn't bother doing anything else until you try that:
https://forum.netgate.com/topic/135850/official-realtek-driver-binary-1-95-for-2-4-4-releaseSteve
-
Both NICs are the same model, if its a driver why would only one be affected?
re1 has far more traffic (routing between VLANs including IP-cameras to servers) than re0...yet re0 is the only one that seems to choke?
-
@mmiller7 said in Frequent internet loss - need help figuring out where and why? Maybe pfSense, Modem, ISP, or all 3?:
SB6193
Your modem is probably an SB6183.. Correct me if Im wrong.
Im not a big fan of Arris products anymore. Do you have another modem to try?
6183's can get really hot. If they get too hot Ive seen them start to error out. Not every one.. not every customer. But enough that we do not keep them in service for our customers anymore.
You probably mentioned but who is your ISP and what region are you in? Edit- found it.. Metrocast/Atlantic Broadband
-
Modem was a SB6183 that's correct - my thread-setarter post has screenshots attached of the modem status/config/log pages (192.168.100.1) with the signals and errors and logs.
And yes - it got what I consider to be "very hot" measured the exterior of the case better than 120F with an IR thermometer. I tried having a case fan blow thru it, that helped the errors stay from the several 1000's down to several 100's but it still was generating errors after a while especially in evenings.
At one point I even found some forum suggesting cellular interference - I even tried disconnecting my FemToCell (which is the only way to get usable cell service in this area) to rule that out since it sits near where the coax comes into the house. That did not look like it made an appreciable difference.
Only other modem I already owned is an ancient one that I used for maybe 6-8 years in college but it's too old to be supported (it may not even be DOCSIS2.0, I only had like 3Mbps back then). Saturday evening I finally got mad at it and replaced with an Arris SB6190 (which I also have a fan blowing thru the case slots as well) when I called the ISP to have them register the new modem the technician I spoke to thought it could just be at peak times the node I'm on is over capacity and throwing errors.
There could be some credit to the cable-tech's theory on node saturation - as I think about it "most of the time" when I've started seeing errors throwing all over before dropping offline it has started to blow up around 10PM local time. In my tests I have been able to nearly saturate both upload/download (which is not easy to saturate 150-200Mbps downlink) for 8-9 hours while I'm at work just running an infinite loop and it has never been offline when I got home.
Since I really want a separate stand-alone modem that isn't an ISP-managed "all in one router" it looks like Arris is about the only modem available.
Side-note, when I called the ISP for a modem swap I had an unusual experience, getting a tech who was not only partially familiar with Linux setups but also was a ham radio operator and I was able to have an intelligent conversation about why I was wanting to swap modems and what concerns I had about the instability, and how I'd checked signals (which he also verified signals look good when it was not dropped offline from his end). He also described the management system he used to set up my new modem as "archaic" for whatever that is worth. It does not have the ability to just "plug n play" a new modem like some IPSs where you can log in and self-register, there's no captive portal, the only way to swap them is call the tech support and have them replace the head end modem config MAC address from their end.
I wish I had other ISP options or fiber...only thing my parents have had on FiOS is the "North American Fiber-Seeking Backhoe" eating the optical cable every year or two. But alas, out here I have exactly one ISP and unusable cellular, too many trees for satellite (and I rent so no cutting them down).
-
I’ve been having a very similar problem. I’m using a SB6183 as well, but with a PCEngines APU2 as my firewall.
-
Just wanted to post an update - while its only been 4 days since I got the new modem so far I have not had any more lockups/dropouts even pushing 200GB per day transfer (I've been trying to run frequent speed tests, several pings, plus normal traffic). Also while I have some channels with "corrected" frames on the modem its only 10 or so at most and 0 uncorrectable (down from many thousands)
On pfSense Status > Monitoring reports only 0.21% maximum packet loss and 0% average and my ping has stayed below 50mS even under load and immediately returns to <10mS when load lets up.
The dmesg output shows no unexpected messages, no flapping, no "watchdog" errors and no "llinfo" errors. It seems stable once again. Hopefully I didn't just jinx it.
EDIT: 7 days now going strong.