Frequent internet loss - need help figuring out where and why? Maybe pfSense, Modem, ISP, or all 3?
-
EDITED TO CORRECT POINTER TO CORRECT FIELD IN "Lease Requirements and Requests"
Try the fix I discussed in the prior post I referenced. You are probably hitting the same issue I saw.
"In the "Lease Requirements and Requests" section for WAN DHCP in the field "Option modifiers" add the text without quotes: "supersede interface-mtu 0""
-
I shall give that a shot! Do I need a reboot/reload after that change is made?
Probably be at least 2 weeks until I declare "fixed" but I may know sooner if it dies. Interesting that they would set such a small MTU, I've never really seen anything under 1500 used "in the wild"
-
Please report back your results, especially with regard to whether the 'arpresolve can't allocate llinfo for $GATEWAY' errors go away with the fix in place. The granted lease details won't change, but the default pfSense MTU of 1500 should be in effect after the fix.
-
Maybe I'm still missing something. I don't see that field listed where I expected it to be under Interfaces > WAN
Am I missing something? This is 2.4.4
-
Yes, scroll down a bit more. "Option modifiers" appears below in the "Lease Requirements and Requests" section.
I would reboot, but saving and applying the fix should resolve things. You can check the MTU before and after the fix at the command prompt with 'ifconfig re0'.
-
Ah, I found it as you said under the "Lease Requirements and Requests" heading. I was looking for a keyword heading "Advanced" or checkbox "Advanced". I'll do a reboot for good measure, it seems reasonable enough.
I will certainly report back either way - if it is still having problems or showing those lines or if I think it's fixed after a while.
Thanks!
-
I have edited the post above, and my prior post to reflect the correct location of "Option modifiers" in the "Lease Requirements and Requests" section.
I really think this fix needs to be documented, as the underlying problem causes all sorts of flakiness beyond DHCP renewal/ARP quirks, including the failure of certain web sites, like newyorker.com, to load.
-
Interesting note - prior to the reboot after applying that I had intermittent connectivity and odd entries in my dmesg output.
I think after a reboot it settled out but in case you are interested here is what it showed:
arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: link state changed to UP arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: link state changed to DOWN re0: link state changed to UP nd6_setmtu0: new link MTU on re0 (576) is too small for IPv6 re0: link state changed to DOWN arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: link state changed to UP re0: link state changed to DOWN re0: link state changed to UP re0: link state changed to DOWN nd6_setmtu0: new link MTU on re0 (576) is too small for IPv6 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: link state changed to UP arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: link state changed to DOWN re0: link state changed to UP nd6_setmtu0: new link MTU on re0 (576) is too small for IPv6 re0: link state changed to DOWN re0: link state changed to UP arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: link state changed to DOWN re0: link state changed to UP nd6_setmtu0: new link MTU on re0 (576) is too small for IPv6 re0: link state changed to DOWN arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: link state changed to UP re0: link state changed to DOWN re0: link state changed to UP re0: link state changed to DOWN nd6_setmtu0: new link MTU on re0 (576) is too small for IPv6 re0: link state changed to UP re0: link state changed to DOWN re0: link state changed to UP nd6_setmtu0: new link MTU on re0 (576) is too small for IPv6 re0: link state changed to DOWN arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: link state changed to UP re0: link state changed to DOWN re0: link state changed to UP nd6_setmtu0: new link MTU on re0 (576) is too small for IPv6 re0: link state changed to DOWN arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: link state changed to UP re0: link state changed to DOWN re0: link state changed to UP nd6_setmtu0: new link MTU on re0 (576) is too small for IPv6 re0: link state changed to DOWN arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: link state changed to UP re0: link state changed to DOWN re0: link state changed to UP nd6_setmtu0: new link MTU on re0 (576) is too small for IPv6 re0: link state changed to DOWN arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: link state changed to UP re0: link state changed to DOWN re0: link state changed to UP nd6_setmtu0: new link MTU on re0 (576) is too small for IPv6 re0: link state changed to DOWN arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: link state changed to UP arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: link state changed to DOWN re0: link state changed to UP nd6_setmtu0: new link MTU on re0 (576) is too small for IPv6 re0: link state changed to DOWN arpresolve: can't allocate llinfo for 74.214.49.1 on re0
AFAIK my ISP does not support IPv6 and I don't appear to have any IPv6 connectivity. I'm assuming it just got confused with changing things until it was rebooted.
-
Yes, I think this pretty clearly shows you hit this 576 MTU oddity. For my friend it broke The New Yorker website, and iHeart Radio. I was more concerned with the firewall unpredictably falling off the net. It turned out that the two problems had the same root cause of a too small MTU.
-
Not sure why the forum thinks I'm spamming posting the output of ifconfig but it does.
After a reboot its still showing ifconfig re0 with mtu 576.
Any thoughts? Maybe I need to manually enter 1500 in the WAN settings?
-
Please post a screen shot of the "Lease Requirements and Requests" as you have it filled out. If you have the supersede statement in there correctly, it may be a quirk of the (re) driver interacting with dhclient. Setting 1500 explicitly for the MTU might fix it, but I would try a cold boot after confirming you have the incantation entered correctly. In the meanwhile, I will double check how I have it set on my friend's machine...
-
Just re-re-re read your post and you said without quotes. Maybe I should have gone to bed while it was still yesterday. :)
Took out my " " from the option modifiers, rebooted AGAIN and now I see mtu 1500 in ifconfig re0!
Seems like a good sign that its at least now doing what you (and I) was expecting.
-
Excellent! Please follow up on this if it is fixed. I think that Netgate needs to put this in the formal documentation. It is a sneaky little quirk from upstream. pfSense, and dhclient are doing the right thing following the DHCP lease parameters issued, but the cable modem hardware from the ISP is giving out bad settings for setting up the connection.
-
Negative success, just had total outage tonight. Had to reboot pfsense to get it to come back online.
Origin="GenuineIntel" Id=0x406c3 Family=0x6 Model=0x4c Stepping=3 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x43d8e3bf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,AESNI,RDRAND> AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM> AMD Features2=0x101<LAHF,Prefetch> Structured Extended Features=0x2282<TSCADJ,SMEP,ERMS,NFPUSG> VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics padlock0: No ACE support. aesni0: <AES-CBC,AES-XTS,AES-GCM,AES-ICM> on motherboard re1: link state changed to DOWN vlan0: changing name to 're1.2' vlan1: changing name to 're1.3' re0: link state changed to DOWN re1: link state changed to UP re1.2: link state changed to UP re1.3: link state changed to UP re0: link state changed to UP tun1: changing name to 'ovpns1' ovpns1: link state changed to UP tun2: changing name to 'ovpns2' ovpns2: link state changed to UP pflog0: promiscuous mode enabled ugen0.5: <vendor 0x8087 product 0x07dc> at usbus0 (disconnected) ugen0.5: <vendor 0x8087 product 0x07dc> at usbus0 re0: link state changed to DOWN re0: link state changed to UP ovpns1: link state changed to DOWN ovpns1: link state changed to UP ovpns2: link state changed to DOWN ovpns2: link state changed to UP re0: watchdog timeout re0: link state changed to DOWN re0: link state changed to UP ovpns1: link state changed to DOWN ovpns1: link state changed to UP ovpns2: link state changed to DOWN ovpns2: link state changed to UP ugen0.5: <vendor 0x8087 product 0x07dc> at usbus0 (disconnected) ugen0.5: <vendor 0x8087 product 0x07dc> at usbus0 ugen0.5: <vendor 0x8087 product 0x07dc> at usbus0 (disconnected) ugen0.5: <vendor 0x8087 product 0x07dc> at usbus0 ugen0.2: <American Power Conversion Back-UPS ES 750 FW841.I3 .D USB FWI3> at usbus0 (disconnected) ugen0.2: <American Power Conversion Back-UPS ES 750 FW841.I3 .D USB FWI3> at usbus0 re0: watchdog timeout re0: link state changed to DOWN re0: link state changed to UP arpresolve: can't allocate llinfo for 74.214.49.1 on re0 ovpns1: link state changed to DOWN ovpns1: link state changed to UP ovpns2: link state changed to DOWN ovpns2: link state changed to UP ugen0.5: <vendor 0x8087 product 0x07dc> at usbus0 (disconnected) ugen0.5: <vendor 0x8087 product 0x07dc> at usbus0 re0: watchdog timeout re0: link state changed to DOWN arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: link state changed to UP arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 ovpns1: link state changed to DOWN ovpns1: link state changed to UP ovpns2: link state changed to DOWN ovpns2: link state changed to UP re0: watchdog timeout re0: link state changed to DOWN arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: link state changed to UP ovpns1: link state changed to DOWN ovpns1: link state changed to UP ovpns2: link state changed to DOWN ovpns2: link state changed to UP re0: watchdog timeout re0: link state changed to DOWN arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: link state changed to UP arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: watchdog timeout re0: link state changed to DOWN arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: link state changed to UP arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: watchdog timeout re0: link state changed to DOWN arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: link state changed to UP arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: watchdog timeout re0: link state changed to DOWN re0: link state changed to UP arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: watchdog timeout re0: link state changed to DOWN arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: link state changed to UP arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: watchdog timeout re0: link state changed to DOWN re0: link state changed to UP re0: watchdog timeout re0: link state changed to DOWN arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: link state changed to UP re0: watchdog timeout re0: link state changed to DOWN re0: link state changed to UP re0: watchdog timeout re0: link state changed to DOWN re0: link state changed to UP arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: watchdog timeout re0: link state changed to DOWN re0: link state changed to UP re0: watchdog timeout re0: link state changed to DOWN re0: link state changed to UP arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: watchdog timeout re0: link state changed to DOWN re0: link state changed to UP re0: watchdog timeout re0: link state changed to DOWN re0: link state changed to UP arpresolve: can't allocate llinfo for 74.214.49.1 on re0 re0: watchdog timeout re0: link state changed to DOWN re0: link state changed to UP re0: watchdog timeout re0: link state changed to DOWN re0: link state changed to UP re0: watchdog timeout re0: link state changed to DOWN
Any other thoughts? I see my modem errors are creeping up again (tho only in the few-hundreds this time, not up to 1000 yet)
-
@mmiller7 said in Frequent internet loss - need help figuring out where and why? Maybe pfSense, Modem, ISP, or all 3?:
re0: watchdog timeout
That's a very common issue with Realtek crap NICs, you can try to use the official Realtek driver (hint: look into the Hardware section) or better yet switch to Intel NICs.
-
I agree with Grimson. The Realtek NICs can be very dodgy to work with. You should make sure that you have disabled all three of:
Hardware Checksum Offloading
Hardware TCP Segmentation Offloading
Hardware Large Receive Offloadingat the bottom of System/Advanced/Networking
The "supersede interface-mtu 0" fix remains necessary for you if you were having the arpresolve/llinfo errors and frequent drops without it. The fix is now referenced in the upgrade guide for 2.4.4 for cases where the advanced options section has been touched.
You can look through the network card tuning recommendations, and try variations on the MSI/MSIX fixes you see there by adapting them for (re) cards.
https://www.netgate.com/docs/pfsense/hardware/tuning-and-troubleshooting-network-cards.html
For example, adding something like these in /boot/loader.conf.local
net.inet.tcp.tso=0
hw.pci.enable_msix=0
hw.pci.enable_msi=0
hw.re.tso_enable=0Take a look through the forums, and you will see that many people have problems with Realtek hardware.
I hope this helps.
-
@grimson The Zotac is a NUC-style low power mini box so the NICs can't be changed and it has no expansion slots, I did try a USB NIC already (AX88179) and it had the exact same problem. Actually I think everything I own that isn't a laptop has Realtek NICs on the motherboard.
Both my WAN and LAN are Realtek chips, re1 (the LAN) never seems to blink (metaphorically, that is), its only re0 (WAN) that is blowing up. The re1 LAN has even more throughput because its got 3 VLANs going thru it vs re0 has no VLANs.
-
@bfeitell said in Frequent internet loss - need help figuring out where and why? Maybe pfSense, Modem, ISP, or all 3?:
I agree with Grimson. The Realtek NICs can be very dodgy to work with. You should make sure that you have disabled all three of:
Hardware Checksum Offloading
Hardware TCP Segmentation Offloading
Hardware Large Receive Offloadingat the bottom of System/Advanced/Networking
All 3 of those were already disabled
The "supersede interface-mtu 0" fix remains necessary for you if you were having the arpresolve/llinfo errors and frequent drops without it. The fix is now referenced in the upgrade guide for 2.4.4 for cases where the advanced options section has been touched.
I'll leave it in - I think it did (slightly) help my speeds even if it didn't help my reliability. I was seeing that prior to 2.4.4 (I hoped upgrading would help things).
You can look through the network card tuning recommendations, and try variations on the MSI/MSIX fixes you see there by adapting them for (re) cards.
https://www.netgate.com/docs/pfsense/hardware/tuning-and-troubleshooting-network-cards.html
For example, adding something like these in /boot/loader.conf.local
net.inet.tcp.tso=0
hw.pci.enable_msix=0
hw.pci.enable_msi=0
hw.re.tso_enable=0Take a look through the forums, and you will see that many people have problems with Realtek hardware.
I hope this helps.
I'll look thru those and see what I can add.
I'm still wondering about the modem - anyone think it could be going bad with those errors that keep jumping up to high numbers shortly before it dies? I just don't understand why it would crash pfSense if the modem stops passing data for a while? And the fact it ran stable for over a year, then now is unstable seems odd it would be a hardware incompatibility?
-
@mmiller7 said in Frequent internet loss - need help figuring out where and why? Maybe pfSense, Modem, ISP, or all 3?:
@grimson The Zotac is a NUC-style low power mini box so the NICs can't be changed and it has no expansion slots, I did try a USB NIC already (AX88179) and it had the exact same problem.
Then it is simply a bad choice for a pfSense installation.
@mmiller7 said in Frequent internet loss - need help figuring out where and why? Maybe pfSense, Modem, ISP, or all 3?:
Actually I think everything I own that isn't a laptop has Realtek NICs on the motherboard.
And those are all consumer grade devices, primarily intended to run Windows where the Realtek NICs work halfway decent (in a consumer use-case). pfSense is designed to run on enterprise grade hardware and based on FreeBSD.
The Realtek drivers from FreeBSD are pretty bad, the FreeBSD drivers from Realtek themselves are a bit better, but far from the quality of Intel (or Broadcom) drivers. If you want a stable and reliable pfSense installation you need to switch hardware. That's simply how it is, if you don't believe me check the hardware section and the FreeBSD forums. If you don't believe them and still insist on using Realtek NICs you'll have to live with crashes and issues related to those interfaces.
Those are the facts, and for me there is no reason to discuss this any further.
-
There may be some issues -- but I can tell you there are also plenty of people successfully using this same Zotac box with pfSense based on the reviews I was reading and multiple others I personally know who are using pfSense on it with no problems. Also the fact it worked for well over a year without any problems for me, seemingly it can't be that bad if I'm only starting to see issues in the past month with the same configuration. It also doesn't explain why the SAME chipset is working totally fine with the LAN interface even when the WAN crashes out.
I've used other FreeBSD based "appliances" including Monowall and FreeNAS (pre-0.8) - I know there can be issues with drivers, I have seen it. This doesn't fit the pattern I've seen with other incompatible devices though - in all those cases it would be unstable or slow out of the box, not working for a long period of time then blow up.