Internal NIC crashes down / no buffer space available
-
Loosing connectivity with external switch on Hyper-V
I have installed 2.3.1 release as a Hyper-V guest on server 2012 R2. WAN is working fine with an External Switch but LAN is connected with other NIC (Connected to the LAN). LAN is loosing connectivity and I restart the LAN interface most of the times or I have to reboot the pfsense guest OS.Even I have tried to use the internal switch on the LAN side and issue still exists. It is for sure not my NIC. It is something to do with the Hyper-V settings or pfSense.
I ran similar setup in test lab on VMware Workstation and it works like a charm on it.Any solution guys!
-
Is your Windows host fully up-to-date with patches? Have you downloaded the most recent network drivers for your NIC? Have you disabled all power saving settings in your NIC configuration?
-
For those experiencing the 'No buffer space available' followed by full NIC failure on the WAN side when running PFsense in hyper-v try the following, it worked for me:
-
Pfsense Version: 2.4.5-RELEASE-p1
-
Hyper-V versions tested: Hyper Server 2019 (Core), Windows Server 2019 w/destop experience and hyper-v role, Windows Server 2016 w/desktop experience and hyper-v role
-
Cable Internet Speed: 200/10
-
For the USB NIC - I validated it did not matter if it was hooked to USB 3.x or 2.x - same issues occured with the disconnect. Validated there was not any thermal issues, maybe luke warm to the touch (tried 2 differnt adaptors, 2 different chipsets - same issues)
-
Drivers: Updated every driver and win updates - in the end this did not even matter, but it's still a good idea.
Services running on PFSense: I have pfblockerNg running, dhcp server, snort (non-blocking), dnsbl with the resolver, and I redirect my domain dns queries back to my internal DCs for private AD dns routing. -
Avg 24hour cpu/memory usage: 7% / 13% (no change even when the issue was occuring)
-
Correlating errors: Resolver: 'No buffer space available' - Gateways: 'dpinger WAN_DHCP 1.2.3.4: Alarm latency 10331us stddev 2932us loss 21%' [this triggered the default gateway action and causes the issue with hyper-v nic comms]
Fix for me:
-
Make sure you have the Hyper-v host's performance options set to high performance. If you are using a USB NIC on the WAN side also make sure to disable the 'USB selective suspend' setting (advanced settings --> usb settings).
-
Recommend turning VMQ off in hyper-v and the NIC settings (if available). I cannot see this being needed with Pfsense and might be tricky to get working correctly (if at all) If you have a more advanced scenario where you need to deal with vRSS mapping the VMQs to distribute the packet load across cpus then maybe it's worth diving into.
-
This was the key for me with Hyper-v: In PFSense make sure to turn off the Gateway Monitoring Action here: System --> Routing --> Gateways --> Edit --> check the box 'Disable Gateway Monitoring Action'. Without this I would get around 20-24 hours max before the gateway alarm action would kick off (probably from junk latency on the cable network providers side), suspend the Nic and then it would never come back -- had to reboot then everything worked fine for another 20-24 hours.
Note: I've tried proxmox and esxi and did not experience this issue so it appears to be Hyper-v specific.
-