Internal NIC crashes down / no buffer space available

Gimli

You're welcome. It's been four days now and still no crash for me.

I'll note that I also changed the size of the receive and transmit buffers under the advanced tab > performance options as well, to 1024 and 2048 respectively. I don't think this is what did it but that's another thing you may try if disabling the energy efficient ethernet option didn't do it.

Gimli

Well, that was short-lived. My interface started crashing again on really fast transfers tonight. Same symptoms as before, it just took a few more days to start happening. Guess it wasn't the energy-efficient Ethernet setting after all.

Back to the drawing board…

Kaavi

Gimli, I have the exact same problem as you - did you find any solution back at the drawing board? :)

Gimli

I haven't had a lot of time to do any more testing but I'm starting to think it may be a bug in the FreeBSD driver for the Hyper-V virtual NIC. I have a different box on which I installed pfSense natively (i.e. not as a VM) with the same NIC and I don't see the issue on that one.

Kaavi

Gimli thanks for your reply, I tried to use another NIC (X552/x557-AT) instead of the I350 - unfortunally it is the same error :(

So I guess you are correct about it being FreeBSD/Hyper-V issue :(

Gimli

Alright, here's an update on this issue.

For the last few weeks I haven't experienced the problem but I don't think I fixed it, it's more of a workaround. I added a cron job on the pfSense box that resets the interface that usually goes down with heavy usage at midnight every day. It appears that cycling the interface down/up before it crashes keeps it from crashing. The cycling is so fast that it doesn't even break connections that are active, it just delays them for a few milliseconds.

Vorland

We had the same problem, our setup:

Windows Server 2012 (without R2) - Hyper-V host
pfSense 2.2.6-RELEASE (amd64)
NIC: HP NC382i DP Multifunction Gigabit Server Adapter

Problem was resolved by installing all windows updates and updating NIC driver.

Hope this information will be helpful.

Gimli

What time frame are you talking about Vorland? My servers have always been up-to-date on updates and drivers. Maybe it's one of the December updates that fixed it.

I'll disable my cron job for a while to see if it comes back.

Vorland

I've installed all updates on 2016-01-06. I guess updated NIC drivers resolved the issue.

rodymcamp

I also had this issue running freenas on top of esxi, the only information that I could find hinted that I needed to stop using VMXNET nic do to an issue with the free BSD driver and switch back to using the intel virtual nic. I have not had a crash sense.

LEckley

Hi Guys,

Has anyone found a solution to this issue beyond restarting the interface using a cron?

Gimli

I've disabled my cron job since the 22nd and haven't experienced the issue since. I don't think it's a question of drivers as I've had the same Intel drivers since last October (they're the latest) but I think the December patches from Microsoft may have fixed it.

LEckley

@Gimli:

I've disabled my cron job since the 22nd and haven't experienced the issue since. I don't think it's a question of drivers as I've had the same Intel drivers since last October (they're the latest) but I think the December patches from Microsoft may have fixed it.

Thanks! I pushed the last round of MS updates last night, will see how it goes.

Cheers!

SkinnyT

I've been having the same issue with the connection dropping when running a speed test. It'll only crash on the upload though.

After about 3 days of pulling my hair out, I ended up disabling the Energy Efficient Ethernet setting on both NIC's and also turning off Flow Control. I also changed the nbmclusters to 1000000. I tried each of these settings on their own with no luck, but all three together seem to have made a difference.

All week I couldn't run one speedtest without dropping my WAN connection and last night I ran a test every 15 minutes for about two hours with no drops.

Now that I'm at work, I'm a little more hesitant to remote in and run one for fear of it dropping again, but hopefully in a few days I'll have a bit more confidence in it.

bean72

I have running into the same issue, for now I have my network running on a backup router until I can resolve this issue. What's weird is when it drops, I can still ping certain external IP's. My network adapter is a BCM5716, tried updating the drivers and still have the same problem.
Anyone seeing anything in the logs on the hyper-V host?
It seems that the error starts at the same time as event viewer logs the error: "The network link is down. Check to make sure the network cable is properly connected"

I thought it was originally a faulty cable, but after switching the cable 3 times I'm guessing it's something else.

Gimli

All I can recommend at this point is to make sure that you're running the latest version of pfSense, your host is fully up-to-date with Microsoft patches and NIC driver updates and that you disable the Energy Saving mode(s) in your host's NIC configuration.

If that all fails, try with a different NIC.

jsingh

Loosing connectivity with external switch on Hyper-V
I have installed 2.3.1 release as a Hyper-V guest on server 2012 R2. WAN is working fine with an External Switch but LAN is connected with other NIC (Connected to the LAN). LAN is loosing connectivity and I restart the LAN interface most of the times or I have to reboot the pfsense guest OS.

Even I have tried to use the internal switch on the LAN side and issue still exists. It is for sure not my NIC. It is something to do with the Hyper-V settings or pfSense.
I ran similar setup in test lab on VMware Workstation and it works like a charm on it.

Any solution guys!

Gimli

Is your Windows host fully up-to-date with patches? Have you downloaded the most recent network drivers for your NIC? Have you disabled all power saving settings in your NIC configuration?

ttocs

For those experiencing the 'No buffer space available' followed by full NIC failure on the WAN side when running PFsense in hyper-v try the following, it worked for me:

Pfsense Version: 2.4.5-RELEASE-p1
Hyper-V versions tested: Hyper Server 2019 (Core), Windows Server 2019 w/destop experience and hyper-v role, Windows Server 2016 w/desktop experience and hyper-v role
Cable Internet Speed: 200/10
For the USB NIC - I validated it did not matter if it was hooked to USB 3.x or 2.x - same issues occured with the disconnect. Validated there was not any thermal issues, maybe luke warm to the touch (tried 2 differnt adaptors, 2 different chipsets - same issues)
Drivers: Updated every driver and win updates - in the end this did not even matter, but it's still a good idea.
Services running on PFSense: I have pfblockerNg running, dhcp server, snort (non-blocking), dnsbl with the resolver, and I redirect my domain dns queries back to my internal DCs for private AD dns routing.
Avg 24hour cpu/memory usage: 7% / 13% (no change even when the issue was occuring)
Correlating errors: Resolver: 'No buffer space available' - Gateways: 'dpinger WAN_DHCP 1.2.3.4: Alarm latency 10331us stddev 2932us loss 21%' [this triggered the default gateway action and causes the issue with hyper-v nic comms]

Fix for me:

Make sure you have the Hyper-v host's performance options set to high performance. If you are using a USB NIC on the WAN side also make sure to disable the 'USB selective suspend' setting (advanced settings --> usb settings).
Recommend turning VMQ off in hyper-v and the NIC settings (if available). I cannot see this being needed with Pfsense and might be tricky to get working correctly (if at all) If you have a more advanced scenario where you need to deal with vRSS mapping the VMQs to distribute the packet load across cpus then maybe it's worth diving into.
This was the key for me with Hyper-v: In PFSense make sure to turn off the Gateway Monitoring Action here: System --> Routing --> Gateways --> Edit --> check the box 'Disable Gateway Monitoring Action'. Without this I would get around 20-24 hours max before the gateway alarm action would kick off (probably from junk latency on the cable network providers side), suspend the Nic and then it would never come back -- had to reboot then everything worked fine for another 20-24 hours.

Note: I've tried proxmox and esxi and did not experience this issue so it appears to be Hyper-v specific.