Watchdog timeout on queue 0

megapearl

Hello,

I see lots of the following errors in the log:

kernel vmx0: watchdog timeout on queue 0

Can anyone tell me what this is? Having this on the stable and the development release of pfSense.
Running on vmware (esxi 6) using vmxnet3 (vmx0 is my LAN)

Tested to disable the following options under System/Advanced/Networking:

CHECKED Disable hardware checksum offload
CHECKED Disable hardware TCP segmentation offload
CHECKED Disable hardware large receive offload

But error still comes back.

Are there any other options I can try?
Hope someone can help me out..

Regards.
Donald.

jimp

Very little detail in your message, but it's possible you might be hitting a known issue on 2.3: https://redmine.pfsense.org/issues/6296

megapearl

Still having this issue. (on 3 different esxi hosts, the vm's including pfsense having this issue are all BSD variants)

What I've done:

Updated all esxi versions to v6.0U2
Updated pfSense to dev branche 2.3.3 now on FreeBSD 10.3-RELEASE-p7

What else can I try?

jimp

If you are seeing it on all BSD variants, then the odds are it's nothing you can address in pfSense. I and many others are using vmx NICs with success and no errors of the sort you're showing. It's possible it's still something related to your ESX installation or the hardware behind it.

Personally I have close to a dozen pfSense and FreeBSD VMs that run on ESX 24/7 without such problems.

If it happens on plain FreeBSD, it may also be an issue in the FreeBSD vmx drivers that is beyond our control, if you can repeat it there reliably, you might want to raise the issue on a FreeBSD forum directly.

RulerOf

I just experienced this for the first time, and it wasn't precipitated by anything in particular. The interface stopped processing traffic and it took me a while to figure that out. I saw the following in my system log:

	kernel		vtnet2: watchdog timeout on queue 0

This was on pfSense 2.4.3-RELEASE-p1 running on KVM with paravirtual NICs defined like this in the VM definition XML:

<interface type='bridge'>
  <mac address='12:23:34:45:56:67'/>
  <source bridge='brteam0.1111'/>
  <target dev='vnet1'/>
  <model type='virtio'/>
  <alias name='net3'/>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
</interface>

Disabling and reenabling the interface resolved the issue. I'll edit this post if this becomes a recurring problem.

Erutan409

@rulerof Has the issue popped again after doing what you outlined? I'm encountering this all of a sudden and seems like the time-span is random. Started occurring (it seems) after adding a 3rd NIC to the VM to connect to a DVR. I disabled the NIC (in pfSesne) and it seems like the issue subsided. That was about 10 days ago. Then, it just occurred again. Not sure what's going on. But, I'll look into implementing your steps.

RulerOf

@erutan409 it happened one more time in the last month, and just like before I really don't have any explanation as to why. Sorry :(

Erutan409

I resolved this issue, running pfSense as a VM in vSphere. I changed the NIC's assigned to the VM from VMXNET 3 to E1000e. Apparently, there's an issue with grouping more than two of the VMXNET 3 type of NIC's together in the same environment. Up and running for about a month without the issue reoccurring.

RulerOf

@Erutan409 if this happens to me again, I'll change my virtual NICs over to fully-emulated hardware. I hate to have to do that, but I'm glad it seems to be a viable fix!

clifford64

Was there ever a fix for this? I have the latest version of pfsense running on an esxi host. I have a 4 port intel gigabit nic that was using two ports for a lan and wan. That would cause the watchdog timeout on queue 0 to happen every couple of hours. I have since switched to the onboard motherboard nic for LAN and using the quad port card for WAN. That seemed to have fixed it, but then I had it happen again today. Any reason why this is happening?

Erutan409

@clifford64 I mentioned in my last response what my fix was. If your environment is similar, I'd suspect it's the type of virtual adapter you're using. And I'm not sure it's something that is reconcilable within pfSense. It might be in the OS, itself.

Erutan409

So, I may have found a fix for my configuration - accounting for bufferbloat:

vSphere - 6.7 - VMXNET3 on all 4 network adapters I have configured on the VM
Internet speed - 250 Mbps/30 Mbps
pfSense version - latest (2.4.4_3)

Following this guide, I seemed to have avoided getting this issue when (a reoccurring issue) I'd set a couple downloads going on my PS4 and max out the download speed during the night. This seems to be the best reproduction of this issue, without fail, when I'd wake up in the morning.

The Internet would get sluggish and eventually stop working all together. Then, I'd log into vSphere and see the issue in the VM console.

But, after following that guide, it seems like my Internet is overall stable. I'll update if it still happens in the future. Hoping this IS the fix.

RulerOf

It happened again. Took over a year for the problem to recur.

I've switched the problematic interface to the e1000 type. I'll reply back if I experience the issue again.

Erutan409

So, I'm not sure where I found [potential] solution for this, but switching to those interfaces didn't seem to fix the problem, either. It seems that setting the VM to sync its time with the vSphere host was causing the issue as the host would start losing A LOT of time over the span of a month or two. We're talking like close to a minute.

I disabled the time sync setting and had pfSense keep itself synced, reverted back to VMXNET interfaces and I've been running fine for 4+ months now. I haven't updated pfSense for a couple years, FWIW.

Same Internet activity I've had, plus working from home (COVID forced me to), hasn't reproduced the issue.

RulerOf

@Erutan409 I've been running with it for a couple of days and discovered that my pfSense box has significantly increased CPU load, and services behind that particular interface feel throttled.

Again, I'm running on KVM and I don't think it has any such paravirtual time synchronization—I run NTP on the host and have pfSense update from the host ntpd once an hour.

In the meantime, I've managed to break my pfSense install by powering it off at the wrong point, so I'm going to reinstall the latest version from ISO and switch back to PV NICs. I'll update here again if I learn anything.