ESXi pfSense: update from 2.2.6 (everything working) to 2.3 gives network issues

Twistor

I updated to 2.3 this weekend and had all the "disable hardware xxx offloading" ticked of (so they were enabled)
I had no issues in 2.2.6 but in 2.3 in the beginning after about 2 hours the internal network stopped working,WAN connection still replies (from the outside).
Even the ipsec tunnel stays up, so it's only 1 interface that stops working.
I disabled the hardware offloading and per item I disabled it took longer for it to stop working.
Now everything is disabled and after 1.5 days it stopped working.

On the console you see this message over and over: vmx1 watchdog timeout
It's not the message with "resetting" at the end, I will check again for the full message when it occurs.

Installing a fresh pfSense 2.3 and importing the config changed nothing.

ESXi version 6u1

Interfaces (all vmxnet3) :

vswitch1: vmx0 WAN, vlan 3
vswitch1: vmx1 LAN, vlan 1
vswitch2: vmx2 DMZ, vlan 2
vswitch2: vmx2 Guest, vlan 4

Everything worked fine in 2.2.6, vmx0 & 1 are on the same vswitch but other physical connections

Twistor

I have another pfSense installation (ESXi 5.5) where I had the same problem, one interface did not work anymore.
It only happened 2 days after the update.
No vlans just 2 interfaces each on another vswitch, I can't be sure about the message on the console because I do not have console access on that vm

A reboot "fixes" both pfSense vm's
I'm not sure how to get it working again through console

Twistor

The message is: vmx1: watchdog timeout on queue 0

Also a reboot does not work completely, it hangs after shutting down ipsec and I have to reset it manually.
See screenshot attached

FYI: if I click reboot when it does not hang, it reboots just fine

![pfsense reboot.png](/public/imported_attachments/1/pfsense reboot.png)
![pfsense reboot.png_thumb](/public/imported_attachments/1/pfsense reboot.png_thumb)

rlrobs

look at this topic –> https://forum.pfsense.org/index.php?topic=81929.0

Twistor

You might be right that it's a bug related to ipv6.
I think this is the last thing I turned of on the WAN interface and I don't have the issue anymore.
The weird thing is, it's the LAN interface that is locking up…

That, or it was the 100GB transfer over ipsec that made it lock up, which thankfully was just finished before the last occurance. (had to restart like 8 times)

Is there a description of the bug in question?

Twistor

@rlrobs:

look at this topic –> https://forum.pfsense.org/index.php?topic=81929.0

Like I said it's not the error message "kernel: em2: Watchdog timeout – resetting" but "vmx1: watchdog timeout on queue 0"
Unless you mean this is the same cause / error?
I had no issues before 2.3 so I don't believe it to be the same.
I could be wrong of course...

heper

i have around 4 esxi boxes that got updated to 2.3 on day1. some of them run legacy e1000 / others run vmxnet.
neither have the symptoms you describe.
But its possible that there is a regression in certain scenarios that i'm not hitting & you are

Is there anything special about your configuration? is this an upgrade or clean install (mine are all upgrades from 2.2.x)/ Any Traffic shaping running?

Twistor

I might try e1000 when I get the chance my secondary box locked up again this morning.

No fancy config, no shaping or carp.
At first it was an upgrade and now a clean install but with a config import.

darioj

Same problem here, after upgrading pfsense from 2.2.4 to 2.3 and running for a few days vmx2 nic stoped working and had to reboot.

Interrupts grow to 50% while the system log is full of timeouts

kernel vmx2: watchdog timeout on queue 0

heper

https://redmine.pfsense.org/issues/6296

cmb

@heper:

https://redmine.pfsense.org/issues/6296

This is almost certainly the cause.

Don't switch to e1000, that won't make it any better, it's best to be on vmxnet3.