PfSense nics in ESXi running half-duplex

node-nine_inc

Hello all.

Seem to be hitting a strange behavior that I have yet to find any conclusive discussions about while grokking about via various search engines.

I have an ESXi7 host running a number of VMs. This problem is ONLY occurring on my pfsense VMs. My Linux VMs and windows VMs are all autosensing full-duplex without any input or effort. These are all communicating via a Distributed vSwitch, not that that should matter.

While debugging why I am experiencing intermittent connection timeouts both to and through my pfsense VMs, I noticed that the vNICs under pfsense are ALL running in half-duplex.

Both the VMs as well as the external Cisco switch I have connected for ingress/egress to the ESXi host are reporting the duplex-mismatches. Every one of my pfsense VMs is running lldp/cdp and is showing up in the duplex-mismatch errors.

ex>
==========
[2.5.2-RELEASE][root@XXX]/root: ifconfig | grep vmx
vmx0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500

vmx1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500

vmx2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
(this is the failover/state-sync interface, hence no PROMISC)

vmx3: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500

vmx4: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500

==========
3560s1#
Sep 3 11:23:16 PST: %CDP-4-DUPLEX_MISMATCH: duplex mismatch discovered on GigabitEthernet0/15 (not half duplex), with XXX vmx0 (half duplex).
Sep 3 11:23:16 PST: %CDP-4-DUPLEX_MISMATCH: duplex mismatch discovered on GigabitEthernet0/14 (not half duplex), with XXX vmx1 (half duplex).
==========

I have tried both the recommended VMXNET3 and E1000 vNIC types, but both behave the same regarding their autodetected duplex.

The physical NICs on the ESXi host are all fine. Proper duplex/speed and bundling are great at the physical layer. This seems to be a problem between the DvSwitch and the pfsense VM interfaces...so all virtualized traffic and components.

Unfortunately this is not just a cosmetic trouble. I am experiencing CARP flapping, experiencing data flows through the firewall dying (even with carp suspended and running single-node), and experiencing ssh sessions to the firewalls hanging. Data flows through the firewall are not even leaving the DvSwitch...so all the communication is "in the box" and between VMs.

Anybody else run into this or have thoughts as to potential solutions?

mr.rosh

@node-nine_inc whats the physical nic make, and what physical hardware. running not supported hardware with ESXi 7 could be a possible issue.

biggsy

@node-nine_inc

I have no idea what "SIMPLEX" actually means in the ifconfig output but I'm quite sure it doesn't mean half-duplex.

What is the output from a plain ifconfig? In particular, what does the line beginning "media: " say?

thatsysadmin

@node-nine_inc
I'm running into this on ESXi 7u2 as well.

posto587

Also having issues on ESXI 7.2

Checked ifconfig:

vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=e100bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6>
	ether 00:
	inet6 fe80:%vmx0 prefixlen 64 scopeid 0x1
	media: Ethernet autoselect
	status: active
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

but like biggsy noted simplex does not mean half-duplex:

SIMPLEX
The interface cannot hear its own transmissions. This is a read-only flag that is set by the driver.

edsiadmin

I'm having this same problem in ESXi 6.5 with standard vSwitches. Same Duplex issues two different VMware clusters. Seeing it in the cisco logs because LLDP/CDP is turned on. CARP seems to work just fine for me after enabling promiscuous mode in the vSwitch. PFSense 2.6.0. Intel 82599 NIC. I can see this in the logs on our Cisco 6509 and Nexus 5K switches depending which hypervisor is running the VM. 6509s connect to the hypervisors with a standard LACP port channel, the Nexus switches are a vPC LACP bond. I also do not have any other gear throwing these errors. I can see this issue on both stand alone and clustered PFSense VMs.