Changing interfaces "names" affects throughput

Gblenn

I'm running pfsense virtualized on Proxmox, but also have an APU2 sitting as a backup device. On the VM, my interfaces show up as vtnet0, vtnet1 and so on.
On the APU2 they show as idg0, idg1 etc.
This creates a bit of a headache when changing configurations since I get a mismatch if I try to restore a backup from one to the "other" machine.

I tried solving this by installing Schellcmd and issuing the following "earlyshellcmd":
ifconfig vtnet0 name igb0
ifconfig vtnet1 name igb1 and so on.

This works, to the extent I can load a configuration file downloaded from my appliance and the setup seems 100% ok.

The problem is that it seems to severely hit performance. I'm on a gigabit fiber and typically get 930-940 down/up. With this setup I barely get 200 down and 1-2Mbit up !!!

What's happening, isn't it possible to change interface names like this without this happening? Any suggestions?

stephenw10

Don't do that?

I've never tried to do that so I'm not sure what would happen. From your description though it sounds like the ifconfig might be setting or unsetting a speed/duplex option on the interface causing a mismatch.

The expected workflow here is to restore the config with the mismatch. The webgui will ask you to re-assign them which isn't usually an issue. Then it reboots into the config.

Also I assume you mean igb0, igb1 etc?

Steve

Gblenn

@stephenw10
I absolutely mean igb0, 1, my mistake! corrected now... :)

Yes the webgui does ask to reassign, which is how I got the VM up and running the first time. It's just an extra step I had hoped to get rid of.

I have also been experimenting a bit using a 4G router on a second WAN for failover. Now I started thinking that I could instead set up both units in HA Sync mode, with one connecting via my Fiber and the other over 4G... Or some such config.
But when looking into how this should be done, it seems you need to have two units that are more or less exactly the same... ?
So I was thinking that different interface names might be a showstopper??

So about that mismatch you suspect is happening, could there be some way to resolve that? Some reload option that might solve the performance issue perhaps??

stephenw10

Yeah, for an HA setup you need matching interfaces for state sync to function. Really you want identical hardware for both nodes. You can work around that to some extent using single interface laggs so both nodes can use lagg0, lagg1 etc but I would only recommend that if there is no other option.
However you would not setup two WANs to two separate nodes like that anyway. To do a dual WAN HA setup you need both WANs connected to both nodes. That means both need a /29 subnet so they can share the L2 on each WAN.

Check the ifconfig output after changing the interface names. Look at the connect speed/type.

Steve

Gblenn

I suppose I could set up another pfsense VM on my second Hvisor, which would match the HW for all parts that matter. But I thought I'd use the APU since it's sitting there in the rack.

I think I need to dig a bit deeper into this, like looking into setting up HA. But more importantly perhaps, trying to find out what is really happening when changing the name like that. It's not like I made some strange hack or something...

stephenw10

Well renaming the interfaces like that isn't supported. The result of it is untested/undefined.

Gblenn

Yes probably not tested, or at least reported, until now perhaps?

So, running ifconfig reveals some differences between the two setups.

Working setup on VM (no name change):
vtnet0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: WAN
options=800b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,LINKSTATE>
nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>

Not working on VM (name changed to igb0):
igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: WAN
options=6c07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

Don't know if this means anything though. And everything else in the output is exactly the same.

Looking at the APU2 where the name actually is igb0, the list looks like this:
options=e527bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>

Could it be a driver issue perhaps? Does it load based on the name??

stephenw10

Ok, yeah so it's enabled all the hardware off-loading features that igb has by default but the actual NIC probably doesn't support them or worse supports them very badly.
Try disabling those. pfSense disables all but checksum offload by default but by running that command outside the pfSense scripts none of that is applied.

Steve

Gblenn

Yes, clearly driver related...

Running pfsense as VM, with default interface naming (vtnet0, 1...), I have no issues whith any form of HW acceleration enabled. (System/Advanced/Networking).

Same thing on the APU2E4 which runs fine with all HW acceleration enabled, almost reaching the same speeds as the VM, with some effort... (i210 NIC's).

On a VM with names changed as above, I get only 1-2Mbit/s UL when Hardware Checksum Offloading is enabled (not disabled in GUI). The other HW Accel-options don't seem to effect performance in any noticeable way.

The host machine is equipped with a PCI-card listed as having 4 Intel i350 NIC's. However, when setting it up the first time I had trouble passing through only some of the NIC's on the card, keeping the others for other VM's, and therefore ended up assigning them as VirtIO-devices. Having no trouble reaching gigabit speeds anyway, I stayed with that setting.

Now I have just tested changing the VM NICs to "Intel E1000" on the host machine, and this also works with all HW acceleration enabled. This is true even when changing the interface names to igb (from em0, 1 etc. this time).
There is a clear performance hit due to the emulation on the host, but I get the exact same performance regardless of interface names and/or HW acceleration on/off.

So to conclude:
Assigning interfaces as VirtIO devices works fine, leaving them named vtnet0,1.. for pfsense.
Changing the names to igb0,1.. works fine as long as Harware Checksum Offloading is turned OFF. "Providing" a different NIC (e.g. through emulation) also works fine, even if interface name is changed.

Would be interesting to understand what "device" is actually used by Proxmox when selecting VirtIO (paravirtualized). There is clearly something that messes things up.

I'm guessing that if I could manage to actually pass through the 2(3) NIC's, it will also work, with excellent performance since they are i350's? Perhaps worth trying again...

stephenw10

Yes, if you can pass though the i350 successfully then both configs will be igbX and all the problems go away.

Gblenn

I thought with some changes I made (like using Q35 instead of ix440 machine) that I could pass individual NIC's to pfsense. But even though they show up as different IMMU groups, the entire card gets passed to the VM.
Unfortunately, this means that I can't use the fourth port for anything else... I could put all other VM's on the mobo port but it basically means I would loose one port.

I made some tests though, and things turn out as expected. The card shows up with igb0, 1, 2 and 3 and things work fine from a performance perspective, also with HW offloading activated.

I might do some more testing but it seems it's either renaming and shutting off HW offloading (also large receive), and be able to experiment with sync. Or keep things default and reconfigure interfaces if I ever copy config.xml. No HA Sync in that case though...

Actually it would be nice if one could select not just one, but any combination of "restore areas" in Backup&Restore. That would actually make things easier... Then I could reload everything but interfaces...

Gblenn

Never thought the solution could be so simple...

It turns out that the only thing I had to do be able to pass each indivdual NIC to one VM, was to NOT to tick the box "All functions"... ?!?!?

I thought, and probably everyone else that I have watched or read, that you have to tick this in order to get all the functionality that the i350 could offer. Apparently, it meant all the i350s, together from the card. A bit misleading but hey, now it works.

So now I finally have things exactly as I had hoped for, with three i350 NICs nicely configured on my pfsense VM (igb0, igb1 and igb2) with HW acceleration enabled.
Just like it is on the APU2...

stephenw10

It might have a PCI bridge on the card that it tries to pass?

Anyways that's a much solution that renaming the interface for force them to match.

Gblenn

The method is the same for any PCI-device you want to pass through. In the documentation, which I now suddenly understand, they have command line examples using a graphics card with sound chipset on it. They may show up as separate devices (IOMMU groups) but you might want to pass them both together... hence "All functionality"...

Anyway, as you say, now I have a much better solution than renaming devices. Which is actually the way I tried to set it up a long time ago...