VLANs seems to be mostly broken with Intel SR-IOV VF
-
I have an Intel 82599ES NIC and crated several virtual functions for it with SR-IOV. One of them was passed through to VM where pfSense is running.
Then on this ixv0 device I have created VLAN and assigned it to WAN.
The problem is that most of the time pfSense boot hangs on "Configuring VLAN interfaces..." for a very long time and then hangs on WAN as well and when it finally boots WAN doesn't work at all no matter what I try.
However, I noticed that occasionally it does work, when that happens, I see following line in OS boot logs right next to "Configuring VLAN interfaces...":
[fib_algo] inet.0 (bsearch4#20) rebuild_fd_flm: switching algo to radix4_lockless
When VLANs hang it doesn't show up until around the end of the boot or even after that.
Not sure what, but something seems broken. Anyone experienced anything like this before?
-
@nazar-pc I used to get errors like that a lot with my bare metal static block of IPs. I was able to create a VM for each Static IPv4 address and had it all working.
It eventually crashed but I think each VM may have needed a different localhost IP. Or I had dns errors.
Recently when getting radix lockless errors I would try forcing the routing algorithm to bnet(?) via a sysctl.
The Proxmox VM with public IPs was pretty neat and I wish I knew now what I did then. I was able to pause my virtual routers individually. BUT, don't recommend that. Maybe in the far future.
You may need to remove vlan1 off of your network entirely. And changing localhost IPs may not be a bad way to go the range is 127.0.0.1-127.255.255.255 but I cannot say if that is changeable in pfSense.
It is also possible that PCI hardware address and msix mapping are creating internal routing conflicts. Like, if your whitebox mobo has a built in WiFi chip or USB interfaces or a PS2 interface, everything is autodetected normally by the kernel.
You may want to set up firewall rules in the parent hypervisor too. Proxmox lets you segregate VMs.
-
@nazar-pc also maybe vlans on the wan should only be a type-2 hypervisor and you may need to tag the vlans in the hypervisor itself too. The number of virtual cores on the VM and how they are mapped to the NIC matter too.
https://forums.freebsd.org/threads/fib_algo-inet-0-bsearch4-20-rebuild_fd_flm-switching-algo-to-radix4_lockless.91474/
Also, if this helps. IOMMU groups get complicated.
https://pve.proxmox.com/wiki/PCI_Passthrough
-
@nazar-pc the VLANs KLD module has dependencies too, and, at least in OPNsense, you can delete it and have OPNsense kernel errors tell you what those dependencies are on reboot (I haven't tried that in pfSense)
Also, maybe try disabling IPv6 and making sure no IPv6 multicast reaches the machine. In freebsd and OPNsense the command would be
ifconfig ivx0 ifdisabled
pfSense doesn't seem to work with the ifdisabled command in my system on bare metal, but in your VM, maybe.
-
@nazar-pc If your ivx0 interface is infiniband based, maybe try
kldload infiniband.ko
as a boot shell or early shell command. There are other modules for SFP+ which may load automagically in pfSense official hardware but not in a VM; you may have to load them manually BUT as to what they are, I don't know.
https://shop.netgate.com/products/sfp-10gbase-sr-transceiver
If you have pfSense+ in a VM, L2 firewall rules may help too.
-
@nazar-pc if you have these vlan tuneables
dev.xxx.X.vlan_only
Require that incoming frames must have a VLAN tag on them that
matches one that is configured for the NIC. Normally, both
frames that have a matching VLAN tag and frames that have no
VLAN tag are accepted. Defaults to 0.dev.xxx.X.vlan_strip When non-zero the NIC strips VLAN tags on receive. Defaults to 0.
-
Honestly I'm not following what you're trying to say, @HLPPC, these messages look like AI-generated hallucination to me
-
@nazar-pc I am suggesting features available in BSD systems which might work with your custom SR-IOV program work with routing vlans. Pre-coded, without having to code yourself. And linux vm features.
There are different routing algorithms that pfSense can switch too with radix trees. They effect hardware/software. The sysctl is net.route.algo. It is exposed in C code. It may be available both as a system call for compiled programs, and an administrator command for interactive use and scripting.
BUT, for everyone else not caring about the infinite permutations of configuration in routing, we lock it down with everyone's official settings and hope it generates, forwards, rejects, drops, denies NATs and accepts packets correctly. And we firewall it. Good luck with netmap though.
-
@nazar-pc https://man.freebsd.org/cgi/man.cgi?route
And none of that is AI generated. I search for crap until my wifi works. And my lan runs smoothly. Usually. VMs were great for switching IPv4 public addresses on the fly. If I used AI to summarize all of that maybe it'd make more sense.
You asked a lunatic question involving SFP+s in an unknown VM with an unknown CPU and mobo and whether or not it is official hardware. I gave you lunatic answers trying to make it work.
I definitely disable dhcp in these setups, and having it enabled even in the VM may cause issues too.
-
@nazar-pc also, the VMs may need to do some encryption with the cloud, and auto-configure your interface drivers. And maybe each VM with cpu encryption keys is a little off depending on the setup. Or whether there is TPM passthrough With other VMs.
Like, if you've only purchased one instance of pfSense+, can you clone it with 5 public IPs? I have and it worked for a bit pre-pfsense monetization, but maybe it caused issues.
I eventually used VIPs on bare metal instead.
-
This post is deleted! -
@nazar-pc There are also NTP files that sync the kernel and bios time which affects interfaces. If you disable NTP and maybe not send kiss of death packets to the VM, could help. Wake on Lan/magic packets may need to be blocked too. Killing syslog PIDs could also reduce interference.
-
Here are proxmox and ntp instructions too
pve
https://tinyurl.com/ru7jn2c8
ntp burst issues
https://tinyurl.com/6fwfuezx -
https://docs.netgate.com/pfsense/en/latest/network/broadcast-domains.html
The fewer broadcast domains, the better I think at least from the VM's perspective. Or the hypervisor.
-
Could be a checksum offloading issue too. Disable Hardware Checksums with Proxmox VE VirtIO.
-
@HLPPC said in VLANs seems to be mostly broken with Intel SR-IOV VF:
You asked a lunatic question involving SFP+s in an unknown VM with an unknown CPU and mobo and whether or not it is official hardware. I gave you lunatic answers trying to make it work.
I appreciate the effort, but you can always ask clarification questions about setup if I missed something important, I'm happy to clarify.
I have an Intel NIC with two SFP+ ports as mentioned in the very first post that supports SR-IOV. VM is just a simple KVM-based one on Linux host with virtual function device assigned to the VM running pfSense. I don't have Infiniband, Wi-Fi, IPv6, cloud, TPM or some other seemingly random things you have mentioned. I have no idea what NTP and WoL has to do with any of this either.
The interface works fine without VLANs and also works with VLANs until reboot, but when VLANs are added it hangs on boot and interfaces are not working after that.
So as far as I'm concerned there are no hardware issues here, no driver issues either, it is just something pfSense-specific (or maybe FreeBSD in general) that is problematic when it comes to VLANs specifically and specifically at boot time. Maybe ordering of stuff at boot is off or something.
-
@nazar-pc said in VLANs seems to be mostly broken with Intel SR-IOV VF:
Honestly I'm not following what you're trying to say, @HLPPC, these messages look like AI-generated hallucination to me
+1 on that... I'm seeing similar in other threads unfortunately.
But regarding your problem, you mention pfsense running as a VM. So you create these "virtual functions" of your NIC in the hypervisor? What hypervisor are you running and how is your setup exactly?
Are you saying that you are using the physical port for more VM's than pfsense, and for other things than WAN? -
@Gblenn said in VLANs seems to be mostly broken with Intel SR-IOV VF:
But regarding your problem, you mention pfsense running as a VM. So you create these "virtual functions" of your NIC in the hypervisor? What hypervisor are you running and how is your setup exactly?
I'm creating virtual function with udev on Linux host like this:
ACTION=="add", SUBSYSTEM=="net", KERNEL=="intel-ocp-0", ATTR{device/sriov_numvfs}="2"
By the time VM starts they already exist like "normal" PCIe devices. VM is created with libvirt and I just take such PCIe device and assign it to the VM. pfSense mostly treats them as normal-ish Intel NICs as far as I can see.
@Gblenn said in VLANs seems to be mostly broken with Intel SR-IOV VF:
Are you saying that you are using the physical port for more VM's than pfsense, and for other things than WAN?
The physical port I have is connected to a switch on the other end. Switch wraps 2 WANs into VLANs and I want to extract both WAN and WAN2 from virtual function in pfSense. In this particular case physical function may or may not be used on the host, but it is mostly irrelevant to what is happening to a specific virtual function I'm assigning to the VM to the best of my knowledge.
As mentioned this whole setup works. I boot VM, create VLANs, assign them to WAN and WAN2 and everything works as expected. Its just when I reboot it hangs, times out and VLANs are "dead" in pfSense.
-
@nazar-pc Aha, but do you really need pfsense to be "involved" with the VLAN's on the switch? In fact, do you even need VLAN on the switch at all??
I guess it depends on your ISP and what type of connection you have of course. I have two public IP's from the same ISP and in my case it's the MAC on each respective WAN that determines which IP is offered to which port. But even if that doesn't work for you, which it doesn't if it is two different ISP's, couldn't you limit the VLAN to just be something between the switch and libvirt?
I run Proxmox and set ID's on some ports to "tunnel" some traffic between individual ports in my network. So that VLAN ID is not used or even known by pfsense at all, it's only for the switches and e.g. one single VM...
-
as a system tunable consider
hw.ix.unsupported_sfp=1 (or whatever other hw.intel card you have options)
maybe try
sysctl -a | grep (your intel driver)
pciconf -lvvv
ifconfig -vvvand then consider disabling msix in the vm if it is on. btw ifdisabled disables duplicate address dectection with ipv6 and others have had success in freebsd VMs by disabling it; it isn't a robot suggestion. Dual stack sucks sometimes and pfSense HAS to be dual stack compliant to partner with AWS; hence it is forced to be enabled.
I haven't actually seen a intel card driver show up itself a vm or tried passthrough.
https://man.freebsd.org/cgi/man.cgi?query=iovctl&sektion=8&manpath=freebsd-release-ports
There might be a setup where bridging the WAN helps it out in the VM
I am just throwing bsd at you to see if it helps. Because you know, it is the reason it isn't :) there are certainly more efficient ways of doing things.