Proxmox 5.1 and hanging pfsense
-
Thanks for answer.
I wrote to Proxmox and currently they do not have any reported problems with pfsense.
I also have more machines running on Proxmox and I have problems with pfsense only. It looks like a problem with him … -
About a week ago I upgraded a customers Proxmox cluster from 4.4 to version 5.2 and after this was done I also upgraded pfsense VM from 2.3 to 2.4.3-RELEASE-p1, because I was on premise to do that.
Then the trouble began. One out of 7 pfsense interfaces suddenly stops forwarding traffic and the sub net is cut off. All virtual NICs are E1000. Some of them are plain bridged interfaces, some of them use VLAN awareness of the bridge on proxmox layer. VLANs are not activated inside pfsense. The problematic IF is OPT1, the underlying proxmox interface is a plain bridge (vmbr1) where the physical interface eth2 is connected directly to the switch on an untagged 1GB port.After long hours of analysing and testing I found out, that I only need to disable the interface, then apply, and re-enable it, to get it working again. But the problem reoccurs after a differing amount of time.
Unfortunately I also setup a new Cisco switch and changed some network settings to use VLANs on Proxmox layer. So I was chasing the problem on switch, bonding, proxmox bridge and pfsense layer. I found that the problem can only be in pfsense, because other VMs on the same proxmox/linux bridge can communicate, the VLANs that run over the bond work fine in and outside of pfsense. The interface that is having trouble is physically connected without bonding from Cisco switch to Proxmox eth2 (vmbr1) without any VLAN tagging on either layer. The problematic interface is a bnx2 driver interface on linux, but as I said the vmbr1 connected VMs can always talk to each other also on different proxmox hosts, so switch and bridge comm. is ok. Also I disabled the pf packet filter with no avail.
The funny thing is that if I ping the "dead" pfsense interfaces IP (opt1_address) from another subnet it responds. If I try to ping (from PFsense GUI) to a host in the dead subnet I get 100% packet loss and also the hosts from dead subnet can not ping the firewall. I have no clue what is happening here ?? -
Update: now also OPT2 (em driver) interface "froze", which has the same pfsense config as the OPT1. On the proxmox layer it is slightly different (vmbr2 bridge on eth4 with e1000e driver). It seems that this always happens when there is "heavy" traffic on the interface (moving GBs of data to and from NAS/file server). I still had my old/second pfsense VM with same firewall config (also already updated to 2.4.3) residing on antother proxmox node with other physical hardware. I changed all 7 VM interfaces to use virtio NICs and then shutdown the problematic instance and started the virtio one.
After that I have no outage for the last 24hours, but the bandwidth going through the firewall seems very bad. I have to analyze further, but don't have the time at the moment...
Could it be possible that the packet filter is behaving faulty and stops working/blocks traffic under certain conditions? -
@bolek2000 I use Proxmox, pfSense and VirtIO and I have no problems with hanging etc.
Probably you forgot to disable hardware offload for VirtIO, that will cause terrible performance.
-
Thanks, for the suggestion, but I had offloading features disabled from the GUI-> Advanced -> Networking from the beginning (also before upgrade). I'm not sure how the sysctl values should look like, so I post them:
net.inet.tcp.tso: 1
hw.hn.tso_maxlen: 65535
hw.vtnet.tso_disable: 0
dev.vtnet.0.txq0.tso: 0
dev.vtnet.0.tx_tso_offloaded: 0
dev.vtnet.0.tx_tso_not_tcp: 0
dev.vtnet.0.tx_tso_bad_ethtype: 0Luckily after doing some more testing I get around 30MB/s transfer rate through the firewall copying data from VM to VM (samba on RAID 6 with Virtio SCSI), while having 60 MB/s without firewall with the machines on the raw hardware with GBit network (without Virt/Samba layers). The problem reported yesterday by the customer was some other issue, I guess.
After changing to VirtIO, the network is still stable, so for me the issue is RESOLVED. So be aware if you upgrade your VM to 2.4.3 with E1000 NICs. Also I realized, that the CPU consumption on the problematic "E1000 VM" was peaking to 90 % (3 vCPUs) and the Virtio-VM now has a maximum of 30% (4 vCPUs) with comparable load (copying big amounts of data around, while having the normal Internet traffic and so on in the background...)
-
Some posts here about this too. Seems that turning off GRO might be a solution if you can't change from E1000 to VirtIO
-
@muppet actually - ...
@muppet said in Proxmox 5.1 and hanging pfsense:
@bolek2000 I use Proxmox, pfSense and VirtIO and I have no problems with hanging etc.
Probably you forgot to disable hardware offload for VirtIO, that will cause terrible performance.
possibly you forgot to read my earlier post where I already stated that disabling hardware offload for virtio helped performance a little but did NOT resolve the freezing problem...
I am glad it's not happening to you - but it is well documented as an issue with [Probably] FreeBSD over KVM, and not specifically pfSense and proxmox.
I believe the issue may be related to bridging or bonding multiple NICs and/or vlan tagging? Would you post your interfaces file (sanitized) to compare with mine? (posted above already)
-
root@orbit:~# lspci 00:00.0 Host bridge: Intel Corporation Broadwell-U Host Bridge -OPI (rev 09) 00:02.0 VGA compatible controller: Intel Corporation HD Graphics 6000 (rev 09) 00:03.0 Audio device: Intel Corporation Broadwell-U Audio Controller (rev 09) 00:14.0 USB controller: Intel Corporation Wildcat Point-LP USB xHCI Controller (rev 03) 00:16.0 Communication controller: Intel Corporation Wildcat Point-LP MEI Controller #1 (rev 03) 00:1b.0 Audio device: Intel Corporation Wildcat Point-LP High Definition Audio Controller (rev 03) 00:1c.0 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #1 (rev e3) 00:1c.1 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #2 (rev e3) 00:1c.2 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #3 (rev e3) 00:1c.4 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #5 (rev e3) 00:1d.0 USB controller: Intel Corporation Wildcat Point-LP USB EHCI Controller (rev 03) 00:1f.0 ISA bridge: Intel Corporation Wildcat Point-LP LPC Controller (rev 03) 00:1f.2 SATA controller: Intel Corporation Wildcat Point-LP SATA Controller [AHCI Mode] (rev 03) 00:1f.3 SMBus: Intel Corporation Wildcat Point-LP SMBus Controller (rev 03) 01:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03) 02:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03) 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03) 04:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)
auto lo iface lo inet loopback iface enp2s0 inet manual #LAN Interface iface enp1s0 inet manual #WAN Interface iface enp3s0 inet manual iface enp4s0 inet manual auto vmbr0 iface vmbr0 inet static address 192.168.0.254 netmask 255.255.255.0 gateway 192.168.0.1 bridge_ports enp2s0 bridge_stp off bridge_fd 0 pre-up ip link set enp2s0 mtu 9000 pre-up ethtool -G enp2s0 rx 1024 tx 1024 pre-up ethtool -K enp2s0 tx off gso off post-up ethtool -K vmbr0 tx off gso off #LAN Interface Bridge auto vmbr1 iface vmbr1 inet manual bridge_ports enp1s0 bridge_stp off bridge_fd 0 pre-up ip link set enp1s0 mtu 9000 pre-up ethtool -G enp1s0 rx 1024 tx 1024 pre-up ethtool -K enp1s0 tx off gso off post-up ethtool -K vmbr1 tx off gso off #WAN Interface Bridge
root@orbit:~# modinfo igb filename: /lib/modules/4.15.18-2-pve/kernel/drivers/net/ethernet/intel/igb/igb.ko version: 5.3.5.18 license: GPL description: Intel(R) Gigabit Ethernet Linux Driver author: Intel Corporation, <e1000-devel@lists.sourceforge.net>
Hope this helps.
-
@muppet
Yes, that gives me a number of things to try...
You're using a couple of parameters which are different from mine - I will experiment... The other differences are that I am bonding multiple nics, and I am also using vlan_aware directive...
Thank you!
I'll do a little more testing and post results.
-
I was on vacation for a few days YAY... so I'm back on this finally
Note that I am not setting tx off and gso off in the interfaces file, but I am in the GUI for pfsense, that was the difference between your interfaces file and mine... however, when I show my NIC settings via ethtool (Proxmox host OS), it does show that gso is off (and that there are 0 tx messages) for the interface and for the bridge, so I'm not very hopeful that setting it in the interfaces file will affect anything - but I will put these settings in my interfaces file (which will take effect next time I reboot the host - probably overnight in the next few days... since I don't want to reboot all my guests at this time)
Sincerely thanks for your kind responses.
-
I know this topic is quite old, but I'm experiencing the same thing on multiple proxmox hosts and pfSense. Did you ever find a solution?