2.3.1-p1 Unstable on Hyper-V (packet loss)
-
We run a very stable 2.2.6 release on Hyper-V. When we upgraded to 2.3.0 it would freeze hard periodically. We understand that was fixed in 2.3.1. Great! We upgraded to 2.3.1-p1 but now see extreme packet loss that comes in spurts on all interfaces (it comes, then goes away for a few minutes). The loss is significant enough that it will break RDP or SSH sessions across the firewall. We can, however, keep a clean ping running from the WAN to the pfSense appliance itself. We quickly had to roll-back the snapshot to 2.2.6 as it was in production use.
I am opening this forum post in case others have this problem, maybe we can find a pattern / cause.
Our pfSense environment uses a lot of features. We do VLAN pass-through/trunking via single NIC on Hyper-V, we use IPSec, OpenVPN and Snort.
-
We switched from PFS 2.2.6 to 2.3.1-p1 on hyper-v hoping for the apinger/dpinger change to solve some problems when switching gateways in groups.
With 2.3.1-p1 we are seeing the same extreme packet loss now.
Our pfSense environment is quite simple: 2 Gateway groups, 1 OpenVPN connection.
It has 2 dedicated NICs (1 for each WAN connection) and 1 NIC shared with other VMs for LAN.
1 gateway group for openvpn traffic, the other gateway group with inverted tiers for internet access.It looks like if I put some traffic on one of the WAN connections both tend to get a lot of packet loss and the remotedesktop access via openvpn connection (and the other connection too) gets very laggy.
I'll roll-back to 2.2.6 today and check if this fixes the problem. Maybe I just didn't recognize it under 2.2.6.
-
Do indeed let me know if you become stable on your 2.2.6 rollback. I have a feeling things will go back to normal. Also check (at least for testing) that VMQ is disabled on Hyper-V for all vNICs on the pfSense VM. There are some manufactures that this will cause packet loss. This was a known issue for us, on certain DELL servers that use Broadcom (I think?) chipsets.
-
We have/had VMQ disabled on all interfaces with 2.2.6 and with 2.3.1-p1.
With 2.3.1-p1 the packet loss started every morning when user started to login onto the ms remotedesktop server via openvpn connection or whenever a bit more traffic occured.
Now back to 2.2.6 everything seems to works fine.
This morning the connection remained stable without any packet loss during the usual rdp login time.
And putting some heavy load onto the wan connections isn't causing any packet loss. -
I just found another case of this yesterday where I had to revert this. Completely different network, WAN, building, server, etc. Exact same behavior we are describing. Reverting to 2.2.6 again resolved the problem.
Please let this forum post serve as warning to Hyper-V users. Do not upgrade to 2.3.x until this serious issue can be diagnosed and resolved. Stay on 2.2.6 which appears to be extremely stable on Hyper-V.
Phil
-
Well, I have been on 2.3 since it's release on Hyper-V 2012 R2…. And everything has worked perfectly.
So the issue certainly is not universal. It could be dependent on packages installed, and VM configuration I suppose.
-
May I ask, is your traffic substantial? We did not notice it at our first upgrade location as traffic was casual. We just had some drops but no one noticed until we ramped up traffic.
Phil
-
Substantial is all relative, of course.
I would call mine not substantial though. The link is 300 Mbit down, 20 mbit up.
I regularly do 250 mbit down sustained, but only for short times (10-20 minutes), and my total simultaneous users is low (50 maybe).
The pfSense box is also doing inter-VLAN routing, but again, only ~50 nodes.
-
Those who are having issues, what Windows version?
It's certainly not a universal problem with Hyper-V, but from the sounds of it there must be something to it in some edge case.
-
Both of my two cases are Hyper-V on Windows 2012 R2. They are both managed under Systems Center 2012 (SCVMM). They both use DELL hardware. One is using NIC trunking, but the other is not. Both have IPsec tunnels. One of my locations is a branch office, I can clone the 2.2.6 VM and upgrade the clone to do parallel testing if you want to look at this further. The other unit is in a data center handling very critical traffic. But, if we find it on one, then no doubt it will fix us globally.
Phil C
-
My case is Hyper-V on Windows 2012 R2 (Datacenter), using HP hardware (ProLiant ML350 G6).
1xNIC "HP NC382T PCIe DP" (2 Ports - 1.Port NIC Team#1 Hyper-V Host, 2.Port NIC Team#2 Hyper-V VMs)
1xNIC "HP NC326i PCIe Dual Port" (2 Ports - 1.Port NIC Team#1 Hyper-V Host, 2.Port NIC Team#2 Hyper-V VMs)
1xNIC "Intel(R) PRO/1000 PT" (2 Ports - 1. Port = WAN1, 2.Port = OPT1)The PFSense VM uses Team#2 for its LAN interface, Intel Port 1 for WAN1, Intel Port 2 for OPT1.
VMQ is disabled on all VMs/interfaces.
-
Same problem here after upgrading to 2.3.1
Running Server 2012 (not R2) with 3 network cards.
Watching Video Streams is a mess. always interrupts, and broken remote sessions too.
Update to 2.3.1p5 no change.
-
No movement here. Tried some dev releases no change so far.
Is there a way to get back to 2.2.6
Didn't find the download, have a 2.2.4 image, can it be upgraded to 2.2.6 and not to the latest release?
Can I restore a 2.3.1 backup to 2.2.6?Thx for your support
-
I had the same issues with pci-passthrough on esxi 5.1 and a DUAL NIC Intel PCI-E card (82575EB); awful latency and packet loss.
I removed the pci-passthrough, added the NICs to a virtual switch and used virtual nics instead and everything is back to normal.
Had the same issue with Hyper-V server 2012 r2 on a Supermicro with 2x 10GB onboard NICs and thought it was a port negociation problem. Switched to virtual NICs and the problem was gone.
But it might not be related with pci-passthrough for all of you.
Are you guys using pci-passthrough?
-
No movement here. Tried some dev releases no change so far.
Is there a way to get back to 2.2.6
Didn't find the download, have a 2.2.4 image, can it be upgraded to 2.2.6 and not to the latest release?
Can I restore a 2.3.1 backup to 2.2.6?Thx for your support
You can update or reinstall 2.2.6 and restore config. I ran into this problem when I tried to upgrade from 2.2.2 to 2.2.6 and could not find the update as 2.3.1 was the only one available. So I updated to 2.3.1 and the firewall would not even boot. Tthey must have made some major changes as I used to always be able to upgrade versions. I also do not think they tested in Hyper-V to check compatibility.
Luckily I did a snapshot before upgrading so I was able to restore back.
2.2.6 update: https://atxfiles.pfsense.org/mirror/updates/old/pfSense-Full-Update-2.2.6-RELEASE-amd64.tgz
2.2.6 full: https://portal.pfsense.org/firmware/2.2.6/
-
I also do not think they tested in Hyper-V to check compatibility.
Not true or even close to it. We fully verified Hyper-V and Azure. Microsoft themselves even tested 2.3 as well to approve it for Azure.
If it didn't boot, it's probably because of the drive type change from old versions that made the fstab invalid, so it needed updating.
-
Sorry I may not have been clear. I meant that the upgrade process may not have been tested. If so is there any documentation that explains what needs to be done when upgrading form 2.2.x to 2.3 in hyper-v so that you do not get the mount error?
-
https://doc.pfsense.org/index.php/Upgrade_Guide#Disk_Driver_Changes
should be fine just running ufslabels.sh prior to upgrade. Otherwise manually specify the appropriate drive at the mountroot prompt. ufs:/dev/da0s1a replacing da0 as needed.
-
Thank you! ;D
-
I have the same situation with Pfsense 2.3.2 on KVM (Proxmox PVE) with virtio nic drivers. I use two WANs with routing groups. Both have significant package losses. One of these WAN interfaces switches to offline sometimes and stays in this status. I have a second Pfsense on an APU board with CARP with same issue.
I use following services:
- Dual WAN with three routing groups
- OpenVPN
- CARP
- Captive Portal
- Free Radius
- Watch Dog
I did some investigations and found following other behaviours than in 2.2.6:
- I find in syslog "check_reload_status" with "reloading filter". This interrupts the traffic and provoques packages losses. This reload is absolutely unnecessary.
- Every few minutes there is a process "xinetd" with "readjusting service 6969-udp" even if TFTP-Proxy isn't activated. This service doesn't stop.
I tried to switch off "Flush all states when a gateway goes down" to avoid state killing if an interface is shortly stated as offline. But if the interface doesn't come up again users are excluded from internet access because the switch from tier1 to tier2 is done but the routing state isn't killed.
So it's really unusable and I have to go back to 2.2.6 also for the moment. But how can we find out if 2.3.x will be ok?