Very slow traffic from other VM's through pfSense on XenServer
-
It's just the tx-offload setting that needs to be changed, rx-offload is fixed-on.
I can confirm the problem and fix with Debian Wheezy/Xen 4.1.4 dom0.
ethtool -K ${dev} tx off in vif-bridge online did the trick.
The issue wasn't submitted to freebsd-bugs so far, now it is:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197344 -
Interesting - only appears to apply to virtual interfaces.
My pfSense VM is running in xen 4.2 (Centos 6.6 dom0) and has no speed issues, but I'm using pci-passthrough to give 2 dedicated hardware NICs (off a dual-port Intel card) to pfSense for LAN/WAN (so that DMZ/intranet are physically separate too).
-
Thanks johnkeats for putting that up here. It really helped me sort this out.
One thing to note is disabling tx offload using ethtool -K does not persist across guest reboots or live migration because the dom-id and assigned vif changes, while xe vif-param-set other-config:ethtool-tx="off" does.
Is there any downside to using the vif-param-set option, or are the two basically equivalent?
-
@johnkeates:
You only need to disable checksum offloading on the hypervisor side of pfSense's interface.
Any interface that does DomU-DomU communication on pfSense's side produces un-checksummed packets which get dropped by PF in BSD.
sudo ethtool -K $interface tx off
where $interface is the VIF on the Xen Dom0 side is enough. Setting TX off on the bridge forces the Dom0 to calculate ALL checksums on ALL packets no matter where the come from or where they are going. This is not a smart idea since it creates a lot of calculations where they might not be needed. So if the pfSense DomU is on vif123.0 you run: sudo ethtool -K vif123.0 tx off
Sorry noob question here,
I am using a Xen implementation on a unraid distribution, when you say Dom0 side are you talking about the VIF that is spun up with the PFsense VM ? Like when i ifconfig to list my interfaces I just don't really know how to identify the interface you are referring to.
Sorry for the noob question again
-
It's all here:
https://forum.pfsense.org/index.php?topic=85797.msg475906#msg475906
I recently just rebuilt my test stack and all I did was the tx and rx on every NIC which is still probably more than is necessary but it worked.
-
@johnkeates:
You only need to disable checksum offloading on the hypervisor side of pfSense's interface.
Any interface that does DomU-DomU communication on pfSense's side produces un-checksummed packets which get dropped by PF in BSD.
sudo ethtool -K $interface tx off
where $interface is the VIF on the Xen Dom0 side is enough. Setting TX off on the bridge forces the Dom0 to calculate ALL checksums on ALL packets no matter where the come from or where they are going. This is not a smart idea since it creates a lot of calculations where they might not be needed. So if the pfSense DomU is on vif123.0 you run: sudo ethtool -K vif123.0 tx off
Thank you for taking the time to explain this, i turned the TX off on the pfsense vif and all was good. Happy days
-
Hello all…
Thanks for the information - sure helped us solve this but I have some more information that wasn't clear to me from all posted here.
This issue only seems to apply where Pf is communicating with hosts within the same xen host (dom0).
We use xenserver 6.2 fwiw. We have two xen dom0 - pf was natting for two services - one on dom0-a and one on dom0-b
pf itself was located on dom0-b
The dom0-a service worked perfectly after the update to 2.2.2 - the dom0-b service did not.For people new to xenserver / for completeness, we used:
xe vm-list
#then find the uuid of your pf vm
xe vif-list vm-uuid={uuid of the vm from above}
#note the uuid of the vif - not the network you want to change!
#for each vif you can check the status:
xe vif-param-get uuid={uuid of vif} param-name=other-config
xe vif-param-set uuid={uuid of vif} other-config:ethtool-tx="off"For what it's worth I was able to turn off tx on only the LAN interface (which nats for the dom0-b service).
I tried but did not need to keep offload off for the WAN interface which seems to get proper checksum as it leaves the dom0 through the physical nic.
Once complete you need to reboot the pf vm. the setting will persist across reboots.
Hope that helps someone else :-)
Mitch
-
I've been running pfsense 2.2 on XenServer 6.2 for a while with the mentioned offloads disabled and it's been working great. I believe since I upgraded to XenServer 6.5 (or when I upgraded to 6.5 SP1) pfsense only works as before on one specific host in the pool. I have 3 hosts in the pool and when pfsense is running on 2 of them it is very slow, but on the 3rd host it works fine.
How come..?? ???
-
Without knowing your network I can only guess… but see if this makes sense.
What I found was that if the pfsense was routing traffic for vm's on other systems (outside the xen box itself) then things worked - the offload worked as expected as the offload is added at the nic as the data leaves the xen server.
When I was routing traffic that was contained by the virtual network on the same xen host, that's when it didn't work - until I disabled the offloads - you only need to disable on the paths which you see the performance issues in my opinion - but you have to think it through.
Cheers.
-
The stack in the diagram in my sig is all on XenServer 6.5. Works fine as long as the checksumming is turned off.
-
Well, this issue is when traffic flows from external machines through pfsense wan-interface to resources on the internal lan.
The host on where this works has different hardware (including different NIC's) than the other two hosts in the pool. So when I migrate or restarts pfsense on host 1 or 2 I don't get through the firewall from the outside (ia its so slow that it dont work). But with pfsense on host 3 it works as expected.
Before it worked on all 3 hosts. Now the pfsense is not protected against host failure.
-
Well, this issue is when traffic flows from external machines through pfsense wan-interface to resources on the internal lan.
The host on where this works has different hardware (including different NIC's) than the other two hosts in the pool. So when I migrate or restarts pfsense on host 1 or 2 I don't get through the firewall from the outside (ia its so slow that it dont work). But with pfsense on host 3 it works as expected.
Before it worked on all 3 hosts. Now the pfsense is not protected against host failure.
What are the eth specs when it's failing? And is it a live migration or a shutdown-boot migration?
If you want to protect against failure, it's better to use pfSense's failover options instead of hypervisor-based failover. -
I think he was trying to do that but he perceived one pfsense to work and two others not to work.
I'll try to explain it another way… the interface (if any) which transmits traffic to machines on the same physical xen server needs to have tx check sums turned off as I noted in my post. That's the only interface affected.
If you have a pf on xen and it does not route for any hosts on the same xen box you don't see any problem.
This would affect any traffic to which check sums would be applicable (all I think?) - so it would affect carp traffic too I imagine IF your pf boxes were on the same network - if they are on different boxes the carp traffic will be fine.
Just turn off the tx check sums for all the pfsense interfaces if you don't understand what I mean - the method I described surives rebooting and only affects the pf vms you apply the changes to.
Hope that clarfies. Cheers.
-
Perhaps my explanation was not so clear. The offload settings mentioned here has been applied on all interfaces of pf from the start when I was running it on XenServer 6.2. That fixed the problem then and pf worked perfectly fine on all 3 hosts. It was like living in a Dream where the streets where paved with gold and there was free candy for everyone.
After upgrading to XS 6.5/SP1 pf only works on 1 host. It doesnt matter if I live migrate or shut down and restart on Another host. It ONLY works on "host 3".
I am only running 1 instance of pfsense and sure it may be better running 2 or more in a HA setup, but thats not really the question here. I had a fine working setup. But not anymore. The candy is all gone and the only change is XS that has been upgraded.
In reply to johnkeates I dont know what eht spec I should look into…?
-
In reply to johnkeates I dont know what eht spec I should look into…?
Use XE to get all the vif specs from the working pf hypervisor and one non-functional hypervisor, as well as ethtool parameters for both.
We're looking for other variables that might mess with the in-memory transport, because that's where VirtIO related issues seem to lie.
If you could post those 4 outputs it'd help us diagnose. -
My bad…
I noticed tht the interfaces on 2 failing XenServer hosts was reordered for some reason. Correcting this solved my problem, hence it was not related to pfsense.
I am thankful for your effort to help out and apologize for confusing you!
-
My bad…
I noticed tht the interfaces on 2 failing XenServer hosts was reordered for some reason. Correcting this solved my problem, hence it was not related to pfsense.
I am thankful for your effort to help out and apologize for confusing you!
Glad you got it fixed!
-
Just to keep this updated.
This problem still happens on XenServer 7.0 with pfSense 2.3.1.
-
Just to keep this updated.
This problem still happens on XenServer 7.0 with pfSense 2.3.1.
Yep, until it's fixed in upstream FreeBSD it won't get fixed, ever.
-
@johnkeates:
Just to keep this updated.
This problem still happens on XenServer 7.0 with pfSense 2.3.1.
Yep, until it's fixed in upstream FreeBSD it won't get fixed, ever.
Just figured I'd update this thread on these issues. It looks like freebsd 11 is supporting dom0 support for xen, so hopefully these issues will be fixed. I'm just getting a virtualized setup going with support ending for 32 bit here soon so I may try 2.4 of PFSense to see how it works out of the box with xen.
Here is a link to the freebsd support, though it will be experimental at this stage:
https://wiki.freebsd.org/Xen