Very slow traffic from other VM's through pfSense on XenServer

corotte

Ok

didi the above fix and it finally work.

Thanks folks !

dsiminiuk

My Internet speed normally is 20 Mb/s down and 2 Mb/s up.

I deployed pfSense 2.2-RELEASE X64 in XenServer 6.5

Without modification, the pfSense 2.2 would only muster 5 Mb/s down, and 0.06 Mb/s up. Painful.

I applied the changes to the LAN side VIF and the upload speed went back to full 2 Mb/s. The WAN speed did not improve.

I applied the changes to the WAN side VIF and the upload speed went back up to 20 Mb/s.

Eureka!

Andy_

It's just the tx-offload setting that needs to be changed, rx-offload is fixed-on.

I can confirm the problem and fix with Debian Wheezy/Xen 4.1.4 dom0.

ethtool -K ${dev} tx off in vif-bridge online did the trick.

The issue wasn't submitted to freebsd-bugs so far, now it is:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197344

A Former User

Interesting - only appears to apply to virtual interfaces.

My pfSense VM is running in xen 4.2 (Centos 6.6 dom0) and has no speed issues, but I'm using pci-passthrough to give 2 dedicated hardware NICs (off a dual-port Intel card) to pfSense for LAN/WAN (so that DMZ/intranet are physically separate too).

bananaboy

Thanks johnkeats for putting that up here. It really helped me sort this out.

One thing to note is disabling tx offload using ethtool -K does not persist across guest reboots or live migration because the dom-id and assigned vif changes, while xe vif-param-set other-config:ethtool-tx="off" does.

Is there any downside to using the vif-param-set option, or are the two basically equivalent?

bennymundz

@johnkeates:

You only need to disable checksum offloading on the hypervisor side of pfSense's interface.

Any interface that does DomU-DomU communication on pfSense's side produces un-checksummed packets which get dropped by PF in BSD.

sudo ethtool -K $interface tx off

where $interface is the VIF on the Xen Dom0 side is enough. Setting TX off on the bridge forces the Dom0 to calculate ALL checksums on ALL packets no matter where the come from or where they are going. This is not a smart idea since it creates a lot of calculations where they might not be needed. So if the pfSense DomU is on vif123.0 you run: sudo ethtool -K vif123.0 tx off

Sorry noob question here,

I am using a Xen implementation on a unraid distribution, when you say Dom0 side are you talking about the VIF that is spun up with the PFsense VM ? Like when i ifconfig to list my interfaces I just don't really know how to identify the interface you are referring to.

Sorry for the noob question again

Derelict

It's all here:

https://forum.pfsense.org/index.php?topic=85797.msg475906#msg475906

I recently just rebuilt my test stack and all I did was the tx and rx on every NIC which is still probably more than is necessary but it worked.

bennymundz

@johnkeates:

You only need to disable checksum offloading on the hypervisor side of pfSense's interface.

Any interface that does DomU-DomU communication on pfSense's side produces un-checksummed packets which get dropped by PF in BSD.

sudo ethtool -K $interface tx off

where $interface is the VIF on the Xen Dom0 side is enough. Setting TX off on the bridge forces the Dom0 to calculate ALL checksums on ALL packets no matter where the come from or where they are going. This is not a smart idea since it creates a lot of calculations where they might not be needed. So if the pfSense DomU is on vif123.0 you run: sudo ethtool -K vif123.0 tx off

Thank you for taking the time to explain this, i turned the TX off on the pfsense vif and all was good. Happy days

BBMitch

Hello all…

Thanks for the information - sure helped us solve this but I have some more information that wasn't clear to me from all posted here.

This issue only seems to apply where Pf is communicating with hosts within the same xen host (dom0).

We use xenserver 6.2 fwiw. We have two xen dom0 - pf was natting for two services - one on dom0-a and one on dom0-b

pf itself was located on dom0-b
The dom0-a service worked perfectly after the update to 2.2.2 - the dom0-b service did not.

For people new to xenserver / for completeness, we used:
xe vm-list
#then find the uuid of your pf vm
xe vif-list vm-uuid={uuid of the vm from above}
#note the uuid of the vif - not the network you want to change!
#for each vif you can check the status:
xe vif-param-get uuid={uuid of vif} param-name=other-config
xe vif-param-set uuid={uuid of vif} other-config:ethtool-tx="off"

For what it's worth I was able to turn off tx on only the LAN interface (which nats for the dom0-b service).

I tried but did not need to keep offload off for the WAN interface which seems to get proper checksum as it leaves the dom0 through the physical nic.

Once complete you need to reboot the pf vm. the setting will persist across reboots.

Hope that helps someone else :-)

Mitch

Gr1pen

I've been running pfsense 2.2 on XenServer 6.2 for a while with the mentioned offloads disabled and it's been working great. I believe since I upgraded to XenServer 6.5 (or when I upgraded to 6.5 SP1) pfsense only works as before on one specific host in the pool. I have 3 hosts in the pool and when pfsense is running on 2 of them it is very slow, but on the 3rd host it works fine.

How come..?? ???

BBMitch

Without knowing your network I can only guess… but see if this makes sense.

What I found was that if the pfsense was routing traffic for vm's on other systems (outside the xen box itself) then things worked - the offload worked as expected as the offload is added at the nic as the data leaves the xen server.

When I was routing traffic that was contained by the virtual network on the same xen host, that's when it didn't work - until I disabled the offloads - you only need to disable on the paths which you see the performance issues in my opinion - but you have to think it through.

Cheers.

Derelict

The stack in the diagram in my sig is all on XenServer 6.5. Works fine as long as the checksumming is turned off.

Gr1pen

Well, this issue is when traffic flows from external machines through pfsense wan-interface to resources on the internal lan.

The host on where this works has different hardware (including different NIC's) than the other two hosts in the pool. So when I migrate or restarts pfsense on host 1 or 2 I don't get through the firewall from the outside (ia its so slow that it dont work). But with pfsense on host 3 it works as expected.

Before it worked on all 3 hosts. Now the pfsense is not protected against host failure.

Guest

@Gr1pen:

Well, this issue is when traffic flows from external machines through pfsense wan-interface to resources on the internal lan.

The host on where this works has different hardware (including different NIC's) than the other two hosts in the pool. So when I migrate or restarts pfsense on host 1 or 2 I don't get through the firewall from the outside (ia its so slow that it dont work). But with pfsense on host 3 it works as expected.

Before it worked on all 3 hosts. Now the pfsense is not protected against host failure.

What are the eth specs when it's failing? And is it a live migration or a shutdown-boot migration?
If you want to protect against failure, it's better to use pfSense's failover options instead of hypervisor-based failover.

BBMitch

I think he was trying to do that but he perceived one pfsense to work and two others not to work.

I'll try to explain it another way… the interface (if any) which transmits traffic to machines on the same physical xen server needs to have tx check sums turned off as I noted in my post. That's the only interface affected.

If you have a pf on xen and it does not route for any hosts on the same xen box you don't see any problem.

This would affect any traffic to which check sums would be applicable (all I think?) - so it would affect carp traffic too I imagine IF your pf boxes were on the same network - if they are on different boxes the carp traffic will be fine.

Just turn off the tx check sums for all the pfsense interfaces if you don't understand what I mean - the method I described surives rebooting and only affects the pf vms you apply the changes to.

Hope that clarfies. Cheers.

Gr1pen

Perhaps my explanation was not so clear. The offload settings mentioned here has been applied on all interfaces of pf from the start when I was running it on XenServer 6.2. That fixed the problem then and pf worked perfectly fine on all 3 hosts. It was like living in a Dream where the streets where paved with gold and there was free candy for everyone.

After upgrading to XS 6.5/SP1 pf only works on 1 host. It doesnt matter if I live migrate or shut down and restart on Another host. It ONLY works on "host 3".

I am only running 1 instance of pfsense and sure it may be better running 2 or more in a HA setup, but thats not really the question here. I had a fine working setup. But not anymore. The candy is all gone and the only change is XS that has been upgraded.

In reply to johnkeates I dont know what eht spec I should look into…?

Guest

@Gr1pen:

In reply to johnkeates I dont know what eht spec I should look into…?

Use XE to get all the vif specs from the working pf hypervisor and one non-functional hypervisor, as well as ethtool parameters for both.
We're looking for other variables that might mess with the in-memory transport, because that's where VirtIO related issues seem to lie.
If you could post those 4 outputs it'd help us diagnose.

Gr1pen

My bad…

I noticed tht the interfaces on 2 failing XenServer hosts was reordered for some reason. Correcting this solved my problem, hence it was not related to pfsense.

I am thankful for your effort to help out and apologize for confusing you!

Guest

@Gr1pen:

My bad…

I noticed tht the interfaces on 2 failing XenServer hosts was reordered for some reason. Correcting this solved my problem, hence it was not related to pfsense.

I am thankful for your effort to help out and apologize for confusing you!

Glad you got it fixed!

viniciusferrao

Just to keep this updated.

This problem still happens on XenServer 7.0 with pfSense 2.3.1.