Very slow traffic from other VM's through pfSense on XenServer



  • I have 2 XenServers, one with XenServer 6.2 and one with Xenserver Creedence beta 3.

    Both have a pfSense 2.2 RC as router/firewall and a couple of Ubuntu Linux VM's and a windows-VM.

    Traffic through both the physical xenserver-box and the virtual pfSense firewall goes at expected speeds.

    But traffic from the other VM's on the same xenserver through the pfSense out on wan/internet goes very, very slow.
    It goes so bad they cannot update themselve with apt-get.

    When I try with iperf from a linux VM through the pfSense's WAN the speed is 3,82 KBits/sec.
    The VM's and pfSense are connected with an internal single-server network (as OPT1), and tests to iperf server run on pfSense from a linux VM shows gigabit-speed.

    One of the pfSense' has xen-tools installed. The other has not. I cannot se improvements with the tools installed.

    One of the XenServers can get several public IP'numbers. On that I now have installed VM's with both an IPCop firewall and a Zentyal firewall.
    When one of those new firewall-VMs' is default gateway for the ordinary VM's on the XenServer, their wan/internet-speed is normal.

    Anobody with experience on XenServer as hypervisor, that can give me in a direction to experiment in to get traffic from VM's on the same Xenserver through pfSense up at useful performance ?



  • Try disabling hardware checksum offloading under System>Advanced, Networking. TSO and LRO should also be disabled, though they likely already are since that's the default for those.

    Which type of NIC is showing up in the VM? re0, em0, xn0?



  • Sorry.

    Tried to disable hardware checksum offloading. The other 2 were disabled by default.

    Did not improve the problem.

    NIC's in the pfSense VM are nx0 to nx3



  • New test with a pfSense 2.1.
    Here internet-traffic from other VM's on the same Xenserver is normal.

    The problem seems to be new in pfSense 2.2.



  • 2.1x wouldn't have xn NICs, it's specific to that. Can you force it to e1000 NICs on 2.2 and see?



  • 2.1x wouldn't have xn NICs, it's specific to that. Can you force it to e1000 NICs on 2.2 and see?

    On my 2.1.5 the nic's are called re. Can you give me some hints on, where abd how to change the driver ?



  • Hi,

    i have the same problem with RC 2.2 (XenServer 6.2, SP1016, different platforms and nics) . The problem is the offload engine. If you route traffic between virtual hosts, you get tcp retransmissions, only a few sessions survive….

    You have to disable the offload function at the VIF at the XenServer.
    First identify the uuid of the VIF's:

    xe vm-vif-list uuid=VMUUID

    And disable the offload settings:

    xe vif-param-set uuid=VIFUUID other-config:ethtool-gso="off"
    xe vif-param-set uuid=VIFUUID other-config:ethtool-ufo="off"
    xe vif-param-set uuid=VIFUUID other-config:ethtool-tso="off"
    xe vif-param-set uuid=VIFUUID other-config:ethtool-sg="off"
    xe vif-param-set uuid=VIFUUID other-config:ethtool-tx="off"
    xe vif-param-set uuid=VIFUUID other-config:ethtool-rx="off"

    shutdown / start the VM.

    And now the disadvantage, whitout offload engine the TCP throughput falls on GBIT level over the vswitch. With offload I reach over 371 MBps with fetch, download the xencenter.iso from dom0 via http, whitout 98 MBps.

    So who has a better solution, bring it on !!


  • Netgate

    This all worked for me on the test stack I use which is now all 2.2-RELEASE.  I don't really care about performance much in this application, but before I did this it was useless.  Thanks much.



  • ___First identify the uuid of the VIF's:
    xe vm-vif-list uuid=VMUUID

    And disable the offload settings:
    xe vif-param-set uuid=VIFUUID other-config:ethtool-gso="off"
    xe vif-param-set uuid=VIFUUID other-config:ethtool-ufo="off"
    xe vif-param-set uuid=VIFUUID other-config:ethtool-tso="off"
    xe vif-param-set uuid=VIFUUID other-config:ethtool-sg="off"
    xe vif-param-set uuid=VIFUUID other-config:ethtool-tx="off"
    xe vif-param-set uuid=VIFUUID other-config:ethtool-rx="off"

    shutdown / start the VM___

    Used this on both a XenServer 6.5 and a 6.2 later upgraded to 6.5. On both it has given other VM's internet-access again.

    Run the xe commands on a Xenserver Private Network, so I hope the speed degrade will only occur on traffic that involves that net.
    I think, both the pfSense VM and the other VM's need to be restartet to get useful speed.



  • @phadm:

    You have to disable the offload function at the VIF at the XenServer.
    First identify the uuid of the VIF's:

    Which VIF? Local or WAN or both?

    Thanks,
    Florian


  • Netgate

    I did it on all.



  • This helped me too. I only did this for my LAN port.

    In my setup it seemed to be sufficient to execute:
    xe vif-param-set uuid=VIFUUID other-config:ethtool-tx="off"
    xe vif-param-set uuid=VIFUUID other-config:ethtool-rx="off"



  • @jpenninkhof:

    This helped me too. I only did this for my LAN port.

    In my setup it seemed to be sufficient to execute:
    xe vif-param-set uuid=VIFUUID other-config:ethtool-tx="off"
    xe vif-param-set uuid=VIFUUID other-config:ethtool-rx="off"

    I can confirm that the LAN port should be enough. On a related note, did someone install the XenServer Tools in the VM?



  • Hi,

    updated my XenServer 6.2 to 6.5 a few day ago with my VM pfsense 2.1.5 with no issue

    updated pfsense to 2.2 WITH XENTOOLS (xe-guest-utilties 6.0.2_3) and got the same issue !

    installed xentool using that method http://blog.feld.me/posts/2014/07/pfsense-on-citrix-xenserver/ (Thanks feld !)

    look like issue remain even with Xentools :/

    anyone can confirm ?


  • Netgate

    Yes.  It's broken.



  • damn !

    but a quesiton remain … was it working well in snapshot ? was it working well with previous version of xentool ?

    in this thread
    https://forum.pfsense.org/index.php?topic=86827.0
    it look like to be an issue with xn nic …
    maybe a previous version should work ?


  • Netgate

    No.

    Just disable the tx/rx like in the above until FreeBSD and/or Citrix fixes it.



  • Ok

    didi the above fix and it finally work.

    Thanks folks !



  • My Internet speed normally is 20 Mb/s down and 2 Mb/s up.

    I deployed pfSense 2.2-RELEASE X64 in XenServer 6.5

    Without modification, the pfSense 2.2 would only muster 5 Mb/s down, and 0.06 Mb/s up. Painful.

    I applied the changes to the LAN side VIF and the upload speed went back to full 2 Mb/s. The WAN speed did not improve.

    I applied the changes to the WAN side VIF and the upload speed went back up to 20 Mb/s.

    Eureka!



  • It's just the tx-offload setting that needs to be changed, rx-offload is fixed-on.

    I can confirm the problem and fix with Debian Wheezy/Xen 4.1.4 dom0.

    ethtool -K ${dev} tx off in vif-bridge online did the trick.

    The issue wasn't submitted to freebsd-bugs so far, now it is:
    https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197344



  • Interesting - only appears to apply to virtual interfaces.

    My pfSense VM is running in xen 4.2 (Centos 6.6 dom0) and has no speed issues, but I'm using pci-passthrough to give 2 dedicated hardware NICs (off a dual-port Intel card) to pfSense for LAN/WAN  (so that DMZ/intranet are physically separate too).



  • Thanks johnkeats for putting that up here. It really helped me sort this out.

    One thing to note is disabling tx offload using ethtool -K does not persist across guest reboots or live migration because the dom-id and assigned vif changes, while xe vif-param-set other-config:ethtool-tx="off" does.

    Is there any downside to using the vif-param-set option, or are the two basically equivalent?



  • @johnkeates:

    You only need to disable checksum offloading on the hypervisor side of pfSense's interface.

    Any interface that does DomU-DomU communication on pfSense's side produces un-checksummed packets which get dropped by PF in BSD.

    sudo ethtool -K $interface tx off

    where $interface is the VIF on the Xen Dom0 side is enough. Setting TX off on the bridge forces the Dom0 to calculate ALL checksums on ALL packets no matter where the come from or where they are going. This is not a smart idea since it creates a lot of calculations where they might not be needed. So if the pfSense DomU is on vif123.0 you run: sudo ethtool -K vif123.0 tx off

    Sorry noob question here,

    I am using a Xen implementation on a unraid distribution, when you say Dom0 side are you talking about the VIF that is spun up with the PFsense VM ? Like when i ifconfig to list my interfaces I just don't really know how to identify the interface you are referring to.

    Sorry for the noob question again


  • Netgate

    It's all here:

    https://forum.pfsense.org/index.php?topic=85797.msg475906#msg475906

    I recently just rebuilt my test stack and all I did was the tx and rx on every NIC which is still probably more than is necessary but it worked.



  • @johnkeates:

    You only need to disable checksum offloading on the hypervisor side of pfSense's interface.

    Any interface that does DomU-DomU communication on pfSense's side produces un-checksummed packets which get dropped by PF in BSD.

    sudo ethtool -K $interface tx off

    where $interface is the VIF on the Xen Dom0 side is enough. Setting TX off on the bridge forces the Dom0 to calculate ALL checksums on ALL packets no matter where the come from or where they are going. This is not a smart idea since it creates a lot of calculations where they might not be needed. So if the pfSense DomU is on vif123.0 you run: sudo ethtool -K vif123.0 tx off

    Thank you for taking the time to explain this, i turned the TX off on the pfsense vif and all was good. Happy days



  • Hello all…

    Thanks for the information - sure helped us solve this but I have some more information that wasn't clear to me from all posted here.

    This issue only seems to apply where Pf is communicating with hosts within the same xen host (dom0).

    We use xenserver 6.2 fwiw. We have two xen dom0 - pf was natting for two services - one on dom0-a and one on dom0-b

    pf itself was located on dom0-b
    The dom0-a service worked perfectly after the update to 2.2.2 - the dom0-b service did not.

    For people new to xenserver / for completeness, we used:
    xe vm-list
    #then find the uuid of your pf vm
    xe vif-list vm-uuid={uuid of the vm from above}
    #note the uuid of the vif - not the network you want to change!
    #for each vif you can check the status:
    xe vif-param-get uuid={uuid of vif} param-name=other-config
    xe vif-param-set uuid={uuid of vif} other-config:ethtool-tx="off"

    For what it's worth I was able to turn off tx on only the LAN interface (which nats for the dom0-b service).

    I tried but did not need to keep offload off for the WAN interface which seems to get proper checksum as it leaves the dom0 through the physical nic.

    Once complete you need to reboot the pf vm. the setting will persist across reboots.

    Hope that helps someone else :-)

    Mitch



  • I've been running pfsense 2.2 on XenServer 6.2 for a while with the mentioned offloads disabled and it's been working great. I believe since I upgraded to XenServer 6.5 (or when I upgraded to 6.5 SP1) pfsense only works as before on one specific host in the pool. I have 3 hosts in the pool and when pfsense is running on 2 of them it is very slow, but on the 3rd host it works fine.

    How come..?? ???



  • Without knowing your network I can only guess… but see if this makes sense.

    What I found was that if the pfsense was routing traffic for vm's on other systems (outside the xen box itself) then things worked - the offload worked as expected as the offload is added at the nic as the data leaves the xen server.

    When I was routing traffic that was contained by the virtual network on the same xen host, that's when it didn't work - until I disabled the offloads - you only need to disable on the paths which you see the performance issues in my opinion - but you have to think it through.

    Cheers.


  • Netgate

    The stack in the diagram in my sig is all on XenServer 6.5.  Works fine as long as the checksumming is turned off.



  • Well, this issue is when traffic flows from external machines through pfsense wan-interface to resources on the internal lan.

    The host on where this works has different hardware (including different NIC's) than the other two hosts in the pool. So when I migrate or restarts pfsense on  host 1 or 2 I don't get through the firewall from the outside (ia its so slow that it dont work). But with pfsense on host 3 it works as expected.

    Before it worked on all 3 hosts. Now the pfsense is not protected against host failure.



  • @Gr1pen:

    Well, this issue is when traffic flows from external machines through pfsense wan-interface to resources on the internal lan.

    The host on where this works has different hardware (including different NIC's) than the other two hosts in the pool. So when I migrate or restarts pfsense on  host 1 or 2 I don't get through the firewall from the outside (ia its so slow that it dont work). But with pfsense on host 3 it works as expected.

    Before it worked on all 3 hosts. Now the pfsense is not protected against host failure.

    What are the eth specs when it's failing? And is it a live migration or a shutdown-boot migration?
    If you want to protect against failure, it's better to use pfSense's failover options instead of hypervisor-based failover.



  • I think he was trying to do that but he perceived one pfsense to work and two others not to work.

    I'll try to explain it another way… the interface (if any) which transmits traffic to machines on the same physical xen server needs to have tx check sums turned off as I noted in my post. That's the only interface affected.

    If you have a pf on xen and it does not route for any hosts on the same xen box you don't see any problem.

    This would affect any traffic to which check sums would be applicable (all I think?) - so it would affect carp traffic too I imagine IF your pf boxes were on the same network - if they are on different boxes the carp traffic will be fine.

    Just turn off the tx check sums for all the pfsense interfaces if you don't understand what I mean - the method I described surives rebooting and only affects the pf vms you apply the changes to.

    Hope that clarfies. Cheers.



  • Perhaps my explanation was not so clear. The offload settings mentioned here has been applied on all interfaces of pf from the start when I was running it on XenServer 6.2. That fixed the problem then and pf worked perfectly fine on all 3 hosts. It was like living in a Dream where the streets where paved with gold and there was free candy for everyone.

    After upgrading to XS 6.5/SP1 pf only works on 1 host. It doesnt matter if I live migrate or shut down and restart on Another host. It ONLY works on "host 3".

    I am only running 1 instance of pfsense and sure it may be better running 2 or more in a HA  setup, but thats not really the question here. I had a fine working setup. But not anymore. The candy is all gone and the only change is XS that has been upgraded.

    In reply to johnkeates I dont know what eht spec I should look into…?



  • @Gr1pen:

    In reply to johnkeates I dont know what eht spec I should look into…?

    Use XE to get all the vif specs from the working pf hypervisor and one non-functional hypervisor, as well as ethtool parameters for both.
    We're looking for other variables that might mess with the in-memory transport, because that's where VirtIO related issues seem to lie.
    If you could post those 4 outputs it'd help us diagnose.



  • My bad…

    I noticed tht the interfaces on 2 failing XenServer hosts was reordered for some reason. Correcting this solved my problem, hence it was not related to pfsense.

    I am thankful for your effort to help out and apologize for confusing you!



  • @Gr1pen:

    My bad…

    I noticed tht the interfaces on 2 failing XenServer hosts was reordered for some reason. Correcting this solved my problem, hence it was not related to pfsense.

    I am thankful for your effort to help out and apologize for confusing you!

    Glad you got it fixed!



  • Just to keep this updated.

    This problem still happens on XenServer 7.0 with pfSense 2.3.1.



  • @viniciusferrao:

    Just to keep this updated.

    This problem still happens on XenServer 7.0 with pfSense 2.3.1.

    Yep, until it's fixed in upstream FreeBSD it won't get fixed, ever.



  • @johnkeates:

    @viniciusferrao:

    Just to keep this updated.

    This problem still happens on XenServer 7.0 with pfSense 2.3.1.

    Yep, until it's fixed in upstream FreeBSD it won't get fixed, ever.

    Just figured I'd update this thread on these issues.  It looks like freebsd 11 is supporting dom0 support for xen, so hopefully these issues will be fixed.  I'm just getting a virtualized setup going with support ending for 32 bit here soon so I may try 2.4 of PFSense to see how it works out of the box with xen.

    Here is a link to the freebsd support, though it will be experimental at this stage:

    https://wiki.freebsd.org/Xen



  • @johnkeates:

    @gothicman02:

    @johnkeates:

    @viniciusferrao:

    Just to keep this updated.

    This problem still happens on XenServer 7.0 with pfSense 2.3.1.

    Yep, until it's fixed in upstream FreeBSD it won't get fixed, ever.

    Just figured I'd update this thread on these issues.  It looks like freebsd 11 is supporting dom0 support for xen, so hopefully these issues will be fixed.  I'm just getting a virtualized setup going with support ending for 32 bit here soon so I may try 2.4 of PFSense to see how it works out of the box with xen.

    Here is a link to the freebsd support, though it will be experimental at this stage:

    https://wiki.freebsd.org/Xen

    I suppose that could actually fix the netback/netfront problems because it will be BSD on the other end too. Interesting.

    Yes very.  Although there is still some work to do.  I got the latest 2.4 snapshot running (as of March 18th) with FreeBSD 11.0-p8 under Xenserver 7.1 with all patches, and the issues with checksum offloading still exist.  Disabling it still fixes the issue through only on the rx and tx side, but I do believe there is a slight performance drop like others have said here.  I haven't tested local file transfers yet, but I do notice a slight drop in internet bandwidth.  I'll do more testing when I got time.