Unbelieveably bad performance



  • I am getting horrendous performance with the RC in XenServer 6.5.

    I took a snapshot of a running 2.1.5 pfSense VM, then upgraded to 2.2 RC from today. Performance is AWFUL. I can't even pass any traffic but ICMP from behind the firewall VM with other VM's on same hypervisor.

    The config is literally this simple: NAT 80 and 443 to a VM. That's it. No packages installed, nothing. Pretty much as simple as it gets. Single WAN, single LAN.

    Here's a packet capture.. it's weird.

    http://i.imgur.com/VLIosF1.png

    I tried telnet just as a simple test (http://pastebin.com/Pp9sbyVy) - I was staring at console and immediately "quit" as soon as it connected. That's how long it took, when it worked.

    Is anyone else experiencing this? I am getting it with and without xen tools.



  • On a related/unrelated note, the interfaces show up as "manual" only. Not sure if that's a freebsd-on-xen bug or not, haven't touched it much yet. Will play with that some more in the next few days perhaps.


  • LAYER 8 Global Moderator

    where did you take that sniff?  Why are you seeing 65.98.6.46 sending to 65.98.6.38 ?  I would assume that is on the same segment..  is that pfsense lan 65.98.6.46?

    Then see same IP sending traffic to a 10 address?

    Then also seeing traffic from a 202 to that 6.38 address?

    Could you give a basic layout with pfsense interfaces and who is trying to talk to who?  And where you did this sniff?



  • EDIT:

    Found the ML thread (http://lists.pfsense.org/pipermail/list/2014-December/007785.html)

    To fix this, where it is rendered unbootable after an upgrade 2.1.5 to 2.2, you can just type ufs:/dev/ada0s1a and it will boot. 2.1 saw it as ad0s1a, and 2.2 sees it as ada0s1a. once it boots, just fix your fstab

    @johnpoz:

    where did you take that sniff?  Why are you seeing 65.98.6.46 sending to 65.98.6.38 ?  I would assume that is on the same segment..  is that pfsense lan 65.98.6.46?

    Then see same IP sending traffic to a 10 address?

    Then also seeing traffic from a 202 to that 6.38 address?

    Could you give a basic layout with pfsense interfaces and who is trying to talk to who?  And where you did this sniff?

    1. 65.98.6.46 is another box in the same rack I am working with (my console machine).

    2. Capture was from pfsense itself

    3. It is internet facing, 65.98.6.38 is the WAN IP for the test.

    Internet –> Hypervisor -- > (xn1) pfSense VM (xn0) <--- VM's

    xn0 is an "single server private network" in xenserver, so it is not connected to any physical interface(s)


  • LAYER 8 Global Moderator

    ok so you sniffed on wan of pfsense that 6.38 address and saw no answer.  So from that its hard to tell if something just didn't answer or you don't have rules setup.  So you forward that to something inside - did you sniff on the lan side of the pfsense box (xn0)?  Did pfsense not send the traffic to where you were forwarding it?

    can you post your wan and forwarding rules?



  • @johnpoz:

    ok so you sniffed on wan of pfsense that 6.38 address and saw no answer.  So from that its hard to tell if something just didn't answer or you don't have rules setup.  So you forward that to something inside - did you sniff on the lan side of the pfsense box (xn0)?  Did pfsense not send the traffic to where you were forwarding it?

    can you post your wan and forwarding rules?

    here you go.

    i set up a brand new VM with just these rules (attached) with same slowness

    it passed some of the telnet requests, after 45-90 seconds, and others timed out. see the above pastebin link in initial post

    i ran the packet capture in the web ui which said it would check all interfaces for it.

    ![Screen Shot 2015-01-15 at 1.19.36 PM.png](/public/imported_attachments/1/Screen Shot 2015-01-15 at 1.19.36 PM.png)
    ![Screen Shot 2015-01-15 at 1.19.36 PM.png_thumb](/public/imported_attachments/1/Screen Shot 2015-01-15 at 1.19.36 PM.png_thumb)
    ![Screen Shot 2015-01-15 at 1.19.27 PM.png](/public/imported_attachments/1/Screen Shot 2015-01-15 at 1.19.27 PM.png)
    ![Screen Shot 2015-01-15 at 1.19.27 PM.png_thumb](/public/imported_attachments/1/Screen Shot 2015-01-15 at 1.19.27 PM.png_thumb)


  • LAYER 8 Global Moderator

    Ok so your forwarding to 10.166.109.1

    I see traffic to that.. But it never answers..  So pfsense sent the traffic to 10.166.109.1 - but it never answers.. so pfsense seems to be forwarding correctly.

    Look at you image you posted..  for every syn and retrans of the syn that hits 65.98.6.38, you see traffic sent to 10.166.109.1 from 65.98.6.46, that was the sender to 65.98.6.38.  That looks to be the forward to me.

    So whatever is suppose to be listening on 80 is not, or its firewalled and doesn't allow from 65.98, etc..  Maybe 109.1 is wrong IP? Maybe there is something wrong with your vm setup that its not getting the traffic.  Or maybe 109.1 doesn't have gateway?  So it doesn't know how to send traffic back to 65.98 network?  You could always sniff on the 65.98.109.1 box to check.  But sure looks like pfsense is doing what you asked it to do.



  • @johnpoz:

    Ok so your forwarding to 10.166.109.1

    I see traffic to that.. But it never answers..  So pfsense sent the traffic to 10.166.109.1 - but it never answers.. so pfsense seems to be forwarding correctly.

    Look at you image you posted..  for every syn and retrans of the syn that hits 65.98.6.38, you see traffic sent to 10.166.109.1 from 65.98.6.46, that was the sender to 65.98.6.38.  That looks to be the forward to me.

    So whatever is suppose to be listening on 80 is not, or its firewalled and doesn't allow from 65.98, etc..  Maybe 109.1 is wrong IP? Maybe there is something wrong with your vm setup that its not getting the traffic.  Or maybe 109.1 doesn't have gateway?  So it doesn't know how to send traffic back to 65.98 network?  You could always sniff on the 65.98.109.1 box to check.  But sure looks like pfsense is doing what you asked it to do.

    I had this exact configuration on a machine running 2.1. A virtual machine. As a matter of fact, I created this configuration on 2.1, upgraded to 2.2, and it no longer works. So there is more to it than is meeting your eye I do believe

    in addition, I have confirmed from one virtual machine to another behind the firewall, that the web server is listening and responding properly to request and has the correct gateway set.

    Pardon me if there are typos in here, I am using voice dictation at the moment.


  • Netgate Administrator

    Was there not some issue with the Xen nic drivers? Was the 2.1.X vm using xn nics?

    https://forum.pfsense.org/index.php?topic=84255.0

    Steve


  • LAYER 8 Global Moderator

    "upgraded to 2.2, and it no longer works."

    All I can tell you from your sniff you posted is the traffic looks to have been sent on.  Did it go out the right interface?  I am not sure from that sniff.. But clearly the packets where forwarded to the IP.  For example the top 2, you see the syn to 65.98.6.38, and then .000066 seconds later packet sent to 10.166.109.1

    This tells me pfsense forwarded the packet - but I can not tell from the picture what interface that was captured on, if could see the mac address for example would know what interface it left on, etc.

    From what I see in the sniff the problem with the 109.1 box getting the packet after it left pfsense or in the answer?.  Lots of things could cause that - but then again can not be sure that the packet went out the correct interface from the image.  What kind filter did you use for the sniff?  I don't see any sort of broadcast traffic or other traffic that would validate that pfsense is seeing any traffic from 109.1 at all?



  • @johnpoz:

    "upgraded to 2.2, and it no longer works."

    All I can tell you from your sniff you posted is the traffic looks to have been sent on.  Did it go out the right interface?  I am not sure from that sniff.. But clearly the packets where forwarded to the IP.  For example the top 2, you see the syn to 65.98.6.38, and then .000066 seconds later packet sent to 10.166.109.1

    This tells me pfsense forwarded the packet - but I can not tell from the picture what interface that was captured on, if could see the mac address for example would know what interface it left on, etc.

    From what I see in the sniff the problem with the 109.1 box getting the packet after it left pfsense or in the answer?.  Lots of things could cause that - but then again can not be sure that the packet went out the correct interface from the image.  What kind filter did you use for the sniff?  I don't see any sort of broadcast traffic or other traffic that would validate that pfsense is seeing any traffic from 109.1 at all?

    I told it to capture 80 only. I'll capture *.


  • LAYER 8 Global Moderator

    Do 2 distinct captures.. Its easier to read that way.. Do one on the wan and one on the lan.. I just use tcpdump from ssh connection to do it.

    Or post up the actual capture so can see the mac - so you can validate it forwarded it out the correct interface.



  • @johnpoz:

    Do 2 distinct captures.. Its easier to read that way.. Do one on the wan and one on the lan.. I just use tcpdump from ssh connection to do it.

    Or post up the actual capture so can see the mac - so you can validate it forwarded it out the correct interface.

    Can't post the capture here. I'll upload them somewhere in a couple.



  • @johnpoz:

    Do 2 distinct captures.. Its easier to read that way.. Do one on the wan and one on the lan.. I just use tcpdump from ssh connection to do it.

    Or post up the actual capture so can see the mac - so you can validate it forwarded it out the correct interface.

    http://douglashaber.com/dump/WANCapture.cap
    http://douglashaber.com/dump/LANCapture.cap


  • Netgate Administrator

    Just to confirm, you've definitely not fallen foul of the driver change issue I linked to? I can't really see why it would affect you since you're not using VLANs or anything other than a standard config but it's worth checking.

    Steve



  • @stephenw10:

    Just to confirm, you've definitely not fallen foul of the driver change issue I linked to? I can't really see why it would affect you since you're not using VLANs or anything other than a standard config but it's worth checking.

    Steve

    I missed your question. Probably.

    It was not xn in 2.1.5, it was re(4)

    Hrmm.. found this on the ML:

    http://lists.freebsd.org/pipermail/freebsd-xen/2014-April/002065.html

    Maybe FreeBSD 10 just does not play nice on Xen.

    Edit 2 - more quirks involving XS..

    http://lists.freebsd.org/pipermail/freebsd-xen/2014-February/002010.html


  • Netgate Administrator

    Hmm, well that's interesting. You specified Realtek emulation in the Xen config then I assume? I'm unfamiliar with Xen.
    I would try removing the paravirtualised NIC support in Xen so that pfSense goes back to using the re driver and see if that makes any difference. Additionally I would set it to emulate Intel NICs rather than Realtek.
    As I say though I can't really see why the xn driver should be causing problems in your basic setup. Try removing all the hardware offloading options in System: Advanced: Networking:

    Steve



  • @stephenw10:

    Hmm, well that's interesting. You specified Realtek emulation in the Xen config then I assume? I'm unfamiliar with Xen.
    I would try removing the paravirtualised NIC support in Xen so that pfSense goes back to using the re driver and see if that makes any difference. Additionally I would set it to emulate Intel NICs rather than Realtek.
    As I say though I can't really see why the xn driver should be causing problems in your basic setup. Try removing all the hardware offloading options in System: Advanced: Networking:

    Steve

    Realtek is the default with XenServer. Switching to Intel emulation requires some hackery I am not ready to be doing yet. I don't want to change Xen necessarily.

    EDIT: By hackery, I mean just a small change really (http://www.netservers.co.uk/articles/open-source-howtos/citrix_e1000_gigabit) but I also have other VM's running, and don't want to change too much.

    I found this, which is interesting..

    ssh from the Windows PV host to the FreeBSD PV DomU host appears to work
    fine. Attempting to 'route' traffic from the Windows PV host 'through' the
    FreeBSD PV DomU fails - pings go, DNS goes, initial TCP 'setups' go - but
    stuff dies thereafter (i.e. may be packet size related or something).

    Sounds pretty much like my issue (re: http not working) even though as another poster mentioned, requests are there.

    http://lists.freebsd.org/pipermail/freebsd-xen/2014-February/002018.html


  • LAYER 8 Global Moderator

    ok this looks different than before..

    So looks like your getting back the syn,ack..  But then when you send a get, a 404 is sent back..  But then that is not working..

    GET / HTTP/1.1
    Host: 65.98.6.38
    Connection: keep-alive
    Cache-Control: max-age=0
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8
    User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36
    Accept-Encoding: gzip, deflate, sdch
    Accept-Language: en-US,en;q=0.8

    HTTP/1.1 404 Not Found
    Date: Fri, 16 Jan 2015 13:45:34 GMT
    Server: Apache/2.2.22 (Debian)

    Then on the lan side you don't see the get??  Something really odd going on here..

    From your wan sniff you can see that 404 was sent, but then you see retrans on the get and 404.  But on the lan side not even seeing the get..  Were these sniffs taken at the same time?

    edit: Ok looks like these were taken at different times..  wan goes from 7:45:31 to 7:47:14  But lan is from 7:47:31 to 7:49:16…  You really need to take capture at the same time.. And wouldn't hurt to have sniff running over the same time period on the webserver.




  • @johnpoz:

    ok this looks different than before..

    So looks like your getting back the syn,ack..  But then when you send a get, a 404 is sent back..  But then that is not working..

    GET / HTTP/1.1
    Host: 65.98.6.38
    Connection: keep-alive
    Cache-Control: max-age=0
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,/;q=0.8
    User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36
    Accept-Encoding: gzip, deflate, sdch
    Accept-Language: en-US,en;q=0.8

    HTTP/1.1 404 Not Found
    Date: Fri, 16 Jan 2015 13:45:34 GMT
    Server: Apache/2.2.22 (Debian)

    Then on the lan side you don't see the get??  Something really odd going on here..

    From your wan sniff you can see that 404 was sent, but then you see retrans on the get and 404.  But on the lan side not even seeing the get..  Were these sniffs taken at the same time?

    1. the 404 is to be expected. i wanted a simple thing to be spit back for testing purposes, rather than several MB webpage ,which is what would be on it in production. there is nothing to be served on the webserver now.

    2. very close.  couple of seconds apart max. i'll work on a set up exact same time ones.


  • LAYER 8 Global Moderator

    no they are not a couple of seconds apart.. they are completely different time frames.  See my edit.



  • @johnpoz:

    no they are not a couple of seconds apart.. they are completely different time frames.  See my edit.

    I'll run a new set, same time. Hang on.



  • Same URL's. Same time. Literally within 1-2 seconds this time, as quick as I could move cursor and hit go.

    No webserver capture in this group, though

    EDIT: let me see if i can do it again and turn up verbosity on pfsense, it's capture is way way less verbose with the LAN interface than my tcpdump was for the WAN


  • LAYER 8 Global Moderator

    well wan is going to see all the noise of a typical wan connection ;)  I would expect to see lots of noise ;)



  • @johnpoz:

    well wan is going to see all the noise of a typical wan connection ;)  I would expect to see lots of noise ;)

    I forgot to take of the default limit of 100 packets on the pf capture.  :-X

    Redoing now



  • @johnpoz:

    well wan is going to see all the noise of a typical wan connection ;)  I would expect to see lots of noise ;)

    Correctly done dumps are there now.



  • Are you using xentools on this vm?

    http://blog.feld.me/posts/2014/07/pfsense-on-citrix-xenserver/

    I've played with a 2.2 beta version on xen server with ~800mbit throughput IIRC.


  • LAYER 8 Global Moderator

    Ok so looking at these dumps..

    You have two connections coming in to 80, one from source port 43293 and another on 27618 both from this 67.81.220.99 IP

    You see the syn,ack back and then the ack from the 43293 connection.  But you never see the ack from the syn,ack sent to 27618

    You also see a get, an ack to that and then sending of the 404..  Clearly you can see the stuff pfsense gets on its wan it sends on to the lan.  Stuff it sees on the lan it sends out the wan.

    I see pfsense doing what it is suppose to do, it forwards on the packets..  But then on the wan side it seems that box is not getting the responses what were sent, so it sends retrans..  And on the lan side it doesn't get the reponse it expected so it retrans.

    Looks to me you have a issue with communication on the wan side..

    So you see the get come in on wan, you set it sent on to the lan, you see the lan ack back, you see it send 404..  But then you see inbound from 220.99 saying hey Im going to resend this get because I never got an ack..  And it clearly didn't get the 404 that was sent.

    Pfsense from your sniff clearly put it on the wire - but seems to be getting lost..  And 220.99 is not getting it.




  • The LAN capture has broken TCP checksums on all the retransmitted traffic. Not on everything though, and not null checksums (which would be the scenario where it's capturing before the NIC's checksum offloading adds the checksum), which suggests that's the likely cause. Have you disabled hardware checksum offloading under System>Advanced, Networking tab? Probably best to reboot afterwards.



  • @marcelloc:

    Are you using xentools on this vm?

    http://blog.feld.me/posts/2014/07/pfsense-on-citrix-xenserver/

    I've played with a 2.2 beta version on xen server with ~800mbit throughput IIRC.

    I had/have same issue tools or not.

    edit: throughput on the pfsense VM itself has been perfect this entire time. no slowness at all. it's only VM's behind the VM.

    @johnpoz:

    Ok so looking at these dumps..

    You have two connections coming in to 80, one from source port 43293 and another on 27618 both from this 67.81.220.99 IP

    You see the syn,ack back and then the ack from the 43293 connection.  But you never see the ack from the syn,ack sent to 27618

    You also see a get, an ack to that and then sending of the 404..  Clearly you can see the stuff pfsense gets on its wan it sends on to the lan.  Stuff it sees on the lan it sends out the wan.

    I see pfsense doing what it is suppose to do, it forwards on the packets..  But then on the wan side it seems that box is not getting the responses what were sent, so it sends retrans..  And on the lan side it doesn't get the reponse it expected so it retrans.

    Looks to me you have a issue with communication on the wan side..

    So you see the get come in on wan, you set it sent on to the lan, you see the lan ack back, you see it send 404..  But then you see inbound from 220.99 saying hey Im going to resend this get because I never got an ack..  And it clearly didn't get the 404 that was sent.

    Pfsense from your sniff clearly put it on the wire - but seems to be getting lost..  And 220.99 is not getting it.

    Not sure where the issue is then, if it is "WAN side", since every other box connected to that hand off from the datacenter is experiencing no issues whatsoever, and as previously stated, FreeBSD 10 (or I guess pfSense 2.2) is the only thing experiencing issue. The same exact WAN uplink/cable/etc in the same hypervisor can do full line rate in the other VM's.

    @cmb:

    The LAN capture has broken TCP checksums on all the retransmitted traffic. Not on everything though, and not null checksums (which would be the scenario where it's capturing before the NIC's checksum offloading adds the checksum), which suggests that's the likely cause. Have you disabled hardware checksum offloading under System>Advanced, Networking tab? Probably best to reboot afterwards.

    I did disable it, but haven't tried rebooting. Trying now.



  • @cmb:

    The LAN capture has broken TCP checksums on all the retransmitted traffic. Not on everything though, and not null checksums (which would be the scenario where it's capturing before the NIC's checksum offloading adds the checksum), which suggests that's the likely cause. Have you disabled hardware checksum offloading under System>Advanced, Networking tab? Probably best to reboot afterwards.

    Disabled, and rebooted. No change.


  • Netgate Administrator

    @Douglas:

    throughput on the pfsense VM itself has been perfect this entire time. no slowness at all. it's only VM's behind the VM.

    How are you testing the 'throughput' on the pfSense VM?

    Steve



  • @stephenw10:

    @Douglas:

    throughput on the pfsense VM itself has been perfect this entire time. no slowness at all. it's only VM's behind the VM.

    How are you testing the 'throughput' on the pfSense VM?

    Steve

    I suppose I should have been more specific. The WAN connection is a 100mbps handoff from the datacenter.

    I added a third interface (OPT1) to the VM and added it to a separate 2nd LAN so I could "speak" to the pfSense VM and run iperf to it. I was able to run an iperf and without any delay push significant traffic on both the OPT and WAN, interfaces

    And can access port 80 on the pfSense VM if I forward it for "OOB" on the WAN as well.

    Was also able to pull down few gigabyte sized files to the pfsense vm (or rather, /dev/null), at full 100Mbps also, no delay, disconnect, or otherwise.


  • LAYER 8 Global Moderator

    I didn't mean to say it was a WAN connection problem - what I meant is that pfsenes is putting it on its wan interface - and for some reason wan device is not seeing it.  Your pfsense is VM..  It seems to me you got a problem in that system on the wan side..

    Again –- from pfsense point of view all the packets it sees on its wan interface are being forwarded to lan, the lan answer and those are sent out its wan..  If you clearly have an issue between the wan guy requesting the data and where its being requested from.

    But from your sniff pfsense was doing what it was suppose to do..  Its possible there is issue in this driver under xen...  But you can clearly see the problem from the sniffs.. You need to investigate that..  Can you sniff on the physical interface to your xen host to see if your actually seeing the traffic pfsense says it put on the wire?



  • @johnpoz:

    I didn't mean to say it was a WAN connection problem - what I meant is that pfsenes is putting it on its wan interface - and for some reason wan device is not seeing it.  Your pfsense is VM..  It seems to me you got a problem in that system on the wan side..

    Again –- from pfsense point of view all the packets it sees on its wan interface are being forwarded to lan, the lan answer and those are sent out its wan..  If you clearly have an issue between the wan guy requesting the data and where its being requested from.

    But from your sniff pfsense was doing what it was suppose to do..  Its possible there is issue in this driver under xen...  But you can clearly see the problem from the sniffs.. You need to investigate that..  Can you sniff on the physical interface to your xen host to see if your actually seeing the traffic pfsense says it put on the wire?

    I sure can. I will do so. Just need to figure out how to get the brand new citrix repo's working as they are not yet. :)

    In order to work with you and others, do I need to capture the LAN side as well, for the trio of items? Hypervisor/pfSense/web VM?


  • LAYER 8 Global Moderator

    In a perfect world trying to track this down.. I wold sniff at the physical interface of your host, on both pfsense interfaces and then at the VM interface.

    This gives us full path..  And allows us to validate that inbound packets are getting all the way to the vm client behind pfsense - it answers and then pfsense sends that back and it goes out the physical interface of the hypervisor host..



  • @johnpoz:

    In a perfect world trying to track this down.. I wold sniff at the physical interface of your host, on both pfsense interfaces and then at the VM interface.

    This gives us full path..  And allows us to validate that inbound packets are getting all the way to the vm client behind pfsense - it answers and then pfsense sends that back and it goes out the physical interface of the hypervisor host..

    http://douglashaber.com/dump/hypervisor.cap
    http://douglashaber.com/dump/WANCapture.cap
    http://douglashaber.com/dump/LANCapture.cap

    warning - hypervisor cap ture is pretty big


  • LAYER 8 Global Moderator

    Ok followed one connection - see attached.

    Physical on the left, vm pfsense on the right

    So you see the syn come in from 6.46 to pfsense 6.38 saying hey I want to talk to you from port 38877 to your port 80

    So you see the syn,ack back and then the ack to the syn - typical handshake..

    Now 6.46 sends get some html shit..  you see ack back that says ok got your get.. Then sends 404..  He never gets an ack back that 6.46 got the ack to the 404..  So he sends 404 again, and again -  that is the retrans.

    So clearly pfsense put that on its virtual interface..  And as you can see on the left its also on the physical HOST interface..  So why does 6.46 never send back ack??  Did he not get it??  Your issue is between phsyical interface of host, and that 6.46 box..  Pfsense is doing exactly what its been asked to do..

    I see the 404 go out on the phsyical capture.. So why does 6.46 not ack??  Did he get it an ack and then that ack got lost.. Never shows up on the phsyical… Can you sniff on the 6.46 host??



  • Netgate Administrator

    This thread is a great example in diagnostics.  :)

    However it does seem hard to explain why it should have worked perfectly under pfSense 2.1.5 and not 2.2 if the error exists outside the host box.  :-\

    Have you read this: https://forum.pfsense.org/index.php?topic=85797.msg475906#msg475906

    I would be disabling the paravirtualised drivers for the pfSense VM to test that.

    Steve



  • @stephenw10:

    I would be disabling the paravirtualised drivers for the pfSense VM to test that.

    Yeah, forcing the VM to e1000 would be ideal and likely would fix the issue. From some brief searching though it doesn't appear easy, if possible at all, to force Xen to present a specific NIC to the VM. Ugly, every other hypervisor handles that far, far better.