Unbelieveably bad performance
-
well wan is going to see all the noise of a typical wan connection ;) I would expect to see lots of noise ;)
Correctly done dumps are there now.
-
Are you using xentools on this vm?
http://blog.feld.me/posts/2014/07/pfsense-on-citrix-xenserver/
I've played with a 2.2 beta version on xen server with ~800mbit throughput IIRC.
-
Ok so looking at these dumps..
You have two connections coming in to 80, one from source port 43293 and another on 27618 both from this 67.81.220.99 IP
You see the syn,ack back and then the ack from the 43293 connection. But you never see the ack from the syn,ack sent to 27618
You also see a get, an ack to that and then sending of the 404.. Clearly you can see the stuff pfsense gets on its wan it sends on to the lan. Stuff it sees on the lan it sends out the wan.
I see pfsense doing what it is suppose to do, it forwards on the packets.. But then on the wan side it seems that box is not getting the responses what were sent, so it sends retrans.. And on the lan side it doesn't get the reponse it expected so it retrans.
Looks to me you have a issue with communication on the wan side..
So you see the get come in on wan, you set it sent on to the lan, you see the lan ack back, you see it send 404.. But then you see inbound from 220.99 saying hey Im going to resend this get because I never got an ack.. And it clearly didn't get the 404 that was sent.
Pfsense from your sniff clearly put it on the wire - but seems to be getting lost.. And 220.99 is not getting it.
-
The LAN capture has broken TCP checksums on all the retransmitted traffic. Not on everything though, and not null checksums (which would be the scenario where it's capturing before the NIC's checksum offloading adds the checksum), which suggests that's the likely cause. Have you disabled hardware checksum offloading under System>Advanced, Networking tab? Probably best to reboot afterwards.
-
Are you using xentools on this vm?
http://blog.feld.me/posts/2014/07/pfsense-on-citrix-xenserver/
I've played with a 2.2 beta version on xen server with ~800mbit throughput IIRC.
I had/have same issue tools or not.
edit: throughput on the pfsense VM itself has been perfect this entire time. no slowness at all. it's only VM's behind the VM.
Ok so looking at these dumps..
You have two connections coming in to 80, one from source port 43293 and another on 27618 both from this 67.81.220.99 IP
You see the syn,ack back and then the ack from the 43293 connection. But you never see the ack from the syn,ack sent to 27618
You also see a get, an ack to that and then sending of the 404.. Clearly you can see the stuff pfsense gets on its wan it sends on to the lan. Stuff it sees on the lan it sends out the wan.
I see pfsense doing what it is suppose to do, it forwards on the packets.. But then on the wan side it seems that box is not getting the responses what were sent, so it sends retrans.. And on the lan side it doesn't get the reponse it expected so it retrans.
Looks to me you have a issue with communication on the wan side..
So you see the get come in on wan, you set it sent on to the lan, you see the lan ack back, you see it send 404.. But then you see inbound from 220.99 saying hey Im going to resend this get because I never got an ack.. And it clearly didn't get the 404 that was sent.
Pfsense from your sniff clearly put it on the wire - but seems to be getting lost.. And 220.99 is not getting it.
Not sure where the issue is then, if it is "WAN side", since every other box connected to that hand off from the datacenter is experiencing no issues whatsoever, and as previously stated, FreeBSD 10 (or I guess pfSense 2.2) is the only thing experiencing issue. The same exact WAN uplink/cable/etc in the same hypervisor can do full line rate in the other VM's.
@cmb:
The LAN capture has broken TCP checksums on all the retransmitted traffic. Not on everything though, and not null checksums (which would be the scenario where it's capturing before the NIC's checksum offloading adds the checksum), which suggests that's the likely cause. Have you disabled hardware checksum offloading under System>Advanced, Networking tab? Probably best to reboot afterwards.
I did disable it, but haven't tried rebooting. Trying now.
-
@cmb:
The LAN capture has broken TCP checksums on all the retransmitted traffic. Not on everything though, and not null checksums (which would be the scenario where it's capturing before the NIC's checksum offloading adds the checksum), which suggests that's the likely cause. Have you disabled hardware checksum offloading under System>Advanced, Networking tab? Probably best to reboot afterwards.
Disabled, and rebooted. No change.
-
throughput on the pfsense VM itself has been perfect this entire time. no slowness at all. it's only VM's behind the VM.
How are you testing the 'throughput' on the pfSense VM?
Steve
-
throughput on the pfsense VM itself has been perfect this entire time. no slowness at all. it's only VM's behind the VM.
How are you testing the 'throughput' on the pfSense VM?
Steve
I suppose I should have been more specific. The WAN connection is a 100mbps handoff from the datacenter.
I added a third interface (OPT1) to the VM and added it to a separate 2nd LAN so I could "speak" to the pfSense VM and run iperf to it. I was able to run an iperf and without any delay push significant traffic on both the OPT and WAN, interfaces
And can access port 80 on the pfSense VM if I forward it for "OOB" on the WAN as well.
Was also able to pull down few gigabyte sized files to the pfsense vm (or rather, /dev/null), at full 100Mbps also, no delay, disconnect, or otherwise.
-
I didn't mean to say it was a WAN connection problem - what I meant is that pfsenes is putting it on its wan interface - and for some reason wan device is not seeing it. Your pfsense is VM.. It seems to me you got a problem in that system on the wan side..
Again –- from pfsense point of view all the packets it sees on its wan interface are being forwarded to lan, the lan answer and those are sent out its wan.. If you clearly have an issue between the wan guy requesting the data and where its being requested from.
But from your sniff pfsense was doing what it was suppose to do.. Its possible there is issue in this driver under xen... But you can clearly see the problem from the sniffs.. You need to investigate that.. Can you sniff on the physical interface to your xen host to see if your actually seeing the traffic pfsense says it put on the wire?
-
I didn't mean to say it was a WAN connection problem - what I meant is that pfsenes is putting it on its wan interface - and for some reason wan device is not seeing it. Your pfsense is VM.. It seems to me you got a problem in that system on the wan side..
Again –- from pfsense point of view all the packets it sees on its wan interface are being forwarded to lan, the lan answer and those are sent out its wan.. If you clearly have an issue between the wan guy requesting the data and where its being requested from.
But from your sniff pfsense was doing what it was suppose to do.. Its possible there is issue in this driver under xen... But you can clearly see the problem from the sniffs.. You need to investigate that.. Can you sniff on the physical interface to your xen host to see if your actually seeing the traffic pfsense says it put on the wire?
I sure can. I will do so. Just need to figure out how to get the brand new citrix repo's working as they are not yet. :)
In order to work with you and others, do I need to capture the LAN side as well, for the trio of items? Hypervisor/pfSense/web VM?
-
In a perfect world trying to track this down.. I wold sniff at the physical interface of your host, on both pfsense interfaces and then at the VM interface.
This gives us full path.. And allows us to validate that inbound packets are getting all the way to the vm client behind pfsense - it answers and then pfsense sends that back and it goes out the physical interface of the hypervisor host..
-
In a perfect world trying to track this down.. I wold sniff at the physical interface of your host, on both pfsense interfaces and then at the VM interface.
This gives us full path.. And allows us to validate that inbound packets are getting all the way to the vm client behind pfsense - it answers and then pfsense sends that back and it goes out the physical interface of the hypervisor host..
http://douglashaber.com/dump/hypervisor.cap
http://douglashaber.com/dump/WANCapture.cap
http://douglashaber.com/dump/LANCapture.capwarning - hypervisor cap ture is pretty big
-
Ok followed one connection - see attached.
Physical on the left, vm pfsense on the right
So you see the syn come in from 6.46 to pfsense 6.38 saying hey I want to talk to you from port 38877 to your port 80
So you see the syn,ack back and then the ack to the syn - typical handshake..
Now 6.46 sends get some html shit.. you see ack back that says ok got your get.. Then sends 404.. He never gets an ack back that 6.46 got the ack to the 404.. So he sends 404 again, and again - that is the retrans.
So clearly pfsense put that on its virtual interface.. And as you can see on the left its also on the physical HOST interface.. So why does 6.46 never send back ack?? Did he not get it?? Your issue is between phsyical interface of host, and that 6.46 box.. Pfsense is doing exactly what its been asked to do..
I see the 404 go out on the phsyical capture.. So why does 6.46 not ack?? Did he get it an ack and then that ack got lost.. Never shows up on the phsyical… Can you sniff on the 6.46 host??
-
This thread is a great example in diagnostics. :)
However it does seem hard to explain why it should have worked perfectly under pfSense 2.1.5 and not 2.2 if the error exists outside the host box. :-\
Have you read this: https://forum.pfsense.org/index.php?topic=85797.msg475906#msg475906
I would be disabling the paravirtualised drivers for the pfSense VM to test that.
Steve
-
I would be disabling the paravirtualised drivers for the pfSense VM to test that.
Yeah, forcing the VM to e1000 would be ideal and likely would fix the issue. From some brief searching though it doesn't appear easy, if possible at all, to force Xen to present a specific NIC to the VM. Ugly, every other hypervisor handles that far, far better.
-
This is a known issue in upstream FreeBSD 10 after they incorporated the Xen paravirtualized drivers in the standard kernel. It's not exactly pfSense's fault.
Yeah, forcing the VM to e1000 would be ideal and likely would fix the issue. From some brief searching though it doesn't appear easy, if possible at all, to force Xen to present a specific NIC to the VM. Ugly, every other hypervisor handles that far, far better.
It's definitely possible. There's a wrapper script for QEMU in```
/opt/xensource/libexec/qemu-dm-wrapperAnyways, I've been experiencing the same network performance issues in pfSense 2.2 snapshots, both on XenServer 6.2 and XenServer Creedence RC. However, I haven't found any way to remove or blacklist drivers _in the kernel_ the way one would on Linux (e.g. rmmod or adding bootloader parameters). So, the only workaround I've found, to revert to emulated NICs, is to recompile the BSD kernel without PVHVM drivers. I've [written instructions here](https://code.dingcorp.com/frederick.ding/pfsense-tools/wikis/removing-pvhvm), tested a few weeks ago, though it's a convoluted process to recompile a kernel.
-
So how is it these drivers cause the packets to show up on the physical nic? of the host - but not get answered?? While I can see how drivers can cause problems in virt.. From the sniffs sure looks like info is put on the physical nic.. Is there something wrong with the info put on the wire? Mangled packets? I did not look that deep into it - just following the stream.. that the other side doesn't like and doesn't see?? If the other side actual saw the traffic then yeah would have to look deeper into why packet there but not seeing it, etc..
-
I agree with you that it looks like there's no reply and hence an external problem. The 404 response is reaching the client correctly though?
However in light of the known issues with the xn(4) drivers in FreeBSD 10 it seems unproductive to continue without testing a standard NIC driver, even if it's re(4). This fits the fact it worked fine under 2.1.5 also.
Steve
-
So how is it these drivers cause the packets to show up on the physical nic? of the host - but not get answered??
I'm pretty confident judging by the packet captures it's because some packets are ending up with bad checksums, so it doesn't matter that they're getting there, they're dropped for that reason.
It's definitely possible. There's a wrapper script for QEMU in```
/opt/xensource/libexec/qemu-dm-wrapperAh good, thanks for the tip, at least it's possible and hopefully that'll help others.
-
But the invalid checksum is most likely to it just being offloaded, etc. I see that so much in sniffs that I have even turned off checking for it.