Strange timed throughput loss on esxi 6.0/vmxnet3 pfsense 2.2.4
http://i.imgur.com/4j1AllV.png Verticle scale is in KB/s
The above graph shows the issue I'm dealing with. I connected a host directly to the ISP and found this isn't an issue. When I use E1000 NIC's, I get more stable performance, however the throughput is slightly lower and latency increases.
Is there any explanation for this since it doesn't seem to be hypervisor load related? (there's just pfsense and vcenter on this machine)
nexus 1000v dvswitch
Direct IO enabled
what is that suppose to be a graph of? What is the scale? Is Y mpbs while X is what? Seconds, Minutes, hours?? What are you doing in that graph?
KOM last edited by
Probably pps / t.
Verticle scale is in KB/s
Sorry I should have been more specific
Horizontal is 500seconds in 1second samples
Verticle is in Kilobytes/s
This is taken from the receiving host that is downloading across the internet.
Just to clarify, using e1000 removes the consistant dips in throughalput.
Where are you grabbing that graph? Is receiving client a VM as well, a reg machine. And your saying you see this behavior if you use vmx3 in pfsense and if you use e1000 in pfsense see a nice straight line with no dips?
And what are you saying is the latency different and speed difference using e1000 vs vmx3? Your only talking 1.5MBps there.. Curious what you see in difference when using vmx3 vs e1000?
I could switch pfsense back over to vmx3 but I went with e1000 because vmx3 doesn't provide speed and duplex correctly when using ladvd package and causing cisco to log lots of duplex mismatches on the ports.. I could really find no real difference in performance between e1000 and vmx3 that made any sort of real difference.
The graph is from an office computer downloading from a server behind pfsense, across the internet.
I will provide a graph with e1000 to compare soon, however I see 1.2-1.5MB/s on vmx and 900-1.2 on e1000
On e1000 I do see a steady line across the screen, and I'm not sure why vmx has these consistent dips. Other linux VM's on the same hypervisor using vmx3 do not have this issue.
"Other linux VM's on the same hypervisor using vmx3 do not have this issue."
Wouldn't they be going through pfsense as well? So if pfsense was using vmx3 I would assume they would see the same thing, are you saying if pfsense is using e1000 and your other vms using vmx3 they do not show these dips?
I mean linux vmxnet3 VM server to internal LAN PC on the physical network. VM to VM is also unaffected, even across hypervisors in the same cluster.
INTERNET===Modem–-gi1/0/23+switch+gi1/0/24---hypervisor>>>pfsense vmxnet3>>>hypervisor---gi1/0/1+switch+gi1/0/2---gi1/1+ASA+gi1/2---gi1/0/3+switch+gi1/0/4---hypervisor>>>Test server
This has issues
INTERNET===Modem---gi1/0/23+switch+gi1/0/24---hypervisor>>>pfsense e1000>>>hypervisor---gi1/0/1+switch+gi1/0/2---gi1/1+ASA+gi1/2---gi1/0/3+switch+gi1/0/4---hypervisor>>>Test server
This does not have the issues
This does not have the issues
One further detail, Uploading from the LAN to the internet, such as google drive, I get a constant rate 10Mbps upload, without dips on all three configurations
The issue manifests only when downloading a file from the test server on a remote site. This also effects production servers as well.
I've also migrated the pfsense VM to different hypervisors with similer but different model super micro motherboards and the issue persists.
This is also not an issue with the pc I'm testing on which I produced the graph, I've reproduced from multiple PC's at different business and residential ISP connections.
Well easy enough to try and duplicate your issue, I will fire up a clean pfsense 2.2.4 vm using vmx3 on esxi 6 update 1 in the morning and download a large file from one of my servers.. And see what happens.
I'm looking forward to what you find, because I'd rather find its due to something that I'm doing so I can fix it. I have all limiters and QoS turned off, just a few firewall NAT rules.
I got side tracked this morning.. Real work gets in the way of play time sometimes.
Ok got new clean pf 2.2.4 64bit vm setup using vmx3 for both its lan and wan
I just connected its wan to my wlan physical network, and its lan to my lan physical network. Then that 192.168.2.0/24 network I connected a laptop (win 7) with a wire, fired up HFS (simple web server) created a big ass zip file 3.6GB and then grabbed it from my windows 7 pc on the lan.
Not seeing such behavior that you were seeing.. So that is just the network tab of task manager, I was getting about 7ish MB per sec transfer from the laptop webserver.. Could prob tweak that and get more. But as you see its not showing any sort of drop off like you were seeing.. I then fired up a iperf real quick and ran for 2 min seeing 144mbps, not seeing that behavior.
[ ID] Interval Transfer Bandwidth
[ 4] 0.00-120.00 sec 2.02 GBytes 144 Mbits/sec sender
[ 4] 0.00-120.00 sec 2.02 GBytes 144 Mbits/sec receiver
I will do so more testing, and can just use my normal setup that is using e1000 to compare. But as you see not see that drop off you were seeing.
Now might figure out why simple iperf was only 144mbps.. But then again this is vm running on older N40L microserver.. But the speeds I was doing was a lot faster than what you were doing.. And my internet is only 80mbps so if pfsense can do 144 routed/natted and firewalled I am fine ;)
Could you try to download a large file for about 10 minutes and use these settings in windows performance monitor?
I don't think you grabbed enough of a sample since the total windows task manager is only ~60 seconds on the X axis
what? No that first picture is 7 some minutes of download.. Not sure where you got the idea its only 60 seconds? each one of those x axis is like 5 seconds. But sure be happy to run it with performance monitor.
Going to have to use a higher Y scale since I was seeing 7MBps not 1500KB…
Ok, couldn't sleep so fired up some testing.. So I fired up performance monitor and started a download didn't notice any sort of drop offs or pattern. But then I noticed you never posted your scale factor.. Not not really sure exactly what speed you were seeing, etc.
So I limited the the webserver I was using to test to slow speed 100K, I then adjusted the scale and also just sampled every 2 seconds to try and get a nice straight line.. And right away I thought hey might be on to something.. Clearly using the vmx3 I was seeing a pattern of drop off in the speed. Not anywhere as pronounced as what you were showing or extended..
So then I connected through my normal pfsense vm that is using e1000 vnics.. Noticing the same exact pattern of drop offs.. Hmmm so I put the laptop running HFS webserver on the same vlan as my workstation, and connected just through switch (sg300) and now still seeing drop offs, but way more often??? So wth?? Something in the webserver I would take it when limiting speeds?? Notice that first graph average hair under 8MBps.. Then in the other graphs I limited the hfs webserver to 100k and you can see averages out right about there on all 3 tests..
So while I am very curious why the pattern of clearly a drop off in speed vs just nice steady state line right at the limit set.. I am not seeing any sort of specific problem with vmx3 drivers having drop offs like what you showed. I will do so more playing - need to figure out why its not a nice straight line at what speed I set it to in the limit ;) But my guess is your drop offs your seeing are something in the network in your connection to your web server your downloading from, or something in the web server itself?? I don't see any sort of difference between using vmx3 or e1000 to be honest.. Other than for me I see speed in pfsense vs autoselect and I don't see logging of duplex mismatch on my cisco swith using cdp.
edit: Ok so doing test with iperf.. I don't see any sort of patter of drop off, tested with switch to make sure no drop offs. Then moved the laptop back to other segment and routed traffic thru the clean pfsense vm using vmx3 drivers.. There is no drop off pattern this way..
Thanks a lot for your help eliminating a possible vector, I really appreciate your time. I now still have a challenge to locate the source now and I'm at a bit of a loss now because I'm not sure where to start tracking this down, especially after I cloned this VM to another host in the vcenter cluster and reproduced it.
I'm going to try to test out a linux based router like vyos since I don't have any of these issues on any of my Windows or Linux based VMs..
I would like it to be based off of CPU use, because that would be easy to identify, but the hypervisor host is just being tickled with the load and I have a 2.4 GHz reservation for the pfsense VM.
I tested across some different points
Remote computer > WAN IP = no issue (11Mbps)
Remote computer > LAN IP (NAT'd) = no issue (11Mbps)
LAN IP >–-ASA---Switch>Test Server = no issue (900Mbps)
LAN PC > Test Server = no issue (900Mbps)
Some reason is causing the problem when packets cross the pfsense router between the WAN interface and the LAN interface in almost precise 2 minute intervals.
Since I feel confident it's not the network infrastructure with the above results, would you have any suggestions I could investigate further? It's also a rolling 2 minute interval, if I start a download it will start off slow and increase speed sometimes and repete the pattern, however other times it will download for the ~2 minutes I've been experiencing.
P.S. My Wan Firewall rules are at the top of the list for fastest response.
What would wan firewall rules have to do with a client talking from lan to wan??
Guess you could always sniff and see what is going on.
I mean the WAN firewall rules for iperf to allow a remote PC to talk to the test server behind the LAN are at the top of the list.
Just to clarify, the test points I used indicate that testing from an internet host to the WAN IP, there's no issue
Testing from the pfsense LAN IP, to the test server, there are no issues
The connections that the issue occurs is when an internet host connects across the pfsense router. Since you've confirmed there aren't any inherint issues with vmxnet3, I guess my new question would be some recommendations on how to probe for a cause since this is new territory for me.
I'm factory resetting the pfsense install to defaults and will be trying from a bare minimum setup as a starting point like you suggested you were working with.