Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Unbelieveably bad performance

    General pfSense Questions
    7
    49
    12.7k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D
      Douglas Haber
      last edited by

      @cmb:

      The LAN capture has broken TCP checksums on all the retransmitted traffic. Not on everything though, and not null checksums (which would be the scenario where it's capturing before the NIC's checksum offloading adds the checksum), which suggests that's the likely cause. Have you disabled hardware checksum offloading under System>Advanced, Networking tab? Probably best to reboot afterwards.

      Disabled, and rebooted. No change.

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        @Douglas:

        throughput on the pfsense VM itself has been perfect this entire time. no slowness at all. it's only VM's behind the VM.

        How are you testing the 'throughput' on the pfSense VM?

        Steve

        1 Reply Last reply Reply Quote 0
        • D
          Douglas Haber
          last edited by

          @stephenw10:

          @Douglas:

          throughput on the pfsense VM itself has been perfect this entire time. no slowness at all. it's only VM's behind the VM.

          How are you testing the 'throughput' on the pfSense VM?

          Steve

          I suppose I should have been more specific. The WAN connection is a 100mbps handoff from the datacenter.

          I added a third interface (OPT1) to the VM and added it to a separate 2nd LAN so I could "speak" to the pfSense VM and run iperf to it. I was able to run an iperf and without any delay push significant traffic on both the OPT and WAN, interfaces

          And can access port 80 on the pfSense VM if I forward it for "OOB" on the WAN as well.

          Was also able to pull down few gigabyte sized files to the pfsense vm (or rather, /dev/null), at full 100Mbps also, no delay, disconnect, or otherwise.

          1 Reply Last reply Reply Quote 0
          • johnpozJ
            johnpoz LAYER 8 Global Moderator
            last edited by

            I didn't mean to say it was a WAN connection problem - what I meant is that pfsenes is putting it on its wan interface - and for some reason wan device is not seeing it.  Your pfsense is VM..  It seems to me you got a problem in that system on the wan side..

            Again –- from pfsense point of view all the packets it sees on its wan interface are being forwarded to lan, the lan answer and those are sent out its wan..  If you clearly have an issue between the wan guy requesting the data and where its being requested from.

            But from your sniff pfsense was doing what it was suppose to do..  Its possible there is issue in this driver under xen...  But you can clearly see the problem from the sniffs.. You need to investigate that..  Can you sniff on the physical interface to your xen host to see if your actually seeing the traffic pfsense says it put on the wire?

            An intelligent man is sometimes forced to be drunk to spend time with his fools
            If you get confused: Listen to the Music Play
            Please don't Chat/PM me for help, unless mod related
            SG-4860 24.11 | Lab VMs 2.7.2, 24.11

            1 Reply Last reply Reply Quote 0
            • D
              Douglas Haber
              last edited by

              @johnpoz:

              I didn't mean to say it was a WAN connection problem - what I meant is that pfsenes is putting it on its wan interface - and for some reason wan device is not seeing it.  Your pfsense is VM..  It seems to me you got a problem in that system on the wan side..

              Again –- from pfsense point of view all the packets it sees on its wan interface are being forwarded to lan, the lan answer and those are sent out its wan..  If you clearly have an issue between the wan guy requesting the data and where its being requested from.

              But from your sniff pfsense was doing what it was suppose to do..  Its possible there is issue in this driver under xen...  But you can clearly see the problem from the sniffs.. You need to investigate that..  Can you sniff on the physical interface to your xen host to see if your actually seeing the traffic pfsense says it put on the wire?

              I sure can. I will do so. Just need to figure out how to get the brand new citrix repo's working as they are not yet. :)

              In order to work with you and others, do I need to capture the LAN side as well, for the trio of items? Hypervisor/pfSense/web VM?

              1 Reply Last reply Reply Quote 0
              • johnpozJ
                johnpoz LAYER 8 Global Moderator
                last edited by

                In a perfect world trying to track this down.. I wold sniff at the physical interface of your host, on both pfsense interfaces and then at the VM interface.

                This gives us full path..  And allows us to validate that inbound packets are getting all the way to the vm client behind pfsense - it answers and then pfsense sends that back and it goes out the physical interface of the hypervisor host..

                An intelligent man is sometimes forced to be drunk to spend time with his fools
                If you get confused: Listen to the Music Play
                Please don't Chat/PM me for help, unless mod related
                SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                1 Reply Last reply Reply Quote 0
                • D
                  Douglas Haber
                  last edited by

                  @johnpoz:

                  In a perfect world trying to track this down.. I wold sniff at the physical interface of your host, on both pfsense interfaces and then at the VM interface.

                  This gives us full path..  And allows us to validate that inbound packets are getting all the way to the vm client behind pfsense - it answers and then pfsense sends that back and it goes out the physical interface of the hypervisor host..

                  http://douglashaber.com/dump/hypervisor.cap
                  http://douglashaber.com/dump/WANCapture.cap
                  http://douglashaber.com/dump/LANCapture.cap

                  warning - hypervisor cap ture is pretty big

                  1 Reply Last reply Reply Quote 0
                  • johnpozJ
                    johnpoz LAYER 8 Global Moderator
                    last edited by

                    Ok followed one connection - see attached.

                    Physical on the left, vm pfsense on the right

                    So you see the syn come in from 6.46 to pfsense 6.38 saying hey I want to talk to you from port 38877 to your port 80

                    So you see the syn,ack back and then the ack to the syn - typical handshake..

                    Now 6.46 sends get some html shit..  you see ack back that says ok got your get.. Then sends 404..  He never gets an ack back that 6.46 got the ack to the 404..  So he sends 404 again, and again -  that is the retrans.

                    So clearly pfsense put that on its virtual interface..  And as you can see on the left its also on the physical HOST interface..  So why does 6.46 never send back ack??  Did he not get it??  Your issue is between phsyical interface of host, and that 6.46 box..  Pfsense is doing exactly what its been asked to do..

                    I see the 404 go out on the phsyical capture.. So why does 6.46 not ack??  Did he get it an ack and then that ack got lost.. Never shows up on the phsyical… Can you sniff on the 6.46 host??

                    oknoackback.png
                    oknoackback.png_thumb

                    An intelligent man is sometimes forced to be drunk to spend time with his fools
                    If you get confused: Listen to the Music Play
                    Please don't Chat/PM me for help, unless mod related
                    SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      This thread is a great example in diagnostics.  :)

                      However it does seem hard to explain why it should have worked perfectly under pfSense 2.1.5 and not 2.2 if the error exists outside the host box.  :-\

                      Have you read this: https://forum.pfsense.org/index.php?topic=85797.msg475906#msg475906

                      I would be disabling the paravirtualised drivers for the pfSense VM to test that.

                      Steve

                      1 Reply Last reply Reply Quote 0
                      • C
                        cmb
                        last edited by

                        @stephenw10:

                        I would be disabling the paravirtualised drivers for the pfSense VM to test that.

                        Yeah, forcing the VM to e1000 would be ideal and likely would fix the issue. From some brief searching though it doesn't appear easy, if possible at all, to force Xen to present a specific NIC to the VM. Ugly, every other hypervisor handles that far, far better.

                        1 Reply Last reply Reply Quote 0
                        • F
                          frederickding
                          last edited by

                          This is a known issue in upstream FreeBSD 10 after they incorporated the Xen paravirtualized drivers in the standard kernel. It's not exactly pfSense's fault.

                          Yeah, forcing the VM to e1000 would be ideal and likely would fix the issue. From some brief searching though it doesn't appear easy, if possible at all, to force Xen to present a specific NIC to the VM. Ugly, every other hypervisor handles that far, far better.

                          It's definitely possible. There's a wrapper script for QEMU in```
                          /opt/xensource/libexec/qemu-dm-wrapper

                          
                          Anyways, I've been experiencing the same network performance issues in pfSense 2.2 snapshots, both on XenServer 6.2 and XenServer Creedence RC.
                          
                          However, I haven't found any way to remove or blacklist drivers _in the kernel_ the way one would on Linux (e.g. rmmod or adding bootloader parameters). So, the only workaround I've found, to revert to emulated NICs, is to recompile the BSD kernel without PVHVM drivers. I've [written instructions here](https://code.dingcorp.com/frederick.ding/pfsense-tools/wikis/removing-pvhvm), tested a few weeks ago, though it's a convoluted process to recompile a kernel.
                          1 Reply Last reply Reply Quote 0
                          • johnpozJ
                            johnpoz LAYER 8 Global Moderator
                            last edited by

                            So how is it these drivers cause the packets to show up on the physical nic? of the host - but not get answered??  While I can see how drivers can cause problems in virt.. From the sniffs sure looks like info is put on the physical nic.. Is there something wrong with the info put on the wire?  Mangled packets?  I did not look that deep into it - just following the stream.. that the other side doesn't like and doesn't see??  If the other side actual saw the traffic then yeah would have to look deeper into why packet there but not seeing it, etc..

                            An intelligent man is sometimes forced to be drunk to spend time with his fools
                            If you get confused: Listen to the Music Play
                            Please don't Chat/PM me for help, unless mod related
                            SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              I agree with you that it looks like there's no reply and hence an external problem. The 404 response is reaching the client correctly though?

                              However in light of the known issues with the xn(4) drivers in FreeBSD 10 it seems unproductive to continue without testing a standard NIC driver, even if it's re(4). This fits the fact it worked fine under 2.1.5 also.

                              Steve

                              1 Reply Last reply Reply Quote 0
                              • C
                                cmb
                                last edited by

                                @johnpoz:

                                So how is it these drivers cause the packets to show up on the physical nic? of the host - but not get answered??

                                I'm pretty confident judging by the packet captures it's because some packets are ending up with bad checksums, so it doesn't matter that they're getting there, they're dropped for that reason.

                                @frederickding:

                                It's definitely possible. There's a wrapper script for QEMU in```
                                /opt/xensource/libexec/qemu-dm-wrapper

                                Ah good, thanks for the tip, at least it's possible and hopefully that'll help others.

                                1 Reply Last reply Reply Quote 0
                                • johnpozJ
                                  johnpoz LAYER 8 Global Moderator
                                  last edited by

                                  But the invalid checksum is most likely to it just being offloaded, etc.  I see that so much in sniffs that I have even turned off checking for it.

                                  An intelligent man is sometimes forced to be drunk to spend time with his fools
                                  If you get confused: Listen to the Music Play
                                  Please don't Chat/PM me for help, unless mod related
                                  SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                                  1 Reply Last reply Reply Quote 0
                                  • C
                                    cmb
                                    last edited by

                                    @johnpoz:

                                    But the invalid checksum is most likely to it just being offloaded, etc.  I see that so much in sniffs that I have even turned off checking for it.

                                    That's true most of the time where you see bad checksums, but where it's inconsistent that's not the case. Everything would have bad checksums if it were hardware checksum offloading at fault, and some of those packets have valid checksums. Also where hardware checksum offloading is to blame, the checksum is most always 0 in the capture, which also isn't the case here.

                                    1 Reply Last reply Reply Quote 0
                                    • johnpozJ
                                      johnpoz LAYER 8 Global Moderator
                                      last edited by

                                      good points..  I will keep that in mind when looking at future sniffs ;)

                                      An intelligent man is sometimes forced to be drunk to spend time with his fools
                                      If you get confused: Listen to the Music Play
                                      Please don't Chat/PM me for help, unless mod related
                                      SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                                      1 Reply Last reply Reply Quote 0
                                      • B
                                        beyondcrazy
                                        last edited by

                                        I'm seeing very similar issues as the OP, using KVM via Promox 3.3. Running on an AMD fx8350 system with a quad port Intel Nic.

                                        2.1.5 is running perfectly. Upgrading to the very latest RC 2.2 seems to migrate fine, but upon boot won't pass any traffic except icmp.

                                        Have tried both paravirtualized nic drivers, as well as the e1000 drivers. No change.

                                        I did try a bare bones install of rc2.2 in a new vm using e1000 drivers, and with very minimal configuration it did appear to work correctly. So it seems that some aspect of the migrated configuration is causing problems. I haven't had a chance yet to figure out what portion.

                                        Will probably try disabling the offloaded checksum calc first (it's easy), and if that doesn't fix it, start removing components of the existing config to see what is causing issues.

                                        Moderately simple pfsense system config. No modules, no vlans. Does have 1 wan and two lan ports (running as emX), multiple ports forwards, schedules, logging. It's running as a pure fw appliance. So, dns/dhcp, sip/asterisk, vpn/strongswan, etc, all running on different internal hosts.

                                        If necessary I can certainly build the whole config again…

                                        1 Reply Last reply Reply Quote 0
                                        • C
                                          cmb
                                          last edited by

                                          @johnkeates:

                                          I posted this in a different thread, I hope it's okay to semi-double post

                                          You're more than welcome to cross-post solutions across however many threads are relevant.  :) There are probably a dozen different threads around here on this same root issue. Feel free to post it in however many threads are relevant. Many people only follow specific threads and may miss a fix for the same problem posted in a different thread otherwise.

                                          pf does have a history of breaking checksums in certain areas, though I can't say I've seen any of that recently outside of this particular issue with Xen. It's probably a combination of pf+xn from the sound of your description. Can take our /tmp/rules.debug file, copy it over to stock FreeBSD, kldload pf && pfctl -f rules.debug (assuming stock system has same NICs) and see what happens. I'm definitely curious on the results.

                                          1 Reply Last reply Reply Quote 0
                                          • First post
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.