Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Performance drop through NAT Proxy IP

    Scheduled Pinned Locked Moved NAT
    5 Posts 3 Posters 3.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C
      Cloverleaf
      last edited by

      Hi All,

      I'm evaluating replacing production firewalls with a pair of pfSense boxes, and I've got a simple test whose results in the lab are making me confused.  I've got:

      2 PF boxes (pretty beefy CPUs, a few gigs of RAM) with 6 Intel NICs, only 4 hooked up, WAN, DMZ, LAN, CARP
      1 Backend server, plugged into the DMZ and LAN network, running apache
      1 client server, plugged into the DMZ and WAN networks
      2 cheap gigabit switches, making up the WAN and DMZ ports.
      1 100Mbit switch, making up the LAN ports
      CARP interfaces are connected directly to each other

      LAN, WAN, and DMZ are all different subnets.  LAN is basically unused for this test.

      I use the apache bench program, ab, from the client server, and access the backend apache with 10000 sequential requests to a small file through the DMZ IP (skipping the firewalls).  This takes under 4 seconds with a 721 Kbytes/sec transfer rate.  When I do the test through a Proxy-ARP NAT IP, the rate drops to 35 Kbytes/sec, with the time being spent in the connect phase.

      I expected a drop in performance, or course, but not something quite so horrific.  I'm sure there's something I'm doing wrong or some setting that's off, but I haven't seen anything in the logs complaining and the CPU looks fine.  The connect slowdown seems to happen after just a few thousand requests, so I don't think I'm maxing out the state tables.  Any ideas of where to check would be fantastic.

      1 Reply Last reply Reply Quote 0
      • GruensFroeschliG
        GruensFroeschli
        last edited by

        How many thousands are you talking about?
        The default max for the statetable is 10'000.
        You can change it under "advanced".

        I dont know about a performance problem with PARP VIPs since i usually use CARP VIPs.
        Have you tried to use CARP type VIPs instead of PARP?

        @http://forum.pfsense.org/index.php/topic:

        A description of what the differences between the 3 types of VIPs are:
        @http://forum.pfsense.org/index.php/topic:

        For the different virual IP types:

        CARP

        • Can be used by the firewall itself to run services or be forwarded
        • Generates Layer2 traffic for the VIP
        • Can be used fo clustering (master firewall and standby failover firewall)
        • The VIP has to be in the same subnet like the real interfaces IP

        ProxyARP

        • Can not be used by the firewal itself but can be forwarded
        • Generates Layer2 traffic for the VIP
        • The VIP can be in a different subnet than the real interfaces IP

        Other

        • Can be used if the Provider routes your VIP to you anyway without needing Layer2 messages
        • Can not be used by the firewall itself but can be forwarded
        • The VIP can be in a different subnet than the real interfaces IP

        We do what we must, because we can.

        Asking questions the smart way: http://www.catb.org/esr/faqs/smart-questions.html

        1 Reply Last reply Reply Quote 0
        • C
          Cloverleaf
          last edited by

          Well, the test was supposed to be 10,000 requests.  Originally I was going to ramp it up until I bumped against the state table limit and see what that error state looked like, but I can "see" that it's messed up before I even get close to the 10K number.  ab prints a status message 1/10th of the way through a cycle, so I could see that the time between 1000 requests suddenly got dramatically slower around 3000, so it shouldn't have bumped against that limit yet.

          As for CARP vs. PARP, I really wanted to do CARP, as we've got dual firewalls, so I'm going to need to do CARP eventually, but ab would crap out very quickly with the error:

          apr_recv: No route to host (113)

          When I changed to PARP this went away.  I don't think it's ab, as I can run the test behind the firewall OK.  I'm going to also swap the WAN switch and the DMZ switch to rule out a bad switch, but I'm thinking that will probably not do anything.

          Another thing I was thinking was to nuke the PFs and set up just one of them… just skip the replication and pretend it's just a single box and see if that helps, but I kind of doubt that will do anything either, as the CARP interface wasn't showing a lot of utilization during the test, either.

          The strange thing is the connect pause.  Here's the end of the report generated by ab for a run that goes through PF:

          Connection Times (ms)
                        min  mean[+/-sd] median  max
          Connect:        0    6 251.1      0  21000
          Processing:    0    0  4.0      0    202
          Waiting:        0    0  0.0      0      1
          Total:          0    7 251.2      0  21000

          Percentage of the requests served within a certain time (ms)
            50%      0
            66%      0
            75%      0
            80%      0
            90%      0
            95%      0
            98%      0
            99%      1
          100%  21000 (longest request)

          The vast majority of the connects and the rest are OK… so much so that the median is still 0ms, but those max requests crush everything else.

          1 Reply Last reply Reply Quote 0
          • R
            RyanG
            last edited by

            Hey Cloverleaf,
            Did you ever get anywhere with this issue? I am having the exact same issue and can't for the life of me think of what I'm doing wrong.  I even tried turning off the firewall, so it's just the router portion of pfSense that's causing the issue, it seems.

            Best,
            Ryan

            1 Reply Last reply Reply Quote 0
            • C
              Cloverleaf
              last edited by

              Ah, forgot to respond on this one… three things:

              1)  I had messed around a bunch with the same firewall pair before starting to do performance testing and I suspect things were a little dirty under the hood.  I ended up nuking the firewalls back to the base state and things looked better, but not perfect.

              2)  My apache settings were a little weak.  I ended up making sure that I was logging to /dev/null and bumped the apache threads (I was using the worker model) up to a higher number and made sure to check vmstat on the system.  It was surprisingly easy to overload the system I was using.

              3)  ab never really panned out for me.  I ended up having a hard time getting it to really scale well.  I ended up using curl-loader http://curl-loader.sourceforge.net/ from multiple machines, and running multiple apaches behind pfSense.  The documentation was a bit sparse, but the results were more consistent and I could crush the servers behind pf.  Ironically, I wasn't able to max out pf, as I needed a few more servers behind it to max it out.  I think I was doing about 20,000 connection attempts per sec when I had to stop.  The requests were pulling a tiny "Hello World" html file, so this was opening and closing sockets with very little data in between.  I think my firewalls were at about 55-60% CPU.  I also did a bandwidth test where I pulled a 50K file over and over again and was able to max the gig link without pfsense breaking a sweat, but that's really more of a test of the NIC then the software, anyway.

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.