Scaling PFSENCE 50,000+ Users



  • Good Evening,

    Wondering how well pfsence will scale with 50,000-60,000 concurrent users about 500,000 Concurrent connections.

    is there anything we will need to do to pfsence to allow it to handle such a load.

    Bandwidth is not high (200 Mbps) it's the number of active connections we have that can go though the roof.

    At the same time we will be running NAT/PAT and VPN Services.

    Regards

    J



  • This will have to be server class with lots of memory.
    according to http://www.pfsense.org/index.php?option=com_content&task=view&id=52&Itemid=49, 10000 states takes up 10MB of memory. So 1000 take up about 1MB. So if you have 500,000 concurrent connections, then you would need 500MB of memory. I would go instead with 60000 users at about 100 states per user and get 6,000,000 states, so you would need about 6GB of memory. Since memory is cheap, get 16GB of memory (4x4GB), and use the x64 bit edition. That should be enough to handle 16 million states concurrently. You will need fast processors for the VPN, or get one of the VPN encryption cards. Let us know how it goes if you do it.



  • While I personally share podilarius's penchant for server class hardware, in this case, I would recommend it for the stability factor more than performance.

    pfSense isn't highly multi-threaded, so having multiple CPU sockets to gather as many cores as possible to be addressable by a single OS isn't quite as helpful here.  Lots of memory, yes, but (higher end) desktops can easily have 16GB of RAM these days.

    It's going to depend a lot of how much VPN you're talking about, and if you -really- need it hosted by the same OS instance.  In theory, (only theory because I haven't personally done this, I'm sure someone will come in and tell us if my idea is crazy stuff) you could build a fairly large VM host, say, 32GB of ram and 2 sockets of Quad or Hex core CPU's, and run a few instances of pfSense, one for routing, with a lot of ram (16, or so), and a few for VPN, this effectively multi-threads your VPN, but at a cost of figuring out how you want to load balance them.  Of course, you could do this with multiple hardware pfSense instances, as well.

    Now, back to the hardware conversation, please don't take the earlier comment as a reccomendation away from server class hardware, only that the larger parallel nature of server hardware probably won't actually help you here, but I would 100% recommend server hardware for reliability.  When you're routing for 50k+ users, I'm sure there's some kind of down side to providing a flaky router.  At that point I'd also be looking at doubling up your hardware and using CARP for redundancy management.  The main thing I was getting at was that these don't have to be that big of enterprise servers, just reliable.  And it might be easier to justify 2 (or more) of them when they're small(er).  Maybe 2x 16GB servers for routing and a single 16GB server as a VM host for your VPN instances.  200Mb is a lot for VPN, but certainly not outrageous, but I'm not sure how well it scales for a large number of users, though.



  • I'd split the NAT and VPN to two different (server-class) machines.


  • Banned

    Do 2 seperate instances on PfSense or route the VPN to seperate VPN server.



  • For the VPN It's not particularly heavy usage 100Mbps MAX mostly around 10-20Mbps.

    We have server class H/W for this already, four identical machines ready for failover



  • @jnex26:

    For the VPN It's not particularly heavy usage 100Mbps MAX mostly around 10-20Mbps.

    We have server class H/W for this already, four identical machines ready for failover

    If you can provide the basic hardware specs/models, we can possibly give you some more relevant info/opinions/recommendations.



  • I can't help but to giggle at this thread.  If you're an admin with the hardware to support 50,000+ users and a 1/2 million concurrent connections, then you can afford to pay the support fee to make sure PfSense can handle the load.  Contact PfSense technical support directly and have your credit card ready. ;D


  • Netgate Administrator

    Whilst I agree with what you are saying I can completely see why the original poster did so.
    It would seem like a waste to take out a support contract if the answer to the question has been 'not at all'.  ;) (though I'm sure in fact they wouldn't take your money to tell you that!)
    No harm in getting a few free opinions first.

    Steve


  • Rebel Alliance Developer Netgate

    Keep in mind that each actual client connection results in two states - one in, one out.  (client -> firewall, firewall -> wan or vice versa) so if you really need 500,000 simultaneous client connections, plan on enough RAM for 1,000,000+ states.



  • @podilarius:

    at about 100 states per user

    My users average about 120 states each.

    To the OP, what you want to do isn't terribly difficult, and not even all that expensive, but I would echo the others, pay the $600 and get yourself a support contract.  I've used all of 30 minutes of mine but that 30 minutes was worth the cost.

    If this is a critical system you should consider using two boxes & CARP for failover.  It's worth it just for the "anytime" maintenance window.



  • Don't forget newer inte CPU's (Sandy Bridge and later) have AEI-NI which significantly speeds up the AES encryption. Something you will want to make sure you have if you want a decent amount of VPN capability.



  • @extide:

    Don't forget newer inte CPU's (Sandy Bridge and later) have AEI-NI which significantly speeds up the AES encryption. Something you will want to make sure you have if you want a decent amount of VPN capability.

    As far as I know, that's not supported yet.



  • Also keep in mind, the pf process is still in a giant-locked single thread. Multiple cores won't help there.



  • Yup, which is why I recommend that most people go with a dual-quad or quad-core with as high a clock frequency as possible unless they're going to be using snort, squid, vpn, etc. where you'd be able to use more than 2 cores.

    There have been quite a few people here who've bought expensive dual quads or hexes with low clock speeds, only to find out that they're slower than an i3 or i5.



  • @Jason:

    @extide:

    Don't forget newer inte CPU's (Sandy Bridge and later) have AEI-NI which significantly speeds up the AES encryption. Something you will want to make sure you have if you want a decent amount of VPN capability.

    As far as I know, that's not supported yet.

    It may only be supported in 2.1, and in that case it will be out very soon.



  • @Jason:

    Yup, which is why I recommend that most people go with a dual-quad or quad-core with as high a clock frequency as possible unless they're going to be using snort, squid, vpn, etc. where you'd be able to use more than 2 cores.

    There have been quite a few people here who've bought expensive dual quads or hexes with low clock speeds, only to find out that they're slower than an i3 or i5.

    I would not recommend multiple physical multi-core processors for a physical pfSense box.  If you're stuck with single or even 2 threads, having 2 separate processor dies can slow individual processes down if there aren't enough threads to occupy the logical CPUs.  Many OSs will cycle the threads between logical CPUs, which, when on a single physical multi-core processor, that's fine since the cache is local to the cores, but when it cycles between sockets it has to transfer the cache over the CPU bus (or pull from main memory.)  While it can do it fairly quickly, it's not helping.  Some multi-core designs share a cache, but even if they don't, the cores can usually snoop in to the other core's cache really fast.

    For a pfSense box, I would recommend to not have multiple physical CPUs; multiple sockets on the motherboard is fine, just don't occupy it.  Not only would you be spending extra on a CPU that you're not effectively using, both in the initial purchase, but also the power to run it, but you may actually be slowing the machine down.

    Of course, if it's a VM host, cores are handy so the extra sockets may be beneficial, but that goes the other way from the original question, here.

    I'm right behind your clock cycle recommendation, though.  So a fast dual or quad core is probably a sweet spot for a high volume pfSense install.  "Server" grade boxes will use a Xeon, but that's not going to buy you much here.  The main thing a Xeon really gives you is usually an option for more cache, better multiple physical CPU support, and the possibility of more cores, neither of which is going to make a huge difference for you.  It'll be hard to get away from it in a "serious" OEM server, like a Dell or HP, but you can find lower end servers with "desktop" type CPUs for less, it just might not have the options for dual power supplies and such; which I would want for a router with a bunch of people behind it.

    I don't think the box needs to be parallel large, just singularly fast.  Same with other components, like a smaller amount of fast RAM rather than gobs of slower RAM.  (When I say smaller amount, I mean in the 16GB to 32GB range, rather than the 192GB+ side.)  You don't need a bunch of disk spindles, it's not a file server or a database, I don't think any data transfer tasks wait on disk reads or writes (after boot-up, of course), as long as there is enough RAM it shouldn't hit swap (much), so a single or mirrored SSD is good.

    Fast NICs on a fast bus, but, and this may sound counter intuitive, maybe limit smaller alternative networks to 100Mb physical links (DMZ, guest wireless, etc.) to reduce the potential routing load.  Not a hard recommendation, just something to keep in mind, that "faster always" is not always helpful, there are other (if Low-Fi) ways to introduce bandwidth limits.  Especially with things like a Guest Wireless network with a bunch of APs that could easily introduce a lot of bandwidth at times; it takes CPU from your pfSense box to do bandwidth throttling, off-load it to "hardware".  Same with a DMZ network, or if you have a DEV network that's islanded and, well, no offense to DEVs, but ya'll do some wacky stuff sometimes and generate a lot of not always legit bandwidth, sorry, you get 100Mb.

    If you don't have any of that, and you're really doing simple routing for 50k people, consider breaking it up in to multiple routers.  Even if large amounts of people need to be on the same broadcast network(s), you can still run multiple DHCP servers with overlapping subnets, just serving portions each, then each DHCP scope points to a different router.  You've effectively load balanced your routing without fancy hardware.  With 4 medium machines, each in a active/passive CARP pool, you've got some pretty robust routing for fairly cheap.  Need more, add more and re-organize your DHCP scopes.  You could do this with some fairly cheap 1U [insert OEM or whitebox manufacturer/reseller here] single core servers.  Many OEM servers have at least 2 good Gb NICs, some 4.  Hell, there's 1/2 U servers, or more accurately, 2 servers in 1U that might work well for this.  I would just be sure to separate the CARP pools across chassis so if a whole chassis went down you don't lose both the primary and secondary of a CARP pool.

    Hell, you could put together a proof of concept with a few desktops, if you have 'em handy.  It's very common to find older desktops for very cheap.  Off the top of my head, you could get old Dell GX260's, I see 'em fairly often for $50 or less in local "old-stuff" / PC Recycler shops.  Toss in some Intel 100Pro PCI nics for the WAN side, the onboard should be an Intel 1000Pro (maybe Broadcom, mine are Intel.)  They probably have at least 1GB of ram, maybe 2, should be enough for some light proof of concept testing.  Oh, these should have Hyper-Threading P4's, around 2.8Ghz.

    For the most part, that's mainly testing failover and load balancing scenarios.  The DHCP servers should be a first respond decision, so, assuming the machines are roughly similar, they should balance themselves decently (if a machine is getting hit hard, odds are, DHCP is going to respond slower anyway, so the idle machine should service the call.)

    If you want more throughput testing, Dell GX280's have PCI-Express x16 slots and SATA, which means if you get the dual or quad port Intel cards now, you'll be able to simply migrate them to your production environment, same with SSDs.  Other popular options are HP DC7600's (P4 or Pentium-D, 4 GB of RAM, PCI-Express x16), HP DC7700's (Pentium-D or C2d, 8GB, PCI-E x16) HP DC7800 (Pentium-D, C2D, Quad, 8GB, PCI-E x16.)  All those are constantly available on ebay for between $70 and $200 shipped (C2Q might be a bit more.)

    Congratulations if you've read this far ;)


Log in to reply