CPU to Saturate 150mbit up and down simultaneously via VPN?



  • Hey all,

    I think the title says it all.  What kind of system can handle this?

    With all the alarming privacy issues going on here in the US currently I am looking to add a VPN service to my home network.

    I know my current super low power AMD Geode based APU2C4 embedded build likely won't be able to cut it, so I am looking at another fresh pfSense build.  (this is the danger of sizing your hardware too close to your current needs, leaving little room for added capacity/demands)

    My desire is to be able to fully load 150mbit up and down simultaneously via VPN without the router/CPU being the bottleneck.

    What kind of hardware would this take?  I am a home builder, and I have no problem putting a system together myself, but if one of the Netgate devices is competitive from a price perspective, I'd be open to this as well.

    If at all possible, I am also looking to not have whatever I build be a power hog.  I have plenty of old multi ghz desktop CPU's and motherboards kicking around, but I'd rather not pull 200W from the wall for just my router.

    I'd appreciate any input!

    Thanks,
    Matt



  • Get something around an i3-7100. It's more than you need for 300Mbps OpenVPN, but that gives you room to grow. It's fairly power efficient if you're not running it at max. (And if you do end up maxing it out, then a low-powered chip wasn't going to work anyway.) Or if you want to wait, embedded solutions based on goldmont might work but they're still thin on the ground and we don't have a lot of real-world results. Silvermont/Airmont based embedded processors aren't going to hit 300Mbps of OpenVPN.


  • Banned

    Here is the performance I'm getting on an old i5-2400 for reference. FWIW that CPU has a passmark of 5878, single thread 1740 (single thread will determine your per-instance performance on OpenVPN).

    https://forum.pfsense.org/index.php?topic=127667.msg704656#msg704656

    Obviously what I'm using is overkill, if I had it to do over again I wouldn't buy what I bought but i didn't know any better at the time.

    Keep in mind this is just for reference, you will get much better performance with what you described.  My system is also running suricata, pfBlockerNG & DNSBL.

    This is also encrypting at AES-256-CBC, which I also don't recommend. AES-128-GCM is more than enough for your privacy and significantly more efficient.

    So without all of the extra packages and encrypting at AES-128-GCM I'm guessing that you would see <12% CPU usage on my system by switching to AES-128-GCM but still using those packages.
    https://calomel.org/aesni_ssl_performance.html

    If I'm right on those estimates then I think a J3355B or J3455-ITX would meet your needs. The AES-NI is different, and goldmont is updated but slower than modern core series processors, but probably at least on par with my 6 years old i5-2400.

    Passmark is certainly not the best reference system but gives a ballpark idea of performance comparison:

    |               –-            | i5-2400    | j3355b              | j3455-ITX |

    | multi-thread            | 5878  | 1324 (22%)    | 2135 (36%) |
    |
    |
    | single-thread            | 1740  | 884 (50%)      | 782 (44%) |

    As another reference point the i3-7100 VAMike suggested beats out my i5-2400 in multi and single threaded scores by a small margin and is far more modern, which matters. His recommendation will absolutely, undoubtedly work for you without a hitch. He's also a very knowledgeable user and I'd be interested in his input on my suggestion.

    If you are looking for a lower power / cheaper solution, read on.

    Obviously this is hack math for a lot of reasons, but still should serve as a ballpark.

    If you adjust my approximation of 12% for AES-128_GCM @130Mbps for your 300Mbps, you get ~28% on my system.
    Again adjusting that for a J3355B on a single core would be ~56%, or 28% of the total two cores.
    On a J3455-ITX, ~62% or 16% of the total four cores.

    Again, these are hack estimates but they both point in the general direction that you should be able to do this on a passively cooled modern celeron.



  • @VAMike:

    Get something around an i3-7100. It's more than you need for 300Mbps OpenVPN, but that gives you room to grow. It's fairly power efficient if you're not running it at max. (And if you do end up maxing it out, then a low-powered chip wasn't going to work anyway.) Or if you want to wait, embedded solutions based on goldmont might work but they're still thin on the ground and we don't have a lot of real-world results. Silvermont/Airmont based embedded processors aren't going to hit 300Mbps of OpenVPN.

    Good info, thanks.

    If the 7100 at 3.9Ghz is a bit overkill, I wonder if the 7100T would, suffice.  The 35W TDP makes cooling and power options lower cost.

    Heck, maybe a Haswell chip could suffice and be lower cost.  Is the fact that these new i3's have  HT and important part of the consideration?  Maybe a non HT dual core haswell could do the trick and save a bunch of money in the process.

    It doesn't hurt that socket 1150 motherboards are still cheaper than 1151 boards, and I already have more DDR3 RAM than I know what to do with, but no spare DDR4 RAM



  • @pfBasic:

    Here is the performance I'm getting on an old i5-2400 for reference. FWIW that CPU has a passmark of 5878, single thread 1740 (single thread will determine your per-instance performance on OpenVPN).

    https://forum.pfsense.org/index.php?topic=127667.msg704656#msg704656

    Obviously what I'm using is overkill, if I had it to do over again I wouldn't buy what I bought but i didn't know any better at the time.

    Keep in mind this is just for reference, you will get much better performance with what you described.  My system is also running suricata, pfBlockerNG & DNSBL.

    This is also encrypting at AES-256-CBC, which I also don't recommend. AES-128-GCM is more than enough for your privacy and significantly more efficient.

    So without all of the extra packages and encrypting at AES-128-GCM I'm guessing that you would see <12% CPU usage on my system by switching to AES-128-GCM but still using those packages.
    https://calomel.org/aesni_ssl_performance.html

    If I'm right on those estimates then I think a J3355B or J3455-ITX would meet your needs. The AES-NI is different, and goldmont is updated but slower than modern core series processors, but probably at least on par with my 6 years old i5-2400.

    Passmark is certainly not the best reference system but gives a ballpark idea of performance comparison:

    |               –-            | i5-2400    | j3355b              | j3455-ITX |

    | multi-thread            | 5878  | 1324 (22%)    | 2135 (36%) |
    |
    |
    | single-thread            | 1740  | 884 (50%)      | 782 (44%) |

    As another reference point the i3-7100 VAMike suggested beats out my i5-2400 in multi and single threaded scores by a small margin and is far more modern, which matters. His recommendation will absolutely, undoubtedly work for you without a hitch. He's also a very knowledgeable user and I'd be interested in his input on my suggestion.

    If you are looking for a lower power / cheaper solution, read on.

    Obviously this is hack math for a lot of reasons, but still should serve as a ballpark.

    If you adjust my approximation of 12% for AES-128_GCM @130Mbps for your 300Mbps, you get ~28% on my system.
    Again adjusting that for a J3355B on a single core would be ~56%, or 28% of the total two cores.
    On a J3455-ITX, ~62% or 16% of the total four cores.

    Again, these are hack estimates but they both point in the general direction that you should be able to do this on a passively cooled modern celeron.

    Thanks, good info.  I wonder this as well.

    So are the OpenVPN calculations not multithreaded?


  • Banned

    No, open VPN is single threaded. You can create a gateway group to spread across multiple cores (and this is effective for real world performance and easy to do), but any single instance of OpenVPN will be limited by single thread performance.



  • Can you elaborate on how to setup an OpenVPN gateway group please?



  • @pfBasic:

    Here is the performance I'm getting on an old i5-2400 for reference. FWIW that CPU has a passmark of 5878, single thread 1740 (single thread will determine your per-instance performance on OpenVPN).

    https://forum.pfsense.org/index.php?topic=127667.msg704656#msg704656

    So about 30% CPU for ~150Mbps, and he's looking for 300Mbps…if your estimate below that the J3355 is about half the speed of the i5-2400, that means he needs about 120% of the J3355. Maybe a bit less for AES-GCM, but the bottom line will probably be that the J3355's limit is right around the target speed.

    This is also encrypting at AES-256-CBC, which I also don't recommend. AES-128-GCM is more than enough for your privacy and significantly more efficient.

    So without all of the extra packages and encrypting at AES-128-GCM I'm guessing that you would see <12% CPU usage on my system by switching to AES-128-GCM but still using those packages.

    I definitely recommend AES-128-GCM over AES-256-CBC, but it actually doesn't make all that much difference in OpenVPN performance–the bottlenecks lie elsewhere.

    Passmark is certainly not the best reference system but gives a ballpark idea of performance comparison:

    I'm skeptical that it offers much insight at all. First, because the results I've seen are all over the map for a given CPU, and also because they're not focused on the specific problem of OpenVPN (so it's an apples to oranges comparison). I'd guess that a goldmont CPU might handle the task, but without some real world experience wouldn't present it as more than a guess. Definitely the math isn't reliable across architectures like this.

    Again, these are hack estimates but they both point in the general direction that you should be able to do this on a passively cooled modern celeron.

    One problem here is that intel has used "pentium" and "celeron" to cover a heck of a lot of different architectures at this point. Right now you can buy a "pentium" or a "celeron" which is a goldmont mobile CPU or one that's a kaby lake desktop, and those have very different performance characteristics. A skylake or kaby lake pentium or celeron will behave similarly to the i3-7100, with the performance for this application basically scaling with clock speed. Picking just one is tough, because there's always another one that's a little faster for another couple of bucks or a little slower for a couple of bucks less. Probably any of them will hit the 300Mbps target, the i3-7100 is just a point that will have some headroom but isn't on the really steep part of the price curve, but you could also go with a G3950 for half the cost. But switching to a goldmont "pentium" or "celeron" you move from "probably can do this" to "maybe can do this"; it's a completely different architecture, so instead of just looking at the lower clock speed you're also looking at a lower IPC.



  • @mattlach:

    If the 7100 at 3.9Ghz is a bit overkill, I wonder if the 7100T would, suffice.  The 35W TDP makes cooling and power options lower cost.

    The 7100 has the fan in the box, and I haven't noticed that a few watts makes much of a difference in power options. If you are trying to go fanless that's a whole different story, but that wasn't in the original ask. :) On most of these newer CPUs you'll be idle most of the time and the fan will also be idling. The low TDP chips are important mainly if you want to ensure that they never get too hot because you can't/don't want to put a fan on. They'll draw the same power at idle, but are throttled from getting too hot (at the cost of peak performance). I personally use a fanless router, but I'm not trying to run multiple hundred megabits of OpenVPN on it–that really changes the requirements.

    Heck, maybe a Haswell chip could suffice and be lower cost.  Is the fact that these new i3's have  HT and important part of the consideration?  Maybe a non HT dual core haswell could do the trick and save a bunch of money in the process.

    The kaby lake celeron option is about $50. If you can get a similar haswell for a bunch less, go for it. I'd guess you'd end up in the same ballpark and I'd just get the newer architecture which is about as slightly more efficient as it may be slightly more expensive. But if you get a steal on the haswell it should work fine; if the parts box RAM is worth enough to tip the scale, that's that.



  • @VAMike:

    The 7100 has the fan in the box, and I haven't noticed that a few watts makes much of a difference in power options. If you are trying to go fanless that's a whole different story, but that wasn't in the original ask. :) On most of these newer CPUs you'll be idle most of the time and the fan will also be idling. The low TDP chips are important mainly if you want to ensure that they never get too hot because you can't/don't want to put a fan on. They'll draw the same power at idle, but are throttled from getting too hot (at the cost of peak performance). I personally use a fanless router, but I'm not trying to run multiple hundred megabits of OpenVPN on it–that really changes the requirements.

    The fan doesn't bother me in the slightest.  This thing is going to reside in my basement next to my noisy ProCurve switch, my KVM server and my patch panel, where noise is not a concern.    The plan - however - was to use a PicoPSU-type power supply.  The 54W TDP - using those online PSU calculators of questionable accuracy - winds up being right on the hairy edge of what the PicoPSU model I had in mind can provide.

    I like the PicoPSU's as they seem to have much less overhead at idle when measured at the wall with my Kill-A-Watt than a similar PC with a traditional ATX power supply does.  Not quite sure why that is though.

    @VAMike:

    The kaby lake celeron option is about $50. If you can get a similar haswell for a bunch less, go for it. I'd guess you'd end up in the same ballpark and I'd just get the newer architecture which is about as slightly more efficient as it may be slightly more expensive. But if you get a steal on the haswell it should work fine; if the parts box RAM is worth enough to tip the scale, that's that.

    Yeah, it's less the price of the CPU, and more the price of the other accessories that winds up being better with Haswell.  Socket 1150 motherboards tend to be cheaper than Socket 1151 models, and then there's the fact that I won't need to buy RAM at all.  I have unused DDR3 sticks up to my arm pits, but I as of yet don't have anything DDR4 (or DDR3L for that matter) in my house.

    @VAMike:

    So about 30% CPU for ~150Mbps, and he's looking for 300Mbps…if your estimate below that the J3355 is about half the speed of the i5-2400, that means he needs about 120% of the J3355. Maybe a bit less for AES-GCM, but the bottom line will probably be that the J3355's limit is right around the target speed.

    Hmm.  So ~30% of a i5-2400 for 150Mbps, so I'd need 60% of that same i5-2400.

    If these results are an accurate predictor for OpenVPN type of workloads (which I am not convinced they are due to the special instruction sets like AES-NI) Haswell gained about 13% IPC over Sandy.

    The i5-2400 turbo's up to 3.4Ghz, so divide by 1.13, and multiply by 0.6 I ought to need an absolute bare minimum of 1.8Ghz out of the Haswell arch.

    Add a safety margin over that, and even a 2.8Ghz Haswell Celeron G1840 ought to do the trick.  Question is, is that cutting it too close…


  • Banned

    Thanks for your input Mike, it's always good stuff! I happen to have a j3355b I use with LibreELEC I also have a spare HDD and PRO/1000. I think I'll run pfsense on it tonight and see how it handles VPN. I'll report back!


  • Banned

    @mattlach:

    I like the PicoPSU's as they seem to have much less overhead at idle when measured at the wall with my Kill-A-Watt than a similar PC with a traditional ATX power supply does.  Not quite sure why that is though.

    It's likely due to a combination of the picoPSU's not having a fan which takes power to run, and the fact that PSU's tend to be inefficient when only drawing a small percentage of their peak power. So even a 300W ATX PSU is running at 10% of peak with a 30W pfSense box, while a 60W AC/DC converter (this is what really matters for efficiency with a picoPSU) is running at 50%.

    On my LibreElec J3355B I saw a drop of about 8W switching from an old ATX PSU to a pico PSU. Not enough to warrant buying one if you already have an ATX PSU lying around, but maybe enough if you don't. I did it because i didn't want to hear the fan though.


  • Banned

    I posted another thread on the J3355B performance.
    https://forum.pfsense.org/index.php?topic=127793.msg705046#msg705046

    It was pretty impressive IMO.

    On the tests I ran it looked like the CPU scaled fairly linearly with VPN throughput. If that's true then this CPU would work for your 300Mbps application @ AES-128-CBC.

    $55 for low power SoC.


  • Banned

    @authenticx:

    Can you elaborate on how to setup an OpenVPN gateway group please?

    Sure, you simply setup 2+ openVPN clients. I would recommend setting different servers for these clients if able, this helps mitigate the effects of a server going down or slowing down.

    Go to Interfaces, assign them an interface and enable the interface, save and apply

    Go to System/Routing/Gateway Groups
    Create a new gateway group. Select all of the clients that you want to work simultaneously as Tier 1, you can optionally select fallback clients as Tier 2+. Fallbacks are active when all gateways in the higher tier are down.

    Finally go to your Firewall rules
    Any rule that passes traffic that you want to force VPN use on, edit it, select advanced settings and under Gateway select the gateway group you created.



  • @mattlach:

    AMD Geode based APU2C4

    Just to clarify, the APU2C4 isn't AMD Geode based, it's on much more powerful Jaguar cores.  And that said, with four of them I'd expect it to be possible to aggregate multiple OpenVPN connections to equal 150Mbps, as others have suggested.  I say possible because it might be, not because I'd advise it.  But if I were in OP's situation I'd at least try it.

    Just didn't want anyone to get the idea the that APU2C4 is the same as the old APU systems, which were (are) based on very old Geode CPUs.



  • @whosmatt:

    @mattlach:

    AMD Geode based APU2C4

    Just to clarify, the APU2C4 isn't AMD Geode based, it's on much more powerful Jaguar cores.  And that said, with four of them I'd expect it to be possible to aggregate multiple OpenVPN connections to equal 150Mbps, as others have suggested.  I say possible because it might be, not because I'd advise it.  But if I were in OP's situation I'd at least try it.

    Note that the requirement was 150Mbps bidirectional; most of the test numbers are single stream–roughly 300Mbps equivalent. Dicey on an APU2, I think, even with multiple streams.

    Just didn't want anyone to get the idea the that APU2C4 is the same as the old APU systems, which were (are) based on very old Geode CPUs.

    Just to correct the correction, the geodes were in the older ALIX series; pcengines "APU" was a bobcat core and performance-wise was much closer to the APU2 except that it lacks AES-NI and has half the cores. (Confusing naming as "APU" is AMDs name for a line covering 8 different cores over 5+ years.)



  • @VAMike:

    Just to correct the correction, the geodes were in the older ALIX series; pcengines "APU" was a bobcat core and performance-wise was much closer to the APU2 except that it lacks AES-NI and has half the cores. (Confusing naming as "APU" is AMDs name for a line covering 8 different cores over 5+ years.)

    Ah, yes. I meant ALIX.



  • @VAMike:

    Get something around an i3-7100. It's more than you need for 300Mbps OpenVPN, but that gives you room to grow. It's fairly power efficient if you're not running it at max. (And if you do end up maxing it out, then a low-powered chip wasn't going to work anyway.) Or if you want to wait, embedded solutions based on goldmont might work but they're still thin on the ground and we don't have a lot of real-world results. Silvermont/Airmont based embedded processors aren't going to hit 300Mbps of OpenVPN.

    So, I took your advice and went with an i3-7100.

    This is my first socket 1151 chip, so there are a lot of unfamiliar BIOS options.

    Anyone know if "Intel Speed Shift Technology" is compatible with the version of BSD pfSense is built on?

    edit:

    I also have to admit I am VERY impressed with this little chip.

    I haven't installed pfSense yet, but I am doing some testing in Ubuntu 16.10.

    Using the PicoPSU-80 and 60W power brick kit from Mini-Box.com I'm idling on the desktop pulling only 7.1W from the wall (as measured on my Kill-A-Watt).

    That's about the same power as my PcEngines low power Quad Core Jaguar at idle.

    When I load up the chip with mprime (linux version of Prime95) it peaks at about 46W at the wall.

    And that's at 3.9Ghz 2C/4T.

    Even the stock Intel cooler (which just BARELY fit inside the M350 case once the drive brackets were removed) doesn't spin up much during load testing.

    Very impressed.

    The ASRock H270M-ITX/ac is also a great little Mini-ITX board with dual Intel NIC's to pair with it.



  • @VAMike:

    @whosmatt:

    @mattlach:

    AMD Geode based APU2C4

    Just to clarify, the APU2C4 isn't AMD Geode based, it's on much more powerful Jaguar cores.  And that said, with four of them I'd expect it to be possible to aggregate multiple OpenVPN connections to equal 150Mbps, as others have suggested.  I say possible because it might be, not because I'd advise it.  But if I were in OP's situation I'd at least try it.

    Note that the requirement was 150Mbps bidirectional; most of the test numbers are single stream–roughly 300Mbps equivalent. Dicey on an APU2, I think, even with multiple streams.

    Just didn't want anyone to get the idea the that APU2C4 is the same as the old APU systems, which were (are) based on very old Geode CPUs.

    Just to correct the correction, the geodes were in the older ALIX series; pcengines "APU" was a bobcat core and performance-wise was much closer to the APU2 except that it lacks AES-NI and has half the cores. (Confusing naming as "APU" is AMDs name for a line covering 8 different cores over 5+ years.)

    Yeah, my bad, I got my chips confused.  Definitely have the PCEngines APU2C4, which is 4 Jaguar cores at 1Ghz I believe  (or was it 1.2?)



  • @mattlach:

    Yeah, my bad, I got my chips confused.  Definitely have the PCEngines APU2C4, which is 4 Jaguar cores at 1Ghz I believe  (or was it 1.2?)

    2nd gen Jaguar core at 1 Ghz according the the document below and pcengines. Likely limited to 1Ghz due to the design of their cooling solution.

    https://www.amd.com/Documents/AMDGSeriesSOCProductBrief.pdf

    CPU: AMD Embedded G series GX-412TC, 1 GHz quad Jaguar core with 64 bit and AES-NI support, 32K data + 32K instruction cache per core, shared 2MB L2 cache.



  • @mattlach:

    I also have to admit I am VERY impressed with this little chip.

    I haven't installed pfSense yet, but I am doing some testing in Ubuntu 16.10.

    Using the PicoPSU-80 and 60W power brick kit from Mini-Box.com I'm idling on the desktop pulling only 7.1W from the wall (as measured on my Kill-A-Watt).

    That's about the same power as my PcEngines low power Quad Core Jaguar at idle.

    When I load up the chip with mprime (linux version of Prime95) it peaks at about 46W at the wall.

    And that's at 3.9Ghz 2C/4T.

    Even the stock Intel cooler (which just BARELY fit inside the M350 case once the drive brackets were removed) doesn't spin up much during load testing.

    Very impressed.

    The ASRock H270M-ITX/ac is also a great little Mini-ITX board with dual Intel NIC's to pair with it.

    Wow.  That's really good to know.  Yeah, those M350 cases are tiny, but they kind of stand alone in the market, and are perfect for a mini ITX pfSense system provided your NICs are onboard.  I have one but it's for a MythTV frontend.  Thanks for the info.



  • @whosmatt:

    @mattlach:

    I also have to admit I am VERY impressed with this little chip.

    I haven't installed pfSense yet, but I am doing some testing in Ubuntu 16.10.

    Using the PicoPSU-80 and 60W power brick kit from Mini-Box.com I'm idling on the desktop pulling only 7.1W from the wall (as measured on my Kill-A-Watt).

    That's about the same power as my PcEngines low power Quad Core Jaguar at idle.

    When I load up the chip with mprime (linux version of Prime95) it peaks at about 46W at the wall.

    And that's at 3.9Ghz 2C/4T.

    Even the stock Intel cooler (which just BARELY fit inside the M350 case once the drive brackets were removed) doesn't spin up much during load testing.

    Very impressed.

    The ASRock H270M-ITX/ac is also a great little Mini-ITX board with dual Intel NIC's to pair with it.

    Wow.  That's really good to know.  Yeah, those M350 cases are tiny, but they kind of stand alone in the market, and are perfect for a mini ITX pfSense system provided your NICs are onboard.  I have one but it's for a MythTV frontend.  Thanks for the info.

    Any time!

    And it gets better.  I killed Xorg and the idle wattage measured at the wall went down to 6.2W!

    Full specs if anyone else is interested (links to where I bought them, you may find better prices elsewhere):

    And that's it.  Total: 393.31  (less for me, since I already had a few of the parts left over from other projects.

    The CPU comes with a cooler.  Before you assemble everything, it looks like it won't fit in the M350 enclosure, but it does (just barely), as long as you don't use the 2.5" drive brackets.  (use an M2, USB drive or SATA DOM)

    I also pulled out the mini-Wlan card (you loosen two screws on the bottom of the board and it comes right out).  I wasn't using it, and I figured I'd rather not have it wasting power.  Also disabled everything in BIOS I wasnt planning on using, and enabled all power saving states, except suspend to RAM, as the router needs to be operating 24/7.

    I used a fan profile on the board.  The CPU puts out so little power that it seems to stay at the coolers minimum fan speed most of the time.  Granted it is pretty cold in my basement right now.

    (Warmer temps will result in higher fan speeds which will drive up power consumption noticeably.  At this low power use the fans use a surprisingly large percentage of the power)

    I'm very happy thus far.

    Just stay away from the USB3 ports.  pfSense doesn't seem to like those at all, and the installers will fail unless booted from one of the USB2 ports.



  • So,

    After installing pfSense, my power use at idle went up a little bit to about 8W (compared to 6.2W in Ubuntu).

    Part of this may be due to my "Hidaptive" power setting, or maybe BSD 10.3 isnt quite as good at power management as Ubuntu is at this point.

    Either way, still good results.

    Here are some comparative openSSL numbers,

    First the PcEngines APU2C4:

    
    [2.3.1-RELEASE][root@pfSense.localdomain]/root: openssl speed -elapsed -evp aes-128-ecb
    You have chosen to measure elapsed time instead of user CPU time.
    Doing aes-128-ecb for 3s on 16 size blocks: 23413097 aes-128-ecb's in 3.00s
    Doing aes-128-ecb for 3s on 64 size blocks: 18438085 aes-128-ecb's in 3.00s
    Doing aes-128-ecb for 3s on 256 size blocks: 7473361 aes-128-ecb's in 3.00s
    Doing aes-128-ecb for 3s on 1024 size blocks: 2115520 aes-128-ecb's in 3.01s
    Doing aes-128-ecb for 3s on 8192 size blocks: 279464 aes-128-ecb's in 3.00s
    OpenSSL 1.0.1s-freebsd  1 Mar 2016
    built on: date not available
    options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) 
    compiler: clang
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-128-ecb     124869.85k   393345.81k   637726.81k   720221.92k   763123.03k
    

    Now the i3-7100:

    [2.3.3-RELEASE][admin@router.localdomain]/var/log: openssl speed -elapsed -evp aes-128-ecb
    You have chosen to measure elapsed time instead of user CPU time.
    Doing aes-128-ecb for 3s on 16 size blocks: 242729953 aes-128-ecb's in 3.00s
    Doing aes-128-ecb for 3s on 64 size blocks: 207367303 aes-128-ecb's in 3.01s
    Doing aes-128-ecb for 3s on 256 size blocks: 69510589 aes-128-ecb's in 3.00s
    Doing aes-128-ecb for 3s on 1024 size blocks: 17831161 aes-128-ecb's in 3.00s
    Doing aes-128-ecb for 3s on 8192 size blocks: 2219499 aes-128-ecb's in 3.00s
    OpenSSL 1.0.1s-freebsd  1 Mar 2016
    built on: date not available
    options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) 
    compiler: clang
    The 'numbers' are in 1000s of bytes per second processed.
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
    aes-128-ecb    1294559.75k  4412345.31k  5931570.26k  6086369.62k  6060711.94k
    

    Looks like an average of about an order of magnitude improvement across the board.

    Now if I could only get OpenVPN to work, I'd be happy.



  • To follow up on this, I did some testing using this benchmark method suggested by Ira (note the time you want is the one with the U, I think)

    My old APU2C4 - according to this test - is able to handle ~45Mbps in OpenVPN.

    The i3-7100 using the same method appears to be able to handle ~425Mpbs

    Since OpenVPN uses one thread per connection, the way I interpret this is that it can support up to 425Mbps per core.

    Since - by necessity - up and down are on separate threads, and I have to cores, I could max it out at 425/425 at the same time.



  • @pfBasic:

    This is also encrypting at AES-256-CBC, which I also don't recommend. AES-128-GCM is more than enough for your privacy and significantly more efficient.

    I have some questions regarding your statements about different ciphers.

    1.)  Are GCM ciphers even compatible with pfSense?

    [2.3.3-RELEASE][admin@router.localdomain]/root: openvpn --show-ciphers
    The following ciphers and cipher modes are available
    for use with OpenVPN.  Each cipher shown below may be
    used as a parameter to the --cipher option.  The default
    key size is shown as well as whether or not it can be
    changed with the --keysize directive.  Using a CBC mode
    is recommended. In static key mode only CBC mode is allowed.
    
    AES-128-CBC  (128 bit key, 128 bit block)
    AES-128-CFB  (128 bit key, 128 bit block, TLS client/server mode only)
    AES-128-CFB1  (128 bit key, 128 bit block, TLS client/server mode only)
    AES-128-CFB8  (128 bit key, 128 bit block, TLS client/server mode only)
    AES-128-OFB  (128 bit key, 128 bit block, TLS client/server mode only)
    AES-192-CBC  (192 bit key, 128 bit block)
    AES-192-CFB  (192 bit key, 128 bit block, TLS client/server mode only)
    AES-192-CFB1  (192 bit key, 128 bit block, TLS client/server mode only)
    AES-192-CFB8  (192 bit key, 128 bit block, TLS client/server mode only)
    AES-192-OFB  (192 bit key, 128 bit block, TLS client/server mode only)
    AES-256-CBC  (256 bit key, 128 bit block)
    AES-256-CFB  (256 bit key, 128 bit block, TLS client/server mode only)
    AES-256-CFB1  (256 bit key, 128 bit block, TLS client/server mode only)
    AES-256-CFB8  (256 bit key, 128 bit block, TLS client/server mode only)
    AES-256-OFB  (256 bit key, 128 bit block, TLS client/server mode only)
    CAMELLIA-128-CBC  (128 bit key, 128 bit block)
    CAMELLIA-128-CFB  (128 bit key, 128 bit block, TLS client/server mode only)
    CAMELLIA-128-CFB1  (128 bit key, 128 bit block, TLS client/server mode only)
    CAMELLIA-128-CFB8  (128 bit key, 128 bit block, TLS client/server mode only)
    CAMELLIA-128-OFB  (128 bit key, 128 bit block, TLS client/server mode only)
    CAMELLIA-192-CBC  (192 bit key, 128 bit block)
    CAMELLIA-192-CFB  (192 bit key, 128 bit block, TLS client/server mode only)
    CAMELLIA-192-CFB1  (192 bit key, 128 bit block, TLS client/server mode only)
    CAMELLIA-192-CFB8  (192 bit key, 128 bit block, TLS client/server mode only)
    CAMELLIA-192-OFB  (192 bit key, 128 bit block, TLS client/server mode only)
    CAMELLIA-256-CBC  (256 bit key, 128 bit block)
    CAMELLIA-256-CFB  (256 bit key, 128 bit block, TLS client/server mode only)
    CAMELLIA-256-CFB1  (256 bit key, 128 bit block, TLS client/server mode only)
    CAMELLIA-256-CFB8  (256 bit key, 128 bit block, TLS client/server mode only)
    CAMELLIA-256-OFB  (256 bit key, 128 bit block, TLS client/server mode only)
    SEED-CBC  (128 bit key, 128 bit block)
    SEED-CFB  (128 bit key, 128 bit block, TLS client/server mode only)
    SEED-OFB  (128 bit key, 128 bit block, TLS client/server mode only)
    
    The following ciphers have a block size of less than 128 bits, 
    and are therefore deprecated.  Do not use unless you have to.
    
    BF-CBC  (128 bit key by default, 64 bit block)
    BF-CFB  (128 bit key by default, 64 bit block, TLS client/server mode only)
    BF-OFB  (128 bit key by default, 64 bit block, TLS client/server mode only)
    CAST5-CBC  (128 bit key by default, 64 bit block)
    CAST5-CFB  (128 bit key by default, 64 bit block, TLS client/server mode only)
    CAST5-OFB  (128 bit key by default, 64 bit block, TLS client/server mode only)
    DES-CBC  (64 bit key, 64 bit block)
    DES-CFB  (64 bit key, 64 bit block, TLS client/server mode only)
    DES-CFB1  (64 bit key, 64 bit block, TLS client/server mode only)
    DES-CFB8  (64 bit key, 64 bit block, TLS client/server mode only)
    DES-EDE-CBC  (128 bit key, 64 bit block)
    DES-EDE-CFB  (128 bit key, 64 bit block, TLS client/server mode only)
    DES-EDE-OFB  (128 bit key, 64 bit block, TLS client/server mode only)
    DES-EDE3-CBC  (192 bit key, 64 bit block)
    DES-EDE3-CFB  (192 bit key, 64 bit block, TLS client/server mode only)
    DES-EDE3-CFB1  (192 bit key, 64 bit block, TLS client/server mode only)
    DES-EDE3-CFB8  (192 bit key, 64 bit block, TLS client/server mode only)
    DES-EDE3-OFB  (192 bit key, 64 bit block, TLS client/server mode only)
    DES-OFB  (64 bit key, 64 bit block, TLS client/server mode only)
    DESX-CBC  (192 bit key, 64 bit block)
    IDEA-CBC  (128 bit key, 64 bit block)
    IDEA-CFB  (128 bit key, 64 bit block, TLS client/server mode only)
    IDEA-OFB  (128 bit key, 64 bit block, TLS client/server mode only)
    RC2-40-CBC  (40 bit key by default, 64 bit block)
    RC2-64-CBC  (64 bit key by default, 64 bit block)
    RC2-CBC  (128 bit key by default, 64 bit block)
    RC2-CFB  (128 bit key by default, 64 bit block, TLS client/server mode only)
    RC2-OFB  (128 bit key by default, 64 bit block, TLS client/server mode only)
    RC5-CBC  (128 bit key by default, 64 bit block)
    RC5-CFB  (128 bit key by default, 64 bit block, TLS client/server mode only)
    RC5-OFB  (128 bit key by default, 64 bit block, TLS client/server mode only)
    

    I don't see any GCM modes in that list.

    2.)  I did some performance testing using the method suggested by Ira, and I found an almost negligible difference in performance between AES-256-CBC and AES-128-CBC.  If that is the case, why not just use 256 bit as it is stronger?

    Would appreciate your input.


  • Banned

    AES-GCM is supported in 2.4, if you're interested you can use 2.4.0 BETA which I have had zero stability issues with for home use.

    I couldn't tell you what is happening behind the scenes with AES-NI & OpenVPN between 128 bit and 256 bit encrypt/decrypt. And yes, there is often not a massive performance gain by switching from AES 256 to 128 CBC, but it is normally big enough to matter. https://forum.pfsense.org/index.php?topic=127793.msg705162#msg705162

    CBC to GCM should be a significant jump.

    While I like the benchmark that Ira put together, take it with a grain of salt for real world performance. I doubt that it works well across different architectures, different versions of AES-NI, different versions of AES. I can tell you that it is not correct for my i5-2400.

    I can tell you that when I did real world testing (not by any means rigorous) on a J3355B switching from AES-256-CBC to AES-128-CBC (my VPN provider does not support GCM) and the difference was dramatic. https://forum.pfsense.org/index.php?topic=127793.msg705046#msg705046
    That doesn't mean the same applies to your CPU though, but only real world tests can tell you that.

    And saying that 256 bit encryption is better/more secure than 128 bit encryption is arguable.
    As far as anyone knows AES-128 has no known vulnerabilities and is effectively uncrackable. The same is true of AES-256.
    Most likely no one will ever attempt to decrypt your data at any encryption level. If they do, no one can brute force AES-128 so there's no known value of using stronger encryption.
    If a (not third world) state level entity wants to hack you, they will and they won't care what encryption you are using. So don't bother trying.

    A very rough analogue would be that if you want to protect your crystal ball collection during a bombing raid, keeping them in a 50' thick concrete bunker is more secure than keeping them in a 25' thick concrete bunker. Even the best crystal ball thieves only have hand grenades so they will effectively never penetrate the 25' bunker and those guys aren't even interested in your crystal balls.
    But then the US Gov't has the GBU-57A/B, so if they ever feel like getting into your crystal ball collection they most certainly can :).



  • @pfBasic:

    AES-GCM is supported in 2.4, if you're interested you can use 2.4.0 BETA which I have had zero stability issues with for home use.

    I couldn't tell you what is happening behind the scenes with AES-NI & OpenVPN between 128 bit and 256 bit encrypt/decrypt. And yes, there is often not a massive performance gain by switching from AES 256 to 128 CBC, but it is normally big enough to matter. https://forum.pfsense.org/index.php?topic=127793.msg705162#msg705162

    CBC to GCM should be a significant jump.

    While I like the benchmark that Ira put together, take it with a grain of salt for real world performance. I doubt that it works well across different architectures, different versions of AES-NI, different versions of AES. I can tell you that it is not correct for my i5-2400.

    I can tell you that when I did real world testing (not by any means rigorous) on a J3355B switching from AES-256-CBC to AES-128-CBC (my VPN provider does not support GCM) and the difference was dramatic. https://forum.pfsense.org/index.php?topic=127793.msg705046#msg705046
    That doesn't mean the same applies to your CPU though, but only real world tests can tell you that.

    Thanks.  I appreciate the input on that.

    Do you know what about GCM it is that makes it so much faster?  Is it a weaker cipher, or just more efficient somehow?


  • Banned

    No, not really. The buzz words I've read say that it has better parallelization than CBC, but I don't really know what that means. It also includes the authentication portion of the VPN whereas CBC does not (it handles it with some level of SHA encryption).

    I've never read anything that suggests its in any way weaker than CBC, in fact everything I've read suggests that it might be more secure. But that's probably more secure in a similar sense to my above analogy about bunkers and bombs.

    There are definitely users on this board that could answer your question though.



  • @mattlach:

    Do you know what about GCM it is that makes it so much faster?  Is it a weaker cipher, or just more efficient somehow?

    It's stronger. By combining the encryption and authentication (instead of having, e.g., a separate SHA MAC) it can be more efficiently pipelined in the CPU. Intel also added the PCLMULQDQ instructions mainly to speed up GCM (so it gets an additional hardware assist on newer CPUs).



  • To add to this, AES-256-GCM = AES-256-CTR and SHA256 combined.



  • @Pippin:

    To add to this, AES-256-GCM = AES-256-CTR and SHA256 combined.

    N.b., there is zero reason to use AES-256 on your home VPN rather than AES-128.



  • I ordered parts for a 7700K pfsense router.  Probably a little overkill, but I wanted to future proof and honestly, that CPU isn't very expensive considering. Spent about 700 total, but should be decent.  Got an intel quad NIC also.


  • Banned

    @psulions5:

    I ordered parts for a 7700K pfsense router.  Probably a little overkill, but I wanted to future proof and honestly, that CPU isn't very expensive considering. Spent about 700 total, but should be decent.  Got an intel quad NIC also.

    Make sure you get a pair of GTX-1080Ti's in SLI to go with that, they can really be leveraged for outstanding IDS/IPS throughput.

    And I'd suggest a pair of 1TB Samsung 960 PRO's in a ZFS mirror so it doesn't bottleneck your logs.



  • I mean I could do that, but my gaming machine already has that :p



  • @VAMike:

    @Pippin:

    To add to this, AES-256-GCM = AES-256-CTR and SHA256 combined.

    N.b., there is zero reason to use AES-256 on your home VPN rather than AES-128.

    There is not a lot done with "good reason" in this world…
    Depends on needs and "craziness"... also in the home ;)

    B.t.w., the default OpenVPN 2.4 or higher selects is AES-256-GCM, when both sides are on OpenVPN 2.4.
    See --ncp in OpenVPN manual 2.4



  • @Pippin:

    B.t.w., the default OpenVPN 2.4 or higher selects is AES-256-GCM, when both sides are on OpenVPN 2.4.
    See –ncp in OpenVPN manual 2.4

    Defaults are made to be changed



  • @mattlach:

    Just stay away from the USB3 ports.  pfSense doesn't seem to like those at all, and the installers will fail unless booted from one of the USB2 ports.

    From the mobo quick installation guide pdf page 3, "CAUTION: For operating system installation, be sure to plug your USB flash drive into the USB 2.0 Ports (USB12)."



  • @mattlach:

    @whosmatt:

    @mattlach:

    I also have to admit I am VERY impressed with this little chip.

    I haven't installed pfSense yet, but I am doing some testing in Ubuntu 16.10.

    Using the PicoPSU-80 and 60W power brick kit from Mini-Box.com I'm idling on the desktop pulling only 7.1W from the wall (as measured on my Kill-A-Watt).

    That's about the same power as my PcEngines low power Quad Core Jaguar at idle.

    When I load up the chip with mprime (linux version of Prime95) it peaks at about 46W at the wall.

    And that's at 3.9Ghz 2C/4T.

    Even the stock Intel cooler (which just BARELY fit inside the M350 case once the drive brackets were removed) doesn't spin up much during load testing.

    Very impressed.

    The ASRock H270M-ITX/ac is also a great little Mini-ITX board with dual Intel NIC's to pair with it.

    Wow.  That's really good to know.  Yeah, those M350 cases are tiny, but they kind of stand alone in the market, and are perfect for a mini ITX pfSense system provided your NICs are onboard.  I have one but it's for a MythTV frontend.  Thanks for the info.

    Any time!

    And it gets better.  I killed Xorg and the idle wattage measured at the wall went down to 6.2W!

    Full specs if anyone else is interested (links to where I bought them, you may find better prices elsewhere):

    And that's it.  Total: 393.31  (less for me, since I already had a few of the parts left over from other projects.

    The CPU comes with a cooler.  Before you assemble everything, it looks like it won't fit in the M350 enclosure, but it does (just barely), as long as you don't use the 2.5" drive brackets.  (use an M2, USB drive or SATA DOM)

    I also pulled out the mini-Wlan card (you loosen two screws on the bottom of the board and it comes right out).  I wasn't using it, and I figured I'd rather not have it wasting power.  Also disabled everything in BIOS I wasnt planning on using, and enabled all power saving states, except suspend to RAM, as the router needs to be operating 24/7.

    I used a fan profile on the board.  The CPU puts out so little power that it seems to stay at the coolers minimum fan speed most of the time.  Granted it is pretty cold in my basement right now.

    (Warmer temps will result in higher fan speeds which will drive up power consumption noticeably.  At this low power use the fans use a surprisingly large percentage of the power)

    I'm very happy thus far.

    Just stay away from the USB3 ports.  pfSense doesn't seem to like those at all, and the installers will fail unless booted from one of the USB2 ports.

    So I finally had time to get this working (a month and a half later), as I had trouble getting PIA VPN working the first time around.

    Now that it is up and running I can definitely say that the i3-7100 is overkill by more than I expected.

    @VAMike:

    @Pippin:

    To add to this, AES-256-GCM = AES-256-CTR and SHA256 combined.

    N.b., there is zero reason to use AES-256 on your home VPN rather than AES-128.

    I did wind up going with AES-256-CBC and SHA256 just because I could as my router is overkill, but honestly, I didn't notice much (any?) CPU load difference between the two, so might as well use the stronger one, even if it might not be necessary.

    Anyway, with AES-256-CBC and SHA256, loading up the connection in one direction (it peaks at about 135Mbit, due to my traffic shaping rules) I only get about 9-10% load on the CPU.  So, under a theoretical full load in both directions I ought to hit 18-20% somewhere.

    I'm glad to have some room to grow should anything change, but this little i3-7100 has definitely outperformed my expectations.



  • @mattlach:

    I did wind up going with AES-256-CBC and SHA256 just because I could as my router is overkill, but honestly, I didn't notice much (any?) CPU load difference between the two, so might as well use the stronger one, even if it might not be necessary.

    I also use AES-256 and SHA256 on my PIA tunnels and have never noticed a tangible performance difference between the two.    I'm still on AES-128 and SHA1 on my personal OpenVPN server, mostly because I set it up that way years ago and haven't felt the need to change.  SHA1 is approaching deprecation anyhow as far as I'm aware.  Anyway, thanks for the update.



  • @mattlach:

    I did wind up going with AES-256-CBC and SHA256 just because I could as my router is overkill, but honestly, I didn't notice much (any?) CPU load difference between the two, so might as well use the stronger one, even if it might not be necessary.

    Anyway, with AES-256-CBC and SHA256, loading up the connection in one direction (it peaks at about 135Mbit, due to my traffic shaping rules) I only get about 9-10% load on the CPU.  So, under a theoretical full load in both directions I ought to hit 18-20% somewhere.

    I'm glad to have some room to grow should anything change, but this little i3-7100 has definitely outperformed my expectations.

    @whosmatt:

    I also use AES-256 and SHA256 on my PIA tunnels and have never noticed a tangible performance difference between the two.    I'm still on AES-128 and SHA1 on my personal OpenVPN server, mostly because I set it up that way years ago and haven't felt the need to change.  SHA1 is approaching deprecation anyhow as far as I'm aware.  Anyway, thanks for the update.

    I should follow up with the fact that since my initial tests (just speedtest.net) I have succeeded in getting the CPU load up much higher.

    I was under the impression that OpenVPN CPU load was really just dependent on raw throughput, but that doesn't seem to be the case,  More connections at the same bandwidth use more CPU it would seem.

    Downloaded a new Ubuntu ISO today using rtorrent, which resulted in downstream maxed, and a little upstream.  This was about 38% CPU on the router.  Still very respectable, but I wanted to update you guys in case someone takes my earlier results too seriously.


Log in to reply