Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    AES-NI performance

    Hardware
    23
    83
    23.4k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C
      Chucko
      last edited by

      Adding -elapsed to the above command only changed results by ~2%.

      Here's the multi-threaded result:

      
      openssl speed -multi 2 -evp aes-256-cbc
      Forked child 0
      Forked child 1
      +DT:aes-256-cbc:3:16
      +DT:aes-256-cbc:3:16
      +R:3311914:aes-256-cbc:3.000000
      +DT:aes-256-cbc:3:64
      +R:3377542:aes-256-cbc:3.000000
      +DT:aes-256-cbc:3:64
      +R:886867:aes-256-cbc:3.000000
      +DT:aes-256-cbc:3:256
      +R:913678:aes-256-cbc:3.000000
      +DT:aes-256-cbc:3:256
      +R:226698:aes-256-cbc:3.000000
      +DT:aes-256-cbc:3:1024
      +R:233562:aes-256-cbc:3.000000
      +DT:aes-256-cbc:3:1024
      +R:57329:aes-256-cbc:3.000000
      +DT:aes-256-cbc:3:8192
      +R:58852:aes-256-cbc:3.000000
      +DT:aes-256-cbc:3:8192
      +R:7285:aes-256-cbc:3.000000
      +R:7406:aes-256-cbc:3.000000
      Got: +H:16:64:256:1024:8192 from 0
      Got: +F:22:aes-256-cbc:17663541.33:18919829.33:19344896.00:19568298.67:19892906.67 from 0
      Got: +H:16:64:256:1024:8192 from 1
      Got: +F:22:aes-256-cbc:18013557.33:19491797.33:19930624.00:20088149.33:20223317.33 from 1
      OpenSSL 1.0.1s-freebsd  1 Mar 2016
      built on: date not available
      options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) 
      compiler: clang
      evp              35677.10k    38411.63k    39275.52k    39656.45k    40116.22k
      
      
      1 Reply Last reply Reply Quote 0
      • C
        Chucko
        last edited by

        And for more perspective, my NAS4Free box running FreeBSD 11.0-RELEASE. This is a Core 2 Quad Q9550 @ 2.83 GHz.

        
        nas4free ~/ chucko~$ openssl speed -elapsed -evp aes-256-cbc
        You have chosen to measure elapsed time instead of user CPU time.
        Doing aes-256-cbc for 3s on 16 size blocks: 28607257 aes-256-cbc's in 3.01s
        Doing aes-256-cbc for 3s on 64 size blocks: 8038838 aes-256-cbc's in 3.00s
        Doing aes-256-cbc for 3s on 256 size blocks: 2078627 aes-256-cbc's in 3.00s
        Doing aes-256-cbc for 3s on 1024 size blocks: 521836 aes-256-cbc's in 3.00s
        Doing aes-256-cbc for 3s on 8192 size blocks: 65551 aes-256-cbc's in 3.00s
        OpenSSL 1.0.2j-freebsd  26 Sep 2016
        built on: date not available
        options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) 
        compiler: clang
        The 'numbers' are in 1000s of bytes per second processed.
        type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
        aes-256-cbc     152175.75k   171495.21k   177376.17k   178120.02k   178997.93k
        nas4free ~/ chucko~$ openssl speed -multi 4 -evp aes-256-cbc
        Forked child 0
        Forked child 1
        Forked child 2
        +DT:aes-256-cbc:3:16
        +DT:aes-256-cbc:3:16
        +DT:aes-256-cbc:3:16
        +DT:aes-256-cbc:3:16
        Forked child 3
        +R:28661984:aes-256-cbc:3.000000
        +R:28561131:aes-256-cbc:3.007813
        +R:28616238:aes-256-cbc:3.000000
        +DT:aes-256-cbc:3:64
        +DT:aes-256-cbc:3:64
        +DT:aes-256-cbc:3:64
        +R:28653210:aes-256-cbc:3.007813
        +DT:aes-256-cbc:3:64
        +R:8221475:aes-256-cbc:3.054688
        +R:8216875:aes-256-cbc:3.054688
        +R:8222598:aes-256-cbc:3.054688
        +R:8199168:aes-256-cbc:3.054688
        +DT:aes-256-cbc:3:256
        +DT:aes-256-cbc:3:256
        +DT:aes-256-cbc:3:256
        +DT:aes-256-cbc:3:256
        +R:2088535:aes-256-cbc:3.000000
        +R:2088077:aes-256-cbc:3.000000
        +R:2081254:aes-256-cbc:3.000000
        +DT:aes-256-cbc:3:1024
        +R:2087901:aes-256-cbc:3.000000
        +DT:aes-256-cbc:3:1024
        +DT:aes-256-cbc:3:1024
        +DT:aes-256-cbc:3:1024
        +R:526763:aes-256-cbc:3.007813
        +R:526629:aes-256-cbc:3.007813
        +R:526698:aes-256-cbc:3.007813
        +DT:aes-256-cbc:3:8192
        +R:525146:aes-256-cbc:3.007813
        +DT:aes-256-cbc:3:8192
        +DT:aes-256-cbc:3:8192
        +DT:aes-256-cbc:3:8192
        +R:65963:aes-256-cbc:3.000000
        +R:65715:aes-256-cbc:3.000000
        +R:65940:aes-256-cbc:3.000000
        +R:65937:aes-256-cbc:3.000000
        Got: +H:16:64:256:1024:8192 from 0
        Got: +F:22:aes-256-cbc:151930379.97:171784102.96:177600341.33:178784250.68:180122965.33 from 0
        Got: +H:16:64:256:1024:8192 from 1
        Got: +F:22:aes-256-cbc:152420192.42:172251465.98:178221653.33:179334753.08:179445760.00 from 1
        Got: +H:16:64:256:1024:8192 from 2
        Got: +F:22:aes-256-cbc:152863914.67:172155089.51:178167552.00:179312624.04:180051968.00 from 2
        Got: +H:16:64:256:1024:8192 from 3
        Got: +F:22:aes-256-cbc:152619936.00:172274994.41:178182570.67:179289133.22:180060160.00 from 3
        OpenSSL 1.0.2j-freebsd  26 Sep 2016
        built on: date not available
        options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) 
        compiler: clang
        evp             609834.42k   688465.65k   712172.12k   716720.76k   719680.85k
        nas4free ~/ chucko~$ 
        
        
        1 Reply Last reply Reply Quote 0
        • V
          VAMike
          last edited by

          @Chucko:

          Adding -elapsed to the above command only changed results by ~2%.

          Yeah, without aes-ni cryptodev isn't in play, and while -elapsed gives a less accurate result when using openssl's internal crypto routines the two numbers should be pretty close.

          1 Reply Last reply Reply Quote 0
          • D
            douggmc
            last edited by

            Hopefully this of value/help to others:

            Quad Core Celeron J1900 Bay Trail 2.0GHz
            (specifically this "Chinese" appliance: https://www.amazon.com/Firewall-micro-appliance-Gigabit-pfSense/dp/B01JHJGG5M

            CPU no AES-NI, so no difference in these two tests (based on what I've read in this thread) …

            
            openssl speed -evp aes-256-cbc
            Doing aes-256-cbc for 3s on 16 size blocks: 5619317 aes-256-cbc's in 3.01s
            Doing aes-256-cbc for 3s on 64 size blocks: 1475355 aes-256-cbc's in 3.01s
            Doing aes-256-cbc for 3s on 256 size blocks: 373757 aes-256-cbc's in 2.99s
            Doing aes-256-cbc for 3s on 1024 size blocks: 94034 aes-256-cbc's in 3.01s
            Doing aes-256-cbc for 3s on 8192 size blocks: 11800 aes-256-cbc's in 3.00s
            OpenSSL 1.0.1s-freebsd  1 Mar 2016
            built on: date not available
            options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
            compiler: clang
            The 'numbers' are in 1000s of bytes per second processed.
            type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
            aes-256-cbc      29891.85k    31392.49k    31977.20k    32013.57k    32221.87k
            
            

            and

            
            openssl speed -elapsed -evp aes-256-cbc
            You have chosen to measure elapsed time instead of user CPU time.
            Doing aes-256-cbc for 3s on 16 size blocks: 5627119 aes-256-cbc's in 3.00s
            Doing aes-256-cbc for 3s on 64 size blocks: 1472526 aes-256-cbc's in 3.01s
            Doing aes-256-cbc for 3s on 256 size blocks: 375127 aes-256-cbc's in 3.01s
            Doing aes-256-cbc for 3s on 1024 size blocks: 94726 aes-256-cbc's in 3.02s
            Doing aes-256-cbc for 3s on 8192 size blocks: 11769 aes-256-cbc's in 3.00s
            OpenSSL 1.0.1s-freebsd  1 Mar 2016
            built on: date not available
            options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
            compiler: clang
            The 'numbers' are in 1000s of bytes per second processed.
            type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
            aes-256-cbc      30011.30k    31332.29k    31927.69k    32082.50k    32137.22k
            
            
            1 Reply Last reply Reply Quote 0
            • M
              meruem
              last edited by

              In case it helps anyone

              System Specs


              • ASRock H270M-ITX/ac

              • Intel(R) Core(TM) i5-7500

              • Adaptive {PowerD}

              uname

              
              [2.4.0-BETA][admin@pfsense.localdomain]/root: uname -a
              FreeBSD pfsense.localdomain 11.0-RELEASE-p10 FreeBSD 11.0-RELEASE-p10 #75 51c8a24f312(RELENG_2_4): Fri May 12 19:55:27 CDT 2017     
              root@buildbot2.netgate.com:/builder/ce/tmp/obj/builder/ce/tmp/FreeBSD-src/sys/pfSense  amd64
              
              

              dmesg cpu

              
              [2.4.0-BETA][admin@pfsense.localdomain]/: dmesg | grep CPU
              CPU: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz (3408.16-MHz K8-class CPU)
              FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
              cpu0: <acpi cpu="">on acpi0
              cpu1: <acpi cpu="">on acpi0
              cpu2: <acpi cpu="">on acpi0
              cpu3: <acpi cpu="">on acpi0
              SMP: AP CPU #1 Launched!
              SMP: AP CPU #2 Launched!
              SMP: AP CPU #3 Launched!
              coretemp0: <cpu on-die="" thermal="" sensors="">on cpu0
              coretemp1: <cpu on-die="" thermal="" sensors="">on cpu1
              coretemp2: <cpu on-die="" thermal="" sensors="">on cpu2
              coretemp3: <cpu on-die="" thermal="" sensors="">on cpu3</cpu></cpu></cpu></cpu></acpi></acpi></acpi></acpi> 
              

              pciconf -lv

              
              [2.4.0-BETA][admin@pfsense.localdomain]/: pciconf -lv
              hostb0@pci0:0:0:0:      class=0x060000 card=0x591f1849 chip=0x591f8086 rev=0x05 hdr=0x00
                  vendor     = 'Intel Corporation'
                  class      = bridge
                  subclass   = HOST-PCI
              pcib1@pci0:0:1:0:       class=0x060400 card=0x19011849 chip=0x19018086 rev=0x05 hdr=0x01
                  vendor     = 'Intel Corporation'
                  device     = 'Skylake PCIe Controller (x16)'
                  class      = bridge
                  subclass   = PCI-PCI
              vgapci0@pci0:0:2:0:     class=0x030000 card=0x59121849 chip=0x59128086 rev=0x04 hdr=0x00
                  vendor     = 'Intel Corporation'
                  class      = display
                  subclass   = VGA
              xhci0@pci0:0:20:0:      class=0x0c0330 card=0xa2af1849 chip=0xa2af8086 rev=0x00 hdr=0x00
                  vendor     = 'Intel Corporation'
                  class      = serial bus
                  subclass   = USB
              none0@pci0:0:20:2:      class=0x118000 card=0xa2b11849 chip=0xa2b18086 rev=0x00 hdr=0x00
                  vendor     = 'Intel Corporation'
                  class      = dasp
              none1@pci0:0:22:0:      class=0x078000 card=0xa2ba1849 chip=0xa2ba8086 rev=0x00 hdr=0x00
                  vendor     = 'Intel Corporation'
                  class      = simple comms
              ahci0@pci0:0:23:0:      class=0x010601 card=0xa2821849 chip=0xa2828086 rev=0x00 hdr=0x00
                  vendor     = 'Intel Corporation'
                  class      = mass storage
                  subclass   = SATA
              pcib2@pci0:0:28:0:      class=0x060400 card=0xa2921849 chip=0xa2928086 rev=0xf0 hdr=0x01
                  vendor     = 'Intel Corporation'
                  class      = bridge
                  subclass   = PCI-PCI
              pcib3@pci0:0:28:5:      class=0x060400 card=0xa2951849 chip=0xa2958086 rev=0xf0 hdr=0x01
                  vendor     = 'Intel Corporation'
                  class      = bridge
                  subclass   = PCI-PCI
              pcib4@pci0:0:29:0:      class=0x060400 card=0xa2981849 chip=0xa2988086 rev=0xf0 hdr=0x01
                  vendor     = 'Intel Corporation'
                  class      = bridge
                  subclass   = PCI-PCI
              isab0@pci0:0:31:0:      class=0x060100 card=0xa2c41849 chip=0xa2c48086 rev=0x00 hdr=0x00
                  vendor     = 'Intel Corporation'
                  class      = bridge
                  subclass   = PCI-ISA
              none2@pci0:0:31:2:      class=0x058000 card=0xa2a11849 chip=0xa2a18086 rev=0x00 hdr=0x00
                  vendor     = 'Intel Corporation'
                  class      = memory
              none3@pci0:0:31:4:      class=0x0c0500 card=0xa2a31849 chip=0xa2a38086 rev=0x00 hdr=0x00
                  vendor     = 'Intel Corporation'
                  class      = serial bus
                  subclass   = SMBus
              em0@pci0:0:31:6:        class=0x020000 card=0x15b81849 chip=0x15b88086 rev=0x00 hdr=0x00
                  vendor     = 'Intel Corporation'
                  device     = 'Ethernet Connection (2) I219-V'
                  class      = network
                  subclass   = ethernet
              igb0@pci0:1:0:0:        class=0x020000 card=0x00018086 chip=0x15218086 rev=0x01 hdr=0x00
                  vendor     = 'Intel Corporation'
                  device     = 'I350 Gigabit Network Connection'
                  class      = network
                  subclass   = ethernet
              igb1@pci0:1:0:1:        class=0x020000 card=0x00018086 chip=0x15218086 rev=0x01 hdr=0x00
                  vendor     = 'Intel Corporation'
                  device     = 'I350 Gigabit Network Connection'
                  class      = network
                  subclass   = ethernet
              igb2@pci0:1:0:2:        class=0x020000 card=0x00018086 chip=0x15218086 rev=0x01 hdr=0x00
                  vendor     = 'Intel Corporation'
                  device     = 'I350 Gigabit Network Connection'
                  class      = network
                  subclass   = ethernet
              igb3@pci0:1:0:3:        class=0x020000 card=0x00018086 chip=0x15218086 rev=0x01 hdr=0x00
                  vendor     = 'Intel Corporation'
                  device     = 'I350 Gigabit Network Connection'
                  class      = network
                  subclass   = ethernet
              igb4@pci0:3:0:0:        class=0x020000 card=0x15391849 chip=0x15398086 rev=0x03 hdr=0x00
                  vendor     = 'Intel Corporation'
                  device     = 'I211 Gigabit Network Connection'
                  class      = network
                  subclass   = ethernet
              nvme0@pci0:4:0:0:       class=0x010802 card=0xa801144d chip=0xa804144d rev=0x00 hdr=0x00
                  vendor     = 'Samsung Electronics Co Ltd'
                  class      = mass storage
                  subclass   = NVM
              
              


              aesni unloaded


              {-engine omitted} versus {-engine=cryptodev}

              
              [2.4.0-BETA][admin@pfsense.localdomain]/root: kldunload aesni
              [2.4.0-BETA][admin@pfsense.localdomain]/root: openssl speed -evp aes-256-cbc
              Doing aes-256-cbc for 3s on 16 size blocks: 150632064 aes-256-cbc's in 2.99s
              Doing aes-256-cbc for 3s on 64 size blocks: 41237969 aes-256-cbc's in 3.00s
              Doing aes-256-cbc for 3s on 256 size blocks: 10550741 aes-256-cbc's in 3.00s
              Doing aes-256-cbc for 3s on 1024 size blocks: 2695765 aes-256-cbc's in 2.99s
              Doing aes-256-cbc for 3s on 8192 size blocks: 335120 aes-256-cbc's in 3.00s
              OpenSSL 1.0.2k-freebsd  26 Jan 2017
              built on: date not available
              options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
              compiler: clang
              The 'numbers' are in 1000s of bytes per second processed.
              type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
              aes-256-cbc     805468.58k   879743.34k   900329.90k   922556.95k   915101.01k
              
              
              
              [2.4.0-BETA][admin@pfsense.localdomain]/root: kldunload aesni
              [2.4.0-BETA][admin@pfsense.localdomain]/: openssl speed -evp aes-256-cbc -engine cryptodev
              engine "cryptodev" set.
              Doing aes-256-cbc for 3s on 16 size blocks: 146575420 aes-256-cbc's in 2.99s
              Doing aes-256-cbc for 3s on 64 size blocks: 41172378 aes-256-cbc's in 3.00s
              Doing aes-256-cbc for 3s on 256 size blocks: 10626707 aes-256-cbc's in 3.00s
              Doing aes-256-cbc for 3s on 1024 size blocks: 2699103 aes-256-cbc's in 2.99s
              Doing aes-256-cbc for 3s on 8192 size blocks: 332528 aes-256-cbc's in 3.00s
              OpenSSL 1.0.2k-freebsd  26 Jan 2017
              built on: date not available
              options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
              compiler: clang
              The 'numbers' are in 1000s of bytes per second processed.
              type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
              aes-256-cbc     783776.66k   878344.06k   906812.33k   923699.29k   908023.13k
              
              

              {-engine omitted} versus {-engine=cryptodev} && {-elapsed}

              
              [2.4.0-BETA][admin@pfsense.localdomain]/root: kldunload aesni
              [2.4.0-BETA][admin@pfsense.localdomain]/root: openssl speed -evp aes-256-cbc -elapsed
              You have chosen to measure elapsed time instead of user CPU time.
              Doing aes-256-cbc for 3s on 16 size blocks: 148406148 aes-256-cbc's in 3.01s
              Doing aes-256-cbc for 3s on 64 size blocks: 41268481 aes-256-cbc's in 3.00s
              Doing aes-256-cbc for 3s on 256 size blocks: 10574324 aes-256-cbc's in 3.00s
              Doing aes-256-cbc for 3s on 1024 size blocks: 2695729 aes-256-cbc's in 3.00s
              Doing aes-256-cbc for 3s on 8192 size blocks: 334470 aes-256-cbc's in 3.00s
              OpenSSL 1.0.2k-freebsd  26 Jan 2017
              built on: date not available
              options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
              compiler: clang
              The 'numbers' are in 1000s of bytes per second processed.
              type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
              aes-256-cbc     789443.61k   880394.26k   902342.31k   920142.17k   913326.08k
              
              
              
              [2.4.0-BETA][admin@pfsense.localdomain]/root: kldunload aesni
              [2.4.0-BETA][admin@pfsense.localdomain]/: openssl speed -evp aes-256-cbc -elapsed -engine cryptodev
              engine "cryptodev" set.
              You have chosen to measure elapsed time instead of user CPU time.
              Doing aes-256-cbc for 3s on 16 size blocks: 146175678 aes-256-cbc's in 3.00s
              Doing aes-256-cbc for 3s on 64 size blocks: 41289379 aes-256-cbc's in 3.00s
              Doing aes-256-cbc for 3s on 256 size blocks: 10663194 aes-256-cbc's in 3.00s
              Doing aes-256-cbc for 3s on 1024 size blocks: 2674432 aes-256-cbc's in 3.00s
              Doing aes-256-cbc for 3s on 8192 size blocks: 334106 aes-256-cbc's in 3.00s
              OpenSSL 1.0.2k-freebsd  26 Jan 2017
              built on: date not available
              options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
              compiler: clang
              The 'numbers' are in 1000s of bytes per second processed.
              type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
              aes-256-cbc     779603.62k   880840.09k   909925.89k   912872.79k   912332.12k
              
              

              aesni loaded


              {-engine omitted} versus {-engine=cryptodev}

              
              [2.4.0-BETA][admin@pfsense.localdomain]/root: kldload aesni
              [2.4.0-BETA][admin@pfsense.localdomain]/root: openssl speed -evp aes-256-cbc
              Doing aes-256-cbc for 3s on 16 size blocks: 1792739 aes-256-cbc's in 0.34s
              Doing aes-256-cbc for 3s on 64 size blocks: 1996478 aes-256-cbc's in 0.35s
              Doing aes-256-cbc for 3s on 256 size blocks: 1750550 aes-256-cbc's in 0.21s
              Doing aes-256-cbc for 3s on 1024 size blocks: 1202918 aes-256-cbc's in 0.25s
              Doing aes-256-cbc for 3s on 8192 size blocks: 296024 aes-256-cbc's in 0.05s
              OpenSSL 1.0.2k-freebsd  26 Jan 2017
              built on: date not available
              options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
              compiler: clang
              The 'numbers' are in 1000s of bytes per second processed.
              type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
              aes-256-cbc      83443.85k   363447.73k  2124519.35k  4927152.13k 44343380.26k
              
              
              
              [2.4.0-BETA][admin@pfsense.localdomain]/root: kldload aesni
              [2.4.0-BETA][admin@pfsense.localdomain]/: openssl speed -evp aes-256-cbc -engine cryptodev
              engine "cryptodev" set.
              Doing aes-256-cbc for 3s on 16 size blocks: 1821618 aes-256-cbc's in 0.41s
              Doing aes-256-cbc for 3s on 64 size blocks: 2000941 aes-256-cbc's in 0.28s
              Doing aes-256-cbc for 3s on 256 size blocks: 1770129 aes-256-cbc's in 0.23s
              Doing aes-256-cbc for 3s on 1024 size blocks: 1193860 aes-256-cbc's in 0.15s
              Doing aes-256-cbc for 3s on 8192 size blocks: 299654 aes-256-cbc's in 0.03s
              OpenSSL 1.0.2k-freebsd  26 Jan 2017
              built on: date not available
              options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
              compiler: clang
              The 'numbers' are in 1000s of bytes per second processed.
              type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
              aes-256-cbc      70390.07k   455325.24k  1933452.90k  8235874.63k 78552498.18k
              
              

              {-engine omitted} versus {-engine=cryptodev} && {-elapsed}

              
              [2.4.0-BETA][admin@pfsense.localdomain]/root: kldload aesni
              [2.4.0-BETA][admin@pfsense.localdomain]/root: openssl speed -evp aes-256-cbc -elapsed
              You have chosen to measure elapsed time instead of user CPU time.
              Doing aes-256-cbc for 3s on 16 size blocks: 1945418 aes-256-cbc's in 3.00s
              Doing aes-256-cbc for 3s on 64 size blocks: 2012669 aes-256-cbc's in 3.00s
              Doing aes-256-cbc for 3s on 256 size blocks: 1750631 aes-256-cbc's in 3.00s
              Doing aes-256-cbc for 3s on 1024 size blocks: 1200128 aes-256-cbc's in 3.00s
              Doing aes-256-cbc for 3s on 8192 size blocks: 298092 aes-256-cbc's in 3.00s
              OpenSSL 1.0.2k-freebsd  26 Jan 2017
              built on: date not available
              options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
              compiler: clang
              The 'numbers' are in 1000s of bytes per second processed.
              type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
              aes-256-cbc      10375.56k    42936.94k   149387.18k   409643.69k   813989.89k
              
              
              
              [2.4.0-BETA][admin@pfsense.localdomain]/root: kldload aesni
              [2.4.0-BETA][admin@pfsense.localdomain]/: openssl speed -evp aes-256-cbc -elapsed -engine cryptodev
              engine "cryptodev" set.
              You have chosen to measure elapsed time instead of user CPU time.
              Doing aes-256-cbc for 3s on 16 size blocks: 1907305 aes-256-cbc's in 3.01s
              Doing aes-256-cbc for 3s on 64 size blocks: 2009783 aes-256-cbc's in 3.00s
              Doing aes-256-cbc for 3s on 256 size blocks: 1773813 aes-256-cbc's in 3.00s
              Doing aes-256-cbc for 3s on 1024 size blocks: 1205382 aes-256-cbc's in 3.00s
              Doing aes-256-cbc for 3s on 8192 size blocks: 296249 aes-256-cbc's in 3.00s
              OpenSSL 1.0.2k-freebsd  26 Jan 2017
              built on: date not available
              options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
              compiler: clang
              The 'numbers' are in 1000s of bytes per second processed.
              type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
              aes-256-cbc      10145.87k    42875.37k   151365.38k   411437.06k   808957.27k
              
              
              1 Reply Last reply Reply Quote 0
              • L
                LucaTo
                last edited by

                AMD Athlon™ 5350 APU with Radeon(tm) R3
                4 CPUs: 1 package(s) x 4 core(s)
                AES-NI CPU Crypto: Yes (active)

                openssl speed -evp aes-256-cbc
                Doing aes-256-cbc for 3s on 16 size blocks: 52378144 aes-256-cbc's in 3.00s
                Doing aes-256-cbc for 3s on 64 size blocks: 17296394 aes-256-cbc's in 3.00s
                Doing aes-256-cbc for 3s on 256 size blocks: 5031667 aes-256-cbc's in 3.00s
                Doing aes-256-cbc for 3s on 1024 size blocks: 1307810 aes-256-cbc's in 3.00s
                Doing aes-256-cbc for 3s on 8192 size blocks: 165573 aes-256-cbc's in 3.00s
                OpenSSL 1.0.2k-freebsd  26 Jan 2017
                built on: date not available
                options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
                compiler: clang
                The 'numbers' are in 1000s of bytes per second processed.
                type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
                aes-256-cbc     279350.10k   368989.74k   429368.92k   446399.15k   452124.67k
                
                1 Reply Last reply Reply Quote 0
                • K
                  kejianshi
                  last edited by

                  You know what?  I still don't know what is good or bad or what these results mean to me in the real world:

                  openssl speed -evp aes-256-cbc -elapsed
                  You have chosen to measure elapsed time instead of user CPU time.
                  Doing aes-256-cbc for 3s on 16 size blocks: 50744813 aes-256-cbc's in 3.00s
                  Doing aes-256-cbc for 3s on 64 size blocks: 13939575 aes-256-cbc's in 3.00s
                  Doing aes-256-cbc for 3s on 256 size blocks: 3914297 aes-256-cbc's in 3.00s
                  Doing aes-256-cbc for 3s on 1024 size blocks: 1010884 aes-256-cbc's in 3.00s
                  Doing aes-256-cbc for 3s on 8192 size blocks: 127631 aes-256-cbc's in 3.00s
                  OpenSSL 1.0.2g  1 Mar 2016
                  built on: reproducible build, date unspecified
                  options:bn(64,64) rc4(8x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
                  compiler: cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,–noexecstack -Wall -DMD32_REG_T=int -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
                  The 'numbers' are in 1000s of bytes per second processed.
                  type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
                  aes-256-cbc    270639.00k  297377.60k  334020.01k  345048.41k  348517.72k

                  cpuid | grep -i aes
                        AES instruction                        = true
                        AES instruction                        = true
                        AES instruction                        = true
                        AES instruction                        = true
                        AES instruction                        = true
                        AES instruction                        = true
                        AES instruction                        = true
                        AES instruction                        = true

                  Interestingly enough I ran the same test on a VM running on a i7 Q70 that has no aes acceleration at all and the numbers were about half what the AES accelerated chip did.
                  The first test is running on a 8 core AMD 8150 and the second (values are all approx half) ran on a very old wimpy i7 quad core with no AES-NI.

                  I would expect the AMD to run 2 or 3 times faster even if it had no AES-NI.  Basically I don't feel these test mean very much and that the only way to gauge performance is an actual throughput test using vpn traffic.

                  1 Reply Last reply Reply Quote 0
                  • V
                    VAMike
                    last edited by

                    @kejianshi:

                    type            16 bytes    64 bytes    256 bytes  1024 bytes  8192 bytes
                    aes-256-cbc    270639.00k  297377.60k  334020.01k  345048.41k  348517.72k
                    […]
                    nterestingly enough I ran the same test on a VM running on a i7 Q70 that has no aes acceleration at all and the numbers were about half what the AES accelerated chip did.
                    The first test is running on a 8 core AMD 8150 and the second (values are all approx half) ran on a very old wimpy i7 quad core with no AES-NI.

                    I would expect the AMD to run 2 or 3 times faster even if it had no AES-NI.  Basically I don't feel these test mean very much and that the only way to gauge performance is an actual throughput test using vpn traffic.

                    The number of cores is irrelevant, it's a single threaded test. (It's also worth pointing out that your bulldozer era chip isn't really 8 cores, it's 4 cores that have a multi-thread implementation similar to intel's hyperthreading, and the early releases weren't tuned very well.) I don't have any numbers for the FX-8150, but it's is an old CPU, so your results aren't necessarily unreasonable. I have tested bulldozer-based opterons and I'd have expected your results to be a bit higher based on clockspeed, but I don't have the data points to know how the results should scale on the desktop chips of that line. I would double check that you have the cryptodev checkbox turned off because that will slow things down, but that might be as good as it gets.

                    It's important to remember that AES-NI implementations have evolved a lot over the years, so there's a whole lot more to performance than its simple presence. You are correct that the openssl speed results alone aren't going to predict OpenVPN performance, but they are a datapoint that can help predict performance relative to other known systems, and can help establish a ceiling on performance. (E.g., a system that can only perform AES-256-CBC at 30MByte/s is never going to get more than 240Mbit/s of VPN, and much less in practice.)

                    1 Reply Last reply Reply Quote 0
                    • K
                      kejianshi
                      last edited by

                      The AES test is single threaded?  Is openssl also single threaded during normal use?

                      1 Reply Last reply Reply Quote 0
                      • V
                        VAMike
                        last edited by

                        @kejianshi:

                        The AES test is single threaded?  Is openssl also single threaded during normal use?

                        Yes, as is OpenVPN (what you probably mean to be asking about).

                        1 Reply Last reply Reply Quote 0
                        • K
                          kejianshi
                          last edited by

                          Nope - I know that openvpn is single threaded in that each instance gets a single thread.

                          What I'm wondering is do multiple instances of openvpn, which result in multiple openvpn threads each also result in multiple threads of openssl?

                          Example.  Do 4 openvpn instances rely on a single instance of openssl working on the crypt or 4 threads?

                          1 Reply Last reply Reply Quote 0
                          • V
                            VAMike
                            last edited by

                            @kejianshi:

                            Nope - I know that openvpn is single threaded in that each instance gets a single thread.

                            What I'm wondering is do multiple instances of openvpn, which result in multiple openvpn threads each also result in multiple threads of openssl?

                            the "openssl" command line utility is single threaded unless you pass -multi (which produces an output which is pretty meaningless and hard to compare across platforms, just don't do that). The ssl library is single threaded with a process. If you run multiple instances of openvpn you are running multiple independent processes, not threads, and can utilize different cores with each process.

                            You didn't answer whether the cryptodev stuff was disabled in the gui.

                            1 Reply Last reply Reply Quote 0
                            • K
                              kejianshi
                              last edited by

                              Yes - cryptodev is disabled and AES-NI is enabled.  The pfsense VM gets about the same scores at the physical machine also, which is pretty nice to see.

                              I was only in the box to test why its getting random crashes, so I was just playing around and running process to stress the machine to wait for the crash.

                              And it died…  I think the power supply is failing.  Going to have to get that replaced before I can further study the mysteries of AES-NI on the AMD 8150.

                              1 Reply Last reply Reply Quote 0
                              • J
                                jazzl0ver
                                last edited by jazzl0ver

                                Hi all,

                                Version 	2.4.3-RELEASE-p1 (amd64) 
                                CPU Type 	Intel(R) Xeon(R) CPU X5650 @ 2.67GHz 24 CPUs: 2 package(s) x 6 core(s) x 2 hardware threads
                                AES-NI CPU Crypto: Yes (active) 
                                

                                I performed several tests with the following commands:

                                openssl speed -evp aes-128-cbc -elapsed
                                openssl speed -evp aes-128-gcm -elapsed
                                

                                with different Cryptographic Hardware and Kernel PTI settings (+PTI means Kernel PTI is enabled):

                                +------------------------+--------------------------+--------------------------+--------------+--------------+-----------------+-----------------+
                                |                        | AES-NI + Cryptodev + PTI | AES-NI + Cryptodev - PTI | AES-NI + PTI | AES-NI - PTI | Cryptodev + PTI | Cryptodev - PTI |
                                +------------------------+--------------------------+--------------------------+--------------+--------------+-----------------+-----------------+
                                | aes-128-cbc 16 bytes   |                     7189 |                     7794 |       612843 |       612249 |          605915 |          588186 |
                                | aes-128-cbc 8192 bytes |                   568785 |                   591544 |       765053 |       763943 |          763748 |          764321 |
                                | aes-128-gcm 16 bytes   |                   243029 |                   243885 |       238457 |       251084 |          250158 |          229928 |
                                | aes-128-gcm 8192 bytes |                   942211 |                   943865 |       944693 |       943185 |          944543 |          946034 |
                                +------------------------+--------------------------+--------------------------+--------------+--------------+-----------------+-----------------+
                                
                                

                                The router was rebooted after changing each setting.

                                Can anybody explain the very small values in aes-128-cbc 16 bytes test as well as remarkably smaller values in aes-128-cbc 8192 bytes test when both AES-NI and Cryptodev enabled?

                                Thanks in advance!

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  I suggest that when both are enabled the AES-NI module registers itself as a crypto device in the framework for AES-CBC and openssl tries to use it. That results in massive additional switching especially for small packets.
                                  Though there is a load of misinformation surrounding this and I have managed to get it wrong before!

                                  Perhaps more interesting is that you seem to be seeing a better result with PTI enabled in some cases there. I have no explanation for that.

                                  Steve

                                  J V 2 Replies Last reply Reply Quote 0
                                  • J
                                    jazzl0ver @stephenw10
                                    last edited by

                                    @stephenw10 , thanks for your prompt reply!

                                    What is the best Cryptographic Hardware setting then? The router mainly serves as a proxy (haproxy) and openvpn server.
                                    And why does the option "AES-NI and Cryptodev" ever exist if it degrades the performance?

                                    Regarding better results with PTI enabled - they look more like a measurement error.

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      Cryptodev exists because there are other cryptographic accelerators in use on other hardware. Though almost everything easily available is now relatively ancient and surpassed by general software encryption on modern CPUs.
                                      AES-NI exists because some code was not written/compiled to the AES instructions directly and it provides a way to access that.

                                      Personally I use AES-NI only.

                                      Steve

                                      1 Reply Last reply Reply Quote 0
                                      • V
                                        VAMike @stephenw10
                                        last edited by

                                        @stephenw10 said in AES-NI performance:

                                        Perhaps more interesting is that you seem to be seeing a better result with PTI enabled in some cases there. I have no explanation for that.

                                        Run-to-run variation. The affects of PTI should be minimal for this sort of workload. Note that the AES-NI and the cryptodev columns are effectively identical (they're executing the same code) yet they have significant differences in some cases--which are just testing artifacts. Likewise, the AES-GCM tests should be identical in all three columns PTI/non-PTI, but there's noise between runs and not enough samples to average. But mostly the only significant result is the performance of aes-ni cbc+cryptodev--don't do that!

                                        @jazzl0ver said in AES-NI performance:

                                        And why does the option "AES-NI and Cryptodev" ever exist if it degrades the performance?

                                        Bad UI design, basically. And a lot of really misinformed people running tests which confused a lot of other people.

                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          Re-reading this thread is.... painful.

                                          Steve

                                          1 Reply Last reply Reply Quote 0
                                          • J
                                            jazzl0ver
                                            last edited by

                                            Just to confirm: if I leave AES-NI only in Cryptographic Hardware, won't this affect OpenVPN performance which Hardware Crypto setting is BSD cryptodev engine? Or I'll have to change it to No Hardware Crypto Acceleration (since it will still utilize internal OpenSSL's AES-NI code)?

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.