Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    PFsense on a Poweredge 1850

    Scheduled Pinned Locked Moved Hardware
    25 Posts 6 Posters 4.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • V
      vman76
      last edited by

      I was able to get my paws on poweredge 1950 which is a newer generation of hardware with the following specs:

      Intel(R) Xeon(R) CPU 5130 @ 2.00GHz
      4 CPUs: 2 package(s) x 2 core(s)

      4GB Ram and a PCI-e quad port intel Pro/1000 PT card. I've got the 1850's config loaded on there and am eager to test it out.

      em0@pci0:14:0:0:        class=0x020000 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00
          class      = network
          subclass  = ethernet
          cap 01[c8] = powerspec 2  supports D0 D3  current D0
          cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
          cap 10[e0] = PCI-Express 1 endpoint max data 256(256) link x4(x4)
      ecap 0001[100] = AER 1 0 fatal 1 non-fatal 0 corrected
      ecap 0003[140] = Serial 1 001517ffff8525cc

      em1@pci0:14:0:1:        class=0x020000 card=0x10a48086 chip=0x10a48086 rev=0x06 hdr=0x00
          class      = network
          subclass  = ethernet
          cap 01[c8] = powerspec 2  supports D0 D3  current D0
          cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
          cap 10[e0] = PCI-Express 1 endpoint max data 256(256) link x4(x4)
      ecap 0001[100] = AER 1 0 fatal 1 non-fatal 0 corrected
      ecap 0003[140] = Serial 1 001517ffff8525cc

      1 Reply Last reply Reply Quote 0
      • B
        bryan.paradis
        last edited by

        PCI-X should have plenty enough bandwidth to max out that card without issue. There could be something screwing with it in the bios. Did you try the second PCI-X slot?

        Also I am not sure but 1850s have different risers for different card configurations. Grabbing the PCI-E riser off ebay might get you to PCI-E but you shouldn't need to anyway!

        http://bsdrp.net/documentation/technical_docs/performance

        Check out that link. Lots of information there.

        vmstat -i
        

        It wouldn't be interrupt related would it?

        sysctl hw.em.0
        

        What is the rest of that output?

        1 Reply Last reply Reply Quote 0
        • V
          vman76
          last edited by

          @bryan.paradis:

          PCI-X should have plenty enough bandwidth to max out that card without issue. There could be something screwing with it in the bios. Did you try the second PCI-X slot?

          Also I am not sure but 1850s have different risers for different card configurations. Grabbing the PCI-E riser off ebay might get you to PCI-E but you shouldn't need to anyway!

          http://bsdrp.net/documentation/technical_docs/performance

          Check out that link. Lots of information there.

          vmstat -i
          

          It wouldn't be interrupt related would it?

          sysctl hw.em.0
          

          What is the rest of that output?

          Hello Bryan,
          I agree that the PCI-X should have enough bandwidth too. I don't think the Mbps is the issue.I didn't try the other side of the riser card.  I think it's the amount of packets and how fast they're coming into the NIC. This firewall is sitting behind a Cisco ASR-1002 which can forward a ton of packets faster than the NIC on the PFsense can take them in. I think that's why I see the same exact issue (overruns on input) with our ASA 5500 albeit at a higher PPS. Based on tighter sampling, I can see on the  ASR that is is forwarding over 100,000 pps to the firewalls.

          We can use this 1850 for something else at the college so I swapped it for newer server with a better NIC. I need to get this firewall back in action ASAP so that seemed like my best option.

          That's a great link and it's the one I used to do most of the troubleshooting when I saw the issue.

          I'm not sure if it's interrupt related. I spent more time troubleshooting the ASA than the 1850 and the 2 scenarios that may be happening here: (from Cisco site):

          Software level - The ASA software does not pull the packets off of the interface FIFO queue fast enough. This causes the FIFO queue to fill up and new packets to be dropped.

          Hardware level - The rate at which packets come into the interface is too fast, which causes the FIFO queue to fill before the ASA software can pull the packets off. Usually, a burst of packets causes the FIFO queue to fill up to maximum capacity in a short amount of time.

          The CPU on the ASA wasn't anywhere near maxed out and the PFSense CPU was also not taxed so I think its the latter of the 2 scenarios. I attached a graph of the CPU for the PF during peak usage (350-400 Mbps)

          The full output of the sysctl command  from this AM before I decommissioned the 1850:

          dev.em.0.%desc: Intel(R) PRO/1000 Legacy Network Connection 1.0.4
          dev.em.0.%driver: em
          dev.em.0.%location: slot=11 function=0
          dev.em.0.%pnpinfo: vendor=0x8086 device=0x1010 subvendor=0x8086 subdevice=0x1012 class=0x020000
          dev.em.0.%parent: pci3
          dev.em.0.nvm: -1
          dev.em.0.rx_int_delay: 0
          dev.em.0.tx_int_delay: 66
          dev.em.0.rx_abs_int_delay: 66
          dev.em.0.tx_abs_int_delay: 66
          dev.em.0.rx_processing_limit: 100
          dev.em.0.flow_control: 3
          dev.em.0.mbuf_alloc_fail: 0
          dev.em.0.cluster_alloc_fail: 0
          dev.em.0.dropped: 0
          dev.em.0.tx_dma_fail: 0
          dev.em.0.tx_desc_fail1: 0
          dev.em.0.tx_desc_fail2: 4
          dev.em.0.rx_overruns: 77194
          dev.em.0.watchdog_timeouts: 0
          dev.em.0.device_control: 1223688777
          dev.em.0.rx_control: 32770
          dev.em.0.fc_high_water: 47104
          dev.em.0.fc_low_water: 45604
          dev.em.0.fifo_workaround: 0
          dev.em.0.fifo_reset: 0
          dev.em.0.txd_head: 49
          dev.em.0.txd_tail: 49
          dev.em.0.rxd_head: 164
          dev.em.0.rxd_tail: 163
          dev.em.0.mac_stats.excess_coll: 0
          dev.em.0.mac_stats.single_coll: 0
          dev.em.0.mac_stats.multiple_coll: 0
          dev.em.0.mac_stats.late_coll: 0
          dev.em.0.mac_stats.collision_count: 0
          dev.em.0.mac_stats.symbol_errors: 0
          dev.em.0.mac_stats.sequence_errors: 0
          dev.em.0.mac_stats.defer_count: 3567
          dev.em.0.mac_stats.missed_packets: 6059754
          dev.em.0.mac_stats.recv_no_buff: 7508997
          dev.em.0.mac_stats.recv_undersize: 0
          dev.em.0.mac_stats.recv_fragmented: 0
          dev.em.0.mac_stats.recv_oversize: 0
          dev.em.0.mac_stats.recv_jabber: 0
          dev.em.0.mac_stats.recv_errs: 0
          dev.em.0.mac_stats.crc_errs: 0
          dev.em.0.mac_stats.alignment_errs: 0
          dev.em.0.mac_stats.coll_ext_errs: 0
          dev.em.0.mac_stats.xon_recvd: 3591
          dev.em.0.mac_stats.xon_txd: 0
          dev.em.0.mac_stats.xoff_recvd: 3591
          dev.em.0.mac_stats.xoff_txd: 0
          dev.em.0.mac_stats.total_pkts_recvd: 20984938718
          dev.em.0.mac_stats.good_pkts_recvd: 20978871785
          dev.em.0.mac_stats.bcast_pkts_recvd: 55671
          dev.em.0.mac_stats.mcast_pkts_recvd: 42983
          dev.em.0.mac_stats.rx_frames_64: 411105803
          dev.em.0.mac_stats.rx_frames_65_127: 1531294228
          dev.em.0.mac_stats.rx_frames_128_255: 670658750
          dev.em.0.mac_stats.rx_frames_256_511: 290321790
          dev.em.0.mac_stats.rx_frames_512_1023: 366207236
          dev.em.0.mac_stats.rx_frames_1024_1522: 17709283978
          dev.em.0.mac_stats.good_octets_recvd: 27173769214521
          dev.em.0.mac_stats.good_octets_txd: 2201587061146
          dev.em.0.mac_stats.total_pkts_txd: 11657222216
          dev.em.0.mac_stats.good_pkts_txd: 11657222216
          dev.em.0.mac_stats.bcast_pkts_txd: 3179
          dev.em.0.mac_stats.mcast_pkts_txd: 2
          dev.em.0.mac_stats.tx_frames_64: 4253849187
          dev.em.0.mac_stats.tx_frames_65_127: 5647725507
          dev.em.0.mac_stats.tx_frames_128_255: 455801927
          dev.em.0.mac_stats.tx_frames_256_511: 188977807
          dev.em.0.mac_stats.tx_frames_512_1023: 278759522
          dev.em.0.mac_stats.tx_frames_1024_1522: 832108266
          dev.em.0.mac_stats.tso_txd: 0
          dev.em.0.mac_stats.tso_ctx_fail: 0

          PF-CPU.jpg
          PF-CPU.jpg_thumb

          1 Reply Last reply Reply Quote 0
          • B
            bryan.paradis
            last edited by

            100,000 pps really doesn't seem like much?

            A Ubiquiti Edge Router should be able to pound out 10 times that in certain cases.

            Did you try turning on polling for the interface?

            ifconfig interface polling
            

            http://www.cyberciti.biz/faq/freebsd-device-polling-network-polling-tutorial/

            For an idea on sort of performance potential in that pci-x nic check here:

            http://pdos.csail.mit.edu/~rtm/e1000/

            missed errors and no buffer errors advice on this page at the bottom

            https://nuclearcat.com/mediawiki/index.php/Intel_Gigabit_Performance

            and more tuning information

            https://calomel.org/freebsd_network_tuning.html

            1 Reply Last reply Reply Quote 0
            • J
              jasonlitka
              last edited by

              @bryan.paradis:

              A Ubiquiti Edge Router should be able to pound out 10 times that in certain cases.

              That's debatable. Just because they said it could doesn't mean it can.

              I can break anything.

              1 Reply Last reply Reply Quote 0
              • V
                vman76
                last edited by

                @bryan.paradis:

                100,000 pps really doesn't seem like much?

                A Ubiquiti Edge Router should be able to pound out 10 times that in certain cases.

                Did you try turning on polling for the interface?

                ifconfig interface polling
                

                http://www.cyberciti.biz/faq/freebsd-device-polling-network-polling-tutorial/

                For an idea on sort of performance potential in that pci-x nic check here:

                http://pdos.csail.mit.edu/~rtm/e1000/

                missed errors and no buffer errors advice on this page at the bottom

                https://nuclearcat.com/mediawiki/index.php/Intel_Gigabit_Performance

                and more tuning information

                https://calomel.org/freebsd_network_tuning.html

                Thanks for all the links!

                I'm always leary about PPS numbers advertised that aren't taken in production environments. The Cisco 7206 VXR NPE-G1 also is spec'd out at 1,000,000 PPS. In our environment by the time it gets to 150,000 PPS @ 600 Mbps, it'll be dropping as well. Especially if any ACLs or features are enabled.

                I have the 1950 running and Iperf between 2 directly connected hosts shows promisinng numbers.  960 Mbps and 120,000 PPS with no input errors or drops. The CPU hung around 30% during the test. Production traffic will show it's true colors.

                packets  errs idrops      bytes    packets  errs      bytes colls
                      115k    0    0      116M      115k    0      116M    0
                      113k    0    0      114M      113k    0      114M    0
                      115k    0    0      116M      115k    0      116M    0
                      113k    0    0      114M      113k    0      114M    0
                      115k    0    0      116M      115k    0      116M    0
                      114k    0    0      115M      114k    0      115M    0

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Real world numbers are always great to have.  :)

                  I would have expected the 1850 to manage substantially more though. I have no numbers to prove it.  ::)

                  Steve

                  1 Reply Last reply Reply Quote 0
                  • B
                    bryan.paradis
                    last edited by

                    @vman76:

                    @bryan.paradis:

                    100,000 pps really doesn't seem like much?

                    A Ubiquiti Edge Router should be able to pound out 10 times that in certain cases.

                    Did you try turning on polling for the interface?

                    ifconfig interface polling
                    

                    http://www.cyberciti.biz/faq/freebsd-device-polling-network-polling-tutorial/

                    For an idea on sort of performance potential in that pci-x nic check here:

                    http://pdos.csail.mit.edu/~rtm/e1000/

                    missed errors and no buffer errors advice on this page at the bottom

                    https://nuclearcat.com/mediawiki/index.php/Intel_Gigabit_Performance

                    and more tuning information

                    https://calomel.org/freebsd_network_tuning.html

                    Thanks for all the links!

                    I'm always leary about PPS numbers advertised that aren't taken in production environments. The Cisco 7206 VXR NPE-G1 also is spec'd out at 1,000,000 PPS. In our environment by the time it gets to 150,000 PPS @ 600 Mbps, it'll be dropping as well. Especially if any ACLs or features are enabled.

                    I have the 1950 running and Iperf between 2 directly connected hosts shows promisinng numbers.  960 Mbps and 120,000 PPS with no input errors or drops. The CPU hung around 30% during the test. Production traffic will show it's true colors.

                    packets  errs idrops      bytes    packets  errs      bytes colls
                          115k    0    0      116M      115k    0      116M    0
                          113k    0    0      114M      113k    0      114M    0
                          115k    0    0      116M      115k    0      116M    0
                          113k    0    0      114M      113k    0      114M    0
                          115k    0    0      116M      115k    0      116M    0
                          114k    0    0      115M      114k    0      115M    0

                    That is looking better for sure. mind posting the sysctl for that guy? Also what size packets are you using or were using in the test?

                    @stephenw10:

                    Real world numbers are always great to have.  :)

                    I would have expected the 1850 to manage substantially more though. I have no numbers to prove it.  ::)

                    Steve

                    It is just really too low for the 1850 imo.

                    http://dl.ubnt.com/Tolly212127UbiquitiEdgeRouterLitePricePerformance.pdf It is a Tolly report looking for PPS from another reviewer. These things are wicked fast really for what they are. People have freebsd running on them already!

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Impressive.
                      The ERL has a custom ASIC to enable it to perform like that. It's not supported by FreeBSD, so if/when pfSense runs on it don't expect those numbers. Currently tops out at 250Mbps.

                      Steve

                      1 Reply Last reply Reply Quote 0
                      • V
                        vman76
                        last edited by

                        @bryan.paradis:

                        That is looking better for sure. mind posting the sysctl for that guy? Also what size packets are you using or were using in the test?

                        Sure, here is the current data . The firewall is now in production and averaging 150Mbps, @ 24,000 PPS with no issues since around noon. I tried various iperfs but the money spot was this one:

                        iperf -c –w 65000 –t 600 –P5

                        Which should use the full Ethernet frame. I tried a bunch of other windows sizes and more flows (up to -P 50) along with UDP tests. The above gave me the best results.Looking at the distribution of packets on the last firewall, and on routes netflow roue-cache the students use mostly applications with large packets (video streaming, filesharing etc). I'd like to have done some more testing but time constraints did not allow it.

                        dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.3.2
                        dev.em.0.%driver: em
                        dev.em.0.%location: slot=0 function=0
                        dev.em.0.%pnpinfo: vendor=0x8086 device=0x10a4 subvendor=0x8086 subdevice=0x10a4 class=0x020000
                        dev.em.0.%parent: pci14
                        dev.em.0.nvm: -1
                        dev.em.0.debug: -1
                        dev.em.0.fc: 3
                        dev.em.0.rx_int_delay: 0
                        dev.em.0.tx_int_delay: 66
                        dev.em.0.rx_abs_int_delay: 66
                        dev.em.0.tx_abs_int_delay: 66
                        dev.em.0.rx_processing_limit: 100
                        dev.em.0.eee_control: 0
                        dev.em.0.link_irq: 0
                        dev.em.0.mbuf_alloc_fail: 0
                        dev.em.0.cluster_alloc_fail: 0
                        dev.em.0.dropped: 0
                        dev.em.0.tx_dma_fail: 0
                        dev.em.0.rx_overruns: 0
                        dev.em.0.watchdog_timeouts: 0
                        dev.em.0.device_control: 1209795137
                        dev.em.0.rx_control: 67141634
                        dev.em.0.fc_high_water: 30720
                        dev.em.0.fc_low_water: 29220
                        dev.em.0.queue0.txd_head: 192
                        dev.em.0.queue0.txd_tail: 192
                        dev.em.0.queue0.tx_irq: 0
                        dev.em.0.queue0.no_desc_avail: 0
                        dev.em.0.queue0.rxd_head: 531
                        dev.em.0.queue0.rxd_tail: 530
                        dev.em.0.queue0.rx_irq: 0
                        dev.em.0.mac_stats.excess_coll: 0
                        dev.em.0.mac_stats.single_coll: 0
                        dev.em.0.mac_stats.multiple_coll: 0
                        dev.em.0.mac_stats.late_coll: 0
                        dev.em.0.mac_stats.collision_count: 0
                        dev.em.0.mac_stats.symbol_errors: 0
                        dev.em.0.mac_stats.sequence_errors: 0
                        dev.em.0.mac_stats.defer_count: 5793
                        dev.em.0.mac_stats.missed_packets: 0
                        dev.em.0.mac_stats.recv_no_buff: 139
                        dev.em.0.mac_stats.recv_undersize: 0
                        dev.em.0.mac_stats.recv_fragmented: 0
                        dev.em.0.mac_stats.recv_oversize: 0
                        dev.em.0.mac_stats.recv_jabber: 0
                        dev.em.0.mac_stats.recv_errs: 0
                        dev.em.0.mac_stats.crc_errs: 0
                        dev.em.0.mac_stats.alignment_errs: 0
                        dev.em.0.mac_stats.coll_ext_errs: 0
                        dev.em.0.mac_stats.xon_recvd: 5929
                        dev.em.0.mac_stats.xon_txd: 120
                        dev.em.0.mac_stats.xoff_recvd: 5929
                        dev.em.0.mac_stats.xoff_txd: 120
                        dev.em.0.mac_stats.total_pkts_recvd: 397413786
                        dev.em.0.mac_stats.good_pkts_recvd: 397401928
                        dev.em.0.mac_stats.bcast_pkts_recvd: 2715
                        dev.em.0.mac_stats.mcast_pkts_recvd: 1528
                        dev.em.0.mac_stats.rx_frames_64: 11419946
                        dev.em.0.mac_stats.rx_frames_65_127: 24122771
                        dev.em.0.mac_stats.rx_frames_128_255: 5438765
                        dev.em.0.mac_stats.rx_frames_256_511: 2942593
                        dev.em.0.mac_stats.rx_frames_512_1023: 13221690
                        dev.em.0.mac_stats.rx_frames_1024_1522: 340256163
                        dev.em.0.mac_stats.good_octets_recvd: 504144384891
                        dev.em.0.mac_stats.good_octets_txd: 70175650866
                        dev.em.0.mac_stats.total_pkts_txd: 199599490
                        dev.em.0.mac_stats.good_pkts_txd: 199599248
                        dev.em.0.mac_stats.bcast_pkts_txd: 1616
                        dev.em.0.mac_stats.mcast_pkts_txd: 2
                        dev.em.0.mac_stats.tx_frames_64: 83244952
                        dev.em.0.mac_stats.tx_frames_65_127: 68946765
                        dev.em.0.mac_stats.tx_frames_128_255: 3324597
                        dev.em.0.mac_stats.tx_frames_256_511: 2036340
                        dev.em.0.mac_stats.tx_frames_512_1023: 3106394
                        dev.em.0.mac_stats.tx_frames_1024_1522: 38940203
                        dev.em.0.mac_stats.tso_txd: 0
                        dev.em.0.mac_stats.tso_ctx_fail: 0
                        dev.em.0.interrupts.asserts: 106244188
                        dev.em.0.interrupts.rx_pkt_timer: 39933
                        dev.em.0.interrupts.rx_abs_timer: 0
                        dev.em.0.interrupts.tx_pkt_timer: 5731
                        dev.em.0.interrupts.tx_abs_timer: 11354
                        dev.em.0.interrupts.tx_queue_empty: 0
                        dev.em.0.interrupts.tx_queue_min_thresh: 0
                        dev.em.0.interrupts.rx_desc_min_thresh: 0
                        dev.em.0.interrupts.rx_overrun: 0

                        1 Reply Last reply Reply Quote 0
                        • B
                          bryan.paradis
                          last edited by

                          @stephenw10:

                          Impressive.
                          The ERL has a custom ASIC to enable it to perform like that. It's not supported by FreeBSD, so if/when pfSense runs on it don't expect those numbers. Currently tops out at 250Mbps.

                          Steve

                          Yes indeed. It is a heavily changed vyatta base OS on debian mips cavicum. The driver would need to be ported. Still at $99

                          http://rtfm.net/FreeBSD/ERL/

                          Performance could be a little better, though it's more than adequate for my home Internet connection. Basic packet passing between two Gigabit hosts seems to top out at about 250Mbits/sec.

                          https://wiki.freebsd.org/FreeBSD/mips/Octeon

                          @vman76:

                          @bryan.paradis:

                          That is looking better for sure. mind posting the sysctl for that guy? Also what size packets are you using or were using in the test?

                          Sure, here is the current data . The firewall is now in production and averaging 150Mbps, @ 24,000 PPS with no issues since around noon. I tried various iperfs but the money spot was this one:

                          iperf -c –w 65000 –t 600 –P5

                          Which should use the full Ethernet frame. I tried a bunch of other windows sizes and more flows (up to -P 50) along with UDP tests. The above gave me the best results.Looking at the distribution of packets on the last firewall, and on routes netflow roue-cache the students use mostly applications with large packets (video streaming, filesharing etc). I'd like to have done some more testing but time constraints did not allow it.

                          dev.em.0.%desc: Intel(R) PRO/1000 Network Connection 7.3.2
                          dev.em.0.%driver: em
                          dev.em.0.%location: slot=0 function=0
                          dev.em.0.%pnpinfo: vendor=0x8086 device=0x10a4 subvendor=0x8086 subdevice=0x10a4 class=0x020000
                          dev.em.0.%parent: pci14
                          dev.em.0.nvm: -1
                          dev.em.0.debug: -1
                          dev.em.0.fc: 3
                          dev.em.0.rx_int_delay: 0
                          dev.em.0.tx_int_delay: 66
                          dev.em.0.rx_abs_int_delay: 66
                          dev.em.0.tx_abs_int_delay: 66
                          dev.em.0.rx_processing_limit: 100
                          dev.em.0.eee_control: 0
                          dev.em.0.link_irq: 0
                          dev.em.0.mbuf_alloc_fail: 0
                          dev.em.0.cluster_alloc_fail: 0
                          dev.em.0.dropped: 0
                          dev.em.0.tx_dma_fail: 0
                          dev.em.0.rx_overruns: 0
                          dev.em.0.watchdog_timeouts: 0
                          dev.em.0.device_control: 1209795137
                          dev.em.0.rx_control: 67141634
                          dev.em.0.fc_high_water: 30720
                          dev.em.0.fc_low_water: 29220
                          dev.em.0.queue0.txd_head: 192
                          dev.em.0.queue0.txd_tail: 192
                          dev.em.0.queue0.tx_irq: 0
                          dev.em.0.queue0.no_desc_avail: 0
                          dev.em.0.queue0.rxd_head: 531
                          dev.em.0.queue0.rxd_tail: 530
                          dev.em.0.queue0.rx_irq: 0
                          dev.em.0.mac_stats.excess_coll: 0
                          dev.em.0.mac_stats.single_coll: 0
                          dev.em.0.mac_stats.multiple_coll: 0
                          dev.em.0.mac_stats.late_coll: 0
                          dev.em.0.mac_stats.collision_count: 0
                          dev.em.0.mac_stats.symbol_errors: 0
                          dev.em.0.mac_stats.sequence_errors: 0
                          dev.em.0.mac_stats.defer_count: 5793
                          dev.em.0.mac_stats.missed_packets: 0
                          dev.em.0.mac_stats.recv_no_buff: 139
                          dev.em.0.mac_stats.recv_undersize: 0
                          dev.em.0.mac_stats.recv_fragmented: 0
                          dev.em.0.mac_stats.recv_oversize: 0
                          dev.em.0.mac_stats.recv_jabber: 0
                          dev.em.0.mac_stats.recv_errs: 0
                          dev.em.0.mac_stats.crc_errs: 0
                          dev.em.0.mac_stats.alignment_errs: 0
                          dev.em.0.mac_stats.coll_ext_errs: 0
                          dev.em.0.mac_stats.xon_recvd: 5929
                          dev.em.0.mac_stats.xon_txd: 120
                          dev.em.0.mac_stats.xoff_recvd: 5929
                          dev.em.0.mac_stats.xoff_txd: 120
                          dev.em.0.mac_stats.total_pkts_recvd: 397413786
                          dev.em.0.mac_stats.good_pkts_recvd: 397401928
                          dev.em.0.mac_stats.bcast_pkts_recvd: 2715
                          dev.em.0.mac_stats.mcast_pkts_recvd: 1528
                          dev.em.0.mac_stats.rx_frames_64: 11419946
                          dev.em.0.mac_stats.rx_frames_65_127: 24122771
                          dev.em.0.mac_stats.rx_frames_128_255: 5438765
                          dev.em.0.mac_stats.rx_frames_256_511: 2942593
                          dev.em.0.mac_stats.rx_frames_512_1023: 13221690
                          dev.em.0.mac_stats.rx_frames_1024_1522: 340256163
                          dev.em.0.mac_stats.good_octets_recvd: 504144384891
                          dev.em.0.mac_stats.good_octets_txd: 70175650866
                          dev.em.0.mac_stats.total_pkts_txd: 199599490
                          dev.em.0.mac_stats.good_pkts_txd: 199599248
                          dev.em.0.mac_stats.bcast_pkts_txd: 1616
                          dev.em.0.mac_stats.mcast_pkts_txd: 2
                          dev.em.0.mac_stats.tx_frames_64: 83244952
                          dev.em.0.mac_stats.tx_frames_65_127: 68946765
                          dev.em.0.mac_stats.tx_frames_128_255: 3324597
                          dev.em.0.mac_stats.tx_frames_256_511: 2036340
                          dev.em.0.mac_stats.tx_frames_512_1023: 3106394
                          dev.em.0.mac_stats.tx_frames_1024_1522: 38940203
                          dev.em.0.mac_stats.tso_txd: 0
                          dev.em.0.mac_stats.tso_ctx_fail: 0
                          dev.em.0.interrupts.asserts: 106244188
                          dev.em.0.interrupts.rx_pkt_timer: 39933
                          dev.em.0.interrupts.rx_abs_timer: 0
                          dev.em.0.interrupts.tx_pkt_timer: 5731
                          dev.em.0.interrupts.tx_abs_timer: 11354
                          dev.em.0.interrupts.tx_queue_empty: 0
                          dev.em.0.interrupts.tx_queue_min_thresh: 0
                          dev.em.0.interrupts.rx_desc_min_thresh: 0
                          dev.em.0.interrupts.rx_overrun: 0

                          Interesting! Thanks for posting.

                          1 Reply Last reply Reply Quote 0
                          • V
                            vman76
                            last edited by

                            Well it looks I found the hardware limits of the new server as well. We were able to push about 500Mbps and 80,000 PPS with no issue. Once we get to the 600Mbps and 100,000 PPS we get input errors (NIC buffer overruns). While doing some realtime troubleshooting, I noticed that the errors occur exactly when the one of 4 CPU's hits 100% .(kernel em0 queue) process. em0 is my otuside interfaces. So it appears my earlier suspicion applies in this case and the CPU  is too busy to  pull the packets off the NIC buffer in time and I end up with overruns. The CPU I'm using is a Intel(R) Xeon(R) CPU 5130 @ 2.00GHz so it looks like I'm going to be searching for another box. I'm doing 1to1 NAT on over 5,000 hosts so I think that might be driving the CPU higher than I expected. The attached pic shows CPU1 at 84% but "top -P" shows that it gets to 100% when the packet loss occurs.

                            I'd love to put the Ubiquiti Edgerouter inline and test their PPS claim here since I'm way under 1,000,000 PPS  :P (j/k)

                            Out of curiosity, does anyone know why the RRD graphs don't show individual CPU/core stats?  The CPU data there looks like its the average of all 4 CPU's which doesn't real help in troubleshooting a problem like this. I did an snmpwalk and found utilization data for all the CPU's so I'm graphing it separately in cacti now. (HOST-RESOURCES-MIB::hrProcessorLoad.x)

                            Some data from my troubleshooting is below in case some spots something . I have a lot of experience troubleshooting networks in general but I'm very new to BSD so I could be missing something.

                            input        (Total)          output
                              packets  errs idrops      bytes    packets  errs      bytes colls
                                  86k    83    0        73M        87k    0        73M    0
                                  100k  155    0        85M      101k    0        85M    0
                                  96k    0    0        82M        97k    0        82M    0
                                  99k    74    0        82M      101k    0        82M    0
                                  96k    0    0        82M        98k    0        82M    0

                            dev.em.0.mac_stats.missed_packets: 2294752
                            dev.em.0.mac_stats.recv_no_buff: 4617837
                            dev.em.0.mac_stats.recv_undersize: 0
                            dev.em.0.mac_stats.recv_fragmented: 0
                            dev.em.0.mac_stats.recv_oversize: 0
                            dev.em.0.mac_stats.recv_jabber: 0
                            dev.em.0.mac_stats.recv_errs: 0
                            dev.em.0.mac_stats.crc_errs: 0
                            dev.em.0.mac_stats.alignment_errs: 0
                            dev.em.0.mac_stats.coll_ext_errs: 0
                            dev.em.0.mac_stats.xon_recvd: 9112
                            dev.em.0.mac_stats.xon_txd: 120
                            dev.em.0.mac_stats.xoff_recvd: 9112
                            dev.em.0.mac_stats.xoff_txd: 120
                            dev.em.0.mac_stats.total_pkts_recvd: 10671726540
                            dev.em.0.mac_stats.good_pkts_recvd: 10669413564
                            dev.em.0.mac_stats.bcast_pkts_recvd: 15097
                            dev.em.0.mac_stats.mcast_pkts_recvd: 9664
                            dev.em.0.mac_stats.rx_frames_64: 240300603
                            dev.em.0.mac_stats.rx_frames_65_127: 744037531
                            dev.em.0.mac_stats.rx_frames_128_255: 281908686
                            dev.em.0.mac_stats.rx_frames_256_511: 135974542
                            dev.em.0.mac_stats.rx_frames_512_1023: 172724810
                            dev.em.0.mac_stats.rx_frames_1024_1522: 9094467392
                            dev.em.0.mac_stats.good_octets_recvd: 13931850472813
                            dev.em.0.mac_stats.good_octets_txd: 1173620928614
                            dev.em.0.mac_stats.total_pkts_txd: 5912173538
                            dev.em.0.mac_stats.good_pkts_txd: 5912173297
                            dev.em.0.mac_stats.bcast_pkts_txd: 2117
                            dev.em.0.mac_stats.mcast_pkts_txd: 2

                            : vmstat -i
                            interrupt                          total      rate
                            irq14: ata0                          376          0
                            irq20: uhci1                      437491          0
                            irq21: uhci0 uhci2+              541201          0
                            cpu0: timer                  1165155769      1997
                            irq256: bce0                    23965829        41
                            irq257: mfi0                    1297902          2
                            irq258: em0                  2536851814      4350
                            irq259: em1                  2695135942      4621
                            cpu2: timer                  1165155721      1997
                            cpu3: timer                  1165155724      1997
                            cpu1: timer                  1165155721      1997
                            Total                        9918853490      17008

                            highCPU.jpg
                            highCPU.jpg_thumb

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              I don't really have experience at this sort of traffic level but it seems like you should be able to do better than that on those servers. That's just a general impression though. It would be useful to get an opinion from someone more experienced.

                              Could this be a situation where IP fastforwarding could be usefully enabled? It can cause problems, notably with IPSec.
                              https://forum.pfsense.org/index.php?topic=57723.0

                              What hardware offloading options do you have enabled?

                              Steve

                              1 Reply Last reply Reply Quote 0
                              • V
                                vman76
                                last edited by

                                @stephenw10:

                                I don't really have experience at this sort of traffic level but it seems like you should be able to do better than that on those servers. That's just a general impression though. It would be useful to get an opinion from someone more experienced.

                                Could this be a situation where IP fastforwarding could be usefully enabled? It can cause problems, notably with IPSec.
                                https://forum.pfsense.org/index.php?topic=57723.0

                                Steve

                                I thought it could do better too but the numbers say otherwise. I have a simple ruleset of about 5 rules on each interface. I have not loaded any packages. No VPN. I do log everything to syslog but that is a requirement that I can't get away from.

                                Hmm, interesting option. We will not be using IPSec terminated directly on this box so that's not an issue. However ,students do use VPN clients which will go through the firewall. I have to research it more to see if anything else might break by applying it. With over 3,000 users with every device you can imagine a student might bring into a dorm room, I'm apprehensive  on what it might break.

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  Hmm, I imagine it would break IPSec through the box and probably generate some complaints! It can dramatically increase throughput in some instances though. There may other opportunities for tuning though.

                                  Earlier I said that the ERL had an ASIC to increase throughput but I think that was wrong (I can't edit it now). It looks like it has a closed source IP forwarding module that can run separately on one of it's 8 cores. No chance of a FreeBSD driver but maybe an equivalent in the future.

                                  Steve

                                  1 Reply Last reply Reply Quote 0
                                  • P
                                    podilarius
                                    last edited by

                                    The results are somewhat expected. currently pfSense is using an old pf that is single core only. The only real reason to run pfsense on a multicore is for the addons to use the other cores while pf filtering is stuck on one.
                                    The faster the clock speed of a single core, the more throughput you will observe.  The pfSense hardware sizing have 2GHz machines topping out at around 500Mbps. You got it to go a bit higher. I would imagine that you could get a lot more if you have a 3.6GHz or an over clocked machine at 4Ghz.
                                    There has been talk about upgrading to the newer pf, but I don't know much about it or even when. Perhaps 2.2 or 2.3. It should have multicore if based on the newer code. (Note, I am not with ESF and I don't know the plans, at all.) Just hoping that we can get to multicore/multithreaded before I need it.

                                    1 Reply Last reply Reply Quote 0
                                    • V
                                      vman76
                                      last edited by

                                      @podilarius:

                                      The results are somewhat expected. currently pfSense is using an old pf that is single core only. The only real reason to run pfsense on a multicore is for the addons to use the other cores while pf filtering is stuck on one.
                                      The faster the clock speed of a single core, the more throughput you will observe.  The pfSense hardware sizing have 2GHz machines topping out at around 500Mbps. You got it to go a bit higher. I would imagine that you could get a lot more if you have a 3.6GHz or an over clocked machine at 4Ghz.
                                      There has been talk about upgrading to the newer pf, but I don't know much about it or even when. Perhaps 2.2 or 2.3. It should have multicore if based on the newer code. (Note, I am not with ESF and I don't know the plans, at all.) Just hoping that we can get to multicore/multithreaded before I need it.

                                      I looked at CPU requirements and saw a 3 Ghz was recommended but it doesn't mention anything about the CPU architecture. The Dell 1850 in the beginning of this thread was a 3 Ghz Xeon but an older architecture (800 FSB). My current 2 Ghz (1333 FSB) is pushing twice the traffic so it gets kind of tricky comparing the older CPU's with the newer models.

                                      Do you know what name of the actual PF process is so I could monitor it? I see that the kernel process is the one taking up all the CPU and it is across 2 cores (cpu1 em0, cpu2 em1 in my last screenshot). Is  that actual OS pulling packets off the NIC before packet filtering process? I'm used to the Cisco ASAs where I would look at the dispatcher process for filtering CPU usage. Not sure what the equivalent is here.

                                      Lastly, do you know what the "top" command equivalent to Diagnostics–>System activity is?  The close I got to it was "top -P" but didn't show me as much detail as the System Activity menu.

                                      Thanks for you patience with my newb questions.

                                      1 Reply Last reply Reply Quote 0
                                      • P
                                        podilarius
                                        last edited by

                                        I agree it doesn't mention that, but if you went with a 1950 with faster proc, you might do well.
                                        Not sure about the top command, but you can do a ps -ef while that is running and it would probably tell you.

                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          top -SH

                                          The hardware guide is little outdated as you've found.

                                          Steve

                                          1 Reply Last reply Reply Quote 0
                                          • A
                                            Aluminum
                                            last edited by

                                            In the little bit of reading I've done its basically about how many interrupts a second the core talking to that device can do, so clockspeed is judge, jury and executioner.
                                            (and since newer architectures have improved IPC over time I would think that might include interrupts as well but not sure?)

                                            The HFT guys apparently have the same problems that busy networks do, but makes sense as both are doing tons of small random I/O.

                                            From what I understand if even a 4.x Ghz core cannot do your workload and you can't spread it to other cores, the next step is to offload it to specialty hardware. Definitely explains some of those odd dual core high clocked xeon models out there.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.