Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Suricata InLine with igb NICs

    Scheduled Pinned Locked Moved IDS/IPS
    77 Posts 6 Posters 10.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • NollipfSenseN
      NollipfSense @boobletins
      last edited by

      @boobletins said in Suricata InLine with igb NICs:

      @nollipfsense

      So here are some initial suggestions. Please keep in mind that I've been working on this for ~1 week (in other words: not long), and I'm not a FreeBSD, pfSense, or Suricata expert.

      Start by making a backup of your configuration.

      Do these first:
      My understanding is that flow control should be off on any netmap interface. You have bi-directional flow control enabled:

      dev.igb.0.fc: 3
      

      Disable flow control on all active interfaces using system tunables. Set dev.igb.0.fc=0 (and dev.igb.1.fc=0)

      Actively set energy efficient ethernet to disabled:
      dev.igb.0.eee_disabled=1

      Actively force IPv6_TXCSUM6 off by adding the following to config.xml in a shellcmd tag:

      ifconfig igb0 -rxcsum -rxcsum6 -txcsum -txcsum6 -lro -tso -vlanhwtso
      

      (see above in this thread for a link on where/how to do that).

      Edit:
      To be clear: anywhere I have a command that says "igb0" or "igb.0" you will want to duplicate that for igb1 and any other interface you're running netmap on.

      So you will need 2 shellcmd lines in config.xml, and two new system tunables for flow control, etc


      Consider changing later:

      Set rx processing limit:
      dev.igb.0.rx_processing_limit: -1

      It looks like your txd and rxd are both set to 1024 currently, I suggest you move those to 4096:
      hw.igb.txd=4096
      hw.igb.rxd=4096

      By changing your txd and rxd we may need to revisit your netmap buf/ring (memory settings).

      We may also revisit your interrupt and queue settings.

      Boobletins, I will need to revisit later...currently, I am happy with just making adjustments to the buf_size:4096 and disable IPv6...haven't got any alert since and my Internet will be down for a while because of moving.

      pfSense+ 23.09 Lenovo Thinkcentre M93P SFF Quadcore i7 dual Raid-ZFS 128GB-SSD 32GB-RAM PCI-Intel i350-t4 NIC, -Intel QAT 8950.
      pfSense+ 23.09 VM-Proxmox, Dell Precision Xeon-W2155 Nvme 500GB-ZFS 128GB-RAM PCIe-Intel i350-t4, Intel QAT-8950, P-cloud.

      1 Reply Last reply Reply Quote 0
      • B
        boobletins @newUser2pfSense
        last edited by boobletins

        So you're running netmap/IPS mode on igb0 (LAN), igb1 (OPT?), and igb3 (WAN)?

        What type of CPU is in the machine (# of cores?, is hyper-threading enabled)? How much RAM?

        Are you saturating all 3 active interfaces? Or just 2?

        Start by making a backup of your configuration.

        First disable flow control (as discussed above):
        You have the following on all igb interfaces which means bi-directional flow control is enabled.:

        dev.igb.0.fc: 3
        

        Change to fc=0 on all netmap interfaces in system tunables. This will take ethernet flow control out of the picture in favor of higher level flow control (TCP) which is less likely to mess with buffering and clog things up.

        Let's look at what generates this particular netmap error:
        From http://web.mit.edu/freebsd/head/sys/dev/netmap/netmap.c

        /*
         * put a copy of the buffers marked NS_FORWARD into an mbuf chain.
         * Take packets from hwcur to ring->head marked NS_FORWARD (or forced)
         * and pass them up. Drop remaining packets in the unlikely event
         * of an mbuf shortage.
         */
        static void
        netmap_grab_packets(struct netmap_kring *kring, struct mbq *q, int force)
        {
        	u_int const lim = kring->nkr_num_slots - 1;
        	u_int const head = kring->ring->head;
        	u_int n;
        	struct netmap_adapter *na = kring->na;
        
        	for (n = kring->nr_hwcur; n != head; n = nm_next(n, lim)) {
        		struct mbuf *m;
        		struct netmap_slot *slot = &kring->ring->slot[n];
        
        		if ((slot->flags & NS_FORWARD) == 0 && !force)
        			continue;
        		if (slot->len < 14 || slot->len > NETMAP_BUF_SIZE(na)) {
        			RD(5, "bad pkt at %d len %d", n, slot->len);
        			continue;
        		}
        		slot->flags &= ~NS_FORWARD; // XXX needed ?
        		/* XXX TODO: adapt to the case of a multisegment packet */
        		m = m_devget(NMB(na, slot), slot->len, 0, na->ifp, NULL);
        
        		if (m == NULL)
        			break;
        		mbq_enqueue(q, m);
        	}
        }
        

        I'm no C expert, but as I read this code there are 2 ways to generate your error in netmap:

        1. a slot is of size less than 14
        2. a slot is of size greater than the netmap buffer can handle

        I don't know what the magic number 14 represents, but let's assume it's some kind of minimum packet size we can't control. If that's the case, then the bad_pkt error is generated from packets that are actually bad.

        That's not what you have. The error is telling us the current hwcur value (the first number - the slot number in the ring) and the length or size of the slot (eg #777 with len 2154).

        So this is a memory issue. The error would be better off saying something like "dropped a packet because it was too short or too large!" -- but that would be useful to others and is thus verboten ;)

        edited: Removed incorrect speculation. Skip to my latest post.

        1 Reply Last reply Reply Quote 0
        • B
          boobletins
          last edited by boobletins

          This post is deleted!
          B 1 Reply Last reply Reply Quote 0
          • B
            boobletins @boobletins
            last edited by boobletins

            @boobletins said in Suricata InLine with igb NICs:

            I guess it depends on what NETMAP_BUF_SIZE(na) is returning. It should be either the available memory for netmap buffers, or the available kernel buffers (for the host adapter).

            From: https://github.com/luigirizzo/netmap/blob/master/sys/dev/netmap/netmap_kern.h

            #define NETMAP_BUF_SIZE(_na)	((_na)->na_lut.objsize)
            
            ...
            
            struct netmap_adapter {
            	...
            
            	struct netmap_lut {
            		struct lut_entry *lut;
            		struct plut_entry *plut;
            		uint32_t objtotal;	/* max buffer index */
            		uint32_t objsize;	/* buffer size */
            	};
            
            
            	/* memory allocator (opaque)
            	 * We also cache a pointer to the lut_entry for translating
            	 * buffer addresses, the total number of buffers and the buffer size.
            	 */
             	struct netmap_mem_d *nm_mem;
            	struct netmap_mem_d *nm_mem_prev;
            	struct netmap_lut na_lut;
            

            It's returning netmap adapter buffer size.

            Let's see.

            Your dev.netmap.buf_size=2048 and the length of the slot it was trying to process were all > 2048 when the error was generated.

            That makes a certain kind of sense. Why were the slots larger..

            Wait. What's your MTU set to on these interfaces? It has to be > 2048? Check this with 'ifconfig igb0' for each interface.

            Some sanity checks when enabling netmap would save people a lot of headaches. If your MTU is 10000 and your dev.netmap.buf_size=2048, then netmap will always choke.

            Know that if you set dev.netmap.buf_size to some obscenely high number to cover an equally high MTU, netmap will preallocate all of that memory and sit on it.

            1 Reply Last reply Reply Quote 0
            • N
              newUser2pfSense
              last edited by newUser2pfSense

              boobletins...Presently I'm using Inline IPS Mode and I only have Suricata running on my WAN and that's igb3. I'm using igb0 and igb1 as well for my WLAN and LAN.

              CPU:
              Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
              Current: 4000 MHz, Max: 4001 MHz
              8 CPUs: 1 package(s) x 4 core(s) x 2 hardware threads
              AES-NI CPU Crypto: Yes (active)

              Memory:
              64 Gig

              System Tunables addition:
              Tunable Name Description Value
              dev.igb.0.fc disable flow control 0
              dev.igb.1.fc disable flow control 0
              dev.igb.2.fc disable flow control 0
              dev.igb.3.fc disable flow control 0
              dev.igb.0.eee_disabled disable energy efficient ethernet 1
              dev.igb.1.eee_disabled disable energy efficient ethernet 1
              dev.igb.2.eee_disabled disable energy efficient ethernet 1
              dev.igb.3.eee_disabled disable energy efficient ethernet 1

              config.xml addition (I had to take the beginning < and ending > out to get it to display):
              shellcmd>ifconfig igb0 -rxcsum -rxcsum6 -txcsum -txcsum6 -lro -tso -vlanhwtso</shellcmd
              shellcmd>ifconfig igb1 -rxcsum -rxcsum6 -txcsum -txcsum6 -lro -tso -vlanhwtso</shellcmd
              shellcmd>ifconfig igb2 -rxcsum -rxcsum6 -txcsum -txcsum6 -lro -tso -vlanhwtso</shellcmd
              shellcmd>ifconfig igb3 -rxcsum -rxcsum6 -txcsum -txcsum6 -lro -tso -vlanhwtso</shellcmd
              shellcmd>ifconfig em0 -rxcsum -rxcsum6 -txcsum -txcsum6 -lro -tso -vlanhwtso</shellcmd

              igb0,1,2,3 all have an MTU of 1500 which I believe is default. I haven't set any values for this myself.

              B 1 Reply Last reply Reply Quote 2
              • B
                boobletins @newUser2pfSense
                last edited by

                @newuser2pfsense said in Suricata InLine with igb NICs:

                boobletins...Presently I'm using Inline IPS Mode and I only have Suricata running on my WAN and that's igb3. I'm using igb0 and igb1 as well for my WLAN and LAN.
                dev.igb.3.fc disable flow control 0

                Previously you had dev.igb.3.fc=3 Does the "bad pkt" error persist with dev.igb.3.fc=0?

                Just to confirm, could you double check and paste me the full output from

                ifconfig igb3
                

                Please paste any additional system turntables you've set via the ui and your full loader.conf.local (minus any sensitive data).

                Please then manually double check and paste the output from these commands:

                sysctl -a | grep nmbclusters
                sysctl -a | grep msi
                sysctl -a | grep num_queues
                dmesg | grep igb3
                

                Above is not busy work, I'm having you manually confirm because with a few commands I found that when I set them in loader.conf.local they didn't take effect. I needed to put some in the ui system tuneables.

                We have more settings to tinker with, I made a bunch of changes before the errors went away, but I'm trying to narrow down the issue before just throwing a bunch of new settings at you. I'm pretty confident we can get this working on your igb since its working on mine with 0 errors for over a week now.

                1 Reply Last reply Reply Quote 0
                • N
                  newUser2pfSense
                  last edited by

                  boobletins...I keep getting the following error message from the page when posting the information you requested; frustrating to say the least:

                  Error
                  Post content was flagged as spam by Akismet.com

                  I'll do what I can to get the information in.

                  The errors do persist:
                  408.786592 [1071] netmap_grab_packets bad pkt at 186 len 2154
                  950.583865 [1071] netmap_grab_packets bad pkt at 433 len 2154
                  530.551894 [1071] netmap_grab_packets bad pkt at 810 len 2147
                  530.547133 [1071] netmap_grab_packets bad pkt at 807 len 2147
                  360.440859 [1071] netmap_grab_packets bad pkt at 728 len 2154
                  764.263927 [1071] netmap_grab_packets bad pkt at 311 len 2154

                  1 Reply Last reply Reply Quote 1
                  • B
                    boobletins
                    last edited by boobletins

                    Ok -- I tried to thumbs-up some of your posts hoping that will help with Akismet.

                    I am interested in those results -- mostly because I think something is putting packets into your hardware buffers that are greater than 2048. They also seem to be consistently in the 2100 range. I can't explain what is doing that or why if your MTU is actually 1500. Maybe there's some kind of overhead with vlan tagging, qos, etc that I'm not aware of.

                    The why doesn't really matter if all you want is a fix. If you raise the buffer_size of netmap (and the packet sizes stay below those new maximums) then the errors should disappear.

                    Currently your dev.netmap.buf_size is set to 2048. If you, for example, double that to 4096, then all of the current errors would be covered by the new larger buffer_size in netmap (do this in the ui under system tuneables).

                    Since I don't understand how you're getting packets that are > 2048 with an MTU of 1500, I can't promise it won't come back with even larger numbers, but that change would cover all of the errors you've pasted so far.

                    As I say above, you may get additional errors by changing dev.netmap.buf_size -- let me know if that's the case.

                    For the record: I have an MTU of 1500 and a dev.netmap.buf_size of 1920 is enough to prevent errors.

                    stephenw10S 1 Reply Last reply Reply Quote 0
                    • B
                      boobletins
                      last edited by

                      Maybe don't change this unless you run into other issues, but the remarks at the link below suggest that hyperthreading (which you have enabled) may limit your throughput.

                      https://calomel.org/freebsd_network_tuning.html

                      # Disable Hyper Threading (HT), also known as Intel's proprietary simultaneous
                      # multithreading (SMT) because implementations typically share TLBs and L1
                      # caches between threads which is a security concern. SMT is likely to slow
                      # down workloads not specifically optimized for SMT if you have a CPU with more
                      # than two(2) real CPU cores. Secondly, multi-queue network cards are as much
                      # as 20% slower when network queues are bound to both real CPU cores and SMT
                      # virtual cores due to interrupt processing collisions.
                      #
                      machdep.hyperthreading_allowed="0"  # (default 1, allow Hyper Threading (HT))
                      

                      That last sentence seems to apply in your situation. They note they've used the config with an i350. I don't see a lot of netmap-specific configuration in there, so ymmv.

                      This is unrelated to the "bad pkt" error.

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator @boobletins
                        last edited by

                        @boobletins said in Suricata InLine with igb NICs:

                        Ok -- I tried to thumbs-up some of your posts hoping that will help with Akismet.

                        It should. Users with a reputation of 5 or more should never see Akismet.
                        I voted a few posts too so that is now that case.

                        Steve

                        1 Reply Last reply Reply Quote 0
                        • N
                          newUser2pfSense
                          last edited by newUser2pfSense

                          boobletins/stephenw10...I tried posting again but unfortunately I keep getting:

                          Post content was flagged as spam by Akismet.com

                          I apologize, I keep trying to post.

                          I'll try posting a little at a time again if it will let me.

                          1 Reply Last reply Reply Quote 0
                          • N
                            newUser2pfSense
                            last edited by

                            ifconfig igb3 [I redacted out IP/MAC addresses]:

                            igb3: flags=28943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,PPROMISC> metric 0 mtu 1500
                            options=1000b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,NETMAP>
                            ether
                            hwaddr
                            inet6 %igb3 prefixlen 64 scopeid 0x4
                            inet netmask broadcast
                            nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
                            media: Ethernet autoselect (1000baseT <full-duplex>)
                            status: active

                            1 Reply Last reply Reply Quote 0
                            • N
                              newUser2pfSense
                              last edited by

                              The only other System Tunable I changed was from here:
                              https://www.netgate.com/docs/pfsense/hardware/tuning-and-troubleshooting-network-cards.html?highlight=tuning
                              net.inet.ip.intr_queue_maxlen , Maximum size of the IP input queue, 3000
                              I believe it was originally set to 1000. I just never changed it back. I can change it back if need be.

                              Although I've kept some of the tunables in my loader.conf.local file for testing, I've commented them out, #, so nothing there should be loading:

                              #hw.igb.rxd="1024"
                              #hw.igb.txd="1024"
                              #hw.igb.enable_aim=1
                              #hw.igb.num_queues=0
                              #kern.ipc.nmbclusters="1000000"
                              #hw.pci.enable_msi=0
                              #hw.igb.max_interrupt_rate="32000"
                              #hw.igb.fc_setting=0
                              #hw.igb.txd=4096
                              #hw.igb.rxd=4096

                              1 Reply Last reply Reply Quote 0
                              • N
                                newUser2pfSense
                                last edited by

                                sysctl -a | grep nmbclusters -
                                kern.ipc.nmbclusters: 4076726

                                sysctl -a | grep msi -
                                hw.ixl.enable_msix: 1
                                hw.sdhci.enable_msi: 1
                                hw.puc.msi_disable: 0
                                hw.pci.honor_msi_blacklist: 1
                                hw.pci.msix_rewrite_table: 0
                                hw.pci.enable_msix: 1
                                hw.pci.enable_msi: 1
                                hw.mfi.msi: 1
                                hw.malo.pci.msi_disable: 0
                                hw.ix.enable_msix: 1
                                hw.igb.enable_msix: 1
                                hw.em.enable_msix: 1
                                hw.cxgb.msi_allowed: 2
                                hw.bce.msi_enable: 1
                                hw.aac.enable_msi: 1
                                machdep.disable_msix_migration: 0

                                sysctl -a | grep num_queues -
                                hw.ix.num_queues: 0
                                hw.igb.num_queues: 0

                                1 Reply Last reply Reply Quote 0
                                • N
                                  newUser2pfSense
                                  last edited by

                                  dmesg | grep igb3 [I redacted out IP/MAC addresses] -

                                  igb3: link state changed to UP
                                  igb3: permanently promiscuous mode enabled
                                  igb3: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xe000-0xe01f mem 0xdf000000-0xdf0fffff,0xdf600000-0xdf603fff irq 19 at device 0.3 on pci2
                                  igb3: Using MSIX interrupts with 9 vectors
                                  igb3: Ethernet address:
                                  igb3: Bound queue 0 to cpu 0
                                  igb3: Bound queue 1 to cpu 1
                                  igb3: Bound queue 2 to cpu 2
                                  igb3: Bound queue 3 to cpu 3
                                  igb3: Bound queue 4 to cpu 4
                                  igb3: Bound queue 5 to cpu 5
                                  igb3: Bound queue 6 to cpu 6
                                  igb3: Bound queue 7 to cpu 7
                                  igb3: netmap queues/slots: TX 8/4096, RX 8/4096
                                  igb3: link state changed to UP
                                  igb3: link state changed to DOWN
                                  igb3: link state changed to UP
                                  igb3: permanently promiscuous mode enabled
                                  igb3: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xe000-0xe01f mem 0xdf000000-0xdf0fffff,0xdf600000-0xdf603fff irq 19 at device 0.3 on pci2
                                  igb3: Using MSIX interrupts with 9 vectors
                                  igb3: Ethernet address:
                                  igb3: Bound queue 0 to cpu 0
                                  igb3: Bound queue 1 to cpu 1
                                  igb3: Bound queue 2 to cpu 2
                                  igb3: Bound queue 3 to cpu 3
                                  igb3: Bound queue 4 to cpu 4
                                  igb3: Bound queue 5 to cpu 5
                                  igb3: Bound queue 6 to cpu 6
                                  igb3: Bound queue 7 to cpu 7
                                  igb3: netmap queues/slots: TX 8/1024, RX 8/1024
                                  igb3: link state changed to UP
                                  igb3: link state changed to DOWN
                                  igb3: link state changed to UP
                                  igb3: permanently promiscuous mode enabled
                                  igb3: link state changed to DOWN
                                  igb3: link state changed to UP
                                  igb3: link state changed to DOWN
                                  igb3: link state changed to UP
                                  igb3: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xe000-0xe01f mem 0xdf000000-0xdf0fffff,0xdf600000-0xdf603fff irq 19 at device 0.3 on pci2
                                  igb3: Using MSIX interrupts with 9 vectors
                                  igb3: Ethernet address:
                                  igb3: Bound queue 0 to cpu 0
                                  igb3: Bound queue 1 to cpu 1
                                  igb3: Bound queue 2 to cpu 2
                                  igb3: Bound queue 3 to cpu 3
                                  igb3: Bound queue 4 to cpu 4
                                  igb3: Bound queue 5 to cpu 5
                                  igb3: Bound queue 6 to cpu 6
                                  igb3: Bound queue 7 to cpu 7
                                  igb3: netmap queues/slots: TX 8/1024, RX 8/1024
                                  igb3: link state changed to UP
                                  igb3: link state changed to DOWN
                                  igb3: link state changed to UP
                                  igb3: permanently promiscuous mode enabled
                                  igb3: link state changed to DOWN
                                  igb3: link state changed to UP
                                  igb3: link state changed to DOWN
                                  igb3: link state changed to UP
                                  igb3: link state changed to DOWN
                                  igb3: link state changed to UP
                                  igb3: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xe000-0xe01f mem 0xdf000000-0xdf0fffff,0xdf600000-0xdf603fff irq 19 at device 0.3 on pci2
                                  igb3: Using MSIX interrupts with 9 vectors
                                  igb3: Ethernet address:
                                  igb3: Bound queue 0 to cpu 0
                                  igb3: Bound queue 1 to cpu 1
                                  igb3: Bound queue 2 to cpu 2
                                  igb3: Bound queue 3 to cpu 3
                                  igb3: Bound queue 4 to cpu 4
                                  igb3: Bound queue 5 to cpu 5
                                  igb3: Bound queue 6 to cpu 6
                                  igb3: Bound queue 7 to cpu 7
                                  igb3: netmap queues/slots: TX 8/1024, RX 8/1024
                                  igb3: link state changed to UP
                                  igb3: link state changed to DOWN
                                  igb3: link state changed to UP
                                  igb3: link state changed to DOWN
                                  igb3: link state changed to UP
                                  igb3: permanently promiscuous mode enabled
                                  igb3: link state changed to DOWN
                                  arpresolve: can't allocate llinfo for on igb3
                                  igb3: link state changed to UP
                                  igb3: link state changed to DOWN
                                  igb3: link state changed to UP
                                  igb3: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xe000-0xe01f mem 0xdf000000-0xdf0fffff,0xdf600000-0xdf603fff irq 19 at device 0.3 on pci2
                                  igb3: Using MSIX interrupts with 9 vectors
                                  igb3: Ethernet address:
                                  igb3: Bound queue 0 to cpu 0
                                  igb3: Bound queue 1 to cpu 1
                                  igb3: Bound queue 2 to cpu 2
                                  igb3: Bound queue 3 to cpu 3
                                  igb3: Bound queue 4 to cpu 4
                                  igb3: Bound queue 5 to cpu 5
                                  igb3: Bound queue 6 to cpu 6
                                  igb3: Bound queue 7 to cpu 7
                                  igb3: netmap queues/slots: TX 8/1024, RX 8/1024
                                  igb3: link state changed to UP
                                  igb3: link state changed to DOWN
                                  igb3: link state changed to UP
                                  igb3: link state changed to DOWN
                                  igb3: link state changed to UP
                                  igb3: permanently promiscuous mode enabled
                                  igb3: link state changed to DOWN
                                  igb3: link state changed to UP
                                  arpresolve: can't allocate llinfo for on igb3
                                  

                                  I added a dev.netmap.buf_size to System Tunables and set the value to 4096. I restarted pfSense and then really throttled as much traffic going through it as I could. I didn't get any netmap_grab_packets errors. I'm now wondering if there is a maximum netmap buffer size.

                                  I look forward to doing whatever change/testing we can to find a solution. Thanks for the continued help!

                                  B 1 Reply Last reply Reply Quote 0
                                  • B
                                    boobletins @newUser2pfSense
                                    last edited by

                                    @newuser2pfsense said in Suricata InLine with igb NICs:

                                    I'm now wondering if there is a maximum netmap buffer size.

                                    With 64 GB of RAM, you should be able to take that tuneable very high, but there isn't a need unless the bad pkt error returns with higher 2nd numbers. I'd wait and see, otherwise you are locking up memory for no reason.

                                    I can't explain how you're getting packets of size ~2100 with an MTU of 1500. Maybe JUMBO_MTU allows for that (I don't know). It could also be that something else on your network has a larger MTU setting. I'm not sure how FreeBSD handles those situations. If you're interested you can check any switches and clients to see and adjust accordingly.

                                    If the system is handling as much throughput as you can throw at it, then I'd leave everything alone for now.

                                    If you run into throughput or interrupt issues, then consider disabling hyperthreading. The dmesg output indicates that you're binding queues to virtual and hardware cores which may become an issue depending on how hard you're saturating the interfaces.

                                    1 Reply Last reply Reply Quote 0
                                    • bmeeksB
                                      bmeeks
                                      last edited by

                                      Once the applicable system tuneables are nailed down and some "good" typical values are established, this thread should be made a "sticky post" or else a new single "sticky post" created containing the relevant settings. The netmap bad packets error has plagued a lot of Suricata Inline IPS Mode users.

                                      1 Reply Last reply Reply Quote 0
                                      • N
                                        newUser2pfSense
                                        last edited by

                                        boobletins...I'll let it run for a while with all of the tweaks we've made and check it periodically for any netmap_grab_packets errors.

                                        bmeeks...I agree.

                                        1 Reply Last reply Reply Quote 0
                                        • N
                                          newUser2pfSense
                                          last edited by newUser2pfSense

                                          I let my system run for just over a week and I noticed this evening that I couldn't access the interwebs for some reason. I restarted my pfSense computer and everything seemed to go back to normal. I then noticed a few minutes ago the following on the console:

                                          kernel 492.136807 [1071] netmap_grab_packets bad pkt at 878 len 4939
                                          kernel 490.136919 [1071] netmap_grab_packets bad pkt at 667 len 4939
                                          kernel 489.136703 [1071] netmap_grab_packets bad pkt at 933 len 4939
                                          kernel 488.636876 [1071] netmap_grab_packets bad pkt at 875 len 4939
                                          kernel 488.435620 [1071] netmap_grab_packets bad pkt at 806 len 4939
                                          kernel 488.235492 [1071] netmap_grab_packets bad pkt at 766 len 4939

                                          Interesting. I guess I'm going to have to bump up my dev.netmap.buf_size from 4096 to a larger value. I have 64 Gig or RAM in my pfSense comptuer so maybe I'll bump it up to 8192 and see how that works. Has anyone had a related experience after tuning their system?

                                          Update - Since changing the buffer size to 8192, I've noticed webpages load a tad slower.

                                          1 Reply Last reply Reply Quote 0
                                          • B
                                            boobletins
                                            last edited by

                                            I still periodically see packets larger than my mtu and netmap.buf_size. I haven't been able to track down the source. After tuning it's down to something like once per week - often without any interface hiccup.

                                            I opened a support question here: https://redmine.openinfosecfoundation.org/issues/2720 -- but so far there's no information. I don't think it's a Suricata issue --
                                            I'm no expert, but I don't see anything in the Suricata netmap code that would be adding length to packets.

                                            It's possible that this type of noise is always there but the netmap configuration is more sensitive to violations of mtu/buf_size.

                                            Really the error message just indicates that a packet was dropped because it exceeded the available buffer length. I believe the interface flap after that is due to the watchdog cycling the interface because it sees high packet loss (or latency). Packets are presumably dropped all the time by the OS and we're only aware of them because we're looking for netmap errors now.

                                            For the record: my logs show the last errors on 12/6 with the same packet size you have above:

                                            kernel: 338.512666 [1071] netmap_grab_packets       bad pkt at 1054 len 4939
                                            kernel: 338.714285 [1071] netmap_grab_packets       bad pkt at 1073 len 4939
                                            kernel: 338.914864 [1071] netmap_grab_packets       bad pkt at 1089 len 4939
                                            kernel: 339.423360 [1071] netmap_grab_packets       bad pkt at 1203 len 4939
                                            kernel: 340.414473 [1071] netmap_grab_packets       bad pkt at 1484 len 4939
                                            kernel: 342.414619 [1071] netmap_grab_packets       bad pkt at 1542 len 4939
                                            kernel: 346.414451 [1071] netmap_grab_packets       bad pkt at 2009 len 4939
                                            

                                            The same size strikes me as a little odd -- what's putting packets of that exact size on the wire? They happen so rarely now that I don't want to run a pcap for weeks to catch them. I don't see any particularly odd traffic at the time in my logs (though of course the bad packets are dropped, so if they're all bad nothing would show up).

                                            I'd be curious to know the output of "sysctl -a | grep missed_packets" -- or more precisely -- I'd be curious to know if you note those numbers now and compare them after "bad pkt" errors to see if the NIC counters are being incremented by netmap or if we lose that reporting. If it's still accurately incremented on a packet miss, then we should be able to compare inline to legacy mode to see if there's any significant increase in packet loss with netmap mode. I suspect there isn't, it's just louder about it's misses.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.