Random Crash HWCUR



  • I'm experiencing random crashes with my supermicro d-1541 with chelsio T520 when I turned on Suricata Inline IPS. Hardware is identical to XG-1541. Suricata is running on LAN interface. I have both offloading and flow control disabled but crash occur very random, sometimes it can go on for a few days before it crashes. Any suggestion where I should check?

    pfsense version: 2.4.4-RELEASE-p3 (amd64)

    cat /var/log/system.log | grep netmap

    May 31 17:54:01 pfSense kernel: 441.657927 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff802b6587500
    May 31 17:54:02 pfSense kernel: 442.278581 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff80182ebf000
    May 31 17:54:02 pfSense kernel: 442.278649 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff80182b3cb00
    May 31 17:54:02 pfSense kernel: 442.473351 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff803df986600
    May 31 17:54:02 pfSense kernel: 442.532143 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 385 m 0xfffff802b64a2b00
    May 31 17:54:02 pfSense kernel: 442.533664 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 398 m 0xfffff802b6c88400
    May 31 17:54:02 pfSense kernel: 442.561443 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 385 m 0xfffff8021b1c2a00
    May 31 17:54:02 pfSense kernel: 442.562488 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 398 m 0xfffff80076f92a00
    May 31 17:54:02 pfSense kernel: 442.697901 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff803df986600
    May 31 17:54:02 pfSense kernel: 442.767366 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 398 m 0xfffff8021b7a9d00
    May 31 17:54:03 pfSense kernel: 442.823697 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff802b6919100
    May 31 17:54:03 pfSense kernel: 443.006911 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff80182efa600
    May 31 17:54:03 pfSense kernel: 443.091577 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 385 m 0xfffff802b6c66300
    May 31 17:54:03 pfSense kernel: 443.273222 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff80182d14800
    May 31 17:54:03 pfSense kernel: 443.314848 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff803dfb2e500
    May 31 17:54:03 pfSense kernel: 443.340055 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff803dfa09100
    May 31 17:54:03 pfSense kernel: 443.340112 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff80182bc8800
    May 31 17:54:03 pfSense kernel: 443.389527 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff803dfa09100
    May 31 17:54:03 pfSense kernel: 443.425848 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff8037b2dcd00
    May 31 17:54:03 pfSense kernel: 443.473259 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff801439fc200
    May 31 17:54:03 pfSense kernel: 443.617594 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 342 m 0xfffff80102421100
    May 31 17:54:04 pfSense kernel: 444.011783 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff80182b41e00
    May 31 17:54:04 pfSense kernel: 444.101411 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff802b6588e00
    May 31 17:54:04 pfSense kernel: 444.271959 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff80182cb8500
    May 31 17:54:04 pfSense kernel: 444.339886 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff80182aff200
    May 31 17:54:04 pfSense kernel: 444.339990 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff80182b8c200
    May 31 17:54:04 pfSense kernel: 444.389273 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff803df986900
    May 31 17:54:04 pfSense kernel: 444.435202 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff802b69bfd00
    May 31 17:54:04 pfSense kernel: 444.457457 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 105 m 0xfffff80102149300
    May 31 17:54:04 pfSense kernel: 444.475727 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff80182dcc700
    May 31 17:54:05 pfSense kernel: 444.915665 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 385 m 0xfffff801024bd700
    May 31 17:54:05 pfSense kernel: 445.012837 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff80182df3300
    May 31 17:54:05 pfSense kernel: 445.103947 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff802b6997800
    May 31 17:54:05 pfSense kernel: 445.340198 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff80182bafc00
    May 31 17:54:05 pfSense kernel: 445.390028 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff802b6e89900
    May 31 17:54:05 pfSense kernel: 445.398337 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 1414 m 0xfffff80107149b00
    May 31 17:54:05 pfSense kernel: 445.398359 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 1414 m 0xfffff802b64a2800
    May 31 17:54:05 pfSense kernel: 445.398376 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 1414 m 0xfffff803df82d300
    May 31 17:54:05 pfSense kernel: 445.398485 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 1414 m 0xfffff8010705e200
    May 31 17:54:05 pfSense kernel: 445.450096 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff80182cac200
    May 31 17:54:05 pfSense kernel: 445.456614 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 1414 m 0xfffff802b676aa00
    May 31 17:54:06 pfSense kernel: 446.106651 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff80182ce4e00
    May 31 17:54:06 pfSense kernel: 446.128464 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff802b69a2900
    May 31 17:54:06 pfSense kernel: 446.167366 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 1414 m 0xfffff801071e2500
    May 31 17:54:06 pfSense kernel: 446.167388 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 1414 m 0xfffff801071e3d00
    May 31 17:54:06 pfSense kernel: 446.167405 [2925] netmap_transmit           cxl1                                                                                                   full hwcur 80 hwtail 17 qlen 62 len 1414 m 0xfffff8021b850200
    
    
    /boot/loader.conf.local
    
    net.inet.tcp.tso=0
    hw.igb.fc_setting=0
    dev.igb.0.fc=0
    dev.igb.1.fc=0
    dev.ix.0.fc=0
    dev.ix.1.fc=0
    dev.cxl.0.fc=0
    dev.cxl.1.fc=0
    
    
    ifconfig cxl1
    cxl1: flags=28943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,PPROMISC> metric 0 mtu 1500
            options=8c00b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,LINKSTATE>
    
     sysctl -a | grep netmap
    device  netmap
    dev.netmap.ixl_rx_miss_bufs: 0
    dev.netmap.ixl_rx_miss: 0
    dev.netmap.iflib_rx_miss_bufs: 0
    dev.netmap.iflib_rx_miss: 0
    dev.netmap.iflib_crcstrip: 1
    dev.netmap.bridge_batch: 1024
    dev.netmap.default_pipes: 0
    dev.netmap.priv_buf_num: 4098
    dev.netmap.priv_buf_size: 2048
    dev.netmap.buf_curr_num: 163840
    dev.netmap.buf_num: 163840
    dev.netmap.buf_curr_size: 2048
    dev.netmap.buf_size: 2048
    dev.netmap.priv_ring_num: 4
    dev.netmap.priv_ring_size: 20480
    dev.netmap.ring_curr_num: 200
    dev.netmap.ring_num: 200
    dev.netmap.ring_curr_size: 36864
    dev.netmap.ring_size: 36864
    dev.netmap.priv_if_num: 1
    dev.netmap.priv_if_size: 1024
    dev.netmap.if_curr_num: 100
    dev.netmap.if_num: 100
    dev.netmap.if_curr_size: 1024
    dev.netmap.if_size: 1024
    dev.netmap.generic_rings: 1
    dev.netmap.generic_ringsize: 1024
    dev.netmap.generic_mit: 100000
    dev.netmap.admode: 0
    dev.netmap.fwd: 0
    dev.netmap.flags: 0
    dev.netmap.adaptive_io: 0
    dev.netmap.txsync_retry: 2
    dev.netmap.no_pendintr: 1
    dev.netmap.mitigate: 1
    dev.netmap.no_timestamp: 0
    dev.netmap.verbose: 0
    dev.netmap.ix_rx_miss_bufs: 0
    dev.netmap.ix_rx_miss: 0
    dev.netmap.ix_crcstrip: 0
    
    
    sysctl -a | grep msi
    hw.ixl.enable_msix: 1
    hw.sdhci.enable_msi: 1
    hw.puc.msi_disable: 0
    hw.pci.honor_msi_blacklist: 1
    hw.pci.msix_rewrite_table: 0
    hw.pci.enable_msix: 1
    hw.pci.enable_msi: 1
    hw.mfi.msi: 1
    hw.malo.pci.msi_disable: 0
    hw.ix.enable_msix: 1
    hw.igb.enable_msix: 1
    hw.em.enable_msix: 1
    hw.cxgb.msi_allowed: 2
    hw.bce.msi_enable: 1
    hw.aac.enable_msi: 1
    machdep.disable_msix_migration: 0
    
    sysctl -a | grep rss
    device  wlan_rssadapt
    hw.bxe.udp_rss: 0
    hw.ix.enable_rss: 1
    dev.cxl.1.rss_size: 16
    dev.cxl.0.rss_size: 16
    
    
    cat /var/log/system.log | grep sig
    May 31 17:57:07 pfSense syslogd: Logging subprocess 8418 (exec /usr/local/sbin/sshguard) exited due to signal 15.
    May 31 17:57:25 pfSense syslogd: exiting on signal 15
    
    
    cat /var/log/suricata/suricata_*/suricata.log | grep -m 1 "signatures processed"
    28/5/2019 -- 23:56:33 - <Info> -- 12117 signatures processed. 409 are IP-only rules, 4557 are inspecting packet payload, 7710 inspect application layer, 102 are decoder event only
    
    


  • @rznetg said in Random Crash HWCUR:

    VLAN_HWCSUM,VLAN_HWTSO

    first thing that come to my mind is that this are still enable for your interface

    here https://forum.netgate.com/topic/138613/configuring-pfsense-netmap-for-suricata-inline-ips-mode-on-em-igb-interfaces
    they suggest to disable that options, it's for em/igb but maybe they apply to you also



  • It would be helpful to see the entire suricata.log file for the interface after the crash but before you restart Suricata. When you restart Suricata, the existing suricata.log file is truncated to zero-length and a fresh log is started. So being able to see what's in the log from the crashed process will potentially be helpful. Next time you notice Suricata not running, grab the contents of the suricata.log file for the interface (on the LOGS VIEW tab) before you restart Suricata. The posted snippet with just the signatures summary is not giving me much useful information for figuring what might cause a crash.

    I did not see any indication of a Suricata crash in the system log snippet you posted. I'm not saying it did not crash, but there is no trace in the log snippet posted. pfSense uses a circular log format whereby older entries get overwritten by newer ones. Are you sure that the log info you posted coincided with the time when Suricata crashed? Maybe try increasing the number of displayed system log entries in pfSense.

    Netmap implementation in the Suricata 4.x tree is still based on mostly original code that used a different method of opening a netmap pipe. The 5.x Suricata version that is now in BETA has a completely rewritten netmap section that leverages the newer netmap user library function calls to open netmap connections. Also, netmap has matured somewhat in FreeBSD 12. So taken together I'm hoping for a better netmap experience for Suricata users with supported NIC hardware once pfSense-2.5 and Suricata 5.x both go to general release.



  • @bmeeks Thanks for for reply. The system would lock up completely even at console when netmap error began filling up the screen and lead me to reboot the device manually, but I will try to recover the suricata.log next time it crashes or upgrade it to 2.5 and see if it'll run better.



  • @rznetg said in Random Crash HWCUR:

    @bmeeks Thanks for for reply. The system would lock up completely even at console when netmap error began filling up the screen and lead me to reboot the device manually, but I will try to recover the suricata.log next time it crashes or upgrade it to 2.5 and see if it'll run better.

    If it locks up the box, then recovering the log file would be pretty much impossible as pfSense will auto-start the packages when it reboots and there is code in the Suricata startup script that wipes the existing log file. I added this a few versions back because the log file just grew and grew with each Suricata restart. That eventually led to disk space issues and to problems even viewing the log in the GUI as it became too large to load into memory.

    Remember that pfSense-2.5 is still in development, and while it is pretty stable, it might develop an issue as changes are added and tested between now and its official release. If you are running just a home firewall, it would be much less risky to run 2.5 as opposed to a business setup where I would stick with the 2.4 release software.

    It really does sound like your problem is with netmap itself. That lock-up symptom is one I've read about during my netmap research on Google. There are some NIC driver fixes and even some kernel fixes for netmap in FreeBSD 12.0, and those will be in pfSense-2.5. But there may also be some issues with netmap within the Suricata binary itself, and those are fixed only in the 5.0-BETA version that's out now for testing. So updating your system to only one of these changes (say pfSense-2.5) may not solve the problem. It might take both updates (Suricata 5.0 and pfSense-2.5).

    My suggestion would be to just switch over to Legacy Mode blocking for now and wait for Suricata 5.0 and pfSense 2.5 to both get to RELEASE state. I know Suricata will make that state this year, and I assume the same for pfSense, but I have no inside knowledge about either one.


Log in to reply