Random Crash HWCUR
-
I'm experiencing random crashes with my supermicro d-1541 with chelsio T520 when I turned on Suricata Inline IPS. Hardware is identical to XG-1541. Suricata is running on LAN interface. I have both offloading and flow control disabled but crash occur very random, sometimes it can go on for a few days before it crashes. Any suggestion where I should check?
pfsense version: 2.4.4-RELEASE-p3 (amd64)
cat /var/log/system.log | grep netmap
May 31 17:54:01 pfSense kernel: 441.657927 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff802b6587500 May 31 17:54:02 pfSense kernel: 442.278581 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff80182ebf000 May 31 17:54:02 pfSense kernel: 442.278649 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff80182b3cb00 May 31 17:54:02 pfSense kernel: 442.473351 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff803df986600 May 31 17:54:02 pfSense kernel: 442.532143 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 385 m 0xfffff802b64a2b00 May 31 17:54:02 pfSense kernel: 442.533664 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 398 m 0xfffff802b6c88400 May 31 17:54:02 pfSense kernel: 442.561443 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 385 m 0xfffff8021b1c2a00 May 31 17:54:02 pfSense kernel: 442.562488 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 398 m 0xfffff80076f92a00 May 31 17:54:02 pfSense kernel: 442.697901 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff803df986600 May 31 17:54:02 pfSense kernel: 442.767366 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 398 m 0xfffff8021b7a9d00 May 31 17:54:03 pfSense kernel: 442.823697 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff802b6919100 May 31 17:54:03 pfSense kernel: 443.006911 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff80182efa600 May 31 17:54:03 pfSense kernel: 443.091577 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 385 m 0xfffff802b6c66300 May 31 17:54:03 pfSense kernel: 443.273222 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff80182d14800 May 31 17:54:03 pfSense kernel: 443.314848 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff803dfb2e500 May 31 17:54:03 pfSense kernel: 443.340055 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff803dfa09100 May 31 17:54:03 pfSense kernel: 443.340112 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff80182bc8800 May 31 17:54:03 pfSense kernel: 443.389527 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff803dfa09100 May 31 17:54:03 pfSense kernel: 443.425848 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff8037b2dcd00 May 31 17:54:03 pfSense kernel: 443.473259 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff801439fc200 May 31 17:54:03 pfSense kernel: 443.617594 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 342 m 0xfffff80102421100 May 31 17:54:04 pfSense kernel: 444.011783 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff80182b41e00 May 31 17:54:04 pfSense kernel: 444.101411 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff802b6588e00 May 31 17:54:04 pfSense kernel: 444.271959 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff80182cb8500 May 31 17:54:04 pfSense kernel: 444.339886 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff80182aff200 May 31 17:54:04 pfSense kernel: 444.339990 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff80182b8c200 May 31 17:54:04 pfSense kernel: 444.389273 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff803df986900 May 31 17:54:04 pfSense kernel: 444.435202 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff802b69bfd00 May 31 17:54:04 pfSense kernel: 444.457457 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 105 m 0xfffff80102149300 May 31 17:54:04 pfSense kernel: 444.475727 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff80182dcc700 May 31 17:54:05 pfSense kernel: 444.915665 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 385 m 0xfffff801024bd700 May 31 17:54:05 pfSense kernel: 445.012837 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff80182df3300 May 31 17:54:05 pfSense kernel: 445.103947 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff802b6997800 May 31 17:54:05 pfSense kernel: 445.340198 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff80182bafc00 May 31 17:54:05 pfSense kernel: 445.390028 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff802b6e89900 May 31 17:54:05 pfSense kernel: 445.398337 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 1414 m 0xfffff80107149b00 May 31 17:54:05 pfSense kernel: 445.398359 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 1414 m 0xfffff802b64a2800 May 31 17:54:05 pfSense kernel: 445.398376 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 1414 m 0xfffff803df82d300 May 31 17:54:05 pfSense kernel: 445.398485 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 1414 m 0xfffff8010705e200 May 31 17:54:05 pfSense kernel: 445.450096 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 86 m 0xfffff80182cac200 May 31 17:54:05 pfSense kernel: 445.456614 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 1414 m 0xfffff802b676aa00 May 31 17:54:06 pfSense kernel: 446.106651 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff80182ce4e00 May 31 17:54:06 pfSense kernel: 446.128464 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 42 m 0xfffff802b69a2900 May 31 17:54:06 pfSense kernel: 446.167366 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 1414 m 0xfffff801071e2500 May 31 17:54:06 pfSense kernel: 446.167388 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 1414 m 0xfffff801071e3d00 May 31 17:54:06 pfSense kernel: 446.167405 [2925] netmap_transmit cxl1 full hwcur 80 hwtail 17 qlen 62 len 1414 m 0xfffff8021b850200
/boot/loader.conf.local net.inet.tcp.tso=0 hw.igb.fc_setting=0 dev.igb.0.fc=0 dev.igb.1.fc=0 dev.ix.0.fc=0 dev.ix.1.fc=0 dev.cxl.0.fc=0 dev.cxl.1.fc=0
ifconfig cxl1 cxl1: flags=28943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,PPROMISC> metric 0 mtu 1500 options=8c00b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,LINKSTATE>
sysctl -a | grep netmap device netmap dev.netmap.ixl_rx_miss_bufs: 0 dev.netmap.ixl_rx_miss: 0 dev.netmap.iflib_rx_miss_bufs: 0 dev.netmap.iflib_rx_miss: 0 dev.netmap.iflib_crcstrip: 1 dev.netmap.bridge_batch: 1024 dev.netmap.default_pipes: 0 dev.netmap.priv_buf_num: 4098 dev.netmap.priv_buf_size: 2048 dev.netmap.buf_curr_num: 163840 dev.netmap.buf_num: 163840 dev.netmap.buf_curr_size: 2048 dev.netmap.buf_size: 2048 dev.netmap.priv_ring_num: 4 dev.netmap.priv_ring_size: 20480 dev.netmap.ring_curr_num: 200 dev.netmap.ring_num: 200 dev.netmap.ring_curr_size: 36864 dev.netmap.ring_size: 36864 dev.netmap.priv_if_num: 1 dev.netmap.priv_if_size: 1024 dev.netmap.if_curr_num: 100 dev.netmap.if_num: 100 dev.netmap.if_curr_size: 1024 dev.netmap.if_size: 1024 dev.netmap.generic_rings: 1 dev.netmap.generic_ringsize: 1024 dev.netmap.generic_mit: 100000 dev.netmap.admode: 0 dev.netmap.fwd: 0 dev.netmap.flags: 0 dev.netmap.adaptive_io: 0 dev.netmap.txsync_retry: 2 dev.netmap.no_pendintr: 1 dev.netmap.mitigate: 1 dev.netmap.no_timestamp: 0 dev.netmap.verbose: 0 dev.netmap.ix_rx_miss_bufs: 0 dev.netmap.ix_rx_miss: 0 dev.netmap.ix_crcstrip: 0
sysctl -a | grep msi hw.ixl.enable_msix: 1 hw.sdhci.enable_msi: 1 hw.puc.msi_disable: 0 hw.pci.honor_msi_blacklist: 1 hw.pci.msix_rewrite_table: 0 hw.pci.enable_msix: 1 hw.pci.enable_msi: 1 hw.mfi.msi: 1 hw.malo.pci.msi_disable: 0 hw.ix.enable_msix: 1 hw.igb.enable_msix: 1 hw.em.enable_msix: 1 hw.cxgb.msi_allowed: 2 hw.bce.msi_enable: 1 hw.aac.enable_msi: 1 machdep.disable_msix_migration: 0
sysctl -a | grep rss device wlan_rssadapt hw.bxe.udp_rss: 0 hw.ix.enable_rss: 1 dev.cxl.1.rss_size: 16 dev.cxl.0.rss_size: 16
cat /var/log/system.log | grep sig May 31 17:57:07 pfSense syslogd: Logging subprocess 8418 (exec /usr/local/sbin/sshguard) exited due to signal 15. May 31 17:57:25 pfSense syslogd: exiting on signal 15
cat /var/log/suricata/suricata_*/suricata.log | grep -m 1 "signatures processed" 28/5/2019 -- 23:56:33 - <Info> -- 12117 signatures processed. 409 are IP-only rules, 4557 are inspecting packet payload, 7710 inspect application layer, 102 are decoder event only
-
@rznetg said in Random Crash HWCUR:
VLAN_HWCSUM,VLAN_HWTSO
first thing that come to my mind is that this are still enable for your interface
here https://forum.netgate.com/topic/138613/configuring-pfsense-netmap-for-suricata-inline-ips-mode-on-em-igb-interfaces
they suggest to disable that options, it's for em/igb but maybe they apply to you also -
It would be helpful to see the entire
suricata.log
file for the interface after the crash but before you restart Suricata. When you restart Suricata, the existingsuricata.log
file is truncated to zero-length and a fresh log is started. So being able to see what's in the log from the crashed process will potentially be helpful. Next time you notice Suricata not running, grab the contents of thesuricata.log
file for the interface (on the LOGS VIEW tab) before you restart Suricata. The posted snippet with just the signatures summary is not giving me much useful information for figuring what might cause a crash.I did not see any indication of a Suricata crash in the system log snippet you posted. I'm not saying it did not crash, but there is no trace in the log snippet posted. pfSense uses a circular log format whereby older entries get overwritten by newer ones. Are you sure that the log info you posted coincided with the time when Suricata crashed? Maybe try increasing the number of displayed system log entries in pfSense.
Netmap implementation in the Suricata 4.x tree is still based on mostly original code that used a different method of opening a netmap pipe. The 5.x Suricata version that is now in BETA has a completely rewritten netmap section that leverages the newer netmap user library function calls to open netmap connections. Also, netmap has matured somewhat in FreeBSD 12. So taken together I'm hoping for a better netmap experience for Suricata users with supported NIC hardware once pfSense-2.5 and Suricata 5.x both go to general release.
-
@bmeeks Thanks for for reply. The system would lock up completely even at console when netmap error began filling up the screen and lead me to reboot the device manually, but I will try to recover the suricata.log next time it crashes or upgrade it to 2.5 and see if it'll run better.
-
@rznetg said in Random Crash HWCUR:
@bmeeks Thanks for for reply. The system would lock up completely even at console when netmap error began filling up the screen and lead me to reboot the device manually, but I will try to recover the suricata.log next time it crashes or upgrade it to 2.5 and see if it'll run better.
If it locks up the box, then recovering the log file would be pretty much impossible as pfSense will auto-start the packages when it reboots and there is code in the Suricata startup script that wipes the existing log file. I added this a few versions back because the log file just grew and grew with each Suricata restart. That eventually led to disk space issues and to problems even viewing the log in the GUI as it became too large to load into memory.
Remember that pfSense-2.5 is still in development, and while it is pretty stable, it might develop an issue as changes are added and tested between now and its official release. If you are running just a home firewall, it would be much less risky to run 2.5 as opposed to a business setup where I would stick with the 2.4 release software.
It really does sound like your problem is with netmap itself. That lock-up symptom is one I've read about during my netmap research on Google. There are some NIC driver fixes and even some kernel fixes for netmap in FreeBSD 12.0, and those will be in pfSense-2.5. But there may also be some issues with netmap within the Suricata binary itself, and those are fixed only in the 5.0-BETA version that's out now for testing. So updating your system to only one of these changes (say pfSense-2.5) may not solve the problem. It might take both updates (Suricata 5.0 and pfSense-2.5).
My suggestion would be to just switch over to Legacy Mode blocking for now and wait for Suricata 5.0 and pfSense 2.5 to both get to RELEASE state. I know Suricata will make that state this year, and I assume the same for pfSense, but I have no inside knowledge about either one.
-
Just to relive an old topic...
I am seeing the same thing on IGB drivers (E1000 nics).
Just not in abundance. The netmap errors go away if you manually select autoselect or any other settings besides default on the interface page.
Start with the interface showing the netmap errors. Chose autoselect
and the errors go away instantly.