Suricata & DNS resolution after update bogons Cron Job runs



  • Hi community looking for some advice to further troubleshoot what’s happening to my setup

    Build:
    2.3.4-RELEASE-p1 (amd64)

    Basic config :
    WAN (PPPOE) & LAN (DHCP)

    DNS Resolver (bind) configured with my ISP’s & Googles DNS severs in the general setup & DNS query forwarding enabled

    Packages:

    Suricata  4.0.0_1

    pfBlockerNG 2.1.1_10

    Symptoms my internet randomly goes down due to DNS being unable to resolve its started since I updated to the latest Suricata & Pfsense builds after several days of monitoring I’ve managed to work out what triggers the event it’s when the scheduled task to update bogans runs currently set to once per day if I manually run

    /usr/bin/nice -n20 /etc/rc.update_bogons.sh
    

    It will trigger the outage nothing can be resolved if I stop Suricata on the WAN interface it all comes back up again unbound appears to be okay during this event something is happening with Suricata on the WAN interface.

    • When this happens, nothing appears in the Alerts or Blocks tabs of Suricata

    • It happens if I disable blocking in Suricata

    • I’ve cleared any supress lists

    • I’ve deleted the Suricata WAN interface & recreated it

    The System logs show some interesting things when I run the cron job to update bogans see below

    After the cron job is run with Suricata  active on the WAN interface

    
    Sep 9 09:51:07	kernel		altq: packet for pppoe0 does not have pkthdr
    Sep 9 09:51:07	kernel		altq: packet for pppoe0 does not have pkthdr
    Sep 9 09:51:07	kernel		altq: packet for pppoe0 does not have pkthdr
    Sep 9 09:51:07	kernel		altq: packet for pppoe0 does not have pkthdr
    Sep 9 09:51:07	kernel		altq: packet for pppoe0 does not have pkthdr
    Sep 9 09:49:49	kernel		589.698009 [ 274] generic_find_num_queues called, in txq 0 rxq 0
    Sep 9 09:49:49	kernel		589.689966 [ 266] generic_find_num_desc called, in tx 1024 rx 1024
    Sep 9 09:49:49	kernel		589.682638 [ 799] generic_netmap_dtor Restored native NA 0
    Sep 9 09:49:49	kernel		589.675318 [ 274] generic_find_num_queues called, in txq 0 rxq 0
    Sep 9 09:49:49	kernel		589.667326 [ 266] generic_find_num_desc called, in tx 1024 rx 1024
    Sep 9 09:49:49	kernel		589.649068 [ 799] generic_netmap_dtor Restored native NA 0
    Sep 9 09:49:49	kernel		589.641543 [ 274] generic_find_num_queues called, in txq 0 rxq 0
    Sep 9 09:49:49	kernel		589.633620 [ 266] generic_find_num_desc called, in tx 1024 rx 1024
    Sep 9 09:48:58	root		rc.update_bogons.sh is sleeping for 60318
    Sep 9 09:48:58	root		rc.update_bogons.sh is starting up.
    

    If I stop Suricata it all comes back online

    
    Sep 9 09:54:08	php-fpm	64950	/suricata/suricata_interfaces_edit.php: End of portal.pfsense.org configuration backup (success).
    Sep 9 09:54:02	php-fpm	64950	/suricata/suricata_interfaces_edit.php: Beginning https://portal.pfsense.org configuration backup.
    Sep 9 09:54:02	check_reload_status		Syncing firewall
    Sep 9 09:54:01	kernel		841.013438 [ 799] generic_netmap_dtor Restored native NA 0
    Sep 9 09:54:00	php-fpm	64950	/suricata/suricata_interfaces_edit.php: [Suricata] Suricata STOP for WAN(pppoe0)...
    Sep 9 09:53:57	kernel		altq: packet for pppoe0 does not have pkthdr
    
    

    I'm a bit lost as to what is happening I currently have bind setup for quite a high level of logging when it goes down I can see it receiving the DNS queries but no responses



  • What blocking mode are you using with Suricata when you enable blocking?  Is it Inline IPS Mode perhaps?  If so, my first theory is you have a compatibility issue with your NIC driver and Netmap.  Netmap is extraordinarily picky about having perfect support from the NIC hardware driver.  The fact your error messages from the kernel mention "netmap" is the clue I'm working from.

    If your NIC driver and Netmap do not play well together (and more NICs don't play well than do play well), then weird stuff begins to happen all the way up to a kernel crash.

    So if I'm right and you are attempting to use Inline IPS Mode, switch over to Legacy Mode blocking instead and let it run for a while to see if the problem repeats.

    Bill



  • @bmeeks:

    What blocking mode are you using with Suricata when you enable blocking?  Is it Inline IPS Mode perhaps?  If so, my first theory is you have a compatibility issue with your NIC driver and Netmap.  Netmap is extraordinarily picky about having perfect support from the NIC hardware driver.  The fact your error messages from the kernel mention "netmap" is the clue I'm working from.

    If your NIC driver and Netmap do not play well together (and more NICs don't play well than do play well), then weird stuff begins to happen all the way up to a kernel crash.

    So if I'm right and you are attempting to use Inline IPS Mode, switch over to Legacy Mode blocking instead and let it run for a while to see if the problem repeats.

    Bill

    Hi thanks for the reply and advice I think you are onto something yes I'm using 'Inline IPS Mode' I've experimented with Promiscuous Mode on/off with little to no effect my hardware is a PC Engines APU2 with Intel i210AT NICs I do recall making some changes when I had Snort over Suricata to System/Advanced/Networking I have noticed that the following things are disabled

    • Hardware Checksum Offloading Disabled

    • Hardware TCP Segmentation Offloading Disabled

    • Hardware Large Receive Offloading Disabled

    I'll experiment with legacy and see how it does I was hoping to use it if possible but perhaps its not going to play nice with my hardware config