Suricata & DNS resolution after update bogons Cron Job runs
-
Hi community looking for some advice to further troubleshoot what’s happening to my setup
Build:
2.3.4-RELEASE-p1 (amd64)Basic config :
WAN (PPPOE) & LAN (DHCP)DNS Resolver (bind) configured with my ISP’s & Googles DNS severs in the general setup & DNS query forwarding enabled
Packages:
Suricata 4.0.0_1
pfBlockerNG 2.1.1_10
Symptoms my internet randomly goes down due to DNS being unable to resolve its started since I updated to the latest Suricata & Pfsense builds after several days of monitoring I’ve managed to work out what triggers the event it’s when the scheduled task to update bogans runs currently set to once per day if I manually run
/usr/bin/nice -n20 /etc/rc.update_bogons.sh
It will trigger the outage nothing can be resolved if I stop Suricata on the WAN interface it all comes back up again unbound appears to be okay during this event something is happening with Suricata on the WAN interface.
• When this happens, nothing appears in the Alerts or Blocks tabs of Suricata
• It happens if I disable blocking in Suricata
• I’ve cleared any supress lists
• I’ve deleted the Suricata WAN interface & recreated it
The System logs show some interesting things when I run the cron job to update bogans see below
After the cron job is run with Suricata active on the WAN interface
Sep 9 09:51:07 kernel altq: packet for pppoe0 does not have pkthdr Sep 9 09:51:07 kernel altq: packet for pppoe0 does not have pkthdr Sep 9 09:51:07 kernel altq: packet for pppoe0 does not have pkthdr Sep 9 09:51:07 kernel altq: packet for pppoe0 does not have pkthdr Sep 9 09:51:07 kernel altq: packet for pppoe0 does not have pkthdr Sep 9 09:49:49 kernel 589.698009 [ 274] generic_find_num_queues called, in txq 0 rxq 0 Sep 9 09:49:49 kernel 589.689966 [ 266] generic_find_num_desc called, in tx 1024 rx 1024 Sep 9 09:49:49 kernel 589.682638 [ 799] generic_netmap_dtor Restored native NA 0 Sep 9 09:49:49 kernel 589.675318 [ 274] generic_find_num_queues called, in txq 0 rxq 0 Sep 9 09:49:49 kernel 589.667326 [ 266] generic_find_num_desc called, in tx 1024 rx 1024 Sep 9 09:49:49 kernel 589.649068 [ 799] generic_netmap_dtor Restored native NA 0 Sep 9 09:49:49 kernel 589.641543 [ 274] generic_find_num_queues called, in txq 0 rxq 0 Sep 9 09:49:49 kernel 589.633620 [ 266] generic_find_num_desc called, in tx 1024 rx 1024 Sep 9 09:48:58 root rc.update_bogons.sh is sleeping for 60318 Sep 9 09:48:58 root rc.update_bogons.sh is starting up.
If I stop Suricata it all comes back online
Sep 9 09:54:08 php-fpm 64950 /suricata/suricata_interfaces_edit.php: End of portal.pfsense.org configuration backup (success). Sep 9 09:54:02 php-fpm 64950 /suricata/suricata_interfaces_edit.php: Beginning https://portal.pfsense.org configuration backup. Sep 9 09:54:02 check_reload_status Syncing firewall Sep 9 09:54:01 kernel 841.013438 [ 799] generic_netmap_dtor Restored native NA 0 Sep 9 09:54:00 php-fpm 64950 /suricata/suricata_interfaces_edit.php: [Suricata] Suricata STOP for WAN(pppoe0)... Sep 9 09:53:57 kernel altq: packet for pppoe0 does not have pkthdr
I'm a bit lost as to what is happening I currently have bind setup for quite a high level of logging when it goes down I can see it receiving the DNS queries but no responses
-
What blocking mode are you using with Suricata when you enable blocking? Is it Inline IPS Mode perhaps? If so, my first theory is you have a compatibility issue with your NIC driver and Netmap. Netmap is extraordinarily picky about having perfect support from the NIC hardware driver. The fact your error messages from the kernel mention "netmap" is the clue I'm working from.
If your NIC driver and Netmap do not play well together (and more NICs don't play well than do play well), then weird stuff begins to happen all the way up to a kernel crash.
So if I'm right and you are attempting to use Inline IPS Mode, switch over to Legacy Mode blocking instead and let it run for a while to see if the problem repeats.
Bill
-
What blocking mode are you using with Suricata when you enable blocking? Is it Inline IPS Mode perhaps? If so, my first theory is you have a compatibility issue with your NIC driver and Netmap. Netmap is extraordinarily picky about having perfect support from the NIC hardware driver. The fact your error messages from the kernel mention "netmap" is the clue I'm working from.
If your NIC driver and Netmap do not play well together (and more NICs don't play well than do play well), then weird stuff begins to happen all the way up to a kernel crash.
So if I'm right and you are attempting to use Inline IPS Mode, switch over to Legacy Mode blocking instead and let it run for a while to see if the problem repeats.
Bill
Hi thanks for the reply and advice I think you are onto something yes I'm using 'Inline IPS Mode' I've experimented with Promiscuous Mode on/off with little to no effect my hardware is a PC Engines APU2 with Intel i210AT NICs I do recall making some changes when I had Snort over Suricata to System/Advanced/Networking I have noticed that the following things are disabled
-
Hardware Checksum Offloading Disabled
-
Hardware TCP Segmentation Offloading Disabled
-
Hardware Large Receive Offloading Disabled
I'll experiment with legacy and see how it does I was hoping to use it if possible but perhaps its not going to play nice with my hardware config
-