SG-3100 on 2.4.4 Rebooting every day at almost same time.



  • An SG-3100 on 2.4.4 started rebooting every day at almost the same time; around 7 in the morning.

    I'm not even sure how to go diagnosing these issues, there's nothing in the system log that would indicate an issue.

    Feb 10 07:00:00	php		[pfBlockerNG] Starting cron process.
    Feb 10 07:00:02	kernel		arp: 10.0.185.23 moved from  xx on mvneta1
    Feb 10 07:00:03	kernel		arp: 10.0.185.23 moved from  xx on mvneta1
    Feb 10 07:01:33	kernel		arp: 10.0.185.23 moved from xx on mvneta1
    Feb 10 07:01:54	php		[pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
    Feb 10 07:02:04	kernel		arp: 10.0.185.23 moved from xx on mvneta1
    Feb 10 07:03:04	kernel		arp: 10.0.185.23 moved from xx on mvneta1
    Feb 10 07:24:54	syslogd		kernel boot file is /boot/kernel/kernel
    Feb 10 07:24:54	kernel		Copyright (c) 1992-2018 The FreeBSD Project.
    

    Packages installed on top of vanilla:

    • acme
    • Surricata
    • pfBlockerNG
    • OpenVPN Export

    58°C

    Load average
    0.58, 0.58, 0.48

    CPU usage
    28%

    Memory usage
    26% of 2028 MiB

    Disk usage:
    /
    5% of 28GiB - ufs
    /var/run
    3% of 3.4MiB - ufs in RAM

    Not sure what's next.


  • LAYER 8 Netgate

    I would disable suricata and see if the reboots stop. please let us know if that is the case.



  • @derelict

    Done. Stopped it on both the WAN and DMZ (where it was running). Will report back.

    I had Suricata running in Legacy Mode with blocking, ET's Open and Snort Subscriber Ruleset (on Connectivity).


  • Netgate Administrator

    If you can predict when it does it you might hook up a serial console and log the output at that point. If it's rebooting because of a kernel panic you would see something logger there.

    Steve



  • Didn't reboot this morning with Suricata off. Will re-enable and see if the behaviour returns to re-confirm.

    Connection to a serial console is a challenge at the moment as I'm remote - but if the behavior returns I'll switch back to Snort until there's an Suricata update.


  • Netgate Administrator

    Sounds good. I'll be very interested to see this result. I've been hammering Suricata on a 3100 here and failed to trigger any problems.

    Steve



  • @stephenw10 Out of curiosity; I have a 3100 and run Suricata. But I have problems where Suricata will crash and won't start (stale .pid). I've seen some discussion about it but there doesn't seem to be a resolution to this other than deleting the file and restarting. It crashes whenever I make adjustments to pfBlocker or the OpenVPN server and only if Suriata is in blocking mode. Is there going to be a fix for this? It's rather annoying.


  • Netgate Administrator

    Hmm, I used to a hit that a while ago but I thought it was fixed. Certainly I haven't hit it in some time.

    That's definitely a software issue though, better to ask about it in the IDS/IPS board.

    Steve



  • @bhjitsense said in SG-3100 on 2.4.4 Rebooting every day at almost same time.:

    @stephenw10 Out of curiosity; I have a 3100 and run Suricata. But I have problems where Suricata will crash and won't start (stale .pid). I've seen some discussion about it but there doesn't seem to be a resolution to this other than deleting the file and restarting. It crashes whenever I make adjustments to pfBlocker or the OpenVPN server and only if Suriata is in blocking mode. Is there going to be a fix for this? It's rather annoying.

    Specifically, what kind of pfBlocker or OpenVPN server adjustment are you making? Is there anything about the crash in the pfSense system log? Look for a Signal 10 error or fault in the pfSense system log (not in the Suricata log). Let me know what you find.



  • @bmeeks Yes, they are sig 10 errors. I recreated this one by disabling pfBlocker. I'm currently remote, but I imagine if I restart the OpenVPN server, the same thing would occur.



  • @bhjitsense said in SG-3100 on 2.4.4 Rebooting every day at almost same time.:

    @bmeeks Yes, they are sig 10 errors. I recreated this one by disabling pfBlocker. I'm currently remote, but I imagine if I restart the OpenVPN server, the same thing would occur.

    Do the Signal 10 errors reference the Suricata process or something else? As @stephenw10 mentioned in his post, I did make some changes recently to attempt mitigation of the Suricata Signal 10 errors. Those are actually coming from unaligned memory access operations.

    And for clarity to help me understand, you mean that having Suricata and pfBlocker running and then stopping (disabling) pfBlocker will trigger a Suricata Signal 10 error? And same thing with OpenVPN Server?



  • @bmeeks Yes, the exact error is pid 58394 (suricata), uid 0, exited on signal 10 (core dumped)

    That's correct, disabling pfBlocker (as an example) causes this error. Same with OVPN server.



  • @bhjitsense said in SG-3100 on 2.4.4 Rebooting every day at almost same time.:

    @bmeeks Yes, the exact error is pid 58394 (suricata), uid 0, exited on signal 10 (core dumped)

    That's correct, disabling pfBlocker (as an example) causes this error. Same with OVPN server.

    Okay, thanks! That gives me a possible hint at the problem area. You also said you were using Legacy Mode blocking. This points to the issue being within the custom blocking module I wrote for Suricata. Let me examine that code in greater detail to see where the unaligned access might be triggered. One thing the custom blocking module does is monitor all firewall interface IP addresses for changes. Toggling something like OpenVPN Server (and perhaps pfBlocker) will cause the interfaces to cycle and trigger this monitoring thread within Suricata.



  • @bhjitsense said in SG-3100 on 2.4.4 Rebooting every day at almost same time.:

    @bmeeks Yes, the exact error is pid 58394 (suricata), uid 0, exited on signal 10 (core dumped)

    That's correct, disabling pfBlocker (as an example) causes this error. Same with OVPN server.

    Can you please give me one more piece of information -- this time from the suricata.log for the interface. Toggle either OpenVPN Server or pfBlocker to trigger the Suricata Signal 10 error. Then immediately go to the LOGS VIEW tab and select the suricata.log file for the interface and post the last few lines of that file. I'm expecting to see some lines noting that Suricata has detected an IP address change on an interface. You are free to obfuscate the IP addresses if you wish, but I want to know if some of those logging lines are present and what they say. That will help me narrow down precisely which function is the likely culprit.



  • @bmeeks

    These are the last several lines before I triggered a crash.

    12/2/2019 -- 12:20:11 - <Info> -- Using 1 live device(s).
    12/2/2019 -- 12:20:11 - <Info> -- using interface mvneta2
    12/2/2019 -- 12:20:11 - <Info> -- Running in 'auto' checksum mode. Detection of interface state will require 1000 packets.
    12/2/2019 -- 12:20:11 - <Info> -- Set snaplen to 1518 for 'mvneta2'
    12/2/2019 -- 12:20:11 - <Info> -- RunModeIdsPcapAutoFp initialised
    12/2/2019 -- 12:20:11 - <Notice> -- all 3 packet processing threads, 4 management threads initialized, engine started.
    12/2/2019 -- 12:21:10 - <Info> -- No packets with invalid checksum, assuming checksum offloading is NOT used
    


  • @bhjitsense said in SG-3100 on 2.4.4 Rebooting every day at almost same time.:

    @bmeeks

    These are the last several lines before I triggered a crash.

    12/2/2019 -- 12:20:11 - <Info> -- Using 1 live device(s).
    12/2/2019 -- 12:20:11 - <Info> -- using interface mvneta2
    12/2/2019 -- 12:20:11 - <Info> -- Running in 'auto' checksum mode. Detection of interface state will require 1000 packets.
    12/2/2019 -- 12:20:11 - <Info> -- Set snaplen to 1518 for 'mvneta2'
    12/2/2019 -- 12:20:11 - <Info> -- RunModeIdsPcapAutoFp initialised
    12/2/2019 -- 12:20:11 - <Notice> -- all 3 packet processing threads, 4 management threads initialized, engine started.
    12/2/2019 -- 12:21:10 - <Info> -- No packets with invalid checksum, assuming checksum offloading is NOT used
    

    Was there anything added to the suricata.log file after you triggered the crash, or is what you posted from after the crash? I was expecting to see one or more lines with info about an interface IP change being detected.

    If the log was from before the crash, then I need the last few lines of the log after the crash but BEFORE you restart Suricata. Upon a restart Suricata wipes the suricata.log file and starts a new one.

    And can you verify one more condition for me? With blocking disabled, will it still crash when you toggle the state of pfBlocker or OpenVPN Server?



  • @bmeeks This was after the crash but before a restart. However, it was only running for a few minutes before I triggered the crash again. For clarity, I just triggered it again. The logs are the same. The last logs were about 30 minutes ago (the exact same as I submitted above), then I triggered the crash. Nothing new was recorded in that log file at the time of the crash.

    When blocking is disabled, the crashes seem to never happen and I can't seem to trigger it.