Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Multiple pfSense firewalls Deployed in a DC environment crash on the same day

    Scheduled Pinned Locked Moved General pfSense Questions
    15 Posts 4 Posters 1.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • jimpJ
      jimp Rebel Alliance Developer Netgate
      last edited by

      It's possible, but unlikely, to be software.

      Repeated crashes with multiple signals are almost always an indicator of a hardware issue.

      Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

      Need help fast? Netgate Global Support!

      Do not Chat/PM for help!

      1 Reply Last reply Reply Quote 0
      • S
        shaunjstokes
        last edited by

        This is very odd.

        The secondary firewalls are identical to the primaries in terms of hardware and have now been stable with an uptime of > 62 days.

        The primary firewalls had the same uptimes (> 61 days) each crashed once like clockwork exactly 12 hours apart.

        All had previously been stable prior to the 2.4.4-p1 upgrade for around 3 years.

        We will continue to monitor but this kind of behaviour seems very strange, if it was a hardware problem triggered by the hardware I would expect these crashes to repeat across all firewalls not just the primaries, not just once and not at a precise 12 hour interval.

        We will continue to monitor.

        1 Reply Last reply Reply Quote 0
        • S
          shaunjstokes
          last edited by

          No further problems thus far, doesn't seem like a hardware issue.

          The primary pfsense at both DCs have remained stable since the incident, neither of the secondary pfsense were affected but all use identical hardware.

          1 Reply Last reply Reply Quote 0
          • bmeeksB
            bmeeks
            last edited by

            Do NOT use Service Watchdog with Snort or with Suricata! Service Watchdog is not designed to work with applications like Snort and Suricata that open multiple copies of themselves (one per configured interface). Also, Service Watchdog does not understand that Snort and Suricata restart themselves after rules updates. Service Watchdog sees a process missing and immediately calls its shell script to restart it. The problem is Snort or Suricata is already in the middle of restarting itself. This leads to all kinds of issues.

            I am assuming you are using Service Watchdog for Snort since you only list pfBlockerNG and Snort as installed packages.

            1 Reply Last reply Reply Quote 0
            • S
              shaunjstokes
              last edited by

              We had been using Service Watchdog with Snort but have now removed it as per your recommendation.

              We originally had issues with Snort crashing occasionally but it would never restart on its own and was otherwise working as expected, Service Watchdog was used as a workaround but that was more than 3 years ago. It's possible this is no longer an issue given there have been a vast number of changes\updates since then.

              We will continue to monitor.

              bmeeksB 1 Reply Last reply Reply Quote 0
              • bmeeksB
                bmeeks @shaunjstokes
                last edited by

                @shaunjstokes said in Multiple pfSense firewalls Deployed in a DC environment crash on the same day:

                We had been using Service Watchdog with Snort but have now removed it as per your recommendation.

                We originally had issues with Snort crashing occasionally but it would never restart on its own and was otherwise working as expected, Service Watchdog was used as a workaround but that was more than 3 years ago. It's possible this is no longer an issue given there have been a vast number of changes\updates since then.

                We will continue to monitor.

                Okay. I misinterpreted your most recent post to the thread to mean there might still be problems (meaning you had ruled out hardware but there could still be a software issue).

                I looked into making the necessary changes to Service Watchdog so it could work with Snort and Suricata, but the required changes were quite massive and having the compatibility would likely only help a very small number of users. Based on that, I dropped the initiative.

                1 Reply Last reply Reply Quote 0
                • S
                  shaunjstokes
                  last edited by

                  There have been no hardware issues that we can find, I believe software is the most likely cause. Given what you've said about Service Watchdog it's possible that was in some way related to the crashes in this incident, if Service Watchdog tried to start Snort while Snort was updating then it's possible it would have crashed which may explain some of what we see in the dumps, as updates happen at set intervals it could also explain why the crashes were precisely 12 hours apart.

                  It's possible Service Watchdog is no longer needed if Snort is now stable but we can't be sure so we will just have to monitor, if Snort does still occasionally crash then what might be useful is some options with-in Service Watchdog such as a 10 or even 20 minute delay so an application has to be continuously down for 10 or 20 minutes before Service Watchdog initiates the start.

                  bmeeksB 1 Reply Last reply Reply Quote 0
                  • GertjanG
                    Gertjan
                    last edited by Gertjan

                    Just thinking out loud : same hardware ... same software ... same moment :
                    What about a DDOS issue ?
                    Master breaks down, slave takes over the load. Software hardware is the same so same result .

                    No "help me" PM's please. Use the forum, the community will thank you.
                    Edit : and where are the logs ??

                    1 Reply Last reply Reply Quote 0
                    • bmeeksB
                      bmeeks @shaunjstokes
                      last edited by bmeeks

                      @shaunjstokes said in Multiple pfSense firewalls Deployed in a DC environment crash on the same day:

                      There have been no hardware issues that we can find, I believe software is the most likely cause. Given what you've said about Service Watchdog it's possible that was in some way related to the crashes in this incident, if Service Watchdog tried to start Snort while Snort was updating then it's possible it would have crashed which may explain some of what we see in the dumps, as updates happen at set intervals it could also explain why the crashes were precisely 12 hours apart.

                      It's possible Service Watchdog is no longer needed if Snort is now stable but we can't be sure so we will just have to monitor, if Snort does still occasionally crash then what might be useful is some options with-in Service Watchdog such as a 10 or even 20 minute delay so an application has to be continuously down for 10 or 20 minutes before Service Watchdog initiates the start.

                      Snort should be pretty stable these days. About the only thing that might take it down (really, it would just prevent startup after a rules update) is a bad rule. This has happened a few times over the last few years. A rule syntax error gets introduced via rules update and that prevents Snort from restarting after the update.

                      I suspect Service Watchdog played a role in your Snort crashes. You could try greatly extending the delay in Service Watchdog, but that only fixes one problem. Another larger one pops up if you run Snort on multiple interfaces. In that case, there is a separate Snort process for each interface. That fools the check done by Service Watchdog when it checks if Snort is running. Service Watchdog simply does an equivalent of:

                      ps -ax | grep snort
                      

                      If it gets a response, it assumes Snort is good. The problem here is Snort may have crashed on the WAN but be running on the LAN, or crashed on one VLAN but running on others. Service Watchdog doesn't understand how to look for multiple Snort instances and match them to the configured interfaces.

                      1 Reply Last reply Reply Quote 0
                      • S
                        shaunjstokes
                        last edited by

                        Everything is monitored, there were no abnormal fluctuations in traffic etc, unless this happened between polling periods which is possible, although technically if the software crashed because of a DDOS that would be a problem to overcome in the software.

                        1 Reply Last reply Reply Quote 0
                        • bmeeksB
                          bmeeks
                          last edited by

                          This thread has given me an idea for a new "health" feature, though. I might be able to put some checks into a cron task and then let the Snort GUI itself (through that cron task) send the admin an email if a configured Snort instance crashes. I could add some configuration info onto the GLOBAL SETTINGS tab. I will consider this for a future update.

                          1 Reply Last reply Reply Quote 0
                          • S
                            shaunjstokes
                            last edited by

                            The health feature would be a good idea. Although it's been over a month now and Snort has been stable with-out Service Watchdog, the problems we had with Snort in the earlier versions of pfSense no longer appear to be present.

                            At this stage I suspect the crash may have been the result of a conflict between Snort and Service Watchdog possibly while Snort was updating.

                            1 Reply Last reply Reply Quote 1
                            • First post
                              Last post
                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.