SG-3100 seems to overheat and crash

  • I'm not entirely sure how to begin to triage this issue; but after several months of of trouble-free use, our SG-3100 will crash increasingly frequently on heavy CPU usage.

    We had Suricata running on both the LAN and DMZ without issue. No changes to the configuration were made but recently the box has stopped responding on multiple occasions.

    If I uninstall Suricata and the CPU usage never exceeds 30-40% the unit is fine, but as soon as I start to put it under heavy load it will stop responding and usually requires a reboot. I can see the temp sitting above 65C regularly and a rule reload on Suricata (or rules refresh) will ensure a good minute or two of 100% CPU if the device is already under load.

    It's in a cool basement so heat isn't really an issue.

    WAN is 200/50 (Mb).

    If I leave it long enough, such that the CPU usage falls - it seems to "come back" but the data is anecdotal.

  • Netgate Administrator

    65°C really isn't that hot for the SG-3100, I would not expect that to be any issue.

    When it stops responding does it stop responding entirely? Even on the console?

    You might try pressing Ctl+T at the console if you can. That can sometimes respond when nothing else does and should show whatever process is currently running.

    When it stops what do you do to bring it back? You can enable the watchdog in System > Adv > Misc to have it reboot itself if it completely locks up.


  • @stephenw10 Thanks for the input.

    Simply put, I was WAYYYY off. ;)

    I should have come back here and updated. I conflated a few issues:

    1. Device responsiveness
    2. Heat
    3. Surricata reloads

    It turned out our ISP was having issues. However they were incredibly intermittent. The GUI becomes unresponsive when the WAN/ISP provider is down - but because it was flapping so much it would come and go. As I flailed about mistaking this unresponsiveness for device issues, I further increased the load on the device, toggling Surricata blocking and rulesets.

    Basically: correlation, not causation.

    I have yet to try the console on this unit (having previously used a dedicated self-built machine). Will get that up in case I need to chase down something similar.

  • Netgate Administrator

    Always a good idea to check the console on the embedded devices. Somethings can show there that show nowhere else.

    Also it's a good idea to make sure you can access the console so that if later you have to access it you know you can. 😉


Log in to reply