Spontaneous corruption?

  • Quick question I haven't been able to find an answer for. Multiple times now, on an sg-2440 and a device purchased on alibaba, both running 2.4.4, I have awaken in the morning to lackluster internet performance. I'm on DSL with the modem bridged and pfsense doing the PPPoE connection, so I disconnect and reconnect and, if that doesn't work, I reboot the router via ssh. The problem is, the router never comes back. No wireless internet (even lan access because dhcp is provided by pfsense). Nothing. My pfsense box is on an UPS and there was no power outage. I could shoot myself for not taking a pick of the terminal but pfsense ultimately was unable to find the files/partition needed to boot. Easiest and quickest way to fix was to simply reinstall and restore a backup - no internet means lots of complaints. Curious if anyone else has had this issue and knows what causes it or how I can avoid it. Fortunately I've been around when it's bombed out but my luck will end one day and I'll be on a business trip or something and be forced to walk my 13 yr old through the process. Thoughts from Netgate or other users?

  • Rebel Alliance Global Moderator

    I have a sg-2440 at a remote location for over 2 years now.. Have had zero issues with it.. Only time it has been rebooted is updating pfsense.

    What version are you running on it? You got it off alibaba? Why you not have bought it direct from netgate? I doubt that was a legit sale..

  • Most of the answers to the questions you have asked are already in the post. I've purchased two sg-2440 both from Netgate. As for the alibaba purchase it's a qotom or whatever with an atom quad core with AES-NI, 4 gigs ram, 32 gigs mSata. Given that people build their own boxes out of old pcs and Dell servers I'm challenged to understand the relevance of your question or concern. The alibaba box did not come preloaded with pfsense. I loaded it and configured it myself. No reason to start the copyright infringement shitstorm.

  • Rebel Alliance Global Moderator

    sorry didn't see the and a device - my bad.

    Your saying your having this issue on all of them? Or just the alibaba purchase?

    I only have the 1 sg-2440.. But its been running 24/365 for couple of years now.. Never any issues at all - only time it has ever needed to be rebooted was on upgrade.. Rock solid performance..

    It just uses dhcp for wan via cable connection.. And isn't running any sort of packages like squid, or pfblocker or ids/ips - pretty vanilla install. Your not running arpwatch or unbound in tls forwarding mode are you.. the unbound when using tls forwarding has a memory leak.. And did run into some issues with arpwatch where my sg-4860 would just lock up hard, couldn't even console into it and would have to reboot.

  • I've had the same problem on both devices. The 2440 did it a couple times and the qotom box just did it two days ago. I have only pfblockerng and the openvpn client export packages installed. Wan (em0) is PPPoE DSL connection. I have a main LAN as well as two vlans that I use (all on em1). I also have a site to site ipsec connection to access the network in my home in the US, an openvpn client connection for PBR, and an openvpn server for remote access when I'm tdy or back in the US. This issue occurs with no warning or indicators and I hadn't made any configuration changes for days. Super perplexing. Next time it occurs I can take a pic of the console display. Out of luck this time. I assume any logs were blown away when I reinstalled. I know I've seen similar issues with those who have devices that get power cycled due to a power outage and having no UPS, but that's not my case at all. I even did a test of the UPS just to make sure it holds if the power goes out. 45 minutes later, I ended the test with more than half the battery left. Is my issue first heard for everyone?

  • You're claiming this issue, which I have not heard of before, it happening to three different boxes, two popular models from Netgate and a 3rd product from a completely unrelated company with completely different hardware.

    In order for the same issue to occur, there must be a commonality. It could be the 2440, but no one else is complaining of this issue. It could be pfSense itself, but again, no one else is complaining.

    This leads me to believe is must be a common configuration that you're adding. Of course a configuration should not cause these kinds of issues, but my guess is it's something that can cause the filesystem to become corrupted. Two ideas come to mind. Drive is getting filled up, like Squid, and some interaction is pushing the FS to it's limit. This scares me, but is possible on a full SSD and wear leveling corrupts a block. Another is the devices are using SSDs and you have some extra logging, causing the drives to burn-out or otherwise get corrupted from repeat writing. Low end SSDs do not like the IO pattern logging has.

    I am not say that these are the issues, but with nothing else to go on, I figured I'd contribute to brain storming.

  • Rebel Alliance Netgate Administrator

    First step is to see why this is happening to you. I agree it's most likely related to the configuration/local setup of the devices.

    Before you reboot the device, you should navigate over to Status, System Logs, take note of the information there, try expanding it to a few thousand entries, there will be something noted in this area about what is causing your lack of internet access.

    I would suggest getting a console session going, our SG-2440 Manual has a great guide on gaining access to the console.

    What packages do you have installed on your device? If you run from the command prompt df -h how much disk space is there?

    The more information you can get to us, the better someone can assist you.