PfSense on Soekris - troubleshooting advice request



  • My company has been using pfSense on a Soekris Net 5501 for several months.  It has been stable and happy, until yesterday when it became non-responsive.  By non-responsive I mean:

    • would not respond to http requests on Lan interface
    • DHCPD would not provide leases upon request
    • Firewall allowed no traffic through the appliance

    I rebooted the machine (unplug the Soekris power supply, wait 30 seconds, and plug it back in) and the machine came up and worked for about 10 minutes, and then again became unresponsive.  Luckily I had on hand a second Soekris box with pfSense loaded and was able to switch out.

    I'm up and running on the backup hardware using the same ISP, wiring, and internal switches as before.  Because things are working again I feel like the issue must be either the Soekris Net 5501 or something about my pfSense configuration on the Soekris.  I'm trying to figure out whether I have a hardware issue (e.g. Ethernet port for Lan interface went bad) or a software config problem, but I'm a bit at a loss as to how to distinguish one from the other.

    I do not export the pfSense logs to an external logging server, and because I have only a CF drive in the Soekris my logs contain only the most recent entries.  Rebooting the machine is usually enough to move "interesting" log entries out of the buffer, and when the box fails I can't see the web GUI to read the new log entries.

    The Net 5501 has four Ethernet ports, of which 3 are in use (Lan, ISP1, ISP2 – I run a failover config).  Perhaps I should move Lan to the fourth Ethernet port to see whether Ethernet port 1 is failing/failed.

    Has anyone had experience with this sort of troubleshooting?  If so, please feel free to offer advice.

    David



  • I think a useful first step would be to determine if it is just a "LAN port lockup" or a system wide lockup or something else.

    If its a "LAN port lockup" then the console serial port should be responsive (and in fact may have told you something about what is going on) and pings to any of the other NICs should generate a response.

    There are many possibilities (including temperature sensitive faults in memory or power supply) so, in general, anything you can do to identify system parts that still work could be helpful towards determining what is causing the "lockup". In a way, its good that it is apparently so readily reproduced.



  • It sounds like I should do something along the lines of:

    • set up test environment

    • connect to serial port to read console

    • connect a computer to LAN port

    • fire up Soekris and wait for failure to occur

    • check console messages

    I'll give that a try when I get a few minutes at work.  The "power that be" have this weird desire that I focus on billable work, rather than internal support work :-)



  • @jupiters_spot:

    My company has been using pfSense on a Soekris Net 5501 for several months.  It has been stable and happy, until yesterday when it became non-responsive.  By non-responsive I mean:

    • would not respond to http requests on Lan interface
    • DHCPD would not provide leases upon request
    • Firewall allowed no traffic through the appliance

    If you connect to the serial console, is there any reaction? If the system just 'hangs', it's a good idea to replace the power supply of the soekris as (even though the soekris boots initially), a broken power supply will hang the system without much diagnostic, it's a very frustrating problem to debug.

    You can use any DC power supply between 7 and 28 volts, I usually use my IBM laptop power supply (it's 16 volts).


Log in to reply