Troubleshooting help needed



  • Hi guys,

    I'm using pfSense since February and I'm quite happy. Unfortunately Since February I'm experiancing very strange problems. The box is located in a different country, therefore I can't see what's going on on the screen. The only option is to check logs and web interface.

    Back in february my configuration was not very stable, thats why I didn't write anything, but now I'm sure I don't have any configuration issues, but some hardware problem.

    The problem:

    The box is working fine for 8-10 days and at certain point I can't reach my WAN. No ping, no SSH, no Internet via LAN - nothing, but the box is alive. After cold restart - everything is OK for another 8-10 days and so on…

    Can you help me please to troubleshoot this problem? I've checked all logs and there are no faults evident. Even more there is no information at what time the box became unresponsive. In the summer I've installed 12cm FAN and my system temperature is 36-38 degrees C all the time, so I'm guessing it's not overheating issue.

    My hardware configuration is:

    Motherboard: GIGABYTE GA-C1037UN-EU (Intel® Dual-core Celeron® 1037U processor (1.8 GHz))
    Motherboard Specs: http://www.gigabyte.com/products/product-page.aspx?pid=4747#sp

    Case: E-mini I5 Black, Brushed Aluminium, 120W DC/DC + 12V/5A adapter, Mini-ITX
    RAM: KINGSTON 2GB DDR3 1600 HYPER X (Part Number: KHX1600C9D3B1K2)
    SSD/DOM: 16GB Apacer SDM4-M APSDM016G15AN-CCM 22pin 90° Industrial S-ATA DOM

    My NIC chipsets are:RealTek 8111F.

    pfSense installed: 2.1.5-RELEASE (amd64)

    I'm looking forward to hearing from you!

    Regards,
    Nick



  • @Nikolay_Zhelev:

    The box is working fine for 8-10 days and at certain point I can't reach my WAN. No ping, no SSH, no Internet via LAN - nothing, but the box is alive. After cold restart - everything is OK for another 8-10 days and so on…

    How are you sure the box is alive?  Can someone at the remote site test it via the LAN?  See if the web gui is available on the LAN, or at least responds to pings on the LAN nic.

    If the WAN goes down, there should be something logged about that when it goes down.  Can you explain more about the logs, ie after WAN disappears, is there normal activity still logged?  Or is there a complete gap of anything logged until the cold restart?  The latter would point to a hardware problem, a complete lock-up.

    You might consider running memtest for a few days, and if that passes, run a cpu loading program for a few days.

    PS - Explain your power setup please.  Really DC-DC converter?  12V * 5A = 60W, not 120W ??



  • Are you able to communicate with whatever device is connected to the WAN port of the pfSense machine? And verify that the interface is functional? Is there a chance that your Internet connection is PPPoE or similar and restarting the box is logging it back in?



  • Hi charliem,

    Thank you for your reply!

    My box is located in my home country in my mother’s apartment, therefore when I had the problem I thought that the electricity went down, but I called my mother, she checked and the box was alive, but no response. Next time when the problem appear, I’ll ask my mother to try to log in via web interface to the box.

    Regarding the logging: there is a complete gab until the cold restart takes place. That’s why I’m looking for help, because there is nothing logged to give me some clue.

    Regarding the power: the DC/DC converting circuit is located in the case. It's DC/DC, because there is AC/DC adapter (laptop type) which is connected to the box. That's how the case is designed, I've never changed anything.

    @ember1205

    Thank you for your reply as well. My ISP is providing my internet connection via PPPoE, but they are quite reliable, never had any problems in the past.

    Regards,
    Nick



  • @Nikolay_Zhelev:

    Regarding the logging: there is a complete gab until the cold restart takes place. That’s why I’m looking for help, because there is nothing logged to give me some clue.

    If no normal stuff is logged until restart, then it's almost certain the board is locked up.

    Regarding the power: the DC/DC converting circuit is located in the case. It's DC/DC, because there is AC/DC adapter (laptop type) which is connected to the box. That's how the case is designed, I've never changed anything.

    I'd be suspicious of your DC-DC power supply, and/or heat buildup in the case.  I have no experience with DC –> ATX adapters.  Can you run it temporarily with a standard ATX power supply? Also perhaps try running it with the case open?



  • Hi charliem,

    I can confirm, that there is a log gab after system hang.

    Regarding the DC - DC converter, I have installed 12 cm fan in the box and activated thermal monitoring function in pfSense. All temperatures are around 36 degrees so I'm guessing it's not a thermal issue.

    The question is: How can I troubleshoot my problem remotely via SSH or WebGUI, since I'm away from my box?

    Thanks for all the attention!

    Regards,
    Nikolay



  • @Nikolay_Zhelev:

    I can confirm, that there is a log gab after system hang.

    The question is: How can I troubleshoot my problem remotely via SSH or WebGUI, since I'm away from my box?

    Well, I'm not sure what you expect to monitor or troubleshoot remotely.  We've already established that it's probably a hardware issue, one that locks the machine suddenly and stops any further activity.

    When I'm faced with a hardware problem, I start replacing hardware, and I'd start with your DC-DC converter and your external 12V power supply.  As you say, you've been dealing with this since February, and it's not a configuration issue.



  • Thanks charliem, as soon as I get home I'll start hardware troubleshooting.

    BTW the RAM module I'm using is not listed by my motherboard manufacturer, but it's the same brand (Kingston) and I think it's even better. Could that be the core issue to my problem?

    I'm using: KHX1600C9D3B1K2 - only one stick 2 GB. It works at 1600 Mhz and my motherboard does support 1600 Mhz.

    The manufacturer has listed: KVR1333D3N9/4G which runs at 1333 Mhz.

    The brand is the same, so I'm guessing the quality of the modules will be the same.

    Thanks.

    Regards,
    Nick



  • I'd also try with a new PSU.

    My current pfSense is also working fine with a DC>ATX converter at for an year now, but had problems in the past with a different type of DC>ATX psu model. You should just grab a psu from an old PC for testing.



  • Dear all,

    It's been a while since my last post, but I wanted to make sure I've solved the problem.

    The system hang was caused by a faulty PSU. Thanks a lot for all your suggestions.

    I've replaced the AC-DC power brick with FSP 12V 12.5A 150W and the DC to DC PSU with MiniBox 12V 160W picoPSU-160-XT.

    Now the system is quite stable.

    Good luck to all of you and all the best,
    Nick



  • Now the system is quite stable.

    If not it would be perhaps going to set up an APC PSU in forn of the pfSense.