Newbie looking for a helping hand - system becomes unresponsive



  • First off thanks for any and all help!

    I'm a new user looking for help diagnosing an issue. The system boots and works fine for a random amount of time before becoming unresponsive. The system log pasted below show what's going on just before becoming unresponsive. Any help diagnosing the issue would be greatly appreciated.

    http://pastebin.com/kZZqyBfi

    System Specs:
    CPU: AMD Athlon™ II X2 245 Processor
    real memory = 4294967296 (4096 MB)
    WAN (nfe0): NVIDIA nForce MCP61 Networking Adapter
    LAN(msk0): Marvell Yukon 88E8053 Gigabit Ethernet

    pfsense is installed on a HDD.

    Dec 30 20:30:13	php: /index.php: Successful webConfigurator login for user 'admin' from 192.168.1.155
    Dec 30 20:30:13	php: /index.php: Successful webConfigurator login for user 'admin' from 192.168.1.155
    Dec 30 14:30:13	sshlockout[24580]: sshlockout/webConfigurator v3.0 starting up
    Dec 30 20:31:02	kernel: msk0: watchdog timeout
    Dec 30 20:31:02	kernel: msk0: link state changed to DOWN
    Dec 30 20:31:02	check_reload_status: Linkup starting msk0
    Dec 30 14:31:05	php: : Hotplug event detected for lan but ignoring since interface is configured with static IP (192.168.1.1)
    Dec 30 20:31:06	check_reload_status: Linkup starting msk0
    Dec 30 20:31:06	kernel: msk0: link state changed to UP
    Dec 30 14:31:09	php: : Hotplug event detected for lan but ignoring since interface is configured with static IP (192.168.1.1)
    Dec 30 20:31:09	check_reload_status: rc.newwanip starting msk0
    Dec 30 14:31:11	php: : rc.newwanip: Informational is starting msk0.
    Dec 30 14:31:11	php: : rc.newwanip: on (IP address: 192.168.1.1) (interface: lan) (real interface: msk0).
    Dec 30 20:31:11	apinger: Exiting on signal 15.
    Dec 30 20:31:12	check_reload_status: Reloading filter
    Dec 30 14:31:12	apinger: Starting Alarm Pinger, apinger(53481)
    Dec 30 20:31:50	syslogd: exiting on signal 15
    


  • From the looks of that, I'd guess you're hitting a problem in the msk driver. Upgrading to 2.1 would be the first thing I would try since it has newer drivers. It also may be that the NIC, its cable or switch port has problems.


  • Netgate Administrator

    The new drivers didn't help in the Watchguard box with that same NIC.
    Instead try disabling MSI for that interface. Put:

    hw.msk.msi_disable=1
    

    In the file: /boot/loader.conf.local
    You will probably have to create that file.

    Steve



  • @cmb:

    From the looks of that, I'd guess you're hitting a problem in the msk driver. Upgrading to 2.1 would be the first thing I would try since it has newer drivers. It also may be that the NIC, its cable or switch port has problems.

    I'm not going to rule out anything. Thanks for the reply. Is 2.1 a beta release? Found it! I only see 2.02 on the download page and the system says I'm on the most recent version. I'm going to try the less invasive suggestion first above. I've also replaced the cable with a newer one which I know is bad troubleshooting but considering it works for random amount of time perfectly I see the cable as being the least of the possible problems and newer cables rarely hurts anyway. ;-)



  • @stephenw10:

    The new drivers didn't help in the Watchguard box with that same NIC.
    Instead try disabling MSI for that interface. Put:

    hw.msk.msi_disable=1
    

    In the file: /boot/loader.conf.local
    You will probably have to create that file.

    Steve

    Thanks for the reply. I've implemented this using the filer package to create/upload the file. We'll see what happens.



  • @AHrubik:

    @stephenw10:

    The new drivers didn't help in the Watchguard box with that same NIC.
    Instead try disabling MSI for that interface. Put:

    hw.msk.msi_disable=1
    

    In the file: /boot/loader.conf.local
    You will probably have to create that file.

    Steve

    Thanks for the reply. I've implemented this using the filer package to create/upload the file. We'll see what happens.

    That seems to have worked! It's been stable for over 48 hours which is much longer than it was able to go for before. Do you mind explaining (Reader's Digest) what you had me do there?



  • That disables MSI in the NIC driver, which I guess either the driver or the hardware has problems with. MSI is:
    http://en.wikipedia.org/wiki/Message_Signaled_Interrupts



  • @cmb:

    That disables MSI in the NIC driver, which I guess either the driver or the hardware has problems with. MSI is:
    http://en.wikipedia.org/wiki/Message_Signaled_Interrupts

    Thank you both for your time and help. As of this post it's still up and running without issue.


Locked