Kernel: msk1: watchdog timeout



  • Hi, sometimes my WAN connections fails with this error in system logs:
    Jun 15 08:03:49 kernel: msk1: watchdog timeout
    Jun 15 08:03:49 kernel: msk1: link state changed to DOWN
    Jun 15 08:03:49 check_reload_status: Linkup starting msk1
    Jun 15 08:03:51 check_reload_status: Linkup starting msk1
    Jun 15 08:03:51 kernel: msk1: link state changed to UP

    It stays in this continuous loop until I reboot the PFsense.

    I've read some old post but are all referred to version 1.
    How could be the problem? The strange thing is that LAN interface stays up, I can login to web interface (where I reboot the machine) and all LAN services (DNS,DHCP) are ok, but WAN is down..
    I think that isn't a driver problem because the LAN and WAN interface have the same motherboard chip network card (Commel 673)
    Thanks


  • Netgate Administrator

    I've seen those errors on my box. They usually happen only when I stress the interface with a lot of traffic. There should be code in 2.1 to prevent it happening. Some people have reported success by disabling MSI or MSI-X. There is a tunable for the msk driver:

    
    hw.msk.msi_disable=1
    
    

    Or you can disable it for everything:

    
    hw.pci.enable_msi=0
    hw.pci.enable_msix=0
    
    

    These lines should be put in /boot/loader.conf.local create the file if it doesn't exits.

    Steve



  • Thanks, I'll try that!



  • Hi, I've created the file, but how can I verify that it was applied correctly?
    This night I'll see if it works..


  • Netgate Administrator

    You can use the sysctl command to check from the console:

    
    [2.0.1-RELEASE][root@pfsense.fire.box]/root(1): sysctl hw.pci
    hw.pci.usb_early_takeover: 1
    hw.pci.honor_msi_blacklist: 1
    hw.pci.enable_msix: 1
    hw.pci.enable_msi: 1
    hw.pci.do_power_resume: 1
    hw.pci.do_power_nodriver: 0
    hw.pci.enable_io_modes: 1
    hw.pci.default_vgapci_unit: -1
    hw.pci.host_mem_start: 2147483648
    hw.pci.mcfg: 1
    hw.pci.irq_override_mask: 57080
    
    

    Steve



  • I've made the command and the output is exactly as your, so I think that /boot/loader.conf.local doesn't works (in fact this night crashes again)..
    How is it possible? I've created the file /boot/loader.conf.local with this lines:

    hw.pci.enable_msi=0
    hw.pci.enable_msix=0
    

    I need to change also /boot/loader.conf or some other file?


  • Netgate Administrator

    Hmm,
    Some tunables can be set by adding them to the table in the webGUI, System: Advanced: System Tunables:
    Just create new entries for hw.pci.enable_msi etc.
    However I had thought these needed to be set as boot loader entries.  :-
    In fact that is the instruction given here: http://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards
    Does your file look exactly like the example given in the docs?

    Steve



  • @Gabri.91:

    I've made the command and the output is exactly as your, so I think that /boot/loader.conf.local doesn't works (in fact this night crashes again)..

    /boot/loader.conf.local is processed only at boot time. Hence your change to it won't take effect until the system is rebooted.

    When you wrote "I think that /boot/loader.conf.local doesn't works" did you mean (assuming you rebooted so the changed file would take effect)
    1. the change didn't stop the crash?
    2. the change didn't seem to affect the sysctl values?



  • @wallabybob:

    /boot/loader.conf.local is processed only at boot time. Hence your change to it won't take effect until the system is rebooted.
    When you wrote "I think that /boot/loader.conf.local doesn't works" did you mean (assuming you rebooted so the changed file would take effect)
    1. the change didn't stop the crash?
    2. the change didn't seem to affect the sysctl values?

    Logically I've rebooted the pfSense..
    I mean all two things: the change didn't stop the crash and the change didn't seem to affect the sysctl values



  • There is no solution? The problem still remains..


  • Netgate Administrator

    I have a patched msk driver I made for the firebox NICs you could try. I also modified the LED register settings but that won't effect the operation. See:
    http://forum.pfsense.org/index.php/topic,20095.msg273691.html#msg273691
    I have not yet had any reports of it working or not.

    Steve



  • Ok, I can try with that but why loader.conf.local settings aren't loaded?
    I prefer to try with that before patch driver..


  • Netgate Administrator

    Some random weirdness!  ::)
    Seriously I have no idea, it should work. Permissions problem?
    Try putting it in loader.conf instead. That file will definitely be read and acted upon. It may be overwritten with a firmware update later but it will prove the method.

    Steve



  • Ok, putting them in loader.conf from sysctl the settings were loaded correctly:

    hw.pci.enable_msix: 0
    hw.pci.enable_msi: 0
    

    I'll let you know if it solves the problem..



  • With this settings never crashes and it's perfect! I suggest them before change driver or trash your firewall.. :)


  • Netgate Administrator

    Good to know. Thanks for reporting back.
    Disabling MSI like that will presumably increase the CPU interrupt load. Have you noticed any increased load?

    Steve



  • I don't know because it's on a Pentium M @ 1,7Ghz and it will be never heavy load by a normal ADSL\Hyperlan connection.
    I'll let you now in the next weeks\month when I'll add another NIC for DMZ and CPU will be more loaded..


  • Netgate Administrator

    It would be interesting to see if you can achieve the same thing using instead:

    hw.msk.msi_disable=1
    

    This would disable MSI only for the msk NIC rather than all PCI devices. A cleaner solution.

    Steve



  • I prefer to don't touch a things that works good.. ;)
    If it helps I never see a delay or slowing down in ping, speedtest, navigation etc after modify the value..


Locked