Yet another sk0 (Dlink DGE-530T) driver issue on pfSense 2.0



  • Hi folks,

    I've been suffering with pfSense 2.0 (inc betas) dropping my LAN interface after a week or so of working properly ever since upgrading from 1.2.x.  When pfSense drops the nic I can't ping via that nic.

    I now have tried 4 gigabit network cards - 3 Dlink DGE-530T and 1 SMC 9452TX gigabit nic which looks almost the same as the dlink cards.  They all have the Marvel Yukon chipset and all use the sk driver.  3 of the nics are brand new.  I had no issues with 1.2.x with the original Dlink nic.

    I have discovered that many others have had the same issues as I have (some with DOWN/UP cycles) and a ifconfig sk0 DOWN / ifconfig sk0 UP will bring the nic back to life.

    So there definitely looks like there is an issue with this sk driver.

    It might be fixed in FreeBSD 8.2+, the question is how can I get a new updated driver working on my pfSense 2.0 box?  Alternatively, how can I get the 1.2.x driver working instead?

    I understand the newer driver might have to be compiled to get it to work, but I don't know how to do this.

    Can someone please give a guide on how do try newer drivers?

    Alternatively, how can I try the driver from 1.2.x?

    To clarify, by driver I mean this: http://www.freebsd.org/cgi/man.cgi?query=sk&apropos=0&sektion=0&manpath=FreeBSD+8.1-RELEASE&arch=default&format=html

    Others with similar issues:
    http://forum.pfsense.org/index.php/topic,41215.0.html
    http://forum.pfsense.org/index.php/topic,42942.0.html
    http://forum.pfsense.org/index.php/topic,40147.0.html
    http://forum.pfsense.org/index.php/topic,42865.0.html

    Probably more, but that should illustrate the point.

    While waiting for assistance to change the driver, is there a way I can get pfSense to automatically execute a script doing the ifconfig sk0 DOWN/UP cycle when the nic gets dropped (shows up in the logs as hotplug event).  apinger works for my WAN interfaces, is there a way to make it work for the LAN interface too?

    In summary I'm looking for:
    1. Way to use a newer sk driver
    2. Way to use the old sk driver from 1.2.x
    3. Workaround script / apinger reconfiguration that cycles the nic automatically.

    Thank you very much in advance,

    Best Regards,

    Vent

    PS. my first post, apologies in advance if I have done something wrong or stepped on the wrong toes or posted to the wrong forum etc, please forgive me!


  • Netgate Administrator

    There's a lot more to running a newer driver than just compiling it unfortunately. This is made all the harder by this:
    @FreeBSD:

    The miibus(4) has been rewritten for the generic IEEE 802.3 annex 31B full duplex flow control support. The alc(4), bge(4), bce(4), cas(4), fxp(4), gem(4), jme(4), msk(4), nfe(4), re(4), stge(4), and xl(4) drivers along with atphy(4), bmtphy(4), brgphy(4), e1000phy(4), gentbi(4), inphy(4), ip1000phy(4), jmphy(4), nsgphy(4), nsphyter(4), and rgephy(4) have been updated to support flow control via this facility

    Though I note that sk(4) is not in that list.

    You could wait for the first builds of 2.1 on FreeBSD 9. No time frame for that though.

    Have you tried any tunables? Do your logs show any errors when it stops responding?

    Steve



  • Hi Steve,

    Yes, I'm beginning to realise there is alot more than simply adding in a new file somewhere in the system and getting it to work, I've been reading your posts!

    The first indication of something going wrong is there's a hotplug event detected for lan but ignoring as static ip.

    When I saw that I did as others did by turning off all power management I could find in the bios, but that didn't help.

    From the man page it looks like the only tunables are to disable jumbo frames and I don't use those, so don't think that would help.  The other thing would be to force full duplex gigabit with slave / master options, but honestly, I'm not holding out much hope of these fixing the issue.

    I'm beginning to think the sk driver is just unusable and/or unreliable for pfSense 2.0 and the only realistic option is to just go and buy a different card altogether or revert back to 1.2.x

    How about helping out with option 3 and that is to get apinger / other script to cycle the nic down/up? That might provide a workaround in the meantime?

    Many thanks for your response and time,

    Vent


  • Netgate Administrator

    There is also:

    dev.skc.%d.int_mod
        This variable controls interrupt moderation.  The accepted range
        is 10 to 10000.  The default value is 100 microseconds.  The
        interface has to be brought down and up again before a change
        takes effect.

    Interestingly the default value under the Linux sk98lin driver is 500µs.

    I have been using the sk(4) driver with 2.0 on my test box with no problems at all, unlike the msk(4) driver which freezes all the time!  ::)

    Steve



  • Hi,

    I think if the interface was cycling between DOWN and UP the interrupt moderation tuning parameter might be helpful, but in my case the interface goes down and stays down, so even a long time might not work.

    It's worth a try though, where / how can I configure this?

    Secondly, I have been looking at /etc/rc.linkup and it looks to have the exact event I'm seeing in my logs:

    function handle_argument_group($iface, $argument2) {
    	global $config;	
    
    	$ipaddr = $config['interfaces'][$iface]['ipaddr'];
    	if (is_ipaddr($ipaddr) || empty($ipaddr)) {
    		log_error("Hotplug event detected for {$iface} but ignoring since interface is configured with static IP ({$ipaddr})");
    		interfaces_staticarp_configure($iface);
    		$iface = get_real_interface($iface);
    		interfaces_bring_up($iface);
    		if ($argument2 == "start" || $argument2 == "up")
    			send_event("interface newip {$iface}");
    

    There are other interesting functions in there to bring the interface down or up depending on the second variable.

    I might not be reading this correctly, but it looks like it tries to bring the interface up after logging the error.  This is what I'm looking for if it does do that, but how do I get this to run?  It's almost as if the script is doing the opposite of ignoring the event as it says in the log line and goes ahead and tries to bring the interface up anyway.

    I am confused!

    This is the first time I'm looking at pfSense code and I'm way to new to this to try and play with the code, so can you or anyone offer any advice or suggestions?

    Thanks in advance,

    Vent


  • Netgate Administrator

    @Ventolin:

    It's worth a try though, where / how can I configure this?

    You configure it using the sysctl command:

    sysctl dev.skc.0.int_mod
    

    Should give the current value.

    sysctl dev.skc.0.int_mod 500
    

    Sets it to 500.

    To set it permanently you can add this as a new value in System: Advanced: System Tunables:

    Re-writing parts of pfSense is beyond my usual level of tinkering ;) It shouldn't have to do this. If the interface were actually going down and then coming back up it would reload no problem. Adding code to reset the interface when it sees a hotplug event would cause flapping if you actually unplugged the cable. Perhaps if you posted some of your logs it would be helpful. Also the relevant sections from the boot log:

     cat /var/log/dmesg.boot | grep sk0
    

    And maybe:

    pciconf -lv | grep sk0
    

    Steve



  • Hi,

    Thanks Steve for your help again.

    sysctl dev.skc.0.int_mod
    

    returns 100

    sysctl -w dev.skc.0.int_mod=500
    

    sets it to 500
    (quick man sysctl was needed for correct syntax :P)

    From the logs, this is what I get…

    Dec  1 18:51:17 myfirewall check_reload_status: Linkup starting sk0
    Dec  1 18:51:17 myfirewall kernel: sk0: link state changed to DOWN
    Dec  1 18:51:24 myfirewall php: : Hotplug event detected for lan but ignoring since interface is configured with static IP (192.168.0.3)
    

    After that, nothing until I got hope and typed:

    ifconfig sk0 UP
    

    and got this in the logs:

    Dec  1 22:25:30 myfirewall check_reload_status: Reloading filter
    Dec  1 22:26:31 myfirewall check_reload_status: Reloading filter
    Dec  1 22:32:06 myfirewall check_reload_status: Linkup starting sk0
    Dec  1 22:32:06 myfirewall kernel: sk0: link state changed to UP
    Dec  1 22:32:17 myfirewall php: : Hotplug event detected for lan but ignoring since interface is configured with static IP (192.168.0.3)
    Dec  1 22:32:17 myfirewall check_reload_status: rc.newwanip starting sk0
    Dec  1 22:32:29 myfirewall php: : rc.newwanip: Informational is starting sk0.
    Dec  1 22:32:29 myfirewall php: : rc.newwanip: on (IP address: 192.168.0.3) (interface: lan) (real interface: sk0).
    

    and then some stuff about routing, apinger restarting, dnsmasq starting etc.

    From dmseg.boot:

    skc0: <marvell gigabit="" ethernet=""> port 0xd400-0xd4ff mem 0xd8000000-0xd8003fff irq 10 at device 9.0 on pci0
    skc0: Marvell Yukon Gigabit Ethernet rev. (0x1)
    sk0: <marvell semiconductor,="" inc.="" yukon=""> on skc0
    miibus0: <mii bus=""> on sk0
    e1000phy0: <marvell 88e1011="" gigabit="" phy=""> PHY 0 on miibus0
    e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
    skc0: [ITHREAD]
    skc1: <d-link dge-530t="" gigabit="" ethernet=""> port 0xe000-0xe0ff mem 0xd8004000-0xd8007fff irq 11 at device 15.0 on pci0
    skc1: DGE-530T Gigabit Ethernet Adapter rev. (0x1)
    sk1: <marvell semiconductor,="" inc.="" yukon=""> on skc1
    miibus3: <mii bus=""> on sk1
    e1000phy1: <marvell 88e1011="" gigabit="" phy=""> PHY 0 on miibus3
    e1000phy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
    skc1: [ITHREAD]
    skc2: <d-link dge-530t="" gigabit="" ethernet=""> port 0xe400-0xe4ff mem 0xd8008000-0xd800bfff irq 10 at device 17.0 on pci0
    skc2: DGE-530T Gigabit Ethernet Adapter rev. (0x1)
    sk2: <marvell semiconductor,="" inc.="" yukon=""> on skc2
    miibus4: <mii bus=""> on sk2
    e1000phy2: <marvell 88e1011="" gigabit="" phy=""> PHY 0 on miibus4
    e1000phy2:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
    skc2: [ITHREAD]</marvell></mii></marvell></d-link></marvell></mii></marvell></d-link></marvell></mii></marvell></marvell>
    

    I have 3 gigabit nics in there and 2 10/100 nics (not Marvell/Yukon), a total of 5 nics.
    sk0 is currently a brand new SMC 9452TX gigabit card, as mentioned before.  I put the extra cards in there for quick testing of other nics to see if I could isolate the problem but all 3 cards do the same thing (even when installed 1 at a time) which led me to suspect the sk driver.

    From pciconf:

    skc0@pci0:0:9:0:        class=0x020000 card=0xb45210b8 chip=0x432011ab rev=0x12 hdr=0x00
        class      = network
        subclass   = ethernet
    skc1@pci0:0:15:0:       class=0x020000 card=0x4c001186 chip=0x4c001186 rev=0x11 hdr=0x00
        class      = network
        subclass   = ethernet
    skc2@pci0:0:17:0:       class=0x020000 card=0x4c001186 chip=0x4c001186 rev=0x11 hdr=0x00
        class      = network
        subclass   = ethernet
    
    

    If a script ran triggered by the hotplug event I don't mind the consequence of the interface 'flapping' if I unplug the network cable; I'm only ever doing that when the box is off and I'm chaning network cards.  Very much the lesser of 2 evils.

    Thank you very much for your help,

    Regards,

    Vent


  • Netgate Administrator

    Hmm, well I'm at the end of diagnostic skills I'm afraid.
    My own interfaces:

    
    skc0: <marvell gigabit="" ethernet="">port 0xc000-0xc0ff mem 0xd042c000-0xd042ffff irq 16 at device 0.0 on pci5
    skc0: Marvell Yukon Lite Gigabit Ethernet rev. (0x9)
    sk0: <marvell semiconductor,="" inc.="" yukon="">on skc0
    e1000phy4: <marvell 88e1011="" gigabit="" phy="">PHY 0 on miibus4
    
    skc0@pci0:5:0:0:	class=0x020000 card=0x43201148 chip=0x432011ab rev=0x13 hdr=0x00</marvell></marvell></marvell> 
    

    Slightly different from yours and working perfectly. Not much help to you though!

    Steve



  • Hi,

    Just to report sysctl -w dev.skc.0.int_mod=500 didn't work, though I got 20 days uptime this time.

    Have updated to pfsense 2.0.1 now, maybe that will be better.

    Would really like to know how to write that script.

    Would be even better if the sk driver was updated or fixed.  How do I make a bug report?

    To be honest, I'm either going to put a cheap realtek nic in or build a second pfsense box and use CARP failover between them.

    Will report back if any further issues.


  • Netgate Administrator

    It's unlikely a bug report on 2.0.X would be acted on at this point since builds of 2.1 on FreeBSD 9 are now getting close. CMB recently posted 'by the end of the year'.
    These will be the first builds for testing only and will probably have many bugs but will have much newer drivers. And by running these and reporting the bugs you will be helping out everyone.  :)

    Steve

    Edit: However now I can't find where I read that! A tweet maybe? Maybe it wasn't CMB.  ::) Fairly sure I did read it though!



  • Hi,

    Well 2.0.1 lasted 2 days of uptime, so the sk bug is still there, perhaps unsurprisingly.

    So, I did some digging and learnt how to write a shell script which simply pings an ip on the LAN 3 times (with a delay of 1 second) and then does an ifconfig sk0 DOWN then ifconfig sk0 UP if the ping fails.  There's a 2 second wait between the DOWN and UP commands in case the driver needs a bit of time to work.

    I then hooked the script up to cron so it could run every minute.

    So, here's a quick guide on how to keep your breaking sk driver working.

    1. Make a new file called ifcheck.sh in /usr/local/bin

    2. copy and paste the following code in:

    #!/bin/sh
    #set logfile here or uncomment second line for no logging
    LOGFILE=/tmp/ifcheck.log
    #LOGFILE=/dev/null
    
    #Set primary interface/ip to check here
    IF1=sk0
    IP1=192.168.0.10
    
    #add more interfaces/ips here
    #IF2=rl0
    #IP2=192.168.1.254
    
    #uncomment next line for debugging
    #echo $(date) "pinging interfaces..." >> $LOGFILE
    
    ping -c 3 -t 1 $IP1 > /dev/null 2>&1 || (echo $(date) "$IF1 DOWN, bouncing..." >> $LOGFILE && /sbin/ifconfig $IF1 down && sleep 2 && /sbin/ifconfig $IF1 up && echo $(date) "$IF1 set to UP" >> $LOGFILE)
    #ping -c 3 -t 1 $IP2 > /dev/null 2>&1 || (echo $(date) "$IF2 DOWN, bouncing..." >> $LOGFILE && /sbin/ifconfig $IF2 down && sleep 2 && /sbin/ifconfig $IF2 up && echo $(date) "$IF2 set to UP" >> $LOGFILE)
    
    

    3. reconfigure the script to your chosen interface name and ip address - you can uncomment the second ping and variables to test another interface and extend to test more.

    I used the Easy Editor ee from the console shell prompt to write the script instead of the Virtually Impossible vi editor because I didn't want to take 10 years to master vi to do a simple edit when I could spend 5 seconds doing the same thing in ee :P

    4. from a shell prompt type the following to make the script executable:

    chmod +755 /usr/local/bin/ifconfig.sh
    

    5. from the web interface add the cron package

    6. Add a new entry to the cron table:
    minute */1
    hour *
    mday *
    month *
    wday *
    who root
    command /usr/local/bin/ifcheck.sh

    7. Save

    8. To test, uncomment the debugging line like this:

    #uncomment next line for debugging
    echo $(date) "pinging interfaces..." >> $LOGFILE
    

    This will write to a log file: /tmp/ifcheck.log

    The cron settings will mean this will fire every minute.  You'll want to comment back the debug line once you're satisfied the script is working to only have errors in the log file and to save space.

    You could probably use:

    /usr/bin/nice -n20 /usr/local/bin/ifcheck.sh
    ```in the cron entry which I think will make the process run with a lower priority, but I haven't tested that yet.
    
    The script should be self explanatory.  If the ping fails, ping will return an output that will trigger everything after the || to run.
    
    OK, standard disclaimer, if it breaks your system sorry about that, use at your own risk.
    
    To write the script I did quite a bit of googling and took bits of code from other slightly more complicated scripts and kept things simple.
    
    If anyone has any suggestions to make it better, please just add your thoughts and/or improvements, it will be very welcome.  Feel free to use / modify as you wish.
    
    Thanks everyone, esp stephenw10.
    
    Vent


  • OK, this version logs to the syslog as well as the logfile:

    #!/bin/sh
    #set logfile here or uncomment second line for no logging
    LOGFILE=/tmp/ifcheck.log
    #LOGFILE=/dev/null
    
    #Set primary interface/ip to check here
    IF1=sk0
    IP1=192.168.0.10
    
    #add more interfaces/ips here
    #IF2=rl0
    #IP2=192.168.1.254
    
    #uncomment next line for debugging
    echo $(date) "Pinging interfaces..." >> $LOGFILE
    #logger -t ifcheck Pinging interfaces...
    
    ping -c 3 -t 1 $IP1 > /dev/null 2>&1 || (logger -t ifcheck $IF1 DOWN, bouncing... && echo $(date) "$IF1 DOWN, bouncing..." >> $LOGFILE && /sbin/ifconfig $IF1 down && sleep 2 && /sbin/ifconfig $IF1 up && logger -t ifcheck $IF1 set to UP && echo $(date) "$IF1 set to UP" >> $LOGFILE)
    #ping -c 3 -t 1 $IP2 > /dev/null 2>&1 || (logger -t ifcheck $IF2 DOWN, bouncing... && echo $(date) "$IF2 DOWN, bouncing..." >> $LOGFILE && /sbin/ifconfig $IF2 down && sleep 2 && /sbin/ifconfig $IF2 up && logger -t ifcheck $IF2 set to UP && echo $(date) "$IF2 set to UP" >> $LOGFILE)
    
    


  • This version only logs to the Syslog and no log file:

    #!/bin/sh
    
    #Set primary interface/ip to check here
    IF1=sk0
    IP1=192.168.0.10
    
    #add more interfaces/ips here
    #IF2=rl0
    #IP2=192.168.1.254
    
    #uncomment next line for debugging
    #logger -t ifcheck Pinging interfaces...
    
    ping -c 3 -t 1 $IP1 > /dev/null 2>&1 || (logger -t ifcheck $IF1 DOWN, bouncing... && /sbin/ifconfig $IF1 down && sleep 2 && /sbin/ifconfig $IF1 up && logger -t ifcheck $IF1 set to UP)
    #ping -c 3 -t 1 $IP2 > /dev/null 2>&1 || (logger -t ifcheck $IF2 DOWN, bouncing... && /sbin/ifconfig $IF2 down && sleep 2 && /sbin/ifconfig $IF2 up && logger -t ifcheck $IF2 set to UP)
    
    

    Hope that helps someone,

    Regards,

    Vent



  • I found this post as I have the exact same problem. The posted solution works for my needs.

    Thanks Ventolin!!! ;D ;D


Locked