Dual Atheros="hardware failure" & "device timeout" errors.



  • I've recently been testing WRAP.2C (v1.11) units running pfSense RC3e-embedded with TWO mini-pci wireless cards installed (random mix of CM9 & EMP-8602): I set ath0 on wan to 2.4ghz, and ath1 on opt1 to 5.8ghz 802.11a ad-hoc.  I noticed that when I boot up in the presence of other nearby similarly configured units, I sometimes/rarely get errors on ath1 like this in the logs: "hardware failure on ath1; resetting" which repeats every so often; I've also seen it spam the serial console.  Note, this doesn't happen every time I boot, just sometimes.  And, If I don't see this message in the logs after booting, then the units function normally until the next reboot/power-cycle, when I roll the dice again.  All other gui settings on opt1/ath1 are the defaults.

    A second problem occurs when two atheros cards are installed; on ath0 (set to wan, 2.4ghz, ad-hoc, odfm=cts/rts, others=defaults) I get this error in the logs: "kernel: ath0: device timeout" which may coincide with connections dropping (or lag ~3000ms ping) for between 12-40sec.  This happens on Every unit to interface ath0 (wan/2.4ghz) only; and, it occurs approximately once every few minutes to several hours apart.  I tested one unit with odfm=off and got significantly fewer of this error (virtually zero), but link quality to distant nodes was much worse on average.

    However, when I test with only One atheros card using the exact same gui configuration (of either interface), I do not encounter these errors.

    Thanks, -pc



  • I've seen this with older pfSense builds as well.  I could never pin it down to any particular cause, but when it happens, all wireless traffic stops until the interface resets…which can sometimes take hours if left to itself.



  • Is there a good way to get around this error without making too drastic a change?



  • just updated the first post with an second, perhaps unrelated, atheros device/driver error.



  • the only time i have see issues with atheros card is to do with the bios version not being updated to
    1.11 on wrap not sure what if it would be on other hardware but this is my experiance with it



  • I found that problem while running in the ad-hoc mode, switched it to access point and it disappeared.



  • I verified all wrap units have v1.11 bios.

    I could try changing ath1/5.x from ad-hoc to ap-client mode,
    but ath0/2.4x must remain ad-hoc because it's part of a mesh (olsr) network.

    Since this appears to be an Atheros driver issue, perhaps
    I should post this problem to the FreeBSD Mobile List?
    ( http://lists.freebsd.org/pipermail/freebsd-mobile/ )
    Although, I'm not certain I'm capable of conviencing
    them it's driver issue and not a pfSense or power issue.
    So, I'm going to wait before posting there and test
    again with a more powerful wrap power supply, see:
    http://forum.pfsense.org/index.php/topic,2647.msg15534.html#msg15534

    Thanks, -pc



  • @pcatiprodotnet:

    I verified all wrap units have v1.11 bios.

    I could try changing ath1/5.x from ad-hoc to ap-client mode,
    but ath0/2.4x must remain ad-hoc because it's part of a mesh (olsr) network.

    Since this appears to be an Atheros driver issue, perhaps
    I should post this problem to the FreeBSD Mobile List?
    ( http://lists.freebsd.org/pipermail/freebsd-mobile/ )
    Although, I'm not certain I'm capable of conviencing
    them it's driver issue and not a pfSense issue.

    Thanks, -pc

    Yes that sounds good.  I don't know what there is to convince … We are running 6.1 with no wireless modifications.



  • I was able to virtually eliminate this problem with the following quick-n-dirty shell scripts.
    It's assumed that you stop+start olsr on bootup via /usr/local/etc/rc.d/olsrd.sh
    These scripts will periodically reset the wireless interfaces, plus restart olsr every three days because it has a slow memory leak if using olsr's MID functionality.

    #!/bin/sh

    Save file as: /usr/local/etc/rc.d/batch.sh

    Required: chmod 555 /usr/local/etc/rc.d/batch.sh

    /root/ath0.sh &
    /root/ath1olsr.sh &

    #!/bin/sh

    Save file as: /root/ath0.sh

    Required: chmod 555 /root/ath0.sh

    ps -x >/tmp/batchath0.txt
    if grep -qi "sleep 3600" /tmp/batchath0.txt
    then echo "found myself already batched."
    else
    sleep 3600
    fi
    ps -x >/tmp/batchath0c.txt
    if grep -qi "sleep 3600" /tmp/batchath0c.txt
    then echo "found myself already batched."
    else

    this forces an interface reset.

    ifconfig ath0 burst
    #ifconfig ath0 down
    #ifconfig ath0 up
    sleep 2
    /root/ath0.sh &
    fi

    if olsr crashed/halted then restart it.

    ps -x >/tmp/batchath0b.txt
    if grep -qi "olsrd" /tmp/batchath0b.txt
    then echo "found olsr doesn't need restart."
    else
    olsrd -f /var/etc/olsrd.conf &
    fi

    #!/bin/sh

    Save file as: /root/ath1olsr.sh

    Required: chmod 555  /root/ath1olsr.sh

    Required: chmod 555  /usr/local/etc/rc.d/olsrd.sh

    ps -x >/tmp/batchath1.txt
    if grep -qi "sleep 300000" /tmp/batchath1.txt
    then echo "found myself already batched."
    else
    sleep 300000
    fi
    ps -x >/tmp/batchath1.txt
    if grep -qi "sleep 300000" /tmp/batchath1.txt
    then echo "found myself already batched."
    else

    this forces an interface reset on backbone (5.x/etc).

    #ifconfig ath1 burst
    #ifconfig ath1 down
    #ifconfig ath1 up
    #sleep 2

    this restarts olsr because it has a slow memory leak. Or, reboot.

    /usr/local/etc/rc.d/olsrd.sh
    /root/ath1olsr.sh &
    #reboot
    fi


Log in to reply