Dual Atheros="hardware failure" & "device timeout" errors.
-
I've recently been testing WRAP.2C (v1.11) units running pfSense RC3e-embedded with TWO mini-pci wireless cards installed (random mix of CM9 & EMP-8602): I set ath0 on wan to 2.4ghz, and ath1 on opt1 to 5.8ghz 802.11a ad-hoc. I noticed that when I boot up in the presence of other nearby similarly configured units, I sometimes/rarely get errors on ath1 like this in the logs: "hardware failure on ath1; resetting" which repeats every so often; I've also seen it spam the serial console. Note, this doesn't happen every time I boot, just sometimes. And, If I don't see this message in the logs after booting, then the units function normally until the next reboot/power-cycle, when I roll the dice again. All other gui settings on opt1/ath1 are the defaults.
A second problem occurs when two atheros cards are installed; on ath0 (set to wan, 2.4ghz, ad-hoc, odfm=cts/rts, others=defaults) I get this error in the logs: "kernel: ath0: device timeout" which may coincide with connections dropping (or lag ~3000ms ping) for between 12-40sec. This happens on Every unit to interface ath0 (wan/2.4ghz) only; and, it occurs approximately once every few minutes to several hours apart. I tested one unit with odfm=off and got significantly fewer of this error (virtually zero), but link quality to distant nodes was much worse on average.
However, when I test with only One atheros card using the exact same gui configuration (of either interface), I do not encounter these errors.
Thanks, -pc
-
I've seen this with older pfSense builds as well. I could never pin it down to any particular cause, but when it happens, all wireless traffic stops until the interface resets…which can sometimes take hours if left to itself.
-
Is there a good way to get around this error without making too drastic a change?
-
just updated the first post with an second, perhaps unrelated, atheros device/driver error.
-
the only time i have see issues with atheros card is to do with the bios version not being updated to
1.11 on wrap not sure what if it would be on other hardware but this is my experiance with it -
I found that problem while running in the ad-hoc mode, switched it to access point and it disappeared.
-
I verified all wrap units have v1.11 bios.
I could try changing ath1/5.x from ad-hoc to ap-client mode,
but ath0/2.4x must remain ad-hoc because it's part of a mesh (olsr) network.Since this appears to be an Atheros driver issue, perhaps
I should post this problem to the FreeBSD Mobile List?
( http://lists.freebsd.org/pipermail/freebsd-mobile/ )
Although, I'm not certain I'm capable of conviencing
them it's driver issue and not a pfSense or power issue.
So, I'm going to wait before posting there and test
again with a more powerful wrap power supply, see:
http://forum.pfsense.org/index.php/topic,2647.msg15534.html#msg15534Thanks, -pc
-
I verified all wrap units have v1.11 bios.
I could try changing ath1/5.x from ad-hoc to ap-client mode,
but ath0/2.4x must remain ad-hoc because it's part of a mesh (olsr) network.Since this appears to be an Atheros driver issue, perhaps
I should post this problem to the FreeBSD Mobile List?
( http://lists.freebsd.org/pipermail/freebsd-mobile/ )
Although, I'm not certain I'm capable of conviencing
them it's driver issue and not a pfSense issue.Thanks, -pc
Yes that sounds good. I don't know what there is to convince … We are running 6.1 with no wireless modifications.
-
I was able to virtually eliminate this problem with the following quick-n-dirty shell scripts.
It's assumed that you stop+start olsr on bootup via /usr/local/etc/rc.d/olsrd.sh
These scripts will periodically reset the wireless interfaces, plus restart olsr every three days because it has a slow memory leak if using olsr's MID functionality.#!/bin/sh
Save file as: /usr/local/etc/rc.d/batch.sh
Required: chmod 555 /usr/local/etc/rc.d/batch.sh
/root/ath0.sh &
/root/ath1olsr.sh &#!/bin/sh
Save file as: /root/ath0.sh
Required: chmod 555 /root/ath0.sh
ps -x >/tmp/batchath0.txt
if grep -qi "sleep 3600" /tmp/batchath0.txt
then echo "found myself already batched."
else
sleep 3600
fi
ps -x >/tmp/batchath0c.txt
if grep -qi "sleep 3600" /tmp/batchath0c.txt
then echo "found myself already batched."
elsethis forces an interface reset.
ifconfig ath0 burst
#ifconfig ath0 down
#ifconfig ath0 up
sleep 2
/root/ath0.sh &
fiif olsr crashed/halted then restart it.
ps -x >/tmp/batchath0b.txt
if grep -qi "olsrd" /tmp/batchath0b.txt
then echo "found olsr doesn't need restart."
else
olsrd -f /var/etc/olsrd.conf &
fi#!/bin/sh
Save file as: /root/ath1olsr.sh
Required: chmod 555 /root/ath1olsr.sh
Required: chmod 555 /usr/local/etc/rc.d/olsrd.sh
ps -x >/tmp/batchath1.txt
if grep -qi "sleep 300000" /tmp/batchath1.txt
then echo "found myself already batched."
else
sleep 300000
fi
ps -x >/tmp/batchath1.txt
if grep -qi "sleep 300000" /tmp/batchath1.txt
then echo "found myself already batched."
elsethis forces an interface reset on backbone (5.x/etc).
#ifconfig ath1 burst
#ifconfig ath1 down
#ifconfig ath1 up
#sleep 2this restarts olsr because it has a slow memory leak. Or, reboot.
/usr/local/etc/rc.d/olsrd.sh
/root/ath1olsr.sh &
#reboot
fi