The ongoing atheros pain

  • Here's what's been going on: I have a tplink atheros card bridged to LAN (em0).  It all works fine.  I have periodic issues where one or more hosts on the WLAN become inaccessible.  I currently have 3 WLAN clients: wireless bridge to my home office, wireless USB adapter on my tivo in the living room, and wireless bridge at the neighbor's house.  I have had this up and running for 2 months or so.  I was running the 7.1 drive that came with the older RC and it was fine.  When I updated to a later snapshot (and got the 7.2 driver), I started seeing the wigginess.  I have had 5 incidents where the neighbor's WLAN went off the air.  I bought a ralink card to replace the atheros based card, so of course, for the last 3 weeks all has been fine :)  Last night the neighbor went off the air again.  I am pretty sure it is not his unit, since the tivo also did.  Oddly, the linksys wireless bridge upstairs stayed visible.  Taking the adapter down with ifconfig down and back up does NOT fix the problem (which surprised me - and makes me wonder if it is not really a wifi issue?)  Rebooting is the only reliable fix I have at this point (there may be others, but without knowing the root cause, I can't speculate further.)  Unfortunately, I was not home, so I had the neighbor go in and hit reset on the pfsense box, otherwise I could have checked if the two down clients were still associated.  I am loath to replace the wifi card if I am not 100% sure it is wireless-related.  Any thoughts?

  • I also have a TP-Link atheros card bridged to LAN. Three Windows laptops and a Linux netbook use the wireless interface. My perception is that the netbook wireless connection drops out (generally) at least a couple of times a day and stays out. A reset of the netbook wireless connection generally fixes it. The windows laptops seem to keep running and running EXCEPT for one laptop which is often involved in an online game. Its user has been coming to me asking if the network is down. When I have investigated its always been because pfSense is rebooting.

    Its my belief that generally the wireless problems on my network are netbook specific.

    I have observed in the past that my pfsense system reboots more frequently in hot weather. It gets quite hot in summer because its next to a western window which gets direct sun in the summer afternoons.

    I'd say my atheros wireless link is very reliable.

  • Hmmm, thanks for the input.  It doesn't sound like we have the same issue, then :(  I used to have a buffalo-tech AP and the neighbor's client bridge, my office bridge and the tivo never had a single issue.  I took down the buffalo (re-used elsewhere) and put the tp-link into the pfsense, and that is when the issues started.  I have had it wedge 3 times the same day, and go 3 weeks without a glitch.  Every failure has involved the neighbor's bridge and 1-2 the tivo.  Never has my office bridge failed (a linksys wrt54g).  So, I think there probably IS some kind of dependency on the remote host, but what?  Ugh!

  • Nasty!

    Startup reports my wireless NIC as
    ath0: <atheros 5212="">mem 0xee000000-0xee00ffff irq 12 at device 8.0 on pci0
    ath0: [ITHREAD]
    ath0: WARNING: using obsoleted if_watchdog interface
    ath0: Ethernet address: 00:19:e0:68:31:4b
    ath0: mac 7.9 phy 4.5 radio 5.6

    Is your similar? (Maybe the last line is significant as some sort of "version"?)</atheros>

  • No difference here :(

    ath0: <atheros 5212="">mem 0xfeae0000-0xfeaeffff irq 21 at device 10.0 on pci2
    ath0: [ITHREAD]
    ath0: WARNING: using obsoleted if_watchdog interface
    ath0: Ethernet address: 00:25:86:d3:85:ce
    ath0: mac 7.9 phy 4.5 radio 5.6</atheros>

  • Rebel Alliance Developer Netgate

    You might compare the output of "sysctl dev.ath" on both your systems, there may be differences there.

    Also if you're brave, test out a snapshot of pfSense 2.0-BETA1, the driver support for wireless cards is supposed to be a bit better.

  • Here's mine:

    sysctl dev.ath

    dev.ath.0.%desc: Atheros 5212
    dev.ath.0.%driver: ath
    dev.ath.0.%location: slot=10 function=0
    dev.ath.0.%pnpinfo: vendor=0x168c device=0x0013 subvendor=0x168c subdevice=0x2051 class=0x020000
    dev.ath.0.%parent: pci2
    dev.ath.0.smoothing_rate: 95
    dev.ath.0.sample_rate: 10
    dev.ath.0.countrycode: 156
    dev.ath.0.regdomain: 32924
    dev.ath.0.slottime: 9
    dev.ath.0.acktimeout: 48
    dev.ath.0.ctstimeout: 48
    dev.ath.0.softled: 0
    dev.ath.0.ledpin: 0
    dev.ath.0.ledon: 0
    dev.ath.0.ledidle: 2700
    dev.ath.0.txantenna: 0
    dev.ath.0.rxantenna: 1
    dev.ath.0.diversity: 1
    dev.ath.0.txintrperiod: 5
    dev.ath.0.diag: 0
    dev.ath.0.tpscale: 0
    dev.ath.0.tpc: 0
    dev.ath.0.tpack: 63
    dev.ath.0.tpcts: 63
    dev.ath.0.fftxqmin: 2
    dev.ath.0.fftxqmax: 50
    dev.ath.0.monpass: 24

    as far as 2.0 goes, I'm happy it's now in Beta - unfortunately, while I am adventurous, my pfsense gateway is a production box (well, as much as a home office network can be production - LOL.)  I might try upgrading in a few weeks.  In theory, I should be able to go from 1.2.3 Release to 2.0?

  • Rebel Alliance Developer Netgate

    You should be able to upgrade from 1.2.3, yes. If you keep a backup and install media handy, a test shouldn't hurt much. If it breaks you can be back up on 1.2.3 in under 10 minutes in most cases (Thanks to PFI).

  • True, I forgot about PFI.  I may give this a couple of weeks and see how things are.

  • My output from # sysctl dev.ath is the same as posted by danswartz.

  • I have a new theory: this has nothing directly to do with wireless, but is somehow related to bridging.  Keep in mind in my configuration, I have not just wireless hosts on the other end, but other bridges (specifically the two APs that are in dd-wrt client-bridge mode.)  I am wondering if freebsd bridge code is getting upset at something and refusing to pass traffic any longer?  One of the wireless clients is an endpoint (the tivo) and that is going off the air too, but if the pfsense box's bridge is getting wedged, that might be enough to take all 3 down?  Unfortunately, my networking knowledge of bridging is not that great, so I am not sure how&where to proceed.  A data point: using ifconfig down&up on ath0 does not fix this.  Nor does doing so on bridge0.  However, hitting "save" on the ath0 config page DID bring back 2 of the 3 wireless clients.  I had to reboot to get the 3rd one back.  I wonder if this is some artifact of STP or something?

  • First hang in over 2 weeks.  As usual, the neighbor's AP went offline.  Interestingly, wireless status shows it still associated.  Power cycling their end did not help.  I tried a hunch and went to the bridged WLAN interface in pfsense and hit "SAVE", without changing anything.  Bingo, their AP came back online.  From my limited reading of the code, the php code in question deletes any existing bridge using that interface and creates a new one.  I know for sure taking the wifi interface down and up does not help, so I am more sure than ever that it is some odd bridging quirk.  Now that I know a less intrusive fix than rebooting the gateway, I don't care as much, since I am moving in 6 months or so, at which time I will have a spare AP to use.

  • This is interesting.  I have had the remote AP go offline 3 times now in 2 days (twice today.)  Resetting the ath0 interface (by hitting save on the wlan page) "fixes" it.  I was looking at athstats and saw this:

    3065     switched default/rx antenna

    Wow, that is a lot of switching, no?  googling for similar issues has showed articles with a handful of switches.  What I am wondering is: if it is switching to  a different connector than the one with the good antenna (for whatever reason), that might drop the signal strength, no?  It occurs to me that the other few clients are all in the same house, the problematic AP is next door at the neighbor's house, so would be more susceptible to rx/tx strength issues?  I tried setting dev.ath.0.diversity to zero with sysctl, but I noticed this:

    dev.ath.0.txantenna: 0
    dev.ath.0.rxantenna: 1

    the rx antenna seems to change occasionally even after I disabled the diversity.  Does anyone have any ideas on this?

    (update) more research indicates antenna number 0 means "both", so the right choice would be 1 or 2, it seems.  I've tried setting both to 1 and we'll see.  It would be nice to know what is going on here.  I am pretty close to pulling the card and going back to a separate access point :(

  • The (unresolved) (hopefully) final chapter: this is NOT an atheros issue.  The ralink card I got to replace it fared no better.  Within 10 minutes of booting the gateway, the remote AP/bridge went off the air - "fixed" the same way.  I finally gave up and got a $50 netgear router and am now bridge-free.  I just wish I knew what the issue was (for closure).

  • what card(s) were you attempting to bridge originally when the issue was persisting? are you bridging the net gear with anything? or is it AP mode? or is it attached to your network on a switch as an AP?

  • It was a tp-link atheros card and bridged with the ethernet lan card.  The current netgear is just acting as an AP and all is well.

Log in to reply