Wireless slowdown after a few days on latest releases.



  • I have many WRAP.2C units running various versions of pfSense with olsr on an active production mesh.
    It seems that pfSense units flash with higher than "RELENG_1_SNAPSHOT-07-09-2006-embedded", after being online for a few days to a week, experience wireless slowdown/slugishness and high/dropped pings until rebooted; after which they work fast and smooth again for a few days.  The units running snap7 never need rebooting, however they sometimes lock up when I bring an RC2 node online nearby which has olsr active on multiple interfaces (2 wireless + ethernet).
    I wish I had better information about this.  Are there any commands I could type to help diagnose when they get sluggish?
    Thanks, -Pete



  • Run top and see what it looks like from a shell.



  • Just rebooted, but I'll try that when they start slowing again.
    I did watch the gui System page on a "slowed" unit and cpu was ~30%.  Normally, cpu is ~2%.
    So cpu wasn't maxed, but something was taking more than it should.  Or, maybe a memory leak?
    -pc



  • Good question.  Top should give us a pretty good indication.



  • RC2 rebooted again 3 times within 20mins this morning; but, now it's been up for a few hours.
    RELENG_1_SNAPSHOT-07-09-2006-embedded has been up for week+.

    I got this in the logs of RC2..
    kernel: pid 277 (fsck_ufs), uid 0: exited on signal 8 (core dumped)
    But, I also get the above message on the units that work great.
    Does this indicate a need re-flash?
    I also get this every now and then…
    kernel: ath1: device timeout
    Even though the wireless keeps functioning.

    Is there a shell command to monitor and always display the last few system log entries?

    Thanks, -Pete



  • @pcatiprodotnet:

    RC2 rebooted again 3 times within 20mins this morning; but, now it's been up for a few hours.
    RELENG_1_SNAPSHOT-07-09-2006-embedded has been up for week+.

    There is no difference FreeBSD wise….  Nothing has changed except pfSense code which would not be causing this.

    @pcatiprodotnet:

    I got this in the logs of RC2..
    kernel: pid 277 (fsck_ufs), uid 0: exited on signal 8 (core dumped)
    But, I also get the above message on the units that work great.
    Does this indicate a need re-flash?
    I also get this every now and then…
    kernel: ath1: device timeout
    Even though the wireless keeps functioning.

    Is there a shell command to monitor and always display the last few system log entries?

    No, enable remote syslogging to a workstation.  Use Kiwi's syslog daemon or something else to assist.

    It almost sounds like a bad power supply to me, but it's hard to say, really.



  • Or maybe heat? are the units placed somwehere in the direct sunlight and/or it's more hot as it is summer?



  • Thanks for suggestions to check.
    It just started slowing again after running fine for ~5-6hrs, and I got this odd kernel message in the log (fyi, time is off)…

    Aug 10 03:14:35 dnsmasq[1037]: DHCPACK(ath1) 10.130.1.101 00:e0:98:e3:4f:e1 D7YMYK11
    Aug 10 03:15:26 kernel: rix 255 (0) bad ratekbps 0 mode 3rix 255 (0) bad ratekbps 0 mode 3rix 255 (0) bad ratekbps 0 mode 3rix 255 (0) bad ratekbps 0 mode 3rix 255 (0) bad ratekbps 0 mode 3
    Aug 10 03:15:41 kernel: rix 255 (0) bad ratekbps 0 mode 3rix 255 (0) bad ratekbps 0 mode 3rix 255 (0) bad ratekbps 0 mode 3rix 255 (0) bad ratekbps 0 mode 3rix 255 (0) bad ratekbps 0 mode 3
    Aug 10 03:15:57 kernel:
    Aug 10 03:15:57 kernel: arplookup 10.130.4.1 failed: host is not on local network

    I wasn't getting these kernel messages earlier when it was working fast.

    It doesn't appear that anything has changed in "top" from when it was fast; sometimes "system" goes up to 40% but not for long…

    last pid: 28510;  load averages:  0.21,  0.19,  0.17        up 0+05:15:26  03:22:04
    32 processes:  1 running, 31 sleeping
    CPU states:  2.7% user,  0.0% nice,  3.9% system,  3.9% interrupt, 89.5% idle
    Mem: 11M Active, 12M Inact, 17M Wired, 13M Buf, 76M Free
    Swap:

    PID USERNAME  THR PRI NICE  SIZE    RES STATE    TIME  WCPU COMMAND
    1038 root        1  8    0  1956K  1660K nanslp  6:56  0.98% olsrd
      993 root        1  8  20  1680K  1188K wait    0:15  0.00% sh
    7659 root        1  96    0  5592K  2612K select  0:13  0.00% sshd
    1097 root        1  8  20  1180K  564K nanslp  0:08  0.00% check_reload_status
      828 root        1  4    0  3252K  2364K kqread  0:06  0.00% lighttpd
    1037 nobody      1  96    0  1332K  1016K select  0:04  0.00% dnsmasq
    27650 root        1  96    0  2276K  1500K RUN      0:04  0.00% top
    1017 root        1  8  -88  1328K  796K nanslp  0:04  0.00% watchdogd
      833 root        1  8    0  8944K  4552K wait    0:02  0.00% php
      498 root        1 -58    0  3664K  1536K bpf      0:01  0.00% tcpdump
      248 root        1  96    0  1352K  984K select  0:01  0.00% syslogd
      946 proxy      1  4    0  656K  412K kqread  0:00  0.00% pftpx
    1019 root        1  8    0  1304K  984K nanslp  0:00  0.00% cron
      829 root        1  8    0  8944K  4552K wait    0:00  0.00% php
    7799 root        1  20    0  3928K  2532K pause    0:00  0.00% tcsh
    1103 root        1  8    0  1568K  1240K wait    0:00  0.00% login
    7692 root        1  8    0  1640K  1132K wait    0:00  0.00% sh



  • Any idea what rix is?  Never heard of that..



  • It seems to run fast for ~5-8 hours, then gets sluggish for a few more hours, then reboots after that, like clockwork.
    Here is, presumably, a last "top" before it rebooted; It appears ok…

    last pid: 58198;  load averages:  0.08,  0.07,  0.07    up 0+14:26:31  17:55:56
    32 processes:  1 running, 31 sleeping
    CPU states:  1.9% user,  0.4% nice,  2.3% system,  3.1% interrupt, 92.3% idle
    Mem: 11M Active, 10M Inact, 16M Wired, 12M Buf, 79M Free
    Swap:

    PID USERNAME  THR PRI NICE  SIZE    RES STATE    TIME  WCPU COMMAND
    1038 root        1  8    0  2544K  2172K nanslp  19:25  1.12% olsrd
    1447 root        1  96    0  2332K  1556K RUN      5:19  0.00% top
      993 root        1  8  20  1740K  1248K wait    0:43  0.00% sh
    1426 root        1  96    0  5592K  2608K select  0:42  0.00% sshd
    1068 root        1  8  20  1180K  564K nanslp  0:21  0.00% check_reload_st
    1017 root        1  8  -88  1328K  796K nanslp  0:10  0.00% watchdogd
    1036 nobody      1  96    0  1332K  1016K select  0:06  0.00% dnsmasq
      828 root        1  4    0  3104K  2196K kqread  0:06  0.00% lighttpd
      498 root        1 -58    0  3664K  1536K bpf      0:04  0.00% tcpdump
      946 proxy      1  4    0  656K  412K kqread  0:01  0.00% pftpx
      248 root        1  96    0  1352K  984K select  0:01  0.00% syslogd
    1019 root        1  8    0  1304K  984K nanslp  0:01  0.00% cron
      833 root        1  8    0  8944K  4552K wait    0:00  0.00% php
      829 root        1  8    0  8944K  4564K wait    0:00  0.00% php
    1446 root        1  20    0  3924K  2524K pause    0:00  0.00% tcsh
    1075 root        1  8    0  1568K  1236K wait    0:00  0.00% login
    1430 root        1  8    0  1640K  1132K wait    0:00  0.00% sh

    What is the path+name of the system log file?  I could leave an ssh window open with "tail -f" on it.

    There is no difference FreeBSD wise....  Nothing has changed except pfSense code which would not be causing this.

    Ok.  I the only other difference I could think of is this unit has two Atheros based cards in it, all other production units have just one.  I've tested both cards individually, and they do both function for days.  However, you'll find this second card is likely unprecedented...
    http://www.ubnt.com/super_range9.php4

    Or maybe heat.

    It could be these two mini-pci wireless cards combined are generating too much heat.
    One is 400mW 2.4ghz, and the other is 700mW 900mhz card.  Although, after rebooting
    it does run smooth and fast again for ~5-8 more hours.

    Thank you for the help, -Pete



  • @pcatiprodotnet:

    It could be these two mini-pci wireless cards combined are generating too much heat.
    One is 400mW 2.4ghz, and the other is 700mW 900mhz card.  Although, after rebooting
    it does run smooth and fast again for ~5-8 more hours.

    Can the WRAP handle the power requirements for those cards?



  • That was my other thought, are the power supplies up to snuff'.



  • Can the WRAP handle the power requirements for those cards.
    are the power supplies up to snuff.

    If it's power, would turning down the wireless tx Power in the gui help?

    Here are the PoE specs:

    Switching Power Supply, 18v, 0.84a, 15w
    http://www.netgate.com/product_info.php?cPath=24_55&products_id=258

    Also,

    The 400mW 2.4GHz mini-pci card…
    http://www.netgate.com/product_info.php?cPath=26_34&products_id=279

    The 700mW 900MHz mini-pci card...
    http://www.ubnt.com/super_range9.php4

    The WRAP.2C
    http://www.netgate.com/product_info.php?products_id=241

    [disclaimer: links provided for information; items were not necessarily purchased there]

    [update]
    I disabled Opt1 (the 900mhz card) interface, and it's been running smooth for 24hrs.
    Next, I'll try a the same setup on different hardware, then I may resort 1 card per wrap.



  • uhm ,normally the cards use more power on lower speeds, and turning down power might mean lower speed.
    So basically you could be shooting yourself by doing that.

    SR-9 spec:

    Current Consumption @3.3V

    Transmit     
    1-24 Mbps    1000mA, +/-100mA
    36 Mbps        900mA, +/-100mA
    48 Mbps        850mA, +/-100mA
    54 Mbps        800mA, +/-100mA

    Receive  400mA, +/-50mA

    This is similar for most wireless cards, however the SR-9 consumes a lot of power.



  • total output of wrap with two wireless cards is about 22w
    8watts real power from one atheros card
    so if your usinf 400mw cards then you will be more

    also make sure your bios is up to daye this can cause issues with minipci bus and atheroes cards



  • Even today, when AR5004x begins to be EOL, I still think cards like CM9 are the way to go. It's a shame that Atheros stop production of this wonderfull chipsets in flavour of the cheap AR5006x. I know i'll miss CM9 a lot…


Locked