Wireless slowdown after a few days on latest releases.
-
I have many WRAP.2C units running various versions of pfSense with olsr on an active production mesh.
It seems that pfSense units flash with higher than "RELENG_1_SNAPSHOT-07-09-2006-embedded", after being online for a few days to a week, experience wireless slowdown/slugishness and high/dropped pings until rebooted; after which they work fast and smooth again for a few days. The units running snap7 never need rebooting, however they sometimes lock up when I bring an RC2 node online nearby which has olsr active on multiple interfaces (2 wireless + ethernet).
I wish I had better information about this. Are there any commands I could type to help diagnose when they get sluggish?
Thanks, -Pete -
Run top and see what it looks like from a shell.
-
Just rebooted, but I'll try that when they start slowing again.
I did watch the gui System page on a "slowed" unit and cpu was ~30%. Normally, cpu is ~2%.
So cpu wasn't maxed, but something was taking more than it should. Or, maybe a memory leak?
-pc -
Good question. Top should give us a pretty good indication.
-
RC2 rebooted again 3 times within 20mins this morning; but, now it's been up for a few hours.
RELENG_1_SNAPSHOT-07-09-2006-embedded has been up for week+.I got this in the logs of RC2..
kernel: pid 277 (fsck_ufs), uid 0: exited on signal 8 (core dumped)
But, I also get the above message on the units that work great.
Does this indicate a need re-flash?
I also get this every now and then…
kernel: ath1: device timeout
Even though the wireless keeps functioning.Is there a shell command to monitor and always display the last few system log entries?
Thanks, -Pete
-
RC2 rebooted again 3 times within 20mins this morning; but, now it's been up for a few hours.
RELENG_1_SNAPSHOT-07-09-2006-embedded has been up for week+.There is no difference FreeBSD wise…. Nothing has changed except pfSense code which would not be causing this.
I got this in the logs of RC2..
kernel: pid 277 (fsck_ufs), uid 0: exited on signal 8 (core dumped)
But, I also get the above message on the units that work great.
Does this indicate a need re-flash?
I also get this every now and then…
kernel: ath1: device timeout
Even though the wireless keeps functioning.Is there a shell command to monitor and always display the last few system log entries?
No, enable remote syslogging to a workstation. Use Kiwi's syslog daemon or something else to assist.
It almost sounds like a bad power supply to me, but it's hard to say, really.
-
Or maybe heat? are the units placed somwehere in the direct sunlight and/or it's more hot as it is summer?
-
Thanks for suggestions to check.
It just started slowing again after running fine for ~5-6hrs, and I got this odd kernel message in the log (fyi, time is off)…Aug 10 03:14:35 dnsmasq[1037]: DHCPACK(ath1) 10.130.1.101 00:e0:98:e3:4f:e1 D7YMYK11
Aug 10 03:15:26 kernel: rix 255 (0) bad ratekbps 0 mode 3rix 255 (0) bad ratekbps 0 mode 3rix 255 (0) bad ratekbps 0 mode 3rix 255 (0) bad ratekbps 0 mode 3rix 255 (0) bad ratekbps 0 mode 3
Aug 10 03:15:41 kernel: rix 255 (0) bad ratekbps 0 mode 3rix 255 (0) bad ratekbps 0 mode 3rix 255 (0) bad ratekbps 0 mode 3rix 255 (0) bad ratekbps 0 mode 3rix 255 (0) bad ratekbps 0 mode 3
Aug 10 03:15:57 kernel:
Aug 10 03:15:57 kernel: arplookup 10.130.4.1 failed: host is not on local networkI wasn't getting these kernel messages earlier when it was working fast.
It doesn't appear that anything has changed in "top" from when it was fast; sometimes "system" goes up to 40% but not for long…
last pid: 28510; load averages: 0.21, 0.19, 0.17 up 0+05:15:26 03:22:04
32 processes: 1 running, 31 sleeping
CPU states: 2.7% user, 0.0% nice, 3.9% system, 3.9% interrupt, 89.5% idle
Mem: 11M Active, 12M Inact, 17M Wired, 13M Buf, 76M Free
Swap:PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
1038 root 1 8 0 1956K 1660K nanslp 6:56 0.98% olsrd
993 root 1 8 20 1680K 1188K wait 0:15 0.00% sh
7659 root 1 96 0 5592K 2612K select 0:13 0.00% sshd
1097 root 1 8 20 1180K 564K nanslp 0:08 0.00% check_reload_status
828 root 1 4 0 3252K 2364K kqread 0:06 0.00% lighttpd
1037 nobody 1 96 0 1332K 1016K select 0:04 0.00% dnsmasq
27650 root 1 96 0 2276K 1500K RUN 0:04 0.00% top
1017 root 1 8 -88 1328K 796K nanslp 0:04 0.00% watchdogd
833 root 1 8 0 8944K 4552K wait 0:02 0.00% php
498 root 1 -58 0 3664K 1536K bpf 0:01 0.00% tcpdump
248 root 1 96 0 1352K 984K select 0:01 0.00% syslogd
946 proxy 1 4 0 656K 412K kqread 0:00 0.00% pftpx
1019 root 1 8 0 1304K 984K nanslp 0:00 0.00% cron
829 root 1 8 0 8944K 4552K wait 0:00 0.00% php
7799 root 1 20 0 3928K 2532K pause 0:00 0.00% tcsh
1103 root 1 8 0 1568K 1240K wait 0:00 0.00% login
7692 root 1 8 0 1640K 1132K wait 0:00 0.00% sh -
Any idea what rix is? Never heard of that..
-
It seems to run fast for ~5-8 hours, then gets sluggish for a few more hours, then reboots after that, like clockwork.
Here is, presumably, a last "top" before it rebooted; It appears ok…last pid: 58198; load averages: 0.08, 0.07, 0.07 up 0+14:26:31 17:55:56
32 processes: 1 running, 31 sleeping
CPU states: 1.9% user, 0.4% nice, 2.3% system, 3.1% interrupt, 92.3% idle
Mem: 11M Active, 10M Inact, 16M Wired, 12M Buf, 79M Free
Swap:PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
1038 root 1 8 0 2544K 2172K nanslp 19:25 1.12% olsrd
1447 root 1 96 0 2332K 1556K RUN 5:19 0.00% top
993 root 1 8 20 1740K 1248K wait 0:43 0.00% sh
1426 root 1 96 0 5592K 2608K select 0:42 0.00% sshd
1068 root 1 8 20 1180K 564K nanslp 0:21 0.00% check_reload_st
1017 root 1 8 -88 1328K 796K nanslp 0:10 0.00% watchdogd
1036 nobody 1 96 0 1332K 1016K select 0:06 0.00% dnsmasq
828 root 1 4 0 3104K 2196K kqread 0:06 0.00% lighttpd
498 root 1 -58 0 3664K 1536K bpf 0:04 0.00% tcpdump
946 proxy 1 4 0 656K 412K kqread 0:01 0.00% pftpx
248 root 1 96 0 1352K 984K select 0:01 0.00% syslogd
1019 root 1 8 0 1304K 984K nanslp 0:01 0.00% cron
833 root 1 8 0 8944K 4552K wait 0:00 0.00% php
829 root 1 8 0 8944K 4564K wait 0:00 0.00% php
1446 root 1 20 0 3924K 2524K pause 0:00 0.00% tcsh
1075 root 1 8 0 1568K 1236K wait 0:00 0.00% login
1430 root 1 8 0 1640K 1132K wait 0:00 0.00% shWhat is the path+name of the system log file? I could leave an ssh window open with "tail -f" on it.
There is no difference FreeBSD wise.... Nothing has changed except pfSense code which would not be causing this.
Ok. I the only other difference I could think of is this unit has two Atheros based cards in it, all other production units have just one. I've tested both cards individually, and they do both function for days. However, you'll find this second card is likely unprecedented...
http://www.ubnt.com/super_range9.php4Or maybe heat.
It could be these two mini-pci wireless cards combined are generating too much heat.
One is 400mW 2.4ghz, and the other is 700mW 900mhz card. Although, after rebooting
it does run smooth and fast again for ~5-8 more hours.Thank you for the help, -Pete
-
It could be these two mini-pci wireless cards combined are generating too much heat.
One is 400mW 2.4ghz, and the other is 700mW 900mhz card. Although, after rebooting
it does run smooth and fast again for ~5-8 more hours.Can the WRAP handle the power requirements for those cards?
-
That was my other thought, are the power supplies up to snuff'.
-
Can the WRAP handle the power requirements for those cards.
are the power supplies up to snuff.If it's power, would turning down the wireless tx Power in the gui help?
Here are the PoE specs:
Switching Power Supply, 18v, 0.84a, 15w
http://www.netgate.com/product_info.php?cPath=24_55&products_id=258Also,
The 400mW 2.4GHz mini-pci card…
http://www.netgate.com/product_info.php?cPath=26_34&products_id=279The 700mW 900MHz mini-pci card...
http://www.ubnt.com/super_range9.php4The WRAP.2C
http://www.netgate.com/product_info.php?products_id=241[disclaimer: links provided for information; items were not necessarily purchased there]
[update]
I disabled Opt1 (the 900mhz card) interface, and it's been running smooth for 24hrs.
Next, I'll try a the same setup on different hardware, then I may resort 1 card per wrap. -
uhm ,normally the cards use more power on lower speeds, and turning down power might mean lower speed.
So basically you could be shooting yourself by doing that.SR-9 spec:
Current Consumption @3.3V
Transmit
1-24 Mbps 1000mA, +/-100mA
36 Mbps 900mA, +/-100mA
48 Mbps 850mA, +/-100mA
54 Mbps 800mA, +/-100mAReceive 400mA, +/-50mA
This is similar for most wireless cards, however the SR-9 consumes a lot of power.
-
total output of wrap with two wireless cards is about 22w
8watts real power from one atheros card
so if your usinf 400mw cards then you will be morealso make sure your bios is up to daye this can cause issues with minipci bus and atheroes cards
-
Even today, when AR5004x begins to be EOL, I still think cards like CM9 are the way to go. It's a shame that Atheros stop production of this wonderfull chipsets in flavour of the cheap AR5006x. I know i'll miss CM9 a lot…