Trouble Shooting a PFSENSE box

nambi

I have a problem with my PFsense box and I need to trouble shoot it but I don't know what to look for.

HEre is the situation, When surfing the net i get very slow responses, when I search google it takes a very long time for results to come up, then when I click on the results it takes a long lag before the page loads, yet when I do a speed test the speed is good.

Sometime the system lags so much that in my win7 I am notified on the network icon that the computer has lost access to the internet, the problem came instantly about 2 weeks ago, first I through it was my ISP but I have now hooked up a backup Watchguard and I don't have this issue.

I pieced together the PFSENSE box with old parts so now I'm wondering is it an error on the HD or the memory the NIC etc.

Does anyone have any advice on what I should be looking for?

Thanks

tommyboy180

When you did a speed test what was your latency rate? You may have good download and upload speeds but latency is what really matters overall.

When you browse the web or download files does your pfsense box have an increased HDD read/write activity?
How old are we talking, above 800Mhz?

nambi

I will put the box back in service when people are logged out, but the box is a Celeron 2Ghz with 1 GB Ram maybe 512 I can check.

When it's back in service I will check the latency, I'm using speedtest.net

Where can I check the HDD/read write activity under system overview?

when idle but disconnected from the WAN I see

cpu about 3% idle
memory 24%
swap usage 0%
disk usage 54%

wallabybob

@nambi:

when idle but disconnected from the WAN I see

cpu about 3% idle

This suggests its busy doing something (but what?). From the shell command prompt (either on the console or ssh in)

# vmstat -i

will show interrupt counters (high interrupt rate can consume a lot of CPU) and

# top -S

will show which processes (-S to include "system" processes) are the main consumers of CPU time.

For comparison, my pfSense box runs on a 800MHz VIA C3 CPU and the graphs (from web GUI: status -> RRD Graphs, System tab) typically show under 10% CPU utilisation, with occasional shorts peaks to 40% to 50%.

Here is a sample from my system:

vmstat -i

interrupt total rate
irq0: clk 43678453 1000
irq1: atkbd0 134 0
irq4: sio0 1 0
irq8: rtc 5590373 127
irq10: rl0 ehci0 148172 3
irq11: vr0 uhci0 2551386 58
irq12: ath0 uhci1 853616 19
irq14: ata0 81868 1
irq15: ata1 106 0
Total 52904109 1211

top -S

last pid: 20608; load averages: 0.07, 0.05, 0.01 up 0+12:08:22 07:15:47
103 processes: 3 running, 75 sleeping, 10 zombie, 15 waiting
CPU: 0.0% user, 0.0% nice, 0.4% system, 1.6% interrupt, 98.1% idle
Mem: 27M Active, 14M Inact, 26M Wired, 80K Cache, 17M Buf, 157M Free
Swap: 260M Total, 260M Free

PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
11 root 1 171 ki31 0K 8K RUN 702:42 100.00% idle: cpu0
20604 root 1 44 0 3524K 1768K RUN 0:00 0.20% top
12 root 1 -32 - 0K 8K WAIT 1:23 0.00% swi4: clock sio
28 root 1 -68 - 0K 8K WAIT 1:09 0.00% irq11: vr0 uhci0
14 root 1 -44 - 0K 8K WAIT 0:41 0.00% swi1: net
1284 root 1 8 20 3492K 1432K wait 0:31 0.00% sh
26 root 1 -68 - 0K 8K - 0:30 0.00% ath0 taskq
25 root 1 -68 - 0K 8K WAIT 0:19 0.00% irq12: ath0 uhci1
27 root 1 -68 - 0K 8K WAIT 0:15 0.00% irq10: rl0 ehci0
31 root 1 8 - 0K 8K usbtsk 0:11 0.00% usbtask-dr
15 root 1 44 - 0K 8K RUN 0:09 0.00% yarrow
37 root 1 0 - 0K 8K tzpoll 0:08 0.00% acpi_thermal
4 root 1 -8 - 0K 8K - 0:06 0.00% g_up
49 root 1 20 - 0K 8K syncer 0:06 0.00% syncer
5 root 1 -8 - 0K 8K - 0:05 0.00% g_down
21 root 1 -24 - 0K 8K WAIT 0:03 0.00% swi6: Giant taskq
3 root 1 -8 - 0K 8K - 0:03 0.00% g_event
1486 root 1 -58 0 5264K 2420K bpf 0:03 0.00% bandwidthd
854 root 1 4 0 44808K 18880K accept 0:03 0.00% php
1488 root 1 -58 0 5264K 2320K bpf 0:03 0.00% bandwidthd
1449 root 1 -58 0 5264K 2196K bpf 0:03 0.00% bandwidthd
1487 root 1 -58 0 5264K 2340K bpf 0:03 0.00% bandwidthd

nambi

Here is my PF Box hooked to the lan with the normal load on it.

Name firewallc****
Version 1.2.3-RELEASE
built on Sun Dec 6 23:21:36 EST 2009
Platform pfSense
Uptime
State table size
Show states
MBUF Usage 198 /645
CPU usage 12%
Memory usage 32%
SWAP usage 0%
Disk usage 55%

How do I test the latency?

vmstat -i

$ vmstat -i
interrupt total rate
irq0: clk 59358197 1000
irq1: atkbd0 14 0
irq5: vr0 1023074 17
irq6: fdc0 2 0
irq8: rtc 7596142 127
irq11: ste0 uhci0* 1027182 17
irq14: ata0 118493 1
irq15: ata1 88 0
Total 69123192 1164

top -S

$ top -S
last pid: 45189; load averages: 0.18, 0.25, 0.19 up 0+16:29:59 09:39:50
117 processes: 3 running, 91 sleeping, 8 zombie, 15 waiting

Mem: 83M Active, 170M Inact, 64M Wired, 56M Buf, 141M Free
Swap: 512M Total, 512M Free

PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND
11 root 1 171 ki31 0K 8K RUN 941:21 97.56% idle: cpu0
41069 root 1 -8 0 40712K 13412K piperd 0:19 1.27% php
33 root 1 -68 - 0K 8K WAIT 5:32 0.39% irq5: vr0
25 root 1 -68 - 0K 8K WAIT 4:45 0.20% irq11: ste0 uhci0*
606 root 1 4 0 5144K 3032K kqread 0:15 0.20% lighttpd
12 root 1 -32 - 0K 8K WAIT 3:10 0.00% swi4: clock sio
14 root 1 -44 - 0K 8K WAIT 1:38 0.00% swi1: net
46 root 1 20 - 0K 8K syncer 1:23 0.00% syncer
3276 proxy 1 4 0 61288K 9240K sbwait 0:46 0.00% squidGuard
4 root 1 -8 - 0K 8K - 0:15 0.00% g_up
5 root 1 -8 - 0K 8K - 0:13 0.00% g_down
15 root 1 44 - 0K 8K - 0:11 0.00% yarrow
3277 proxy 1 4 0 61288K 9152K sbwait 0:08 0.00% squidGuard
20 root 1 8 - 0K 8K - 0:08 0.00% thread taskq
3 root 1 -8 - 0K 8K - 0:06 0.00% g_event
26 root 1 -64 - 0K 8K WAIT 0:06 0.00% irq14: ata0
2014 root 1 8 0 3492K 1380K wait 0:06 0.00% sh
2034 root 1 -58 0 5260K 2364K bpf 0:05 0.00% bandwidthd

Thank You.

nambi

here is the usage graph,

Why is it that my 4 hour 1 min avr graph doesn't show anything or 16 hour 1 min av & 2 day 5 min avr

pfgraph.jpg_thumb

wallabybob

The top output is not consistent with your claim of 3% idle CPU. How did you come to that conclusion?

The ping command can be used to estimate latency, e.g.```

ping -c 20 host


The shell command```
# netstat -i
```will return counters for each interface including error counters. (delays might be caused by corrupted or lost packets)

nambi

Thank you for all the help.

"The top output is not consistent with your claim of 3% idle CPU. How did you come to that conclusion?"

This was what I normally see when I log into the box and go to /status/system

Here are my ping results. and my netstat results. I think my pings look slow, what would cause this?

$ ping -c 20 google.com
PING google.com (173.194.32.104): 56 data bytes
64 bytes from 173.194.32.104: icmp_seq=0 ttl=56 time=11.843 ms
64 bytes from 173.194.32.104: icmp_seq=1 ttl=56 time=11.548 ms
64 bytes from 173.194.32.104: icmp_seq=2 ttl=56 time=11.493 ms
64 bytes from 173.194.32.104: icmp_seq=3 ttl=56 time=10.810 ms
64 bytes from 173.194.32.104: icmp_seq=4 ttl=56 time=12.730 ms
64 bytes from 173.194.32.104: icmp_seq=5 ttl=56 time=11.083 ms
64 bytes from 173.194.32.104: icmp_seq=6 ttl=56 time=11.527 ms
64 bytes from 173.194.32.104: icmp_seq=7 ttl=56 time=10.569 ms
64 bytes from 173.194.32.104: icmp_seq=8 ttl=56 time=11.711 ms
64 bytes from 173.194.32.104: icmp_seq=9 ttl=56 time=11.244 ms
64 bytes from 173.194.32.104: icmp_seq=10 ttl=56 time=11.023 ms
64 bytes from 173.194.32.104: icmp_seq=11 ttl=56 time=11.252 ms
64 bytes from 173.194.32.104: icmp_seq=12 ttl=56 time=11.989 ms
64 bytes from 173.194.32.104: icmp_seq=13 ttl=56 time=11.579 ms
64 bytes from 173.194.32.104: icmp_seq=14 ttl=56 time=11.830 ms
64 bytes from 173.194.32.104: icmp_seq=15 ttl=56 time=12.268 ms
64 bytes from 173.194.32.104: icmp_seq=16 ttl=56 time=32.353 ms
64 bytes from 173.194.32.104: icmp_seq=17 ttl=56 time=11.585 ms
64 bytes from 173.194.32.104: icmp_seq=18 ttl=56 time=11.318 ms
64 bytes from 173.194.32.104: icmp_seq=19 ttl=56 time=11.428 ms

–- google.com ping statistics ---
20 packets transmitted, 20 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 10.569/12.559/32.353/4.566 ms

$ ping -c 20 microsoft.com
PING microsoft.com (207.46.232.182): 56 data bytes

--- microsoft.com ping statistics ---
20 packets transmitted, 0 packets received, 100.0% packet loss

$ ping -c 20 yahoo.com
PING yahoo.com (69.147.125.65): 56 data bytes
64 bytes from 69.147.125.65: icmp_seq=0 ttl=54 time=28.310 ms
64 bytes from 69.147.125.65: icmp_seq=1 ttl=54 time=26.832 ms
64 bytes from 69.147.125.65: icmp_seq=2 ttl=54 time=27.967 ms
64 bytes from 69.147.125.65: icmp_seq=3 ttl=54 time=26.968 ms
64 bytes from 69.147.125.65: icmp_seq=4 ttl=54 time=26.622 ms
64 bytes from 69.147.125.65: icmp_seq=5 ttl=54 time=26.420 ms
64 bytes from 69.147.125.65: icmp_seq=6 ttl=54 time=26.669 ms
64 bytes from 69.147.125.65: icmp_seq=7 ttl=54 time=26.749 ms
64 bytes from 69.147.125.65: icmp_seq=8 ttl=54 time=26.511 ms
64 bytes from 69.147.125.65: icmp_seq=9 ttl=54 time=26.527 ms
64 bytes from 69.147.125.65: icmp_seq=10 ttl=54 time=26.749 ms
64 bytes from 69.147.125.65: icmp_seq=11 ttl=54 time=26.329 ms
64 bytes from 69.147.125.65: icmp_seq=12 ttl=54 time=27.077 ms
64 bytes from 69.147.125.65: icmp_seq=13 ttl=54 time=27.306 ms
64 bytes from 69.147.125.65: icmp_seq=14 ttl=54 time=26.584 ms
64 bytes from 69.147.125.65: icmp_seq=15 ttl=54 time=26.323 ms
64 bytes from 69.147.125.65: icmp_seq=16 ttl=54 time=27.092 ms
64 bytes from 69.147.125.65: icmp_seq=17 ttl=54 time=27.042 ms
64 bytes from 69.147.125.65: icmp_seq=18 ttl=54 time=26.741 ms
64 bytes from 69.147.125.65: icmp_seq=19 ttl=54 time=42.767 ms

--- yahoo.com ping statistics ---
20 packets transmitted, 20 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 26.323/27.679/42.767/3.496 ms

$ ping -c 20 apple.com
PING apple.com (17.112.152.57): 56 data bytes
64 bytes from 17.112.152.57: icmp_seq=0 ttl=243 time=79.368 ms
64 bytes from 17.112.152.57: icmp_seq=1 ttl=243 time=79.317 ms
64 bytes from 17.112.152.57: icmp_seq=2 ttl=243 time=78.288 ms
64 bytes from 17.112.152.57: icmp_seq=3 ttl=243 time=79.355 ms
64 bytes from 17.112.152.57: icmp_seq=4 ttl=243 time=78.635 ms
64 bytes from 17.112.152.57: icmp_seq=5 ttl=243 time=79.386 ms
64 bytes from 17.112.152.57: icmp_seq=6 ttl=243 time=78.455 ms
64 bytes from 17.112.152.57: icmp_seq=7 ttl=243 time=78.654 ms
64 bytes from 17.112.152.57: icmp_seq=8 ttl=243 time=78.982 ms
64 bytes from 17.112.152.57: icmp_seq=9 ttl=243 time=78.985 ms
64 bytes from 17.112.152.57: icmp_seq=10 ttl=243 time=78.456 ms
64 bytes from 17.112.152.57: icmp_seq=11 ttl=243 time=79.444 ms
64 bytes from 17.112.152.57: icmp_seq=12 ttl=243 time=78.749 ms
64 bytes from 17.112.152.57: icmp_seq=13 ttl=243 time=79.249 ms
64 bytes from 17.112.152.57: icmp_seq=14 ttl=243 time=78.750 ms
64 bytes from 17.112.152.57: icmp_seq=15 ttl=243 time=78.992 ms
64 bytes from 17.112.152.57: icmp_seq=16 ttl=243 time=79.403 ms
64 bytes from 17.112.152.57: icmp_seq=17 ttl=243 time=80.230 ms
64 bytes from 17.112.152.57: icmp_seq=18 ttl=243 time=78.417 ms
64 bytes from 17.112.152.57: icmp_seq=19 ttl=243 time=79.180 ms

--- apple.com ping statistics ---
20 packets transmitted, 20 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 78.288/79.015/80.230/0.463

$ netstat -i
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll
ste0 1500 <link#1>00:05:5d:fb:cf:2e 732766 0 968330 0 0
ste0 1500 192.168.1.0 firewallcanada 16722 - 440684 - -
ste0 1500 fe80:1::205:5 fe80:1::205:5dff: 0 - 1 - -
vr0 1500 <link#2>00:0a:e6:2d:f3:9f 969580 0 948426 0 0
vr0 1500 fe80:2::20a:e fe80:2::20a:e6ff: 0 - 2 - -
lo0 16384 <link#3>5453 0 5453 0 0
lo0 16384 your-net localhost 286242 - 821 - -
lo0 16384 ::1 ::1 0 - 0 - -
lo0 16384 fe80:3::1 fe80:3::1 0 - 0 - -
enc0 1536 <link#4>314093 0 225432 0 0
pfsyn 1460 <link#5>0 0 0 0 0
pflog 33204 <link#6>0 0 5249 0 0
ng0 1492 <link#7>968505 0 947352 0 0
ng0 1492 70.25.56.48/3 PTR 1060915 - 758238 - -
ng0 1492 fe80:7::205:5 fe80:7::205:5dff: 0 - 2 - -
tun0 1500 <link#8>0 0 3 0 0
tun0 1500 fe80:8::205:5 fe80:8::205:5dff: 0 - 2 - -
tun0 1500 192.168.200.1 192.168.200.1 0 - 0 - -</link#8></link#7></link#6></link#5></link#4></link#3></link#2></link#1>

wallabybob

@nambi:

"The top output is not consistent with your claim of 3% idle CPU. How did you come to that conclusion?"

This was what I normally see when I log into the box and go to /status/system

I presume you are logging in through the web GUI rather than the console or ssh session. On the web page I see a CPU Usage field there, not a CPU idle field. Did you really mean "cpu about 3% idle" rather than "cpu usage about 3%"?

The ping times to apple.com and yahoo.com are quite consistent. The ping times to google.com are much more variable (10mS to 32mS) but I wouldn't expect that variability to be very noticeable.

The netstat output doesn't show any received errors.

I wonder if your DNS is slow. This could explain why your web surfing is "slow" but download speeds are "high". (When you click on a web page link its generally necessary to ask your DNS to translate the host name to an IP address.) What DNS do you use? (People generally end up using their ISP's DNS.)

tommyboy180

Forgot about DNS.

Try setting your DNS servers to OpenDNS with IP 208.67.222.222 and 208.67.220.220.
I have used OpenDNS for years now and their service has always been 100% reliable not to mention very fast.

nambi

Thank you for the info, at first I was using my ISP's DNS (Bell) but I had a lot of time outs then I swtiched them to google.

8.8.8.8
and
8.8.4.4

I will try opendns's dns but doesn't opendn filter their content?

Thanks will try, to day the net seems responsive.

"The top output is not consistent with your claim of 3% idle CPU. How did you come to that conclusion?"
I must have read the system wrong I do apologize.

And Yes I meant to speak of "CPU usage" meaning when the system system way idle the "CPU usage" seems to be at about 3%

My graphs do not display anything for "4 hour 1 min avr, or 16 hour 1 min av & 2 day 5 min avr"

dszp

OpenDNS filters only spyware or virus-laden webpages by default (along with phishing sites I believe, found at least by their sister site, phishtank.com), if you just start using them. To do additional content filtering by category for instance, you have to create an account and register your IP address(es) and set the settings.

nambi

Using the open DNS servers seemed to fix the lag, pages are quick to respond and system seems back to normal.

Thanks you all for helping me diagnose this issue.

rugby

@David:

OpenDNS filters only spyware or virus-laden webpages by default (along with phishing sites I believe, found at least by their sister site, phishtank.com), if you just start using them. To do additional content filtering by category for instance, you have to create an account and register your IP address(es) and set the settings.

They also do DNS redirecting and sometimes have incorrect records. Be very careful using their servers, as you won't end up where you wanted to go sometimes.

tommyboy180

Do you have an example URL. I have never seen a re-direct to anything other than the OpenDNS search engine from URLs that don't exist.

dszp

It's been a while and I'm not sure if they still do it, but the other thing I've seen from OpenDNS is that they've (transparent) proxied Google in the past, to resolve a specific issue they were having. I discovered this when I was using it and Google wouldn't load for anyone (a proxy issue on their end) but everything else worked. I forget what the exact reason was (there was a technical reason that went along with the way they were doing "shortcuts" or something), and like I said I don't know if they still do this. I haven't had any issues with that or any other part of the service for a long time, and I use them pretty regularly.