Need help finding why memory and swap are full
-
400+ processes is way too many. Looks like lightsquid is having a problem.
What packages are you running? What hardware are you running on? What sort of network is it in front of?Steve
Edit: I see you've added some of that.
-
Yeah, the forum chopped off part of my post and I didn't notice until just now. I apologize for that.
It's running on a Pentium III machine with 1GB ram, 80GB HDD, and 4 network cards. I'm not really sure how to answer the network question. It's the firewall and dns forwarder for a network of several managed switches of mixed manufacture, two windows domain controllers, an exchange server, a file server, a backup server, a nas, and about 50 workstations. Only one person uses the openvpn package for a vpn connection. The network is split into three vlans. I apologize if this isn't want you were asking for.
-
That's pretty much exactly what I was asking.
You have one NIC for each vlan and one for wan? In which case are your switches handling the vlans? There are no vlan interfaces in the pfSense box? It's probably not relevant but it's best to get an idea of what you're desling with.
Are you using lightsquid? Have you ever used it? If the squid and squidguard logs aren't present it's going to have a hard time, you should disable it at least.Steve
-
Yes, one nic for each vlan and wan. The vlans are configured as interfaces on the firewall. Most traffic uses one vlan. One is setup to segregate a few public computers. The other currently isn't used. Oddly enough we never experienced lan problems when the box went haywire. We only had an issue with port 80 traffic until we rebooted the machine that pfsense is loaded on. Yes we were using lightsquid for reporting. I have just started a package removal for lightsquid but it's moving very slowly, but it is moving.
-
In your system activity output you have 5 interfaces listed: rl0-2, dc0 and xl0. Is one of those unassigned? Irq16 seems to be causing an interrupt storm, dc0 is using that IRQ. Something else may be though, what do you see from 'vmstat -i'?
Steve
-
Under the interfaces tab I get (assign), LAN, VLAN3, VLAN4, WAN. The three vlans and the wan have the enabled check box checked. I cannot currently get the interfaces (assign) page to load. Below is the results of the Status>Interfaces page. I apologize if this isn't what you were referring to.
WAN interface (xl0) Status up MAC address 00:60:97:a1:8d:a4 IP address xx.xx.109.126 Subnet mask 255.255.255.240 Gateway Windstream xx.xx.109.113 ISP DNS servers 127.0.0.1 xx.xx.222.222 xx.xx.220.220 Media 100baseTX <full-duplex>In/out packets 1942932/1926379 (1.27 GB/747.06 MB) In/out packets (pass) 1926379/1844701 (1.27 GB/747.06 MB) In/out packets (block) 16553/0 (1.14 MB/0 bytes) In/out errors 4/0 Collisions 0 LAN interface (rl0) Status up MAC address 00:50:ba:5d:3f:9f IP address 192.168.241.254 Subnet mask 255.255.255.0 Media 100baseTX <full-duplex>In/out packets 7062023361/1488130 (211.12 GB/779.59 MB) In/out packets (pass) 1488130/1418930 (711.83 MB/779.59 MB) In/out packets (block) 7060535231/0 (210.42 GB/0 bytes) In/out errors 75180/0 Collisions 0 VLAN3 interface (rl1_vlan3) Status up MAC address 00:50:ba:ba:a1:85 IP address 192.168.12.254 Subnet mask 255.255.255.0 Media 100baseTX <full-duplex>In/out packets 369100/352325 (37.78 MB/436.87 MB) In/out packets (pass) 351807/404665 (36.49 MB/436.84 MB) In/out packets (block) 17293/518 (1.29 MB/28 KB) In/out errors 0/108 Collisions 52 VLAN4 interface (rl2_vlan4) Status up MAC address 00:e0:29:6f:8e:3c IP address 192.168.11.254 Subnet mask 255.255.255.0 Media 100baseTX <full-duplex>In/out packets 23020/23012 (5.14 MB/29.92 MB) In/out packets (pass) 23012/29450 (5.14 MB/29.92 MB) In/out packets (block) 8/0 (506 bytes/0 bytes) In/out errors 0/0 Collisions 0</full-duplex></full-duplex></full-duplex></full-duplex>
Here is the command you requested to be ran.
$ vmstat -i interrupt total rate irq1: atkbd0 8 0 irq5: uhci0 uhci1 2980212854 14796 irq6: fdc0 2 0 irq14: ata0 4424609 21 irq15: ata1 68 0 irq16: dc0 729307791 3620 irq17: xl0 rl2 1512276573 7508 irq18: rl0 1257810302 6244 irq19: rl1 2026324816 10060 cpu0: timer 402790186 1999 cpu1: timer 402791167 1999 Total 9315938376 46252
-
Interesting. So two things:
You're not using dc0 but that looks to be what's causing the interrupt flood. Is that an on board NIC perhaps? You may be able to disable it in the BIOS.
Although you have separate NICs for your local VLANs the actual tagged VLAN traffic is being trunked through to the pfSense NICs. This isn't necessary since you could handle the vlan tagging/untagging in the switch(es) which is potentially less problematic. However it's obviously been working fine for you so I wouldn't change it now.The intrupt rate on rl1 (VLAN3) is significantly higher than the other NICs, is that where most of your clients are?
The fact that squid and squidguard didn't start correctly is not a good sign. Possibly your HD is failing, if that was the case there would be evidece of it in the system log. Is there anything in the squid log to idicate why it didn't start?
Steve
-
I will have to reboot the firewall after hours and inspect the bios.
Most of the clients are on LAN. VLAN3 is the vlan for the couple public computers I mentioned earlier.
I'm not finding any useful information inside the squid logs located in /var/squid/log/. The only logs located there are the access.log and cache.log logs. Below is a list of all of the log files and directories relating to squid I knew how to find:
[2.0.1-RELEASE][admin@kit-pfsense.thekitcheninc.org]/root(1): cd /var/squid/ [2.0.1-RELEASE][admin@kit-pfsense.thekitcheninc.org]/var/squid(2): ls acl cache log logs [2.0.1-RELEASE][admin@kit-pfsense.thekitcheninc.org]/var/squid(3): cd /var/squid/log [2.0.1-RELEASE][admin@kit-pfsense.thekitcheninc.org]/var/squid/log(4): ls access.log access.log.2 access.log.5 cache.log.0 cache.log.3 cache.log.6 access.log.0 access.log.3 access.log.6 cache.log.1 cache.log.4 access.log.1 access.log.4 cache.log cache.log.2 cache.log.5 [2.0.1-RELEASE][admin@kit-pfsense.thekitcheninc.org]/var/squid/log(5): cd /var/squid/logs [2.0.1-RELEASE][admin@kit-pfsense.thekitcheninc.org]/var/squid/logs(6): ls cache.log [2.0.1-RELEASE][admin@kit-pfsense.thekitcheninc.org]/var/squid/logs(7): cd /var/squid/cache [2.0.1-RELEASE][admin@kit-pfsense.thekitcheninc.org]/var/squid/cache(8): ls 00 05 0A 0F 01 06 0B swap.state 02 07 0C swap.state.clean 03 08 0D 04 09 0E [2.0.1-RELEASE][admin@kit-pfsense.thekitcheninc.org]/var/squid/cache(9): cd /var/log [2.0.1-RELEASE][admin@kit-pfsense.thekitcheninc.org]/var/log(10): ls apinger.log lastlog portalauth.log system.log dhcpd.log lighttpd.error.log ppp.log userlog dmesg.boot lighttpd.log pptps.log vpn.log filter.log ntpd.log relayd.log wireless.log ipsec.log openvpn.log slbd.log l2tps.log poes.log squidGuard.log
I think I remember seeing part of a message mentioning sectors on the console after reboot but the interrupt storm was spamming the console so much I couldn't make out much more than the word sectors. Unfortunately I cleared the log after disabling system console logging in case the log getting spammed by the interrupt storm was dragging the system down. I realize in hindsight that was really dumb. However checking the HEALTH and SMART information of the drive through pfsense shows no failures and a passing grade on the SMART assessment.
I did find this in the Portal Auth logs:
Nov 22 21:30:40 squid[12747]: Squid Parent: child process 13229 exited due to signal 9 Nov 22 18:13:54 squid[57530]: Squid Parent: child process 59181 started
-
Just wanted to follow up on this. I ended up updating the firewall to the latest full release. I then reinstalled the squid and squid guard packages. Thankfully pfsense backs up the package configs before reinstalling them so I didn't need to change anything. I also disabled teh serial and parallel ports in the bios to get rid of the interrupt storm. Everything is running perfectly now.
-
Thanks for following up, many don't. :)
Good to hear you sorted it.Steve