2.0 RC1 CPU at 100% after 1-4 days
-
(EDIT: seems to happen as soon as a few hours after booting)
2.0-RC1 (i386)
built on Sat Feb 26 15:30:26 EST 2011Hello,
I have a 32-bit pfSense 2.0 RC1 install on an (older) AMD Sempron, with two Trendnet Gigabit NIC's, running off a hard drive. Connection is 1.5 Mbit DSL, nothing spectacular. CPU usage is usually around 3-4%.
For some reason, after 3-4 days, I'll notice my network acting slowly/strangely (but still working,) only to log onto the console, bring up top, and find it laden with a ton of inetd and nc processes maxing out the CPU. RRD Graphs seem to go blank at the time where the CPU goes nuts. I'm running the SMP kernel, and I tried the other kernel during my initial troubleshooting of the problem (which I thought was because I was running off a USB stick, thus switched over to a hard drive.)
During this time the box is barely useable so the last time this happened (this morning) I just snapped a picture of the screen. Other than that, the only "weird" thing I see after I reboot are 6 "Bump sched buckets to 64 (was 0)" messages.
I've disabled pretty much everything in BIOS.vmstat -i
interrupt total rate
irq0: clk 2154812 999
irq1: atkbd0 18 0
irq8: rtc 275810 127
irq10: re0 uhci0+ 218310 101
irq11: re1 atapci0* 219425 101
irq14: ata0 5462 2
Total 2873837 1333Not sure what to check next, so I wanted to get some advice from the forum, e.g. what I should be checking for in the logs. Anything strange stand out? I couldn't find any open issues for 2.0 that cover this, but perhaps I wasn't searching for the right thing. (considered posting in Hardware but wasn't sure because this was specific to 2.0) Thanks for looking!
Attached: image of top during strange behavior
-
Try updating pfSense, you're using an older build, pfSense 2.0-RC1 is constantly updated sometimes more than once a day. I have an amd athlon k7 1.3Ghz with 512MB of ram from 2001 and I've never seen this problem and I've been using 2.0RC1 for just under a month and a half.
-
I'll try updating and keep an eye on it. Since this is newer than 1.5 months, I can't help but suspect some hardware configuration issue.
(I'd feel more confident about it if this was a known issue before that was resolved; the closest thing I found was a bug related to rate service (for traffic graphs) causing high CPU usage after a few days.) -
Top reports
674 processes, 262 running, 377 zombie
That you have so many inetd processes and zombie processes suggests there might be an issue with the interaction between inetd and a child process.
On my system
more /var/etc/inetd.conf
tftp-proxy dgram udp wait root /usr/libexec/tftp-proxy tftp-proxy -v
suggesting tftp is the only thing inetd is likely to start.
What uses inetd on your system? The pfSense shell command
clog /var/log/system.log | grep inetd
MIGHT provide some hints.
-
Thanks for the suggestions.
It happened again overnight, so only about 8 hours of uptime before it happened again. Another thing of note is that there are a ton of "nc" (netcat) processes along with all the zombie inetd processes.
I grepped the system log for inetd and didn't see any messages containing it (prior to rebooting.) Unfortunately I didn't realize the system.log was entirely wiped during a reboot, so I'll make sure to scp it over beforehand when it happens again.The only "non-standard" packages I have running are snort and openVPN.Something weird I noticed in the system.log is that each of the snort log entries is duplicated, such as
Mar 24 09:10:04 snort[52667]: --== Initialization Complete ==-- Mar 24 09:10:04 snort[52667]: --== Initialization Complete ==--
I disabled snort from the webconfigurator and the system didn't recover, but I have disabled it for the time being to assist with the troubleshooting.
When (if) it happens again, I'll make sure I get the system.log to try and correlate and messages with when the system freaks out. -
What is in your /var/etc/inetd.conf?
Do you have anything attempting to use tftp or any other service in /var/etc/inetd.conf?
-
Looks like tftp and the firewall rules (first three digits edited to xxx) for a few of my external IP's
tftp-proxy dgram udp wait root /usr/libexec/tftp-proxy tftp-proxy -v 19000 stream tcp nowait/0 nobody /usr/bin/nc nc -w 2000 10.0.0.26 xxx.xxx.xxx.86 25 19001 stream tcp nowait/0 nobody /usr/bin/nc nc -w 2000 10.0.0.26 xxx.xxx.xxx.86 25 19002 stream tcp nowait/0 nobody /usr/bin/nc nc -w 2000 10.0.0.26 xxx.xxx.xxx.86 25 19003 stream tcp nowait/0 nobody /usr/bin/nc nc -w 2000 10.0.0.26 xxx.xxx.xxx.86 25 19004 stream tcp nowait/0 nobody /usr/bin/nc nc -w 2000 10.0.0.4 xxx.xxx.xxx.85 443 19005 stream tcp nowait/0 nobody /usr/bin/nc nc -w 2000 10.0.0.4 xxx.xxx.xxx.85 222 19006 stream tcp nowait/0 nobody /usr/bin/nc nc -w 2000 10.0.0.26 xxx.xxx.xxx.86 222 19007 stream tcp nowait/0 nobody /usr/bin/nc nc -w 2000 10.0.0.4 xxx.xxx.xxx.85 54 19007 dgram udp nowait/0 nobody /usr/bin/nc nc -u -w 2000 10.0.0.4 xxx.xxx.xxx.85 54 19008 stream tcp nowait/0 nobody /usr/bin/nc nc -w 2000 10.0.0.4 xxx.xxx.xxx.85 54 19008 dgram udp nowait/0 nobody /usr/bin/nc nc -u -w 2000 10.0.0.4 xxx.xxx.xxx.85 54 19009 stream tcp nowait/0 nobody /usr/bin/nc nc -w 2000 10.0.0.4 xxx.xxx.xxx.85 54 19009 dgram udp nowait/0 nobody /usr/bin/nc nc -u -w 2000 10.0.0.4 xxx.xxx.xxx.85 54 19010 stream tcp nowait/0 nobody /usr/bin/nc nc -w 2000 10.0.0.115 xxx.xxx.xxx.91 80
-
On my pfSense:
/usr/bin/nc -w 2000 10.0.0.26 205.126.89.86 25
nc: port number invalid: 205.126.89.86
Also when I tried to match up the nc command in inetd.conf against the FreeBSD man page for nc it seemed to me that the command didn't match the template in the man page.
I'm running 2.0-RC1-IPv6 (i386)
built on Sun Mar 20 02:20:38 EDT 2011 -
Thanks for investigating; I'll go ahead with the upgrade tonight and see if that changes anything. I haven't updated it yet, so hopefully it'll go smoothly.
-
Thanks for investigating; I'll go ahead with the upgrade tonight
Its probably a good thing to upgrade snapshot builds from time to time, espeially when you come across problems. I just want to be clear that I wasn't suggesting you upgrade. AFter the investigation I recently reported I fully expect the version I'm running would display similar symptoms to your system if I had a similar inetd.conf and I had traffic activating the nc entries in inetd.conf.
Do you have any idea what parts of your configuration are responsible for those nc entries in inetd.conf?
-
Right, I figured I'd upgrade anyway, no huge hopes that it'll solve this issue, but perhaps the something with nc changed?
I haven't done anything exotic, just set up some NAT port forwarding via Web Configurator, which I assume was what added those nc lines to inetd.conf.
-
Upgraded to
2.0-RC1 (i386)
built on Thu Mar 24 13:58:11 EDT 2011and disabled Snort for the time being. Thanks for the input so far; if it happens again I'll be sure to copy down the logs for more info before rebooting it.
-
I haven't done anything exotic, just set up some NAT port forwarding via Web Configurator, which I assume was what added those nc lines to inetd.conf.
I have a number of port forward rules defined in Firewall -> NAT, click on Port Forward tab and I don't have any nc entries in /etc/inetd.conf. Do you have a different type of port forward?
-
Can you tell me if you have any aliases referenced on port forward rules?
-
Yes, I have aliases defined for most of my firewall rules. For some aliases, I specified both the internal and external IP's.
Under Firewall->Aliases, I have a few entries similar to
Name | Values
mailserver | 10.0.0.4, xxx.xxx.xxx.85Then in Firewall->NAT I created rule(s) using the aliases, like:
WAN TCP * * xxx.xxx.xxx.85 25 (SMTP) mailserver 25 (SMTP) Mail Server -
Ok try with latest snapshot. I fixed an issue on the generated configs in the backend.
If you do not want to wait for next snapshot the change is https://rcs.pfsense.org/projects/pfsense/repos/mainline/commits/650b573bd8a435449178385a2d132f7f0002d309 -
OK, thanks! Other than upgrading my snapshot, should I remove/re-add my firewall rules? Delete the nc entries from inetd.conf?
-
For the time being, I've removed the aliases from my setup; once things are stable I'll turn them back on.
-
You should not do any changes to your firewal other than upgrade.
It would be good to give feedback if it solves your issues since its better fix it now rather than go through the hoops again after 2.0
-
OK. Updated, with aliases enabled, so far so good over the weekend. Out of town this week so hopefully it will behave; I'll report on hopefully successful results then.