2.0 RC1 CPU at 100% after 1-4 days

Coinbird

(EDIT: seems to happen as soon as a few hours after booting)

2.0-RC1 (i386)
built on Sat Feb 26 15:30:26 EST 2011

Hello,
I have a 32-bit pfSense 2.0 RC1 install on an (older) AMD Sempron, with two Trendnet Gigabit NIC's, running off a hard drive. Connection is 1.5 Mbit DSL, nothing spectacular. CPU usage is usually around 3-4%.
For some reason, after 3-4 days, I'll notice my network acting slowly/strangely (but still working,) only to log onto the console, bring up top, and find it laden with a ton of inetd and nc processes maxing out the CPU. RRD Graphs seem to go blank at the time where the CPU goes nuts. I'm running the SMP kernel, and I tried the other kernel during my initial troubleshooting of the problem (which I thought was because I was running off a USB stick, thus switched over to a hard drive.)
During this time the box is barely useable so the last time this happened (this morning) I just snapped a picture of the screen. Other than that, the only "weird" thing I see after I reboot are 6 "Bump sched buckets to 64 (was 0)" messages.
I've disabled pretty much everything in BIOS.

vmstat -i
interrupt total rate
irq0: clk 2154812 999
irq1: atkbd0 18 0
irq8: rtc 275810 127
irq10: re0 uhci0+ 218310 101
irq11: re1 atapci0* 219425 101
irq14: ata0 5462 2
Total 2873837 1333

Not sure what to check next, so I wanted to get some advice from the forum, e.g. what I should be checking for in the logs. Anything strange stand out? I couldn't find any open issues for 2.0 that cover this, but perhaps I wasn't searching for the right thing. (considered posting in Hardware but wasn't sure because this was specific to 2.0) Thanks for looking!

Attached: image of top during strange behavior

top.jpg_thumb

NiteSnow

Try updating pfSense, you're using an older build, pfSense 2.0-RC1 is constantly updated sometimes more than once a day. I have an amd athlon k7 1.3Ghz with 512MB of ram from 2001 and I've never seen this problem and I've been using 2.0RC1 for just under a month and a half.

Coinbird

I'll try updating and keep an eye on it. Since this is newer than 1.5 months, I can't help but suspect some hardware configuration issue.
(I'd feel more confident about it if this was a known issue before that was resolved; the closest thing I found was a bug related to rate service (for traffic graphs) causing high CPU usage after a few days.)

wallabybob

Top reports

674 processes, 262 running, 377 zombie

That you have so many inetd processes and zombie processes suggests there might be an issue with the interaction between inetd and a child process.

On my system

more /var/etc/inetd.conf

tftp-proxy dgram udp wait root /usr/libexec/tftp-proxy tftp-proxy -v

suggesting tftp is the only thing inetd is likely to start.

What uses inetd on your system? The pfSense shell command

clog /var/log/system.log | grep inetd

MIGHT provide some hints.

Coinbird

Thanks for the suggestions.
It happened again overnight, so only about 8 hours of uptime before it happened again. Another thing of note is that there are a ton of "nc" (netcat) processes along with all the zombie inetd processes.
I grepped the system log for inetd and didn't see any messages containing it (prior to rebooting.) Unfortunately I didn't realize the system.log was entirely wiped during a reboot, so I'll make sure to scp it over beforehand when it happens again.

The only "non-standard" packages I have running are snort and openVPN.Something weird I noticed in the system.log is that each of the snort log entries is duplicated, such as


Mar 24 09:10:04 	snort[52667]: --== Initialization Complete ==--
Mar 24 09:10:04 	snort[52667]: --== Initialization Complete ==--

I disabled snort from the webconfigurator and the system didn't recover, but I have disabled it for the time being to assist with the troubleshooting.
When (if) it happens again, I'll make sure I get the system.log to try and correlate and messages with when the system freaks out.

wallabybob

What is in your /var/etc/inetd.conf?

Do you have anything attempting to use tftp or any other service in /var/etc/inetd.conf?

Coinbird

Looks like tftp and the firewall rules (first three digits edited to xxx) for a few of my external IP's


tftp-proxy	dgram	udp	wait		root	/usr/libexec/tftp-proxy	tftp-proxy -v
19000	stream	tcp	nowait/0	nobody	/usr/bin/nc	nc -w 2000   10.0.0.26  xxx.xxx.xxx.86 25
19001	stream	tcp	nowait/0	nobody	/usr/bin/nc	nc -w 2000   10.0.0.26  xxx.xxx.xxx.86 25
19002	stream	tcp	nowait/0	nobody	/usr/bin/nc	nc -w 2000   10.0.0.26  xxx.xxx.xxx.86 25
19003	stream	tcp	nowait/0	nobody	/usr/bin/nc	nc -w 2000   10.0.0.26  xxx.xxx.xxx.86 25
19004	stream	tcp	nowait/0	nobody	/usr/bin/nc	nc -w 2000   10.0.0.4  xxx.xxx.xxx.85 443
19005	stream	tcp	nowait/0	nobody	/usr/bin/nc	nc -w 2000   10.0.0.4  xxx.xxx.xxx.85 222
19006	stream	tcp	nowait/0	nobody	/usr/bin/nc	nc -w 2000   10.0.0.26  xxx.xxx.xxx.86 222
19007	stream	tcp	nowait/0	nobody	/usr/bin/nc	nc -w 2000   10.0.0.4  xxx.xxx.xxx.85 54
19007	dgram	udp	nowait/0	nobody	/usr/bin/nc	nc -u -w 2000   10.0.0.4  xxx.xxx.xxx.85 54
19008	stream	tcp	nowait/0	nobody	/usr/bin/nc	nc -w 2000   10.0.0.4  xxx.xxx.xxx.85 54
19008	dgram	udp	nowait/0	nobody	/usr/bin/nc	nc -u -w 2000   10.0.0.4  xxx.xxx.xxx.85 54
19009	stream	tcp	nowait/0	nobody	/usr/bin/nc	nc -w 2000   10.0.0.4  xxx.xxx.xxx.85 54
19009	dgram	udp	nowait/0	nobody	/usr/bin/nc	nc -u -w 2000   10.0.0.4  xxx.xxx.xxx.85 54
19010	stream	tcp	nowait/0	nobody	/usr/bin/nc	nc -w 2000   10.0.0.115  xxx.xxx.xxx.91 80

wallabybob

On my pfSense:

/usr/bin/nc -w 2000 10.0.0.26 205.126.89.86 25

nc: port number invalid: 205.126.89.86

Also when I tried to match up the nc command in inetd.conf against the FreeBSD man page for nc it seemed to me that the command didn't match the template in the man page.

I'm running 2.0-RC1-IPv6 (i386)
built on Sun Mar 20 02:20:38 EDT 2011

Coinbird

Thanks for investigating; I'll go ahead with the upgrade tonight and see if that changes anything. I haven't updated it yet, so hopefully it'll go smoothly.

wallabybob

@Coinbird:

Thanks for investigating; I'll go ahead with the upgrade tonight

Its probably a good thing to upgrade snapshot builds from time to time, espeially when you come across problems. I just want to be clear that I wasn't suggesting you upgrade. AFter the investigation I recently reported I fully expect the version I'm running would display similar symptoms to your system if I had a similar inetd.conf and I had traffic activating the nc entries in inetd.conf.

Do you have any idea what parts of your configuration are responsible for those nc entries in inetd.conf?

Coinbird

Right, I figured I'd upgrade anyway, no huge hopes that it'll solve this issue, but perhaps the something with nc changed?

I haven't done anything exotic, just set up some NAT port forwarding via Web Configurator, which I assume was what added those nc lines to inetd.conf.

Coinbird

Upgraded to
2.0-RC1 (i386)
built on Thu Mar 24 13:58:11 EDT 2011

and disabled Snort for the time being. Thanks for the input so far; if it happens again I'll be sure to copy down the logs for more info before rebooting it.

wallabybob

@Coinbird:

I haven't done anything exotic, just set up some NAT port forwarding via Web Configurator, which I assume was what added those nc lines to inetd.conf.

I have a number of port forward rules defined in Firewall -> NAT, click on Port Forward tab and I don't have any nc entries in /etc/inetd.conf. Do you have a different type of port forward?

eri--

Can you tell me if you have any aliases referenced on port forward rules?

Coinbird

Yes, I have aliases defined for most of my firewall rules. For some aliases, I specified both the internal and external IP's.

Under Firewall->Aliases, I have a few entries similar to
Name | Values
mailserver | 10.0.0.4, xxx.xxx.xxx.85

Then in Firewall->NAT I created rule(s) using the aliases, like:
WAN TCP * * xxx.xxx.xxx.85 25 (SMTP) mailserver 25 (SMTP) Mail Server

eri--

Ok try with latest snapshot. I fixed an issue on the generated configs in the backend.
If you do not want to wait for next snapshot the change is https://rcs.pfsense.org/projects/pfsense/repos/mainline/commits/650b573bd8a435449178385a2d132f7f0002d309

Coinbird

OK, thanks! Other than upgrading my snapshot, should I remove/re-add my firewall rules? Delete the nc entries from inetd.conf?

Coinbird

For the time being, I've removed the aliases from my setup; once things are stable I'll turn them back on.

eri--

You should not do any changes to your firewal other than upgrade.

It would be good to give feedback if it solves your issues since its better fix it now rather than go through the hoops again after 2.0

Coinbird

OK. Updated, with aliases enabled, so far so good over the weekend. Out of town this week so hopefully it will behave; I'll report on hopefully successful results then.