Pfsense dies a horrible death, happening on multiple machines.



  • I had pfsense installed on a 800MHz Pentium3 machine with 256MB of ram. Two ethernet interfaces, one onboard and one PCI card. PCI card connected to cable modem, onboard to lan.

    One day, I noticed my internet slowing down, from it's normal 15mbits down to about 2.5. I went to log into the router over the webgui to see if i could find what was using the bandwidth.

    The webGUI started spitting out errors, and lots of them. I tried to reboot, then I could not access the webgui at all, though the net did come back up, still running at 2.5mbit - ish.

    I went and took some extra parts laying around, and built an entirely different machine, this one with an athlon 64 3800+, 2GB of ram. All completely different parts, down to the case and PSU.

    I downloaded a fresh copy of pfsense from the site, burned it to a new CD, and installed it.

    Set it up, just basic setup, lan/wan interfaces, DHCP, and a few port forwards, hooked it up and everything was fine for a day.

    Today I woke up to find my net slowing to a halt again, and upon logging into the webGUI i get this:

    Warning: shell_exec(): Unable to execute '/sbin/sysctl -n kern.cp_time' in /usr/local/www/includes/functions.inc.php on line 65  Warning: shell_exec(): Unable to execute '/sbin/sysctl -n kern.cp_time' in /usr/local/www/includes/functions.inc.php on line 67  Warning: exec(): Unable to fork [/sbin/sysctl -n vm.stats.vm.v_page_count vm.stats.vm.v_inactive_count vm.stats.vm.v_cache_count vm.stats.vm.v_free_count] in /usr/local/www/includes/functions.inc.php on line 152  Warning: Division by zero in /usr/local/www/includes/functions.inc.php on line 157  Warning: exec(): Unable to fork [/sbin/sysctl -n kern.boottime] in /usr/local/www/includes/functions.inc.php on line 39  Warning: shell_exec(): Unable to execute '/sbin/pfctl -si %

    And many more similar errors. When I try to SSH into the machine, i get this:

    ╰╼[~]$ ssh -l admin -p 25254 10.8.8.1
    ssh_exchange_identification: Connection closed by remote host

    This is the same thing that happened to the first box, and I'm at a complete loss as to what could be wrong. I haven't changed anything other than the basic setup options I mentioned, yet it's happening on a completely different machine, with a new copy of pfsense.

    Help?



  • is the other machine an old box too ?

    i suspect your ram, and/or hard drive is about to kick the bucket.

    burn an copy of some linux live cd, first one that comes to mind is ubuntu; and once the boot menu pops up, select Memtest, run it for a couple of hours. if you get errors, then you need new ram. if not, im pretty sure your hard disk needs a replacement.



  • Which version of pfSense? Same version in both installs?

    What is the FreeBSD interface names (e.g. em0, fxp0, xl0, …)?



  • @ericab:

    is the other machine an old box too ?

    i suspect your ram, and/or hard drive is about to kick the bucket.

    burn an copy of some linux live cd, first one that comes to mind is ubuntu; and once the boot menu pops up, select Memtest, run it for a couple of hours. if you get errors, then you need new ram. if not, im pretty sure your hard disk needs a replacement.

    The ram is good, it came out of a PC that has been used as a media center for awhile now, never had any issues with it whatsoever. HD is brand new, didnt have a spare one of those. When I took down the first box I ran memtest on it, installed XP and ran prime95 for hours, did some HD diagnostics. Nothing in the first machine was faulty, and everything in the current machine seems to be good as well.

    @wallabybob:

    Which version of pfSense? Same version in both installs?

    What is the FreeBSD interface names (e.g. em0, fxp0, xl0, …)?

    Same version yes, 1.2.3-release. Interface names on the first machine were rl0 and xl0, on the second machine they're fxp0 and fxp1.



  • Well, I just drug out a monitor and keyboard and hooked up to it to poke around over the console, and I saw this: http://img338.imageshack.us/img338/246/1001016c.jpg

    I think I found the problem, but I'm not sure why it's happening or how to fix it. :/



  • It appears you have some process that starts from time to time but doesn't terminate correctly, hence the available process "slots" are gradually used up. It would be useful to know which process and then determine why it isn't terminating correctly.

    Please peruse the pfSense system logs looking for error reports from processes.

    Do you have any packages installed?

    How long does the system stay up before this happens?

    Since you system seems to be able to stay up for a day (24 hours?) how about rebooting at the end of the day then in the morning ssh to it give the shell command ps ax and look for duplicate processes (some will be valid), repeat hourly and compare with previous output looking for increasing numbers of duplicate processes.  This won't help it the problem is provoked by a particular "unusual" event which provokes a flurry of creation of processes which don't correctly terminate.

    You could also scan the forums for topics discussing maxproc to see if anyone else has seen a similar problem.



  • @wallabybob:

    It appears you have some process that starts from time to time but doesn't terminate correctly, hence the available process "slots" are gradually used up. It would be useful to know which process and then determine why it isn't terminating correctly.

    Please peruse the pfSense system logs looking for error reports from processes.

    Do you have any packages installed?

    How long does the system stay up before this happens?

    Since you system seems to be able to stay up for a day (24 hours?) how about rebooting at the end of the day then in the morning ssh to it give the shell command ps ax and look for duplicate processes (some will be valid), repeat hourly and compare with previous output looking for increasing numbers of duplicate processes.  This won't help it the problem is provoked by a particular "unusual" event which provokes a flurry of creation of processes which don't correctly terminate.

    You could also scan the forums for topics discussing maxproc to see if anyone else has seen a similar problem.

    Well, I did some more research and poking around on the router, and I've discovered the problem and fixed it. The issue was caused by NAT reflection. I have a few servers running here, and quite a few people on my LAN, all accessing local servers via their domain name, and it seems the sheer number of connections being created was causing NAT reflection to spawn off too many processes for the system to handle. I disabled NAT Reflection and am working on getting split-DNS set up now. Thank for the advice though guys. :)


Locked