Resource problem with 1.2.3?
-
I am pulling my hair out over this one. I've got pfSense 1.2.3 running on an old HP desktop with two nics in it (one onboard, one PCI). I run this exact same hardware at other offices. Starting a couple months ago it it started freaking out. When you log in and navigate around the GUI it just spits back of bunch of gibberish (well, I'm sure it actually means something, but to me it's gibberish). When you reboot, the problem goes away, but only for a couple hours. I figured it was messed up, so I just swapped it out with another box that I keep as a cold standby. It was up for a little bit no problem, then BAM… gibberish on that one too. So I built a third box and shipped it out and had them plug it in. Good for a couple hours, then gibberish. So I have had three different boxes do this now. Everything seems to work ok, with the exception of IPSec. I have one tunnel configured on that box, and it won't come up once the gibberish starts. Here are a couple examples:
Here is the output from a clog -f /var/log/system.log
Jun 9 13:10:57 scgw1 inetd[509]: fork: Resource temporarily unavailable
Jun 9 13:10:58 scgw1 kernel: maxproc limit exceeded by uid 0, please see tuning(7) and login.conf(5).
Jun 9 13:11:21 scgw1 last message repeated 5 times
Jun 9 13:11:22 scgw1 inetd[509]: fork: Resource temporarily unavailable
Jun 9 13:11:25 scgw1 kernel: maxproc limit exceeded by uid 0, please see tuning(7) and login.conf(5).
Jun 9 13:11:39 scgw1 last message repeated 3 times
Jun 9 13:11:39 scgw1 kernel:
Jun 9 13:11:39 scgw1 inetd[509]: fork: Resource temporarily unavailable
Jun 9 13:11:42 scgw1 kernel: maxproc limit exceeded by uid 0, please see tuning(7) and login.conf(5).
Jun 9 13:11:47 scgw1 kernel: maxproc limit exceeded by uid 0, please see tuning(7) and login.conf(5).And here is a top -SH
Any ideas?
-
I suspect 1892 processes is way too many and that you have some sort of leak where dead processes don't go away and release the resources they were using. Once it gets into this state its probably difficult to see what is going on because issuing a new command will generally require creation of a new process.
I suggest you reboot and from time to time issue the command ps -ax and watch the process count and see if you can detect an accumulation of dead processes of the same name.
-
Probably too many connections through reflection, you'll have to disable it if that's the case.
-
I'll schedule a reboot tonight and turn reflection off. That's kind of rediculous, no? If it can't handle reflection properly, why is it an option?
-Seth
-
Because of the way reflection currently works, if you have a lot of open connections from local systems to your port forwards, there will be a lot of nc (netcat) processes - one per connection, I think.
-
It did turn out to be reflection. I still don't understand why though. I have one internal resource that is accessed via reflection by one user and it is only occasionally. If pfSense can't handle that properly, why is it an option?
-
It did turn out to be reflection. I still don't understand why though. I have one internal resource that is accessed via reflection by one user and it is only occasionally. If pfSense can't handle that properly, why is it an option?
Without seeing your exact NAT and interface configuration, we can only speculate. It works fine for most users. There may be something particular about the way you set it up that was problematic, or something problematic with the client or server software holding open connections.
-
It takes thousands of simultaneous connections to get to that point, if you're seeing that, you have that many connections through reflection. We generally advise against using reflection at all, but it's a fine solution in most circumstances, just not when you get to higher numbers of connections that need to be reflected. High throughput environments do it "right", i.e. split DNS.
Efonne does have a branch in git that does reflection in pf, which gets rid of the nc scalability issues. That's for 2.0 only, and may have other drawbacks as it hasn't been nearly as widely tested.