Very slow response with large number of states
-
I'm running the pfSense 1.2-RC2 snapshot. The primary purpose of this setup is to support a high performance Web crawler that's running many (currently between 50 and 100, but once I get more bandwidth, many more) concurrent connections. The crawler is running on a single machine.
The crawler runs fine for a few minutes and then performance begins to degrade quickly until almost every web request times out. When this happens, the number of states is pegged at 5,000. Connections from any machine on the network out to the Internet are very slow, and often time out. Bandwidth usage when this happens is minimal (less than 2 megabits per second on a 10+ megabit line).
Also when this happens, connections to the pfSense box take a very long time and sometimes time out. That is, visiting the Web interface at 192.168.1.1 will time out.
I read in another thread that it might be due to packets queuing up in the modem, however if I shut down the crawler, I still have trouble making connections outside. If I clear the state table, connections become very fast again. Processor usage while the crawler is running is usually in the 50% range, with some forays up into the high 80%.
I'm running pfSense on an AMD Geode processor at 1 GHz. One gigabyte of RAM and a disk that has 100 GB or more (I don't recall how large it is). The current setup has three WAN connections that are load balanced, and a LAN connection. Each of the WAN connections is to a 10-megabit cable modem.
Are there known issues with running pfSense with many thousands of states? Is there some way I can force the states to be removed from the table more quickly? The crawler certainly never has more than a hundred connections open, and it does close a connection when it's done.
I currently have sticky connections turned on. Would turning it off help solve this problem?
Anything else I'm missing?
-
It sounds like your state table is maxed out. What do you have it set to? Does it show as being full or near full?
-
On the System->Advanced functions page, the Firewall Maximum States value is blank, which the description on the page says will give me a state table of 10,000 entries. Odd that my number of states never goes above 5,000.
Also, I have the optimization option set to "normal." I'm wondering if "aggressive" would be better. The description says that the aggressive setting "can drop legitimate connections." In what situations would that happen?
Looking at the states in the state table, I see that most of the apparently orphaned states are in the FIN_WAIT_2 state. That's very odd, as I'm pretty careful about closing my connections on the client. But I'm beginning to wonder if my code or the Windows networking layer isn't handling things correctly.
-
It looks like I've solved the problem. A combination of turning off "keep alive" on my client connections, and setting the pfSense optimization option to "aggressive" has things running very well. Processor usage is in the 50% range, and states stay between 3K and 4K. Not too bad considering that I'm doing more than 70 URLs per second.
-
Also see the Full Install tuning section located here: http://devwiki.pfsense.org/Tuning