Pfsense keep going down after it goes live



  • Hi,

    I have installed pfsense 1.2.2 and it work fine on testing. It could run for a week in the lab w/o any problem serving 1 inactive notebook for testing purpose.

    However, once I put the pfsense fw to the Data Center. It keep going down every few hours. The symptom would be the SSH died 1st then follow by the WebGUI few hours later. Few more hours later, the FW would be totally inaccessible and all the servers behind the FW will be inaccessible. I am using a transparent bridged mode for FW.

    When the packet filtering is on, all the SSH/Services access to the servers behind the FW is extremely slow. The services speed is restored when i disable packet filtering on the WebGUI.

    The hardware for the Pfsense FW would be as below:0
    P4 3.0D
    1GB DDR1
    1 X on board NIC 10/100
    1 X external NIC 10/100

    What would be the problem?



  • Chances are good that you are experiencing firewall state exhaustion.  Look at your RRD Graphs for the past few days and see if you are maxing out your firewall states.  If so, increase the max states.  System -> Advanced



  • @submicron:

    Chances are good that you are experiencing firewall state exhaustion.  Look at your RRD Graphs for the past few days and see if you are maxing out your firewall states.  If so, increase the max states.  System -> Advanced

    The problem occur even if i "disabled packet filtering" which I believe won't be using the FW states, plz correct me.
    The FW is serving up to 5Mbit/sec bandwidth for about 20 servers behind it. Do I need to upgrade the HW?



  • In your first post you said:

    "The services speed is restored when i disable packet filtering on the WebGUI."

    Now you're saying the opposite.  Please be more clear.  Your hardware should be perfectly adequate to handle the bandwidth you are describing.  Again, it sounds exactly like state exhaustion.



  • @submicron:

    In your first post you said:

    "The services speed is restored when i disable packet filtering on the WebGUI."

    Now you're saying the opposite.  Please be more clear.  Your hardware should be perfectly adequate to handle the bandwidth you are describing.  Again, it sounds exactly like state exhaustion.

    Umm…Let me get it clear

    Packet Filtering ON:-

    • Connection to the servers are slow. SSH takes 20-30 seconds to authenticate. I can't SSH to some of the servers

    • After few hours of operation, WebGUI will hung and follow by SSH for the PF and lastly the entire server hung

    Packet Filtering OFF:-

    • Connection to the servers are fast. SSH authentication to the servers responded quickly and the speed is the same like before I plugged the FW in

    • After few hours of operation, WebGUI will hung and follow by SSH for the PF and lastly the entire server hung

    While Packet Filtering is OFF, the usage of the states is always 0/10000. Do I still need to increase the number of states? If yes, what is the number I should increase to?



  • Hummm.

    Nothing visible with the RRD graphics ? Processor usage ? number of tasks is stable ?
    While you can SSH in, what does 'top' says ?

    I'll bet some task (thread) is eating all cycles….
    Nothing special in 'dmesg' ?



  • @Gertjan:

    Hummm.

    Nothing visible with the RRD graphics ? Processor usage ? number of tasks is stable ?
    While you can SSH in, what does 'top' says ?

    I'll bet some task (thread) is eating all cycles….
    Nothing special in 'dmesg' ?

    Top is showing very low proc and memory usage. Average proc usage is about 0.1

    I'll bet some task (thread) is eating all cycles….  --> How do I check for this?



  • Pfsense is already up w/o any problem for 48 hours with "Disable packet filtering" enabled. Whenever I enabled the FW, the connection to the servers are very slow especially SSH authentication session. Anybody experience it before?



  • Not wishing to sound too condescending, but if you're responsible for firewalling 20 servers at 5Mb/s, you'd be wise to invest a bit more time in future in realistic load testing.

    A single "inactive" lappy hardly mimics the deployment environment.


Log in to reply