Pfsense Freeze



  • Hi,
    I have my pfsense version 2.0.3-RELEASE FreeBSD 8.1-RELEASE-p13, Platform nanobsd (4g) on a OPNsense appliance.
    Randomly it FREEZE, inpossible to connect via serial port nor via web interface.
    Can't ping the router nor the WAN… ???
    Only after a reboot everything goes back to normal !
    does anyone knows what it might be? How can I debug?

    Thank you


  • Netgate Administrator

    No crash reports I take it?
    Are you doing anything unusual with it? No pattern to the crashes at all?

    If it really is crashing hard and it really is random it's probably a hardware problem. I'd test the RAM and check the cooling solution. The PSU is also a likely suspect.

    Steve



  • @stephenw10:

    No crash reports I take it?
    Are you doing anything unusual with it? No pattern to the crashes at all?

    If it really is crashing hard and it really is random it's probably a hardware problem. I'd test the RAM and check the cooling solution. The PSU is also a likely suspect.

    Steve

    I second that. I've run pFsense off of a Pentium III laptop, Soekris boards and on a Dell Xeon 1U server in production at my old job as a backup firewall to a Cisco ASA. All of them were bullet proof and never crashed. It's probably hardware.



  • I suggest to try to use the CD version, crashes seems like compatibility problems, might be one of the side effects of nanoBSD


  • Netgate Administrator

    If you bought it new I'd ask the Appliance Shop guys who enjoy a good support reputation (and support the project  :)) though I've never had anything from them myself.

    Steve



  • @stephenw10:

    No crash reports I take it?
    Are you doing anything unusual with it? No pattern to the crashes at all?

    If it really is crashing hard and it really is random it's probably a hardware problem. I'd test the RAM and check the cooling solution. The PSU is also a likely suspect.

    Steve

    I haven't identify any pattern to the crashes (not yet)
    Nothing special with the router, it's configured with dual WAN, trigger level: Packet Loss or High Latency
    No problem with the cooling since it's installed in a air conditioned room server
    No problem with the PSU it's plugged into a UPS.
    I'll test the RAM this weekend.

    Thank you for your support!



  • @stephenw10:

    If you bought it new I'd ask the Appliance Shop guys who enjoy a good support reputation (and support the project  :)) though I've never had anything from them myself.

    Steve

    Yes it's new, I confirm that they have a very good support. They told me to check the forum for a solution till they check with the engineers!
    Thank you anyway  :)


  • Netgate Administrator

    Which Opensense appliance do you have?

    This isn't a common problem certainly. I have a nanobsd box running multiwan here and it's completely stable.
    Try connecting a serial console to it to catch the crash or monitor whatever else might be happening.

    Steve



  • @stephenw10:

    Which Opensense appliance do you have?

    This isn't a common problem certainly. I have a nanobsd box running multiwan here and it's completely stable.
    Try connecting a serial console to it to catch the crash or monitor whatever else might be happening.

    Steve

    We have "OPNsense 5 port Ghz rack edition - 19" pfSense appliance"
    I installed a syslog server (I found nothing after a crash), I have now pfsense monitored by Nagios (but nagios don't deliver live check, only by interval)
    I haven't check if I have access from the WAN to my pfsense (I am not sure it will work but will try next time)!
    When the crash happens I tried connecting via serial port but no output.

    I'll check the RAM and CF card, will let you know the result!

    Thank you  8)


  • Netgate Administrator

    If you have a serial console connected when it crashes (or doesn't) it may spew something out that doesn't make it to the syslog server.
    Those boxes appear to have a DC power supply. Are you using a DC UPS or an external power brick and standard UPS? It probably also has an internal DC-DC power supply. Glitches in any of those power parts could cause the system to halt.

    Steve



  • Perhaps you can try to disable Device polling, and look there for other hardware support that can or might be the problem.



  • @stephenw10:

    If you have a serial console connected when it crashes (or doesn't) it may spew something out that doesn't make it to the syslog server.
    Those boxes appear to have a DC power supply. Are you using a DC UPS or an external power brick and standard UPS? It probably also has an internal DC-DC power supply. Glitches in any of those power parts could cause the system to halt.

    Steve

    It's an external power brick with the building's UPS. I can't keep the serial console connected but after the crash/freeze I get no output from it!
    Thank you



  • @ilaurens:

    Perhaps you can try to disable Device polling, and look there for other hardware support that can or might be the problem.

    Device polling is already disabled but thanks for the tip



  • On my pfsense I have configured loadbalancing and failover with two WAN.
    I was getting these notifications by email "MONITOR: WAN1GW has packet loss, removing from routing group*"

    I added a new rule to the firewall > LAN (Source LAN, destination any, port any and gateway WAN1)
    Since (4 days ago), the router is running without any issues!

    Am not sure this is it, I'll keep you updated!

    *WAN1GW: this is the Gateway of WAN1


  • Netgate Administrator

    So you added a rule to force all traffic down one WAN, disabling load balancing and failover. That implies that trying to use the second WAN was causing the box big problems. Perhaps it's misconfigured?
    If it really was suffering packet loss and that's normal for your connection you can tune the parameters used in System: Routing: Edit Gateway: then select the advanced button.

    Steve



  • @stephenw10:

    So you added a rule to force all traffic down one WAN, disabling load balancing and failover. That implies that trying to use the second WAN was causing the box big problems. Perhaps it's misconfigured?
    If it really was suffering packet loss and that's normal for your connection you can tune the parameters used in System: Routing: Edit Gateway: then select the advanced button.

    Steve

    I kept My loadbalancing/Failover rules, I only added this rule at the bottom in case the GW were removed from the group as mentioned in the Email notification.
    The advanced option is already configured on both Gateway settings: Latency thresholds From 300 To 1000 and Packet Loss thresholds From 30 To 50.


  • Netgate Administrator

    Are your WAN connections really that lossy/laggy?

    Adding a rule to catch traffic not caught by the loadbalance rule shouldn't do anything. Even if both WANs are removed from the group the rule would still catch traffic and pass it to the group. Hmm.

    Steve



  • @stephenw10:

    Are your WAN connections really that lossy/laggy?

    Adding a rule to catch traffic not caught by the loadbalance rule shouldn't do anything. Even if both WANs are removed from the group the rule would still catch traffic and pass it to the group. Hmm.

    Steve

    Some times yes the connection is very laggy/lossy



  • I found the following errors in the logs:

    "apinger: Error while feeding rrdtool: Broken pipe
    ..
    lighttpd[30497]: (connections.c.305) SSL: 1 error:140760FC:SSL routines:SSL23_GET_CLIENT_HELLO:unknown protocol

    lighttpd[30497]: (connections.c.305) SSL: 1 error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request
    "

    I am not sure that this have something to do with my issue!


  • Netgate Administrator

    Both those errors are 'normal' though they look scary.  :)
    The RRD tool error happens when the interface first comes up, it's not a problem. I believe it has been fixed in 2.1.
    The lighttpd error shows that someone tried to connect to the webgui on an http connection when it's configured for only https. That error has also always been present but lighttpd errors have only recently been added to the main logging system so you wouldn't have seen it in previous pfSense versions. It's nothing to worry about, the box redirects you to https anyway.

    Steve


Log in to reply