Random pfsense 2 lockups



  • I have 5 wifi sites running pfsense/captive portal mostly running 2.1 beta1 (as of today),  in the past few months one of the sites has been having random lockups, these range anywhere from 30 mins to 2 days.  When this began, i was running 2.0.1.  I have pretty much been chasing the problem thinking it was bad hardware,  I have so far replaced the switch, replaced the nic cards, replaced the server itself and also a complete reload.  I have also upgraded to 2.1 Beta1 with the exact same issuses.  ( i have done this in steps trying to pinpoint the cause, not all at once).

    My old hardware:

    Server dell dimension 2400 celeron 2.53 with 2 GB Ram, 1 dual intel 100e,  & 2 3com 3c905B nics,  IDE 80GB hd & running nat

    new hardware:

    Dell precision 670 with dual xeon 3.06 with 1GB Ram 2 dual intel 100e nics, IDE 80GB hd & running NAT

    There are never any errors in the logs, it just locks hard,  nothing on the screen, always blank, and the only way to unlock it is to pull the power.

    I have been reading over other posts with similar errors and think I have tracked to a possible interrupt starvation due to bit-torrents running on customers machines,  So last night I changed the network setting to do polling  and blocked bit-torrent with layer7.  I would prefer not to block anything from the customers, since they are paying, but I also cant keep having the server crash taking out 20 other customers internet as well.

    I would like to get some input on this and see if there is anyway to let everything run by using different nics,  other hardware or will the bit-torrents always cause crashes?

    Also just to add on, I never ran out of states, at the time of crash i had roughly 2900 / 97000 states  and mbuf is also always around  1562 / 25600

    Thanks  :)

    Dickie



  • @never-enuff:

    I have 5 wifi sites running pfsense/captive portal mostly running 2.1 beta1 (as of today),  in the past few months one of the sites has been having random lockups, these range anywhere from 30 mins to 2 days.

    Do you mean someone power cycles the box after an interval between 30 mins and  2 days OR that it has been observed to recover by itself after that interval?

    Unless you have a turbo Internet connection, I doubt that that the torrents themselves would cause the problem. However a bunch of torrents MIGHT cause saturation of the internet connection, effectively locking out other users attempting to access the Internet. Its possible that saturation might expose a software bug.

    @never-enuff:

    There are never any errors in the logs, it just locks hard,  nothing on the screen, always blank, and the only way to unlock it is to pull the power.

    What does the system do that causes you to describe it as "locks hard"? For example, SSH to the machine times out, CAPS LOCK light on console keyboard doesn't respond to taps of the CAPS LOCK key, browser times out on attempts to access the web GUI, no response to shell command typed on the console etc?

    It might help to give more information about that system: internet connections and speed, connections to local users and speeds etc.

    @never-enuff:

    Also just to add on, I never ran out of states, at the time of crash i had roughly 2900 / 97000 states  and mbuf is also always around  1562 / 25600

    If the system "locks hard" how did you determine that?



  • Well to follow up on this issue.

    When I said "hard locks"  I meant, the keyboard locked, the monitor froze, basically the server froze, no ssh, no traffic, no ping replies.  It would happen no matter how long the server was running, i could boot it up and within 30 mins it was locked or sometimes it would run for 2 days before locking.

    How i managed to track down the problem was to change one of the dual nic cards, what this accomplished was splitting the WAN & WAN2 on to separate nics.  Not long after doing this, I saw that WAN was not flowing any traffic and sure enough it was locked, but this time didn't stop the system, WAN2 kept on carrying the load.  What I did was put a switch between the nic and the comcast biz modem and I haven't had any lockups since.  I tried this afer some heavy googling and finding other people having issues with comcast modems.

    Thanks

    Dickie



  • Just a thought.

    1. Try running memtest86 from any linux distro bootup disk on your pfSense machine. This will take just several minutes to do,and will at least eliminate the possibility of having a/some diffective memory sticks

    .2) When the pfSense machine  is booted and run for just 15 minutes try running top from the shell and see if anything looks wonky in the printout here. Maybe you will see some oddity.

    Also,,in your initial post did you say you are running 5 pfSense machines,and all 5 are experiencing these lockups ina similar fashion?

    Take Care,
    Barry



  • @brcisna:

    Just a thought.

    1. Try running memtest86 from any linux distro bootup disk on your pfSense machine. This will take just several minutes to do,and will at least eliminate the possibility of having a/some diffective memory sticks

    Tried that, twice actually last week, No problems.

    @brcisna:

    .2) When the pfSense machine  is booted and run for just 15 minutes try running top from the shell and see if anything looks wonky in the printout here. Maybe you will see some oddity.

    Nothing odd in the syslog at all nor anything in the running processes. Even when it locked up, it just locked and stopped everything including syslog.  To make sure the system wasn't hacked, I even did a full wipe and reload of the system.

    @brcisna:

    Also,,in your initial post did you say you are running 5 pfSense machines,and all 5 are experiencing these lockups ina similar fashion?

    Only on one, the other 4 are perfect.

    @brcisna:

    Take Care,
    Barry

    Thanks for the Idea's  since putting the switch in, there hasn't been a single lockup and it is going on 5 days now.  So maybe a flaky comcast modem?? and the switch is dealing with it better than running directly to the nic…maybe????  oh and btw,  on the nic interfaces before and after the switch install, there were never any errors shown. always 0

    Thanks

    Dickie  :-)
    Happy Holidays


Locked