Primary Pfsense Hangs/Freeze After 40-48 hours



  • Hello Everyone, I have setup Pfsense 2.2.1 CARP setup , with Bridge Mode .
    inside devd i have defined action bridge0 down and it is working Ok .
    Both the servers runs smoothly as Primary and Backup , but there us 1 problem that my primary Servers Hangs after approx two days , and only way to reboot is forcefully via hardware button.

    Please Guide , how do i proceed .



  • All of a sudden Both the Servers are now DOWN .

    Update :-  Now Primary and Secondary both  Only Stays Active for 20-30 minutes and Then it hangs 1 by 1 .
      I have not Enabled manual outbound rule for Lan . Also As i have 5 different subnets on LAN , i have defined directly CARP VIP on Master for those .


  • Netgate Administrator

    We will need more information here.
    Is it actually 'hung' or just stops passing traffic? Can you access the console still?
    If it does crash do you see a crash report or any errors in the system logs?

    Could you exaplin what you mean by 'CARP setup , with Bridge Mode'. You may be hitting this: https://redmine.pfsense.org/issues/4607

    Steve



  • @stephenw10:

    We will need more information here.
    Is it actually 'hung' or just stops passing traffic? Can you access the console still?
    If it does crash do you see a crash report or any errors in the system logs?

    Could you exaplin what you mean by 'CARP setup , with Bridge Mode'. You may be hitting this: https://redmine.pfsense.org/issues/4607

    Steve

    Hello Sir ,
    I'm not able to access console or GUI , Traffic stops . and my monitoring system shows WAN IP down .
    Then i have to do hard reboot , it checks for disk error and no Crash Reports


  • Netgate Administrator

    So do you have any bridges? Which interfaces are bridged?
    You will have problems with syncing between those because the interfaces are not the same.

    Steve



  • @stephenw10:

    So do you have any bridges? Which interfaces are bridged?
    You will have problems with syncing between those because the interfaces are not the same.

    Steve

    Hello Sir,

    On Master Server :-  em0 + re0  are bridged  (bridge0)
    On Backup Server :- em0 + igb0  are bridged (bridge0)

    Initially it did worked for 2 days continuously and Syncing of rules was smooth and continuous .
    I did tried to change outbound NAT for LAN subnet 192.168.1.0/24  to CARP WAN IP .


  • Netgate Administrator

    It's almost certainly the CARP+Bridge bug I linked to above. Luckily that has just been switched to feedback as patches have gone in top resolve it.
    If you're able to test  that then try one of the 2.2.3 snapshots that include those patches from http://snapshots.pfsense.org/.
    Wait a few hours for the snapshots to be built with the patches included. Bare in mind that they are development snapshots so other things may be an issue!

    Steve



  • @stephenw10:

    It's almost certainly the CARP+Bridge bug I linked to above. Luckily that has just been switched to feedback as patches have gone in top resolve it.
    If you're able to test  that then try one of the 2.2.3 snapshots that include those patches from http://snapshots.pfsense.org/.
    Wait a few hours for the snapshots to be built with the patches included. Bare in mind that they are development snapshots so other things may be an issue!

    Steve

    Hello Sir,
    Thank you for pointing that out ,currently i have reinstalled both the machines and restored config before carp setup , now only master server is active ,backup is taken out from network . As this is critical production environment , I'll test snapshot on backup server and make it live for few days and then CARP setup, Will let you know after testing .
    I have configured:-  Snort , Squid+Squidguard , OpenVpn , ntopng , Sarg , bandwidthd .
    Thank you very much for your support :)



  • I would be interested to know if this has been fixed for RootMd5 as using a similar?

    I had the same issue with multiple pfSense 2.2.1 Firewalls when using CARP and Bridging where by any of the pfSense firewalls that used this configuration would lockup/freeze randomly and the only fix was to power cycle to get them running again.

    As they would lockup and the only solution was to power cycle there where no crash files.

    As these are production firewalls located in a data centre I had to roll back to pfSense 2.1.5 till the issue can be resolved and testing the 2.2.3 snapshot would require driving long distances to the data centre to roll back pfSense if the 2.2.3 snapshot does not resolve the issue.


  • Netgate Administrator

    Well, it's status is feedback because as yet nobody who previously had that issue has confirmed that this has resolved it. It looks to have done so in our internal testing.
    If you are able to replicate the failure and then try the snapshot and report your findings on that ticket that would help.  :)

    Steve



  • How stable is the 2.2.3 snapshot?

    Are only able to replicate the issue on the production firewalls when there is real traffic going through them, and then sometimes it maybe 48 hours before one will lock up and other times 20 minutes.

    Load does not seem to matter either as sometimes it might be a primary firewall that locks up with 40,000 states open @ 200Mbps going through it, and other times it will be a backup firewall with no traffic going through it at all.


  • Netgate Administrator

    It's quite stable, I've been running it here for some time. However I can't recommend you run it in production yet. It may be better than crashes every 20mins though!

    Steve



  • Hello ,

    I will update the version next week , and will share the results , as i have checked bug is declared as resolved now i can plan 2.2.3 on production environment .
    I hope it will not mess my running packages .



  • We have been running the latest 2.2.3 version on 4 sets of pfSense firewalls running CARP in bridging mode and they have all been running stable now for the past week with out any issues.

    The only issue we have is that if you are using limiters in firewall rules these need to be removed otherwise the secondary firewall will periodically crash and reboot. Once you disable any limiters pfSense 2.2.3 runs stable.



  • Mine is also running smoothly , no crash after upgrade . however Squidguard is not working now and makes the GUI slow . I have removed it for the time being .
    Steve Sir please mark this as solved . Thanks


Locked