Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Upgrade success (Edit: not quite)

    Scheduled Pinned Locked Moved 2.1.1 Snapshot Feedback and Problems - RETIRED
    15 Posts 5 Posters 4.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P
      phil.davis
      last edited by

      There have been some changes to check_reload_status in pfsense-tools. Maybe they will help the failover processing?
      I don't have a suitable multi-WAN test environment at the moment (I am traveling). But On Sunday I will be able to do real testing. So please let me know if there is any change worth testing and I can get the latest snapshot on Saturday night or Sunday and test it.

      As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
      If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

      1 Reply Last reply Reply Quote 0
      • jimpJ
        jimp Rebel Alliance Developer Netgate
        last edited by

        Yes, the snapshots from Thursday afternoon or later should have the check_reload_status fixes which hopefully will address the behavior some are seeing in this and other similar threads. If someone can upgrade and test again it would be appreciated.

        Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

        Need help fast? Netgate Global Support!

        Do not Chat/PM for help!

        1 Reply Last reply Reply Quote 0
        • A
          athurdent
          last edited by

          My test KVM still has that error when a gateway goes down, so I didn't try on my "production" machine because of that:

          Jan 25 07:59:00 pfsense-kvm.local-lan apinger: Error while starting command form alarm(down) on target(192.168.xxx.254-WANGW)
          Jan 25 07:59:00 pfsense-kvm.local-lan apinger: command (/usr/local/sbin/pfSctl -c 'service reload dyndns WANGW' -c 'service reload ipsecdns' -c 'service reload openvpn WANGW' -c 'filter reload' ) exited with status: 255
          
          1 Reply Last reply Reply Quote 0
          • A
            athurdent
            last edited by

            Tried the latest snapshot ( Sat Jan 25 10:43:17 EST 2014 ) and the problem is still there.

            1 Reply Last reply Reply Quote 0
            • P
              phil.davis
              last edited by

              2.1.1-PRERELEASE (i386)
              built on Sat Jan 25 10:00:56 EST 2014
              FreeBSD 8.3-RELEASE-p14
              

              Test Alix 2D13 with its WAN connected to my home LAN. The test WAN gets DHCP from home LAN. WAN gateway monitor IP is set to an external IP on the real internet. Things are running nicely.
              I pull the home WAN as a test - as expected the test WAN gateway goes to "pending" and eventually "offline" as apinger fails to get ping response from the monitor IP. So apinger takes some action, but this still appears in the gateway log:

              Jan 26 22:30:47 	apinger: ALARM: WAN_DHCP(216.146.35.35) *** WAN_DHCPdown ***
              Jan 26 22:30:57 	apinger: Error while starting command form alarm(WAN_DHCPdown) on target(216.146.35.35-WAN_DHCP)
              Jan 26 22:30:57 	apinger: command (/usr/local/sbin/pfSctl -c 'service reload dyndns WAN_DHCP' -c 'service reload ipsecdns' -c 'service reload openvpn WAN_DHCP' -c 'filter reload' ) exited with status: 255
              

              Then I connect home WAN again (so I can post this  :) ). apinger on test pfSense starts getting ping response from the monitor IP - good. It attempts to take action when it decides the WAN gateway is up, and this in the gateway log:

              Jan 26 22:34:00 	apinger: alarm canceled: WAN_DHCP(216.146.35.35) *** WAN_DHCPdown ***
              Jan 26 22:34:10 	apinger: Error while starting command form alarm(WAN_DHCPdown) on target(216.146.35.35-WAN_DHCP)
              Jan 26 22:34:10 	apinger: command (/usr/local/sbin/pfSctl -c 'service reload dyndns WAN_DHCP' -c 'service reload ipsecdns' -c 'service reload openvpn WAN_DHCP' -c 'filter reload' ) exited with status: 255
              

              So the "exited with status: 255" thing happens for all gateway transitions - no need for complicated multi-WAN failover scenarios to see this error message.

              As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
              If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

              1 Reply Last reply Reply Quote 0
              • jimpJ
                jimp Rebel Alliance Developer Netgate
                last edited by

                Finally locked in a fix for this. New snapshots building now should be OK. It'll be a few hours before they upload.

                Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                Need help fast? Netgate Global Support!

                Do not Chat/PM for help!

                1 Reply Last reply Reply Quote 0
                • A
                  athurdent
                  last edited by

                  @jimp:

                  Finally locked in a fix for this. New snapshots building now should be OK. It'll be a few hours before they upload.

                  Great thanks, no more error messages.

                  2 questions for things I ran into while testing failover with that new version:

                  Are states still flushed when a gateway goes down and the option "State Killing on Gateway Failure" is not checked? I have default gateway switching on.
                  I started a ping to an external host, pulled my primary gateway and the ping stopped, never recovered. Shouldn't it recover and automatically use my second gateway?

                  Is there a configure option to switch of those
                  MONITOR: <gateway>is down, removing from routing group <group>messages? They keep on coming and filling my mailbox as long as one gateway is down, but one message per gateway group would really be enough. And even worse is that one never gets a mail when the gateway has recovered, so this whole feature is a little useless (at least for me) and I have to rely on my monitoring system anyway.</group></gateway>

                  1 Reply Last reply Reply Quote 0
                  • A
                    athurdent
                    last edited by

                    Regarding state flushing, it seems that at some point all the states are flushed, not just those of the gateway going down.
                    I would suggest to modify the anti-lockout rule like this, using the "no state" feature of pf:

                    # make sure the user cannot lock himself out of the webConfigurator or SSH
                    pass in  quick on {$lanif} proto tcp from any to ({$lanif}) port { {$alports} } no state label "anti-lockout rule"
                    pass out quick on {$lanif} proto tcp from ({$lanif}) port { {$alports} } to any no state label "anti-lockout rule"
                    

                    You can keep on using your ssh/gui session that way even if all states get flushed. Also the gui feature "Reset States" would profit from that. You could get rid of:
                    "NOTE: If you reset the firewall state table, the browser session may appear to be hung after clicking "Reset". Simply refresh the page to continue."

                    1 Reply Last reply Reply Quote 0
                    • D
                      doktornotor Banned
                      last edited by

                      @jimp:

                      Finally locked in a fix for this. New snapshots building now should be OK. It'll be a few hours before they upload.

                      Confirmed fixed, finally… Yay!  8)

                      1 Reply Last reply Reply Quote 0
                      • P
                        phil.davis
                        last edited by

                        2.1.1-PRERELEASE (i386)
                        built on Sun Feb 2 12:42:30 EST 2014
                        FreeBSD 8.3-RELEASE-p14
                        

                        All is well. I tried pulling out the phone line on the ADSL (default gateway), the dynamic DNS name, OpenVPN road warrior server and 2 OpenVPN site-2-site clients all switched to using OPT1. Plugged in the phone line again, ADSL negotiated, apinger detected the WAN online again, everything failed back.
                        Pulled the cable on OPT1 (which had general internet traffic directed to it as tier1 of a gateway group). General browsing failed over to WAN. OpenVPN server and clients remained running untouched (as they should because they were already on WAN - so no need to restart them).
                        During messing about, ADSL WAN went down by itself - had some minutes with both WAN and OPT1 down, and things recovered fine from that as links became available again. Always good to have the ISP give you a real test  ;)
                        Gateways log tab has nice clean entries like this:

                        Feb 3 10:24:32 	apinger: Starting Alarm Pinger, apinger(28478)
                        Feb 3 10:30:25 	apinger: SIGHUP received, reloading configuration.
                        Feb 3 10:52:28 	apinger: ALARM: WANGW(8.8.8.8) *** down ***
                        Feb 3 11:00:52 	apinger: alarm canceled: WANGW(8.8.8.8) *** down ***
                        Feb 3 11:08:57 	apinger: ALARM: WANGW(8.8.8.8) *** down ***
                        Feb 3 11:09:10 	apinger: ALARM: OPT1GW(8.8.4.4) *** OPT1GWdown ***
                        Feb 3 11:10:49 	apinger: SIGHUP received, reloading configuration.
                        Feb 3 11:12:59 	apinger: alarm canceled: WANGW(8.8.8.8) *** down ***
                        Feb 3 11:20:12 	apinger: SIGHUP received, reloading configuration.
                        Feb 3 11:20:23 	apinger: alarm canceled: OPT1GW(8.8.4.4) *** OPT1GWdown ***
                        

                        As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
                        If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.