Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    pfBlocker CARP node goes into a kind of backup after pfBlocker update. pfb_dsnbl stops.

    Scheduled Pinned Locked Moved pfBlockerNG
    10 Posts 2 Posters 559 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • R
      reberhar
      last edited by reberhar

      Thanks all.

      I have 4 pairs of CARP boxes. One pair is using an IP Alias which I am going to change to CARP.

      The other three use the CARP function on pfBlocker, apropriately configured with primary Base 1, skew 0, secondary Base 1, skew 100, allowing the /32 which is always restored after an update. That works, with one very irritating problem. Which I discuss below from another post I made.

      Today I found a new wrinkle. So say I change the time of the update of the pfBlocker instance, after the save, the pfBlocker CARP node also performs as mentioned below. This is totally predictable.

      One comment. For me Unbound works fine, great even, but when the pfBlocker CARP node fails and the corresponding pfb_dnsbl process stops, the DNS cannot reach the users from that pfSense box and the pfblocker and the secondary box takes over.

      Yes, if I disable pfBlocker, all works fine.

      Like I mention, editing and saving the VIPs from the Firewall menu fixes the pfBlocker CARP nodes until the next pfBlocker update. I am wondering if there is a process that I can invoke from CRON that I could use to reset the CARP in the same manner?

      After a pfBlockerng update, the pfBlocker CARP VIP on the primary does not show MASTER or SECONDARY it is just blank and pfb_dnsbl stops. Then the secondary pfBlocker CARP VIIP takes over.

      See screen shots below.

      So if I make like I am editing the primary pfBlocker CARP VIP from the firewall menu and just save it the primary pfBlocker CARP VIP becomes MASTER and the secondary one becomes backup.

      Then I can start pfb_dnsbl successfully.

      Here is something from the general logs.

      Jul 26 09:16:52 php 40257 [pfBlockerNG] DNSBL parser daemon started
      Jul 26 09:16:52 lighttpd_pfb 39134 [pfBlockerNG] DNSBL Webserver started
      Jul 26 09:16:52 lighttpd_pfb 36785 [pfBlockerNG] DNSBL Webserver stopped
      Jul 26 09:16:41 kernel carp: 4@igb1: BACKUP -> MASTER (preempting a slower master)
      Jul 26 09:16:40 check_reload_status 441 Reloading filter
      Jul 26 09:16:40 kernel carp: 4@igb1: INIT -> BACKUP (initialization complete)
      Jul 26 09:16:39 php-fpm 97364 /rc.filter_synchronize: XMLRPC reload data success with https://10.1.10.2:443/xmlrpc.php (pfsense.restore_config_section).
      Jul 26 09:16:38 php-fpm 97364 /rc.filter_synchronize: Beginning XMLRPC sync data to https://10.1.10.2:443/xmlrpc.php.
      Jul 26 09:16:38 php-fpm 97364 /rc.filter_synchronize: XMLRPC versioncheck: 23.3 -- 23.3
      Jul 26 09:16:38 php-fpm 97364 /rc.filter_synchronize: XMLRPC reload data success with https://10.1.10.2:443/xmlrpc.php (pfsense.host_firmware_version).

      The CARP system seems to be fine otherwise. Once I have done the manual intervention things are fine until pfBlocker does its updates again.

      Observations? Suggestions? What am I missing?

      Thanks

      The other machines CARP pair

      Jul 26 08:18:16 php-fpm 31382 /rc.carpmaster: HA cluster member "(10.33.10.1@em1): (GREENLAN)" has resumed CARP state "MASTER" for vhid 5
      Jul 26 08:18:15 check_reload_status 457 Carp master event
      Jul 26 08:18:15 kernel carp: 5@em1: BACKUP -> MASTER (preempting a slower master)
      Jul 26 08:18:15 php-fpm 19625 /rc.carpbackup: HA cluster member "(10.33.10.1@em1): (GREENLAN)" has resumed CARP state "BACKUP" for vhid 5
      Jul 26 08:18:15 php-fpm 19625 /rc.filter_synchronize: XMLRPC reload data success with https://172.16.1.3:443/xmlrpc.php (pfsense.restore_config_section).
      Jul 26 08:18:14 check_reload_status 457 Reloading filter
      Jul 26 08:18:14 kernel carp: 5@em1: INIT -> BACKUP (initialization complete)
      Jul 26 08:18:14 check_reload_status 457 Carp backup event
      Jul 26 08:18:12 php-fpm 19625 /rc.filter_synchronize: Beginning XMLRPC sync data to https://172.16.1.3:443/xmlrpc.php.
      Jul 26 08:18:12 php-fpm 19625 /rc.filter_synchronize: XMLRPC versioncheck: 23.3 -- 23.3
      Jul 26 08:18:12 php-fpm 19625 /rc.filter_sJul 26 08:18:16 php-fpm 31382 /rc.carpmaster: HA cluster member "(10.33.10.1@em1): (GREENLAN)" has resumed CARP state "MASTER" for vhid 5
      Jul 26 08:18:15 check_reload_status 457 Carp master event
      Jul 26 08:18:15 kernel carp: 5@em1: BACKUP -> MASTER (preempting a slower master)
      Jul 26 08:18:15 php-fpm 19625 /rc.carpbackup: HA cluster member "(10.33.10.1@em1): (GREENLAN)" has resumed CARP state "BACKUP" for vhid 5
      Jul 26 08:18:15 php-fpm 19625 /rc.filter_synchronize: XMLRPC reload data success with https://172.16.1.3:4443/xmlrpc.php (pfsense.restore_config_section).
      Jul 26 08:18:14 check_reload_status 457 Reloading filter
      Jul 26 08:18:14 kernel carp: 5@em1: INIT -> BACKUP (initialization complete)
      Jul 26 08:18:14 check_reload_status 457 Carp backup event
      Jul 26 08:18:12 php-fpm 19625 /rc.filter_synchronize: Beginning XMLRPC sync data to https://172.16.1.3:4443/xmlrpc.php.
      Jul 26 08:18:12 php-fpm 19625 /rc.filter_synchronize: XMLRPC versioncheck: 23.3 -- 23.3
      Jul 26 08:18:12 php-fpm 19625 /rc.filter_synchronize: XMLRPC reload data success with https://172.16.1.3:4443/xmlrpc.php (pfsense.host_firmware_version).
      Jul 26 08:18:12 php-fpm 19625 /rc.filter_synchronize: Beginning XMLRPC sync data to https://172.16.1.3:4443/xmlrpc.php.
      Jul 26 08:18:11 php-fpm 70712 /firewall_virtual_ip_edit.php: Beginning configuration backup to https://acb.netgate.com/save
      ynchronize: XMLRPC reload data success with https://172.16.1.3:443/xmlrpc.php (pfsense.host_firmware_version).
      Jul 26 08:18:12 php-fpm 19625 /rc.filter_synchronize: Beginning XMLRPC sync data to https://172.16.1.3:443/xmlrpc.php.
      Jul 26 08:18:11 php-fpm 70712 /firewall_virtual_ip_edit.php: Beginning configuration backup to https://acb.netgate.com/save

      Dashboard of Primary

      Screenshot 2024-08-05 at 16-22-49 catasenseprimary.cata[...].png

      Primary pfBlocker CARP Config ...

      Screenshot 2024-08-05 at 16-12-36 catasenseprimary.cata[...].png

      Primary CARP VIP

      Screenshot 2024-08-05 at 16-25-55 catasenseprimary.cata[...].png

      Secondary pfBlocker CARP Config ...

      Screenshot 2024-08-05 at 16-16-49 catasensesecondary.ca[...].png

      Secondary CARP VIP

      Screenshot 2024-08-05 at 16-26-34 catasensesecondary.ca[...].png

      Update settings on Primary node.

      Screenshot 2024-08-05 at 16-43-45 catasenseprimary.cata[...].png

      R 1 Reply Last reply Reply Quote 0
      • R
        reberhar @reberhar
        last edited by reberhar

        This post is deleted!
        R 1 Reply Last reply Reply Quote 0
        • R
          reberhar @reberhar
          last edited by reberhar

          @reberhar

          I think I may be close to the solution. I had two extended power failures on two of my sites. When the power returned, both of those sites stabilized.

          Gee that's just weird right? Yes I had tried to reboot the pfSense systems without success.

          I am going to try resetting the switches directly downstream from the CARP pairs that are still giving issues. If that fixes that problem I will write the whole thing up.

          It might have to do with Filter Host IDs and the way I originally configured my systems.

          R 1 Reply Last reply Reply Quote 0
          • R
            reberhar @reberhar
            last edited by reberhar

            @reberhar Ok Success! Everything seems happy. Phew!

            So how did I manage to goof up such a great system and what was it that I did?

            So there is a very long story which I will not tell you.

            Suffice it to say that my filter IDs were identical on 3 of the HA pairs. I corrected them and cleared the state tables. The fact that they were identical is part of the story I am not telling you.

            I also setup pfBlocker with with CARP and made sure the CARP VHIDs were unique. I setup base and skew numbers appropriately on the primary and secondary machines. Nuts to the "do not edit" message.This was all good and ran perfectly until pfBlocker did its nightly updates. Then I would get the bizarre behavior mentioned above. I noted that it happened as well when I fussed with OpenVPN.

            Ok, there is a very nice discussion on the importance of having your switches right on the HA section in the Netgate docs. It has some important challenging information in it.

            So yes, I reset my switches downstream of the HA/CARP pairs and 3 of the four pairs were happy, but not the fourth unit. It is a 24 port ubiquiti smart switch. So I moved a little Managed Netgear GS108PE between the pfSense and the Ubiquiti, INSTANT HAPPINESS! The switch was already there. I just moved it up in position.

            (Actually a helpful extended power failure reset two of the four switches.)

            Ok. I have made enough mistakes on these things. If anybody thinks I goofed here, gee please tell me.

            At any rate, DNS now works on failover and all the bizarre behavior that occasioned this post has gone away. I also get to sleep at night.

            R 2 Replies Last reply Reply Quote 1
            • R
              reberhar @reberhar
              last edited by reberhar

              @reberhar

              I want to mention that enabling Checksum Offloading in System/Advanced/Networking seems to help with my problem. I have Intel NICs.

              CARP seems to be a very CPU critical operation. It is, by design very time sensitive. Removing load from the CPU, especially where the NICs are concerned can't hurt and may help. It does seem to make a difference in my case. A very slight lag can upset the CARP system.

              Thus the importance of the switches being on board is important as well.

              The more I fuss with this the more I realize how complex HA and CARP really is. It is, however, worth the struggle. It is really quite amazing when it works well. 😊

              S 1 Reply Last reply Reply Quote 0
              • S
                SteveITS Galactic Empire @reberhar
                last edited by

                @reberhar What is your ISP bandwidth? Guessing rather high…

                It is cool to upgrade pfSense while they are in use. (Backup first, for lurkers)

                Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                Upvote 👍 helpful posts!

                R 1 Reply Last reply Reply Quote 0
                • R
                  reberhar @SteveITS
                  last edited by reberhar

                  @SteveITS

                  I have 5 sites, but the one I am talking about now has two ISP gateways,

                  One at 350/350, and the other at 150/150.

                  Fiber optic, really awesome. Before it was DSL which was a very small fraction of the FO bandwidth.

                  Yes, it is very nice to update one box and then the other.

                  "Lurkers" ... very appropriate term. I don't like surprises either, especially when I am remote.

                  R 1 Reply Last reply Reply Quote 0
                  • R
                    reberhar @reberhar
                    last edited by reberhar

                    This post is deleted!
                    R 1 Reply Last reply Reply Quote 0
                    • R
                      reberhar @reberhar
                      last edited by

                      This post is deleted!
                      1 Reply Last reply Reply Quote 0
                      • R
                        reberhar @reberhar
                        last edited by reberhar

                        @reberhar SUCCESS

                        After the latest upgrade for pfBlocker I started to have the same problems all over again and none of my other methods fixed it.

                        I finally got onsite and have learned some useful things.

                        First I have 2 Netgear GS108PEs, and one worked properly in this situation and the other did not. After thinking about it I realized that the one that functioned had 802.1q VLAN enabled. So I enabled 802.1q VLAN on the one that was not functioning correctly and the problem disappeared. No I didn't make any VLANs on the second unit, although the first unit I mentioned does have them. I just enabled 802.1q VLAN.

                        I reasoned that perhaps multicast was somehow involved in this. (duh) So I worked through enabling multicast on my Ubiquiti 24 port smart switch that had failed with this challenge earlier. It actually involved the Cloud Key as well.

                        This I did just on the two ports I am using for HA, not the entire switch.

                        That worked too and is still working.

                        😊

                        Yes I know, multicast is mentioned in the HA diagnostics write up. I guess I was just not following through. Actually, I was just a little unsure how to proceed. I have other very smart switches that have been testy in this pfBlockerng / HA environment. I am excited to try this approach with them.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.