Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Unexpected alias behaviour - two ranges / size limits with FQDN

    Scheduled Pinned Locked Moved General pfSense Questions
    82 Posts 4 Posters 1.1k Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P Online
      Patch @Patch
      last edited by Patch

      Updated above post to investigate if using explicit IP addresses vs FQDN records matters
      Updated title to more accurately describe root cause evident in hind sight

      1 Reply Last reply Reply Quote 0
      • tinfoilmattT Offline
        tinfoilmatt @Patch
        last edited by

        @Patch said in Unexpected alias behaviour - two ranges / size limits with FQDN:

        Easy bug trigger
        [ . . . ]
        The easiest way of triggering the bug is entering the following 1024 consecutive element alias -> 473 element are shown in the corresponding alias table.

        I can confirm this behavior.

        ecd46d8a-5569-45ac-b89e-a0000e4d2888-image.png

        9be71e3c-e06f-41ec-b9f3-686cab04606d-image.png

        1447f1d9-3d22-4501-8f9e-e5ff333dd837-image.png

        1 Reply Last reply Reply Quote 0
        • tinfoilmattT Offline
          tinfoilmatt
          last edited by

          I am holding here despite the...

          Aliases become Tables when loaded into the active firewall ruleset. The contents displayed on this page reflect the current addresses inside tables used by the firewall.

          ...advisory.

          Please let me know if there's something I can/should do next that would further assist the thread.

          1 Reply Last reply Reply Quote 0
          • tinfoilmattT Offline
            tinfoilmatt
            last edited by

            The 473 records are sequential from 123.123.120.0 through 123.123.121.216.

            1 Reply Last reply Reply Quote 0
            • stephenw10S Offline
              stephenw10 Netgate Administrator
              last edited by

              Hmm, I don't think this is a numerical limit. It seems more like a function of timing to me. But either way I would bet the root issue is in filterdns.

              1 Reply Last reply Reply Quote 1
              • P Online
                Patch
                last edited by Patch

                @bbcan177 You have been dealing with far larger sets of IP addresses and FQDN in pfsense. How have you been working around

                1. The limitation in cumulative number of entries in aliases containing a FQDN as described above

                2. The possibly related consequence of using incremental alias update with duplicate removal but then using incremental IP removal without restoration of he removed duplicate as per https://redmine.pfsense.org/issues/13792 and possibly https://redmine.pfsense.org/issues/9296 and https://redmine.pfsense.org/issues/13793

                The short term fix

                1. is relatively simple, just document the limit, add it to redmine and say it's being worked on. Something I can easily work with.

                2. is more difficult as that bug essential requires the user ensure all alias containing FQDN are always non intersecting (never trigger duplicate removal). Which severely limits the value of alias for me.

                The longer term fix maybe more difficult as it maybe a program architecture limit rather than a coding bug.

                • The limit for "mixed mode" aliases may be resolvable by internally processing the constants (explicitly specified IP addresses) separately to the variable entries (FQDN) then combining the two "sub aliases". The constant portion only need to be evaluated at alias editing or program boot up. Optimisation to group longer runs of consecutive addresses into CIDR prefix notation would be possible.

                • Regression analysis I suspect would show the progressive slowing of alias update and eventual lock up investigated in this thread is a result of earlier work to try and avoid the limitations of implicit duplicate removal addressed in https://redmine.pfsense.org/issues/9296

                • A solution maybe to pre process so addition to filterdns are only made when the duplicate count for that address goes from 0->1 and similalry deletion from filterdns are only made when the duplicate count for that address goes from 1->0. Other duplicate count transitions 1<->2<->3 etc result in no filterdns calls.

                • Alternatively incremental deletes could be avoided completely by all updates were done by a full alias rebuild (creating a list of actual IP addresses +/- CIDR prefix) to a temporary variable then full replacement of the old with the new.

                • or probably best of all, combine the above two. When updating an alias containing one or more FQDN fully rebuild the alias to a temporary variable. Then compare the active and new aliases calling filterdns to update the active alias only as required

                None of which sound trivial to me.

                1 Reply Last reply Reply Quote 0
                • stephenw10S Offline
                  stephenw10 Netgate Administrator
                  last edited by

                  pfBlocker uses URL table aliases which are not limited to 3000 entries like other alias types. But as I said I don't think that's what we're hitting here.

                  P 1 Reply Last reply Reply Quote 0
                  • P Online
                    Patch @stephenw10
                    last edited by Patch

                    @stephenw10 You may well be correct.
                    I agree it has features suggestive of timing or a race condition such as

                    • performance varies as measured by the total number of entries processed correctly in aliases containing at least one FQDN
                    • This performance is worse under heavier loads (greater total number entries in aliases containing a FQDN).

                    Other feature are not typical of a race condition

                    • For a given test vector the performance (total number of entries processed correctly) is exactly the same for each run
                    • For a given test vector the performance is exactly the same on different hardware (473 for one FQDN and /22 as per my and tinfoilmatt testing)
                    • For a given test vector the performance is exactly the same across pfsense v2.81 and v2.72 software. The 256 total is remarkable not because it's a binary number but rather it shows varying the software version does not change the performance (exactly the same total entries processed correctly) even though it does changes how many entries are processed correctly in each of the 3 aliases.
                    • With lots of FQDN the system eventually reaches an identical end point but locks up gradually, getting very slow prior to actually locking up.
                    • Looking at the Resolver logs, IPv4 address showing in the alias tables are processed once. The missing IPv4 addresses are processed twice not zero times.
                    • I thought there is an old redmine (which I can't refined atm) to prevent duplicate processing resulting in removal of IP addresses. List size limits of that fix is most suspicious IMO

                    Looking at filterdns and the prior work on it I can see why Netgate would be in no rush to delve back in to it, lots of semaphores. I could be wrong but my brief reading suggests it appears to process one entry (absolute IP address or FQDN) at a time. Useful for small inserts but complex for deletes and efficiency maybe challenging for bulk entry with overlaps.

                    I tagged @bbcan177 not intending to insult the fine programmers at Netgate but because I assumed efficient pre processing to reliably remove duplicates of larger data sets would be required to fully fix the issues raised in this thread. A task I guessed bbcan177 had already invested a lot of time in, so may have some instructive hints.

                    However I suspect this is a step too large given Netgates prior investment in filerdns hence the repeated deferral of duplicate restoration

                    1 Reply Last reply Reply Quote 1
                    • tinfoilmattT Offline
                      tinfoilmatt
                      last edited by

                      I went further and applied my "Test" alias containing only 473 (of an expected 1,025) records to a disabled firewall rule.

                      After reloading the filter, my "Test" alias continued to contain only 473 records.

                      I then 'emptied' the table (i.e., Diagnostics / Tables > select Test alias from dropdown > click "Empty Table"), and reloaded the filter (i.e., Status / Filter Reload).

                      My "Test" alias now contains 165 records—sequential IPs 123.123.120.0 - 123.123.120.147, 123.123.120.149, sequential IPs 123.123.120.151 - 123.123.120.158, sequential IPs 123.123.120.165 - 123.123.120.167, 123.123.120.172, 123.123.120.178, 123.123.120.191, followed at the very end by the A/AAAA records to which redmine.pfsense.org resolve (208.123.73.219 and 2610:160:11:18::219 respectively, and in that order).

                      P 1 Reply Last reply Reply Quote 0
                      • P Online
                        Patch @tinfoilmatt
                        last edited by Patch

                        I had difficulty making sense of edits after the system has locked up because old entries are retained and test reproduction more difficult.

                        The other reason I have not focused on it is it tends to demonstrate the bug can be made latent so is bad not good. To explain

                        For me the bug initially occurred in a production unit. I incrementally update the configuration over years. One update resulted in me going over the total alias entry limit for alias with a FQDN. As old entries and other alias tables are retained after editing an alias, the system continued to work well. Months later I had a prolonged power failure resulting in a pfsense restart. The restart forces a full alias rebuilt but now the failed alias entries were not restricted to my edits 2 months earlier, other more critical entries were omitted. Which presented as failure of my main incoming VoIP supplier to register on my PABX.

                        As a result I have focussed on behaviour on device restart as I don't like latent failures.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.