Unexpected alias behaviour - two ranges / size limits with FQDN
-
@stephenw10 Hey, at least you're getting paid to placate this behavior. I'm only here to have fun!

-
@SteveITS
After more testing I suspect the root cause of the bug investigated in this thread is- An alias containing one or more FQDN is limited to a little over 512 entries (total across all such aliases), not the 5000 per alias limit suggested in the manual.
- if you go over that limit further alias table updates are blocked for all aliases
For the duplicate removal but not restored bug discovered again in this thread, and referenced in your thread, I can't see that being resolved any time soon. Learning how best to live with it is more sensible imo.
Easy bug trigger
Tested in a clean pfsense v2.8.1 install, enable WAN GUI access. Save that configuration as baseline. Done to ensure easy repeatability and reduce the validity of the shooting the messenger crap.
The easiest way of triggering the bug is entering the following 1024 consecutive element alias -> 473 element are shown in the corresponding alias table.

Random IP addresses & lower bound
To put a bound on the lower limit and confirm sequential IP addresses are irrelevant
Reload the baseline configuration and enter the a 513 element alias such as the following, -> 514 elements are shown in the corresponding alias table

Reload the baseline configuration and enter a 1025 element alias (using the method illustrated above) -> 473 element are shown in the corresponding alias table. Which is exactly the same as the sequential IP initial test case. Waiting longer, Filter reload and "killall filterdns" all make no difference.
Limit per alias or total record count
To illustrate the limit depends on the load implied by other alias
Restore the baseline configuration
Enter Alias like IP_set1 with 50 IP + FQDN x1 as shown below -> 52 records in Alias table

Enter Alias like IP_set2 with 50 IP + FQDN x1 using the same method (mine started 202.) -> 52 records shown in the alias table
Now again try and enter IP_set3 with 512 IP + FQDN x1 as shown below -> zero records. After several filter reloads 271 records. Waiting longer, Filter reload and "killall filterdns" all make no difference.
Which contrasts with the 514 records when no other aliases were entered.

Limit for "mixed mode" alias or also pure FQDN
To investigate if using explicit IP addresses vs FQDN addresses matter
Reload the baseline configuration and enter a 1024 FQDN, I copied the first 1024 entires form pfblocker. -> 569 records after wait about 1 hour and lots of filter reloads. After which- "killall filterdns" -> No matching processes were found
- create an Test_loaded containing a single FQDN -> empty alias table
Which strongly suggests "mixed mode" alias are no different to pure FQDN aliases, just testing takes longer.

-
Updated above post to investigate if using explicit IP addresses vs FQDN records matters
Updated title to more accurately describe root cause evident in hind sight -
@Patch said in Unexpected alias behaviour - two ranges / size limits with FQDN:
Easy bug trigger
[ . . . ]
The easiest way of triggering the bug is entering the following 1024 consecutive element alias -> 473 element are shown in the corresponding alias table.I can confirm this behavior.



-
I am holding here despite the...
Aliases become Tables when loaded into the active firewall ruleset. The contents displayed on this page reflect the current addresses inside tables used by the firewall....advisory.
Please let me know if there's something I can/should do next that would further assist the thread.
-
The 473 records are sequential from
123.123.120.0through123.123.121.216. -
Hmm, I don't think this is a numerical limit. It seems more like a function of timing to me. But either way I would bet the root issue is in filterdns.
-
@bbcan177 You have been dealing with far larger sets of IP addresses and FQDN in pfsense. How have you been working around
-
The limitation in cumulative number of entries in aliases containing a FQDN as described above
-
The possibly related consequence of using incremental alias update with duplicate removal but then using incremental IP removal without restoration of he removed duplicate as per https://redmine.pfsense.org/issues/13792 and possibly https://redmine.pfsense.org/issues/9296 and https://redmine.pfsense.org/issues/13793
The short term fix
-
is relatively simple, just document the limit, add it to redmine and say it's being worked on. Something I can easily work with.
-
is more difficult as that bug essential requires the user ensure all alias containing FQDN are always non intersecting (never trigger duplicate removal). Which severely limits the value of alias for me.
The longer term fix maybe more difficult as it maybe a program architecture limit rather than a coding bug.
-
The limit for "mixed mode" aliases may be resolvable by internally processing the constants (explicitly specified IP addresses) separately to the variable entries (FQDN) then combining the two "sub aliases". The constant portion only need to be evaluated at alias editing or program boot up. Optimisation to group longer runs of consecutive addresses into CIDR prefix notation would be possible.
-
Regression analysis I suspect would show the progressive slowing of alias update and eventual lock up investigated in this thread is a result of earlier work to try and avoid the limitations of implicit duplicate removal addressed in https://redmine.pfsense.org/issues/9296
-
A solution maybe to pre process so addition to filterdns are only made when the duplicate count for that address goes from 0->1 and similalry deletion from filterdns are only made when the duplicate count for that address goes from 1->0. Other duplicate count transitions 1<->2<->3 etc result in no filterdns calls.
-
Alternatively incremental deletes could be avoided completely by all updates were done by a full alias rebuild (creating a list of actual IP addresses +/- CIDR prefix) to a temporary variable then full replacement of the old with the new.
-
or probably best of all, combine the above two. When updating an alias containing one or more FQDN fully rebuild the alias to a temporary variable. Then compare the active and new aliases calling filterdns to update the active alias only as required
None of which sound trivial to me.
-
-
pfBlocker uses URL table aliases which are not limited to 3000 entries like other alias types. But as I said I don't think that's what we're hitting here.
-
@stephenw10 You may well be correct.
I agree it has features suggestive of timing or a race condition such as- performance varies as measured by the total number of entries processed correctly in aliases containing at least one FQDN
- This performance is worse under heavier loads (greater total number entries in aliases containing a FQDN).
Other feature are not typical of a race condition
- For a given test vector the performance (total number of entries processed correctly) is exactly the same for each run
- For a given test vector the performance is exactly the same on different hardware (473 for one FQDN and /22 as per my and tinfoilmatt testing)
- For a given test vector the performance is exactly the same across pfsense v2.81 and v2.72 software. The 256 total is remarkable not because it's a binary number but rather it shows varying the software version does not change the performance (exactly the same total entries processed correctly) even though it does changes how many entries are processed correctly in each of the 3 aliases.
- With lots of FQDN the system eventually reaches an identical end point but locks up gradually, getting very slow prior to actually locking up.
- Looking at the Resolver logs, IPv4 address showing in the alias tables are processed once. The missing IPv4 addresses are processed twice not zero times.
- I thought there is an old redmine (which I can't refined atm) to prevent duplicate processing resulting in removal of IP addresses. List size limits of that fix is most suspicious IMO
Looking at filterdns and the prior work on it I can see why Netgate would be in no rush to delve back in to it, lots of semaphores. I could be wrong but my brief reading suggests it appears to process one entry (absolute IP address or FQDN) at a time. Useful for small inserts but complex for deletes and efficiency maybe challenging for bulk entry with overlaps.
I tagged @bbcan177 not intending to insult the fine programmers at Netgate but because I assumed efficient pre processing to reliably remove duplicates of larger data sets would be required to fully fix the issues raised in this thread. A task I guessed bbcan177 had already invested a lot of time in, so may have some instructive hints.
However I suspect this is a step too large given Netgates prior investment in filerdns hence the repeated deferral of duplicate restoration
-
I went further and applied my "Test" alias containing only 473 (of an expected 1,025) records to a disabled firewall rule.
After reloading the filter, my "Test" alias continued to contain only 473 records.
I then 'emptied' the table (i.e.,
Diagnostics / Tables> select Test alias from dropdown > click "Empty Table"), and reloaded the filter (i.e.,Status / Filter Reload).My "Test" alias now contains 165 records—sequential IPs
123.123.120.0-123.123.120.147,123.123.120.149, sequential IPs123.123.120.151-123.123.120.158, sequential IPs123.123.120.165-123.123.120.167,123.123.120.172,123.123.120.178,123.123.120.191, followed at the very end by the A/AAAA records to whichredmine.pfsense.orgresolve (208.123.73.219and2610:160:11:18::219respectively, and in that order).