Unexpected alias behaviour - two ranges / size limits with FQDN
-
@stephenw10 said in Unexpected alias behaviour - two ranges:
Mixed mode aliases
I like that. Has a nicer ring than 'kludge.'
-
@tinfoilmatt said in Unexpected alias behaviour - two ranges:
Can you re-run this without any user error
I guess that's fair though it would imply deleting the invalid entries still causes a problem? Or at least that doing so doesn't fix the problem. Onwards (read it all)...
If I delete all four aliases, apply, and re-import them, I do not see the error case today even after a filter reload or reboot. All four aliases are correct (618 total IPs).
I added "invalid" to alias_512 and applied, same.
I emptied all four tables, ran a filter reload, and all four remained empty.
I removed "invalid" and ran a filter reload, all tables remained empty.
I had to "killall filterdns" and filter reload, and after that the tables populated correctly.
next:
empty all tables
add "invalid" to alias_512, and apply
all tables remain empty
killall filterdns, and reload filter
all tables are populated correctly...so, killing filterdns is suddenly required to get the tables to recreate at all. @stephenw10, does a filter reload actively empty the tables when it runs, or does it leave them and attempt to update them?
next:
I started over, imported the aliases with the extra two error lines, just like last night, and was unable to replicate my original observed case (incomplete aliases). Unclear why it is different today. I shut down the VM overnight, which seems irrelevant but did happen.It seems there is definitely "something wrong" because the alias tables are either sometimes incomplete or empty, but now I'm confused also.
-
Yes I expect it to re-populate the tables based on the loaded ruleset.
It looks like there are at least two bugs still outstanding related to this. But as far as I know neither is a regression for 2.8.1/25.07.1.
@Patch you first saw this in 2.8.1? Is it possible it was happening in 2.7.2 and you just didn't notice?
-
@stephenw10 yes I first saw this in v2.81 and had not tripped it in v2.72
I then installed v2.72 in a VM using the current installer an explicit testing as per. https://forum.netgate.com/post/1229337 showed essentially the same behaviour.
The only real testing I had done after the error is triggered is to demonstrate creating a new trivial alias results in an alias table but it isn’t populated.
I avoided further testing as I had previously found repairing the system by further changing the alias definition was difficult. The system behaves as if something has crashed or locked up. My current experience is data entry errors are handled correctly by pfsense but the alias table filling error once triggered persist. Which initially miss lead me into blaming data entry error handling. Hence my very frequent restarts / configuration restore in testing
-
@stephenw10 said in Unexpected alias behaviour - two ranges:
Yes I expect it to re-populate the tables based on the loaded ruleset.
Yes, but the hair I'm splitting is whether the alias Apply is either 1) not updating the table as expected, or 2) not emptying the tables at the beginning of its run and thus presumably aborting very early in the process. Just thinking about the programming out loud, is all. Because if I manually empty them and they stay empty that implies the prior filter reload maybe didn't get to the point of emptying them.
I guess I didn't explain it well but it seems like:
I added "invalid" to alias_512 and applied, same.
...is possibly not a great test if I didn't "killall filterdns" and filter reload.Seems like one possibility is filterdns gets stuck and thus the tables aren't updated. Which may be what @Patch is talking about when mentioning lockups.
-
@SteveITS said in Unexpected alias behaviour - two ranges:
Onwards (read it all)...
Clearly I've been 'reading it all', Steve. Otherwise I wouldn't still be here. Does it concern you that I somehow keep picking the most relevant bits out of the noise to maintain my position here?
Your focus on the matter at-hand is showing with that comment (which I've of course taken the bait on and obliged you).
@SteveITS said in Unexpected alias behaviour - two ranges:
I had to "killall filterdns" and filter reload, and after that the tables populated correctly.
I had a feeling...
-
@stephenw10 said in Unexpected alias behaviour - two ranges:
It looks like there are at least two bugs still outstanding related to this.
Redmine links?
-
@SteveITS said in Unexpected alias behaviour - two ranges:
so, killing filterdns is suddenly required to get the tables to recreate at all.
Or you could just, like—not introduce user error and it probably wouldn't be necessary.
-
@tinfoilmatt said in Unexpected alias behaviour - two ranges:
Clearly I've been 'reading it all',
That wasn't directed at you, I just meant to read my whole post, there, since the behaviors changed.
@tinfoilmatt said in Unexpected alias behaviour - two ranges:
I had a feeling...
That wasn't the case last night, they did update on Apply.
@tinfoilmatt said in Unexpected alias behaviour - two ranges:
Or you could just, like—not introduce user error and it probably wouldn't be necessary.
What was the error you allege in today's post? AFAIK if I empty a table and filter reload, pfSense is supposed to populate the table.
-
@SteveITS said in Unexpected alias behaviour - two ranges:
I just meant to read my whole post
I would hope anybody participating here and on the entire forum—nay, the entire Internet—thoroughly reads and considers in earnest any communtication directed at them by a fellow human being.
But back to topic at-hand, anything you did today is preempted by the fact that you didn't start with...
[in Unexpected alias behaviour - two ranges:]
I created a VM with 2.8.1.
I used easyrule to allow access on WAN.
I bypassed the GUI setup wizard....like you did yesterday. (Some people refer to this methodology colloquially as 'blowing everything out and starting over.') In other words you didn't even consistently recreate your own test.
I could break any system with some formulation of
rmor system-specific equivalent. What does that tell anyone? -
@tinfoilmatt said in Unexpected alias behaviour - two ranges:
I would hope anybody participating here and on the entire forum—nay, the entire Internet—thoroughly reads and considers in earnest any communtication directed at them by a fellow human being.

-
@stephenw10 Hey, at least you're getting paid to placate this behavior. I'm only here to have fun!

-
@SteveITS
After more testing I suspect the root cause of the bug investigated in this thread is- An alias containing one or more FQDN is limited to a little over 512 entries (total across all such aliases), not the 5000 per alias limit suggested in the manual.
- if you go over that limit further alias table updates are blocked for all aliases
For the duplicate removal but not restored bug discovered again in this thread, and referenced in your thread, I can't see that being resolved any time soon. Learning how best to live with it is more sensible imo.
Easy bug trigger
Tested in a clean pfsense v2.8.1 install, enable WAN GUI access. Save that configuration as baseline. Done to ensure easy repeatability and reduce the validity of the shooting the messenger crap.
The easiest way of triggering the bug is entering the following 1024 consecutive element alias -> 473 element are shown in the corresponding alias table.

Random IP addresses & lower bound
To put a bound on the lower limit and confirm sequential IP addresses are irrelevant
Reload the baseline configuration and enter the a 513 element alias such as the following, -> 514 elements are shown in the corresponding alias table

Reload the baseline configuration and enter a 1025 element alias (using the method illustrated above) -> 473 element are shown in the corresponding alias table. Which is exactly the same as the sequential IP initial test case. Waiting longer, Filter reload and "killall filterdns" all make no difference.
Limit per alias or total record count
To illustrate the limit depends on the load implied by other alias
Restore the baseline configuration
Enter Alias like IP_set1 with 50 IP + FQDN x1 as shown below -> 52 records in Alias table

Enter Alias like IP_set2 with 50 IP + FQDN x1 using the same method (mine started 202.) -> 52 records shown in the alias table
Now again try and enter IP_set3 with 512 IP + FQDN x1 as shown below -> zero records. After several filter reloads 271 records. Waiting longer, Filter reload and "killall filterdns" all make no difference.
Which contrasts with the 514 records when no other aliases were entered.

Limit for "mixed mode" alias or also pure FQDN
To investigate if using explicit IP addresses vs FQDN addresses matter
Reload the baseline configuration and enter a 1024 FQDN, I copied the first 1024 entires form pfblocker. -> 569 records after wait about 1 hour and lots of filter reloads. After which- "killall filterdns" -> No matching processes were found
- create an Test_loaded containing a single FQDN -> empty alias table
Which strongly suggests "mixed mode" alias are no different to pure FQDN aliases, just testing takes longer.

-
Updated above post to investigate if using explicit IP addresses vs FQDN records matters
Updated title to more accurately describe root cause evident in hind sight -
@Patch said in Unexpected alias behaviour - two ranges / size limits with FQDN:
Easy bug trigger
[ . . . ]
The easiest way of triggering the bug is entering the following 1024 consecutive element alias -> 473 element are shown in the corresponding alias table.I can confirm this behavior.



-
I am holding here despite the...
Aliases become Tables when loaded into the active firewall ruleset. The contents displayed on this page reflect the current addresses inside tables used by the firewall....advisory.
Please let me know if there's something I can/should do next that would further assist the thread.
-
The 473 records are sequential from
123.123.120.0through123.123.121.216. -
Hmm, I don't think this is a numerical limit. It seems more like a function of timing to me. But either way I would bet the root issue is in filterdns.
-
@bbcan177 You have been dealing with far larger sets of IP addresses and FQDN in pfsense. How have you been working around
-
The limitation in cumulative number of entries in aliases containing a FQDN as described above
-
The possibly related consequence of using incremental alias update with duplicate removal but then using incremental IP removal without restoration of he removed duplicate as per https://redmine.pfsense.org/issues/13792 and possibly https://redmine.pfsense.org/issues/9296 and https://redmine.pfsense.org/issues/13793
The short term fix
-
is relatively simple, just document the limit, add it to redmine and say it's being worked on. Something I can easily work with.
-
is more difficult as that bug essential requires the user ensure all alias containing FQDN are always non intersecting (never trigger duplicate removal). Which severely limits the value of alias for me.
The longer term fix maybe more difficult as it maybe a program architecture limit rather than a coding bug.
-
The limit for "mixed mode" aliases may be resolvable by internally processing the constants (explicitly specified IP addresses) separately to the variable entries (FQDN) then combining the two "sub aliases". The constant portion only need to be evaluated at alias editing or program boot up. Optimisation to group longer runs of consecutive addresses into network would be possible.
-
Regression analysis I suspect would show the progressive slowing if alias update and eventual lock up investigated in this thread is a result of earlier work to try and avoid the limitations of implicit duplicate removal addressed in https://redmine.pfsense.org/issues/9296
-
A solution maybe to pre process so addition to filterdns are only made when the duplicate count for that address goes from 0->1 and similalry deletion from filterdns are only made when the duplicate count for that address goes from 1->0. Other duplicate count transitions 1<->2<->3 etc result in no filterdns calls.
-
Alternatively incremental deletes could be avoided completely by all updated being done by a full alias rebuild to a temporary variable then full replacement of the old with the new.
None of which sound trivial to me.
-