Unexpected alias behaviour - two ranges
-
@stephenw10 said in Unexpected alias behaviour - two ranges:
That's what I saw when I hit it temporarily.
Were/are these the relevant steps?
- Create an Alias (host type).
- Add a FQDN and two /24 networks one of which includes [one of] the FQDN IPv4 address.
- Save and apply.
- Look at the filter reload screen,
- When complete look at the created table for the Alias.
-
@tinfoilmatt There were steps above...https://forum.netgate.com/topic/199152/unexpected-alias-behaviour-two-ranges/26
As I understood it (if I followed) some additional key details were:
- all child_aliases to be put in the parent_alias needed 1 FQDN, to trigger this
- then restore the new configuration (or reboot?)
The restore of course reboots. As I understand the report, it is a latent problem until the reboot when the alias was no longer fully populated.
A general usage example (not tested here but used in my other thread), we have an alias that has aliases containing IPs of our clients as well as various dynamic DNS IPs. Obviously we don't want to set up the same rules for each so a nested alias makes sense.
-
@SteveITS This thread is lacking in coherent, reproducible steps which demonstrate anything. Not picking on you as you're not OP. But Stephen most recently reaffirms that, at one point, he was able to do—something. In this post he referred to it as "case 2".
That's what I'm wanting to try to recreate for myself.
-
@SteveITS said in Unexpected alias behaviour - two ranges:
I am curious, does it matter where the FQDN is, in your alias? Does it stop updating the alias after the FQDN, if it is listed first or last?
Starting from https://forum.netgate.com/post/1229337
In practice I have entered a FQDN then many actual IPv4 addresses. I have mostly used a fixed prefix such as 201 or 202 or 203 (using a different number for each IP_set alias consists almost) followed by random numbers (0-255). It is far easier to add sequential IPv4 addresses but I was unsure what optimisation pfsense does so avoided that.These IP_sets are then combined in Combined_IP alias (nested / hierarchical)
The bug is revealed on full alias rebuild. In testing I used a configuration restore to ensure repeatability and clear starting point. Restarting pfsense has trigger it in my active systems.
Using 3 IP_sets containing 50, 512, 50 IP addresses,
for me it happens every time within a 30 sec of pfsense starting up. Smaller IP_set sizes can fail less cleanly. Tested with a clean install pfsense v2.81 and v2.72.@SteveITS said in Unexpected alias behaviour - two ranges:
...should have 612 615, has 256. Which seems like a suspiciously specific number, tbh
I agree that's a suspicious number but if I use 2 IP_sets the number is larger and with other IP set sizes the Combined_IP varies slightly.
@stephenw10 said in Unexpected alias behaviour - two ranges:
A partially populate table is probably a better description here.
That's a reasonable term.
Looking at the Resolver logs the missing alias table entries appear to correspond to
said in Unexpected alias behaviour - two ranges:
IP_set3 table is empty however the log shows the actual 50 IP addresses are added but duplicates of "Adding Action: pf table: IP_set3 host:" but I think all 50 appear.
Similarly "Adding Action: pf table: IP_set2 host: " shows some duplicates. Not all actual IP addresses appear in the 2000 log entires. I was not able to readily tell if all 512 appear at least once in Adding Action: pf table: IP_set2 host:
-
@Patch said in Unexpected alias behaviour - two ranges:
These IP_sets are then combined in Combined_IP alias (nested / hierarchical)
-
If anybody can distill two posts above this one, I'm happy to test.
-
Seems like I've replicated it.
I created a VM with 2.8.1.
I used easyrule to allow access on WAN.
I bypassed the GUI setup wizard.I created 4 aliases. IPs were created in Excel, enter the first and drag down 255 cells, then copy/paste into the Import in pfSense to create the alias.
alias_50_1: Host(s) 10.1.1.1, 10.1.1.2, 10.1.1.3, 10.1.1.4, 10.1.1.5, 10.1.1.6, 10.1.1.7, 10.1.1.8, 10.1.1.9, 10.1.1.10… (through .50) alias_50_2: Host(s) 10.2.2.1, 10.2.2.2, 10.2.2.3, 10.2.2.4, 10.2.2.5, 10.2.2.6, 10.2.2.7, 10.2.2.8, 10.2.2.9, 10.2.2.10… (through .50) alias_512: Host(s) 10.0.0.1, 10.0.0.2, 10.0.0.3, 10.0.0.4, 10.0.0.5, 10.0.0.6, 10.0.0.7, 10.0.0.8, 10.0.0.9, 10.0.0.10… (through .255 and 10.10.0.1-.255)Each of the first 3 ended with a hostname, e.g. pfsense.org.
alias_all: Host(s) alias_512, alias_50_1, alias_50_2If I click Apply and look at Diag > Tables, alias_512 shows:
Date of last update of table is unknown. 172 records....and ends at 10.0.0.170 but has the hostname for alias_512:
forum.netgate.com. 3 IN A 208.123.73.77Matching that, alias_all has 278 records.
The two "50" aliases do include the FQDN IPs (the third was netgate.com). So they seem correct. The alias_512 is missing most of its list. Which does show correctly if I edit the alias.
I did not need to reboot.
I'll make a follow on post momentarily. I can send or upload the config if desired but it seems easy enough.
-
If you run filter_reload does it fully populate?
Mixed mode aliases have been a problem in the past. I've long recommended not mixing IPs and FQDNs but I had thought those issues were resolved. Looks like we have a regression.
-
The first time I tried this I had an error in my list for alias_512. I accidentally scrolled two extra rows, leaving this in the import copy/paste:
10.10.0.256
10.10.0.257Obviously an error. Filterdns (DNS Resolver log) threw an error trying to resolve the "hostnames" since they are not IPs. The result was still the problem above, however, the results/numbers were slightly different and not off by two.
Deleting those and clicking Apply again reproduced the issue, hence my above post.
I'm not sure what that means but it seems odd that removing the two invalid IPs resulted in 1) several more (more than 2) additional IPs made it into the alias_512 table, and 2) the FQDN forum.netgate.com at the bottom of that list was resolved and its IPs also in that table. Even though 10.0.0.171-.255 and 10.10.0.1-.255 are not. Possibly an out of memory error and the "hostnames" take a bit more RAM than the last few IPs? I did try adding them back in again and the tables did not shrink as I expected.
Note also that as @Patch reported, the log shows the missing IPs being added:
Nov 7 01:34:37 filterdns 55828 Adding Action: pf table: alias_all host: 10.10.0.253 Nov 7 01:34:37 filterdns 55828 Adding Action: pf table: alias_all host: 10.10.0.254 Nov 7 01:34:37 filterdns 55828 Adding Action: pf table: alias_all host: 10.10.0.255 -
@stephenw10 said in Unexpected alias behaviour - two ranges:
If you run filter_reload does it fully populate?
no.
Initializing Creating aliases Creating gateway group item... Generating Limiter rules Generating NAT rules Creating 1:1 rules... Creating outbound NAT rules Creating automatic outbound rules Setting up TFTP helper Generating filter rules Creating default rules Pre-caching Default allow LAN to any rule... Creating filter rule Default allow LAN to any rule ... Creating filter rules Default allow LAN to any rule ... Setting up pass/block rules Setting up pass/block rules Default allow LAN to any rule Creating rule Default allow LAN to any rule Pre-caching Default allow LAN IPv6 to any rule... Creating filter rule Default allow LAN IPv6 to any rule ... Creating filter rules Default allow LAN IPv6 to any rule ... Setting up pass/block rules Setting up pass/block rules Default allow LAN IPv6 to any rule Creating rule Default allow LAN IPv6 to any rule Pre-caching Passed via EasyRule... Creating filter rule Passed via EasyRule ... Creating filter rules Passed via EasyRule ... Setting up pass/block rules Setting up pass/block rules Passed via EasyRule Creating rule Passed via EasyRule Creating IPsec rules... Generating ALTQ queues Loading filter rules Setting up logging information Setting up SCRUB information Processing down interface states Running plugins Done -
@stephenw10 said in Unexpected alias behaviour - two ranges:
not mixing IPs and FQDNs
If I remove forum.netgate.com from alias_512 then I get 618 records so I think it fully populates.
-
OK, weirder, I set the tunable for https://docs.netgate.com/pfsense/en/latest/troubleshooting/filterdns-thread-errors.html to 4096, just to see. I applied it, and did a Filter Reload.
alias_512 now contains only 12 records, scattered through the list plus the last 7+FQDN:
10.0.0.138 10.10.0.1 10.10.0.58 10.10.0.249 10.10.0.250 10.10.0.251 10.10.0.252 10.10.0.253 10.10.0.254 10.10.0.255 208.123.73.77 2610:160:11:11::6However alias_all still contains 618 entries. Which makes me think it either was created successfully, or not updated at all.
-
- In my testing when editing alias, the old table entries are not deleted. But I agree sometimes not fully updated. This is probably why the bug results in latent failure, and the fault is revelled on pfsense restart when all aliases are fully built.
- I agree it can be reproduced without restarting, it's just that I found it harder to make sense of the data.
- I suspect deleting table data then filter reload would be equivalent.
- Configuration save and reload is a belts & braces approach used as I didn't know what part of pfsense was locking up initially.
- I get the same behaviour in pfsense 2.7.2 also. I looked as I wondered about a regression or new bug but could not confirm that.
@SteveITS said in Unexpected alias behaviour - two ranges:
the log shows the missing IPs being added
I think included IP's added once but missing IP are added twice.
To see this I
- set the Resolver log to show 2000 entries
- copy the log
- paste text into a spreadsheet (such as LibreOffice calc) using space as a delimiter
- sort by the description column (D)
- scroll to Alias of interest & look for duplicates
-
S SteveITS referenced this topic
-
https://redmine.pfsense.org/issues/9296 sounds similar, though the posters say killing filterdns and reloading filters fixes it. I did not test that but could, later today. And/or open a new redmine if desired. Note the last post says it was a problem in 2.7.2.
My issue I linked above actually sounds more like https://redmine.pfsense.org/issues/14734 (when the FQDN changes IPs the separately listed/duplicate IP is incorrectly removed from the table).
-
@SteveITS said in Unexpected alias behaviour - two ranges:
The first time I tried this I had an error in my list for alias_512. I accidentally scrolled two extra rows, leaving this in the import copy/paste:
10.10.0.256
10.10.0.257This 'bunkifies' your entire test—for all three aliases ("alias_50_1", "alias_50_2", an "alias_512") and the one nested alias ("alias_all").
Can you re-run this without any user error and see if you observe any 'consistency' of results? My guess is there will be enough difference (meaning, lack of consistency) in 'results' so as to mean nothing.
-
@stephenw10 said in Unexpected alias behaviour - two ranges:
Mixed mode aliases
I like that. Has a nicer ring than 'kludge.'
-
@tinfoilmatt said in Unexpected alias behaviour - two ranges:
Can you re-run this without any user error
I guess that's fair though it would imply deleting the invalid entries still causes a problem? Or at least that doing so doesn't fix the problem. Onwards (read it all)...
If I delete all four aliases, apply, and re-import them, I do not see the error case today even after a filter reload or reboot. All four aliases are correct (618 total IPs).
I added "invalid" to alias_512 and applied, same.
I emptied all four tables, ran a filter reload, and all four remained empty.
I removed "invalid" and ran a filter reload, all tables remained empty.
I had to "killall filterdns" and filter reload, and after that the tables populated correctly.
next:
empty all tables
add "invalid" to alias_512, and apply
all tables remain empty
killall filterdns, and reload filter
all tables are populated correctly...so, killing filterdns is suddenly required to get the tables to recreate at all. @stephenw10, does a filter reload actively empty the tables when it runs, or does it leave them and attempt to update them?
next:
I started over, imported the aliases with the extra two error lines, just like last night, and was unable to replicate my original observed case (incomplete aliases). Unclear why it is different today. I shut down the VM overnight, which seems irrelevant but did happen.It seems there is definitely "something wrong" because the alias tables are either sometimes incomplete or empty, but now I'm confused also.
-
Yes I expect it to re-populate the tables based on the loaded ruleset.
It looks like there are at least two bugs still outstanding related to this. But as far as I know neither is a regression for 2.8.1/25.07.1.
@Patch you first saw this in 2.8.1? Is it possible it was happening in 2.7.2 and you just didn't notice?
-
@stephenw10 yes I first saw this in v2.81 and had not tripped it in v2.72
I then installed v2.72 in a VM using the current installer an explicit testing as per. https://forum.netgate.com/post/1229337 showed essentially the same behaviour.
The only real testing I had done after the error is triggered is to demonstrate creating a new trivial alias results in an alias table but it isn’t populated.
I avoided further testing as I had previously found repairing the system by further changing the alias definition was difficult. The system behaves as if something has crashed or locked up. My current experience is data entry errors are handled correctly by pfsense but the alias table filling error once triggered persist. Which initially miss lead me into blaming data entry error handling. Hence my very frequent restarts / configuration restore in testing
-
@stephenw10 said in Unexpected alias behaviour - two ranges:
Yes I expect it to re-populate the tables based on the loaded ruleset.
Yes, but the hair I'm splitting is whether the alias Apply is either 1) not updating the table as expected, or 2) not emptying the tables at the beginning of its run and thus presumably aborting very early in the process. Just thinking about the programming out loud, is all. Because if I manually empty them and they stay empty that implies the prior filter reload maybe didn't get to the point of emptying them.
I guess I didn't explain it well but it seems like:
I added "invalid" to alias_512 and applied, same.
...is possibly not a great test if I didn't "killall filterdns" and filter reload.Seems like one possibility is filterdns gets stuck and thus the tables aren't updated. Which may be what @Patch is talking about when mentioning lockups.