Unexpected alias behaviour - two ranges
-
@Patch said in Unexpected alias behaviour - two ranges:
Using a single space prevents creation of the blank host. But doing so does not fix the problem with creating the desired Alias.
Yes it does. The rest is a super longwinded way of saying, 'my bad.'
-
@Patch said in Unexpected alias behaviour - two ranges:
5 minute alias reload appears not to have the resources / time to ever complete some aliases
Are you talking about FQDNs or the /24 subnets here? I thought the 5 minute timer was for resolving FQDNs via DNS. We have a scenario where some either stop resolving or are maybe never added to the table...hard to tell since 99% of the time they're not used. However per your description, some may overlap (laptop goes to an allowed public IP) so I'm wondering if one fails it is removing "both" IPs/entries?
-
Hmm OK I can replicate case 2 here. Digging....
Do you see the full alias set shown in the Resolver logs when you add it?
For me I see that and it does load all 512 entries after a delay.
-
@stephenw10
I was testing last night and I think I have found the case which actually started my searching.- create an alias type network containing two /24 network and a couple of. /32 I have been using FQDN here which I think triggers alias table creation.
- create an alias type host containing two expanded /24 network and a couple of single hosts
- create an alias containing the above two alias.
- I also created a firewall rule using the last alias but don’t think that’s essential
- for test repeatability I have been clearing the alias tables, saving the configuration, then restoring the configuration
For me alias table generation locks up completely at about 300 entries. I think it also blocks other alias calculations. Tested in v2.8.1 and v2.7.2
Yet to test if using 512 random rather than sequential IPv4 addresses prevents the lock upPs
I have not looked for or at the Resolver logs. I will look when I’m able to test further -
@Patch said in Unexpected alias behaviour - two ranges:
For me alias table generation locks up completely at about 300 entries.
That's until you run filter-reload?
And it definitely still does it without any FQDNs present? Because otherwise it looks like it hangs filterdns but that can't be the case with FQDNs.
-
@stephenw10 said in Unexpected alias behaviour - two ranges:
That's until you run filter-reload?
No.
Makes no difference to me. Similarly leaving it running for 24 hours makes no difference.After some experimentation I can lock up the system immediately if I
Create the following alias (I have been using random IPv4 addresses but only minor differences occur if sequential IP addresses are used. Creating the Aliases via the bulk import option also makes no difference.)- IP_set1 : host type, 50 IPv4 hosts (/32) and at least 1 FQDN
- IP_set2 : host type, 512 IPv4 hosts (/32) and at least 1 FQDN
- IP_set3 : host type, 50 IPv4 hosts (/32) and at least 1 FQDN
- Combined_IP : Host type consisting of the above 3 aliases (IP_set1 IP_set2 IP_set3)
Then
- Create a firewall rule which uses the alias Combined_IP
- Diagnostic -> Tables -> select each of the above aliases and "Empty table"
- Save the configuration
To test, Restore the above configuration. My results
- Diagnostic -> tables -> records: Combined_IP = 256, IP_set1=50, IP_set2=206, IP_set3=0
- Waiting longer makes no difference, Filter reload makes no difference.
- Create a new Alias with hosts forum.netgate.com & redmine.pfsense.org -> empty table only generated
Testing with pfsense v2.7.2 results in similar results
- Combined_IP = 256, IP_set1=50, IP_set2=156, IP_set3=50
It appears pfsense alias capacity is way less than 5000 entries if an alias contains other aliases.
Not sure if this helps localise the issue. The similar Combine_IP size across software versions is interesting but is higher if only two aliasses are combined. Processor load remains trivial.@SteveITS said in Unexpected alias behaviour - two ranges:
some may overlap (laptop goes to an allowed public IP) so I'm wondering if one fails it is removing "both" IPs/entries?
I'm yet to test this as I have been first focussing on why wide spread alias problems have been occurring.It is on my list of things to do as I have a white list alias (containing my home IP and laptops current IP). Losing home access when the laptop leaves home is not desirable but happened recently. I was thinking of using a host over ride to try and simulate this. But the fault could have been caused by something unrelated.
Edit
Added the requirement for each IP_set to include at least 1 FQDN -
@Patch If you run "killall filterdns" and Status>Filter Reload do the tables populate?
-
@SteveITS is killall filterdns run from the command line or GUI menu option?
You do realise the IP_set alias consists almost exclusively of actual IPv4 addresses. I used a spreadsheet random number generator to construct addresses in the format
222.<random 1-255>. .<random 1-255> .<random 1-255>
The leading number changed for different aliases
However I don’t think the actual IP addresses make any difference.The issue being the failure varies with number of hosts and number of aliases. The aliases now contain less than 2 FQDN in total now.
But will try later today
-
@Patch The command should run both places. It just ends the processes. I have run it in the GUI.
Yeah I'm aware of the difference, I'm just trying to connect dots. Yours may be a totally different issue than mine, but it started to sound similar.
-
Yes you should be able to run that in either place. Though I would run it on the real command line if possible in case it does something unexpected.
-
@SteveITS said in Unexpected alias behaviour - two ranges:
If you run "killall filterdns" and Status>Filter Reload do the tables populate?
No but I guess this is not the expected response

btw @stephenw10 what happens when you try to replicate the behaviour?
While I assume it makes no difference, I'm using a Proxmox VM with 2 GB ram (GUI shows 18% memory usage), a Host type processor (i5-1235U with 2 cores), Hard disk: 8GB SSD, Bios OVMS (UEFI), Machine q35.
-
@SteveITS @stephenw10
Oops.
My test description was wrong.
Each IP_set alias needs at least one FQDN for the fault to be shown.- Adding the FQDN results in the table for each IP_set alias being created / viewable
- Removing all FQDN results in the Combined_IP being rapidly calculated.
Above post edited to include this requirement https://forum.netgate.com/post/1229337
-
@Patch filterdns processes are left running to monitor for updates in hostnames for Aliases/IPsec/etc, one thread per hostname. So, maybe unrelated to my observed problem.
But I’d expect some if you had FQDNs to resolve…?
-
@stephenw10 said in Unexpected alias behaviour - two ranges:
Do you see the full alias set shown in the Resolver logs when you add it?
More than showing in the alias tables
-
I can't be sure all entries are shown as display is limited to 2000 entries
-
IP_set3 table is empty however the log shows the actual 50 IP addresses are added but duplicates of "Adding Action: pf table: IP_set3 host:" but I think all 50 appear.
-
Similarly "Adding Action: pf table: IP_set2 host: " shows some duplicates. Not all actual IP addresses appear in the 2000 log entires. I was not able to readily tell if all 512 appear at least once in Adding Action: pf table: IP_set2 host:
As I have not looked at these logs in the past, I'm not sure what is normal
-
-
I think I agree at this point in some of the most incoherent SQA-masquerading-as-troubleshooting I've ever witnessed that—
It's true. The ability to add IP addresses and/or IP ranges to "Host" type aliases should be removed completely (and vice versa) via validation. That this makes no sense whatsoever on its face notwithstanding, it clearly has more than mere potential to lead to all of the above confusion.
-
Mmm, you could be right. But that is going to hurt some users. And save some others. Potentially.
-
For those who have missed what is actually trying to be addressed in this thread
@Patch said in Unexpected alias behaviour - two ranges:
I discovered this behaviour when setting up a port forward for a PBX. Unfortunately the behaviour was not immediately obvious.
@Patch said in Unexpected alias behaviour - two ranges:
fault detection I suspect was using failure to include specified entries in an alias -> hybrid NAT rule failed -> after firewall restart failure to register of 1 of 4 VoIP suppliers
Important features of the bug
-
the fault results in failure of pfsense packet filtering not just a display error in debugging tools
-
the error is only revealed when pfsense restarting not after editing and applying an alias change. So not a nice bug to have in a live system.
-
how it presents in my live system is too complex for anyone else to reproduce or Netgate to fix. As a result substitute test end points and a simplified bug reproduction have been searched for (a process which risk masking the bug root cause or miss appropriating blame).
About the testing
-
Lock up of alias table generation has been used as a substitute marker of packet filter failure of rules which use these aliases.
-
Increasing the entries in each set or increasing the number of sets combined changes the fault behaviour. At least 1 FQDN is required in each IP_set to trigger the error.
-
I have not observed an obvious bug effect in having many FQDN in a set but have not directly tested this. No clear difference between ISC DHCP or Kea DHCP. Doubling the VM ram does not make any difference. Entering the alias via import, manually 1 entry at a time, many host in 1 entry, or network expansion all make minimal difference. A double space between items entered in a host type alias is expanded to a blank entry (which can be manually deleted) but otherwise makes no difference I could detect.
-
Diagnostics -> Tables are useful when the system is working well. It's less clear during fault conditions or as a marker for the bug being investigated in this thread. Double entry in the DNS resolver logs may corresponds to entries missing from these tables. After the primary alias tables stop updating, other aliases table entries is also blocked.
-
If the alias tables are just a diagnostic aid, which are not used in actual filter creation, so as a result at times not representative. Then it would be useful to support more direct alias content display perhaps, through keactrl or directly displaying the database content used by Kea
To state the obvious
-
I don't like having a production system which stops working for reasons I don't understand so can not reliably avoid. I can configure my systems to keep hierarchical aliases small (combine less than 4 sets with <50 entries) and revert to a higher ram VM allocation, so can avoid triggering this bug in my live systems. Netgate and other users may be less happy to discover it themselves in the future, but I can't speak for them, and my debugging time to support them is finite.
-
The bug can be triggered by sequential or random sets of IP addresses. So blocking easy creation of sequential IP addresses is irrelevant to this bug.
-
Summarising many hours of testing results in information dense posts. While these post are not easy to read, doing the underlying testing is more painful. Useful testing results new understanding of system behaviour, reflected in thread history.
@stephenw10 said in Unexpected alias behaviour - two ranges:
that is going to hurt some users. And save some others. Potentially.
We are off topic but blocking entering IP ranges in an alias is a bad idea.
- It is sensible to preserve range definition where that optimises resultant filter performance and configuration clarity. As such when a host line is entered which contains a range best left as a range, pfsense could:
- Change the alias type to Network or
- Leave the alias type as Host but also retain that line(s) subnet prefix length (it appears when a host type alias is displayed pfsense initially displays all host with a subnet prefix length then hides it).
-