Upgrade 2.4.0: firewall rule with alias and FQDN not working anymore
-
Can you test a FQDN you never used before?
Only to see if it's a caching problem. -
You mean just to ping?
I just tried to Diagnostics -> Ping 'hello.fqdn.private' and just 'hello' and both failed as you'd expect.
UPDATED: Also tried this from the console itself with the same error (again as you'd expect). I rebooted pfSense earlier today and also about 15 minutes ago (in case the aliases 'spring' to life after a reboot - I can but hope).
-
What says Status/System Logs/System/DNS Resolver?
Before update it was working:
Sep 22 22:47:42 filterdns adding entry 10.19.4.250 to table smtp_server on host smtp.domain.local Sep 22 22:42:48 filterdns failed to resolve host smtp.domain.local will retry later again. Sep 22 22:18:56 dnsmasq 43335 using nameserver 8.8.4.4#53 Sep 22 22:18:56 dnsmasq 43335 using nameserver 8.8.8.8#53 Sep 22 22:18:56 dnsmasq 43335 ignoring nameserver 127.0.0.1 - local interface
After update it was working to:
Oct 12 20:41:53 filterdns adding entry 10.19.4.250 to pf table smtp_server for host smtp.domain.local Oct 12 20:41:53 filterdns clearing entry 10.19.4.250 from pf table smtp_server on host smtp.domain.local Oct 12 20:41:46 filterdns adding entry 10.19.4.250 to pf table smtp_server for host smtp.domain.local Oct 12 20:41:46 filterdns clearing entry 10.19.4.250 from pf table smtp_server on host smtp.domain.local Oct 12 20:41:45 filterdns adding entry 10.19.4.250 to pf table smtp_server for host smtp.domain.local Oct 12 20:41:45 filterdns failed to resolve host smtp.domain.local will retry later again. Oct 12 20:26:06 dnsmasq 860 using nameserver 8.8.4.4#53 Oct 12 20:26:06 dnsmasq 860 using nameserver 8.8.8.8#53 Oct 12 20:26:06 dnsmasq 860 ignoring nameserver 127.0.0.1 - local interface
Suddently on Saturday it didn't update this entry any more:
Oct 17 13:30:37 filterdns adding entry 216.58.210.3 to pf table Host for host www.google.de Oct 17 13:30:37 filterdns adding entry 10.19.4.250 to pf table Host for host smtp.domain.local Oct 14 06:45:01 filterdns clearing entry 10.19.4.250 from pf table smtp_server on host smtp.domain.local Oct 14 06:30:01 filterdns adding entry 10.19.4.250 to pf table smtp_server for host smtp.domain.local Oct 14 06:30:01 filterdns clearing entry 10.19.4.250 from pf table smtp_server on host smtp.domain.local Oct 14 06:15:01 filterdns adding entry 10.19.4.250 to pf table smtp_server for host smtp.domain.local Oct 14 06:15:01 filterdns clearing entry 10.19.4.250 from pf table smtp_server on host smtp.domain.local Oct 14 06:00:01 filterdns adding entry 10.19.4.250 to pf table smtp_server for host smtp.domain.local
It only worked today as I added google too.
Yesterday on OCT 16 I tried successfully to ping at smtp.domain.local. So why didn't he update? Did the job crash?I think filterdns has a problem. I have 2 running and since the second one runs I have a fresh alias table:
ps aux | grep filterdns root 19719 0.0 0.3 21492 3184 - Is 13:30 0:00.03 /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 300 -c /var/etc/filterdns.conf -d 1 root 58949 0.0 0.3 12784 2616 - Is Thu20 0:00.35 /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 300 -c /var/etc/filterdns.conf -d 1 root 44060 0.0 0.2 14728 2444 0 S+ 15:03 0:00.00 grep filterdns
-
Sep 22 22:47:42 filterdns adding entry 10.19.4.250 to table smtp_server on host smtp.domain.local Sep 22 22:42:48 filterdns failed to resolve host smtp.domain.local will retry later again. Sep 22 22:18:56 dnsmasq 43335 using nameserver 8.8.4.4#53 Sep 22 22:18:56 dnsmasq 43335 using nameserver 8.8.8.8#53 Sep 22 22:18:56 dnsmasq 43335 ignoring nameserver 127.0.0.1 - local interface
Wait …
You're asking 8.8.8.8 - 8.8.4.4 (Also known as Google) info about "smtp.domain.local" ?
Well, yes, that will fail ;DIf "smtp.domain.local" your has a static IP, add it to Services => DNS Forwarder => Host Overrides and you'll be fine.
-
yeah…
Why is it ignoring 127.0.0.1?
"Sep 22 22:18:56 dnsmasq 43335 ignoring nameserver 127.0.0.1 - local interface"
edit: This is forwarder, going to have to forward somewhere ;) I have not used the forwarder since they enabled unbound.. Well really before that when unbound was just a package. A resolver is just so much better than a forwarder. Not sure why anyone still uses it to be honest ;)
In a nutshell if you have an alias that is not working, you need to check the table. If entries not in the table then you need to figure out why the resolution of whatever FQDN is not working is not in the table. Pfsense needs to be able to resolve the FQDN you put in there for it to be able to put in the table..
So normally such problems just come down to name resolution troubleshooting.. Which doesn't look like any was done before bug report filed ;)
-
Sep 22 22:18:56 dnsmasq 43335 using nameserver 8.8.4.4#53
Sep 22 22:18:56 dnsmasq 43335 using nameserver 8.8.8.8#53
Sep 22 22:18:56 dnsmasq 43335 ignoring nameserver 127.0.0.1 - local interfaceWhat you see here is dnsmasq and not filterdns.
Dnsmasq works on localhost so it could not add itself. This would give a loop.If filterdns is running it makes it good.
Oct 13 10:29:32 filterdns adding entry 10.19.4.250 to pf table smtp_server for host smtp.domain.local Oct 13 10:29:32 filterdns clearing entry 10.19.4.250 from pf table smtp_server on host smtp.domain.local Oct 13 10:15:01 filterdns adding entry 10.19.4.250 to pf table smtp_server for host smtp.domain.local Oct 13 10:15:01 filterdns clearing entry 10.19.4.250 from pf table smtp_server on host smtp.domain.local Oct 13 10:00:01 filterdns adding entry 10.19.4.250 to pf table smtp_server for host smtp.domain.local Oct 13 10:00:01 filterdns clearing entry 10.19.4.250 from pf table smtp_server on host smtp.domain.local
But since update it talks too much:
Oct 10 01:47:52 filterdns failed to resolve host smtp.domain.local will retry later again. Sep 22 22:47:42 filterdns adding entry 10.19.4.250 to table smtp_server on host smtp.domain.local Sep 22 22:42:48 filterdns failed to resolve host smtp.domain.local will retry later again. Sep 22 22:18:56 dnsmasq 43335 using nameserver 8.8.4.4#53 Sep 22 22:18:56 dnsmasq 43335 using nameserver 8.8.8.8#53 Sep 22 22:18:56 dnsmasq 43335 ignoring nameserver 127.0.0.1 - local interface
-
What says Status/System Logs/System/DNS Resolver?
DNS Resolver only has the 'unbound' process. There is nothing of filterdns or dnsmasq in there. There is also nothing in System|General either for either filterdns or dnsmasq.
Are you not using DNS Forwarder service rather than DNS Resolver? I'm assuming there are different 'process' entries.
I'm happy to check anything else out to try and resolve this.
-
edit: This is forwarder, going to have to forward somewhere ;) I have not used the forwarder since they enabled unbound.. Well really before that when unbound was just a package. A resolver is just so much better than a forwarder. Not sure why anyone still uses it to be honest ;)
I tried to migrate to unbound last year but I had some problems: https://redmine.pfsense.org/issues/6065
Because I have more than 40 Overrides I don't like to try it again on this pfsense.
And there are still some unwanted effects with unbound: https://redmine.pfsense.org/issues/7884 -
In a nutshell if you have an alias that is not working, you need to check the table. If entries not in the table then you need to figure out why the resolution of whatever FQDN is not working is not in the table. Pfsense needs to be able to resolve the FQDN you put in there for it to be able to put in the table..
So normally such problems just come down to name resolution troubleshooting.. Which doesn't look like any was done before bug report filed ;)
So what do you suggest beyond what has been done (by me)?
-
I have messages from filterdns in there. It's not dnsmasq and not unbound.
Even with unbound on an other pfsense I get this:
Oct 13 01:59:09 filterdns adding entry 79.1.2.3 to ipfw table for host dummy.dyndns.org Oct 13 00:59:06 filterdns failed to resolve host dummy.dyndns.org will retry later again. Oct 12 22:24:47 filterdns adding entry 87.1.2.3 to ipfw table for dummy.dyndns.org Oct 12 22:24:45 unbound 74165:0 info: start of service (unbound 1.6.6). Oct 12 22:24:45 unbound 74165:0 notice: init module 0: iterator
-
I still had these two filterdns running.
I killed both. For the older one it was enough to send kill. The second one needed kill -9 to stop.I removed the test entry with a FQDN inside and pfsense started a new filterdns.
Now it's not spamming any more. Only on changes (add/delete entries or changing IPs) I see filterdns entries in log. -
I still had these two filterdns running.
I killed both. For the older one it was enough to send kill. The second one needed kill -9 to stop.I removed the test entry with a FQDN inside and pfsense started a new filterdns.
Now it's not spamming any more. Only on changes (add/delete entries or changing IPs) I see filterdns entries in log.I added a new alias and a new FQDN (www.barrymanilow.com) and on query it's table (Diagnostics -> Tables) it has no entries. I get no filterdns entries in System|General or System|DNS Resolver logs.
filterdns does exist and is running (Diagnostics -> Command -> ps -A | grep filterdns).
I do get these errors in the System|General log file (whcih could have been therefore prior to the upgrade and they are maybe a red herring):
Oct 17 19:29:39 dhcpleases kqueue error: unkown
Oct 17 19:29:38 dhcpleases Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) does not exist, No such process.
Oct 17 19:29:38 dhcpleases /etc/hosts changed size from original!The /var/run/unbound.pid does exist.
I also did a 'cat /etc/hosts' and the nas.fqdn.private entry is in there. I think we can discount the 'if pfsense cannot resolve it it won't be in a table' issue as pfSense can not only resolve it, it's put it into it's Hosts file.
So the issue is that Aliases have no table entries.
And I'll say again what I said earlier the upgrade to 2.4 broke this.
-
So I'll update the post with the fix for this.
The first DNS server listed in System -> General Setup was dead. It was working fine as there were anotehr 3 DNS servers in there. As soon as I replaced this one server with a working one I started seeing 'filterdns' entries in the System|DNS Resolver log. I checked the rules and they have started working now as well.
What I don't understand is:
1. The majority of the aliases are for internal IP's and therefore don't need external DNS resolution;
2. The majority of the aliases are for DHCP leases and are therefore registered by the DHCP service and appear in the pfSense Hosts file so again don't need external resolution;
3. If you have 4 listed DNS servers and one breaks then why should this stop aliases working;
4. What has changed that this issue did not appear before the upgrade to v2.4;If I have a working system and I upgrade it and parts of it stops working then that's a problem. It's a bug. A bug in the upgrade. A bug in the way something works. But it's a bug. Something that should work doesn't. That's clear from this.
-
If I understand it right?
You have local DNS entries which appear in /etc/hosts?
And now the first of the external DNS servers is not responding and local IPs (from /etc/hosts) are not resolved in alias table?Normally nsswitch.conf looks like:
hosts: files dnsWhat shows yours?
-
If I understand it right?
You have local DNS entries which appear in /etc/hosts?
Yes. All of them were static DHCP leases. All of them resolved using Diagnostics -> Ping and Disagnostics -> DNS Lookup in pfSense.
And now the first of the external DNS servers is not responding and local IPs (from /etc/hosts) are not resolved in alias table?
They don't appear in the aliases tables. And filterdns entries were missing from the System|DNS Resolver logs.
Normally nsswitch.conf looks like:
hosts: files dnsWhat shows yours?
It is:
hosts: files dns
Exactly the same.
-
I think you should open a bug in redmine.
-
I think you should open a bug in redmine.
I would do normally. But there is already a bug open for this. The way it was also completely dismissed without waiting for further information and pushed back to the forum means I'm not going to waste my time going through the hoops to do it. I appreciate that there was not a lot of information given on the issue raised but the way it was handled was poor. Pre-empting an issue as not a bug 'because we don't see it here' is a naive viewpoint and a does not encourage people to feedback on their project.
But I do appreciate your help in this ggzengel. Between the pair of us it lead me to find what I did. It's been much appreciated.
-
I restarted my pfsense and got only one filterdns and it's working.
Now I will have a look how long it will be stable. -
Hi All,
I know this is an old topic, but I too, have noticed this issue occurring since an upgrade to 2.4.2. This definitely wasn't an issue previously, and very few config changes have been made since the upgrade.
I don't fully understand the process used to build these FQDN aliases, but I'll provide as much info as possible, in the hope it helps narrow down the root cause.
I've created a test Alias, called Host_Test, containing the FQDN 'www.test.com'.
-
Viewing the table entry for this alias shows an empty table.
-
DNS servers for the firewall are set to 8.8.8.8 and 8.8.4.4. DNS forwarders or resolver are not in use.
-
DNS resolution for this hostname is working fine for both DNS servers under status -> DNS Lookup.
-
Runninng 'ps -A | grep filterdns' shows there is a process running called filterdns.
-
If I view the log under System -> DNS Resolver, I can see that on the date of the upgrade (I assume on first boot after) there are entries such as the below, for all almost all FQDN aliases configured on the firewall. There have since been no events logged in this log.
filterdns failed to resolve host s186.fmp12-hosting.co.uk will retry later again.
This firewall has an HA partner, which doesn't seem to be experiencing the problem. Based on the total lack of logs since the primary firewall's initial boot, I'm wondering if the root cause is the process hanging (I assume 'filterdns' is the relevant process). Is it possible to safely kill and restart this process, or are there other considerations when doing this?
-
-
Quick followup. It looks like the process was hung. It's currently working after running "killall -9 filterdns" then saving and applying an Alias to restart the process.
What's potentially concerning is how soon after bootup this process seems to have stopped responding. Not sure if this is a one off for me, or something peculiar that's happening since the upgrade. I'll update this post if I notice the issue reoccur, especially after the next reboot.