Filterdns stops working

dudi

Somebody with a solution?

Birke

not really a solution, only a workaround:
run "killall -9 filterdns" in the shell and then "/usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 300 -c /var/etc/filterdns.conf -d 1" (or save and apply an existing alias). you could also put them in a cron as already mentioned.

since the error happened to me again, i checked the resolver.log again to see if there is any information what the reason could be. even with debug level 3 there is no clue.
the last working entries are some normal adding and clearing entries and some information about some static entries.
the next time filterdns should run, it starts with a "Received signal Hangup(1)" entry and only one entry gets deleted and the static entries are listed.
after that every time filterdns should run (automatically or after a manual save&apply of an alias), only the hangup-message is in log.

ps: seems this thread is about the same problem. maybe both threads should be merged.

Valeriy

I can confirm, same issue is happening with me. It seems to be it, it started happening after pfSense upgrade in Sept-Nov 2017. I am using development snapshot from 10th of January, issue still persists.

am using Policy Based Routing (PBR) and heavily rely on a lot of aliases: it took time to realize that tables of IP addresses (referring hostname based aliases) are not updated.

So temporary workaround so far is same, what you have suggested:

killall -9 filterdns
rm /var/pid/filterdns.pid (not sure if it correct path, just writing from my head)

and then start filterdns process again (or refresh aliases).

In fact, starting filterdns (with proper arguments) sometimes did not help, I had to kill the process again and refresh (edit-save-apply) one of aliases lists.

jimp

Do we have any reliable and predictable way to trigger this issue? Any specific alias contents that cause it? Is there a set interval at which the problem occurs? Is there some other event that causes it to fail?

Gertjan

I'm not having any issue with filterdns, but I'm using it, it resolves a couple of (very static) URL's to IPv4 and IPv6.

When you guys kill filterdns, restart it like this :

/usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 240 -c /var/etc/filterdns.conf -d 7

or even

/usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 240 -c /var/etc/filterdns.conf -d 7 -f

"-d 7" will produce massive logging is the DNS log. Something might show up.
"-f" will keep it in the foreground, so keep your console access open for the time being. Ctrl-C will end it.

Try also chancing the interval "-i 240" (4 minutes) to "i -600" (every 10 minutes) to give it more time.

Btw : "filterdns" is a pretty simple FreeBSD package (program), you'll find it here : https://github.com/pfsense/FreeBSD-ports/blob/devel/net/filterdns/files/filterdns.c
It doesn't do much, and it depends on one important thing : DNS should be working.
Also, it injects modifications into pf tables.
Knowing that all spawned threads (as many as there are tables) are relaunched every "-i xxx" seconds at the same time, is it possible that "pf " gets "overrun" ?

Birke

@jimp:

Do we have any reliable and predictable way to trigger this issue? Any specific alias contents that cause it? Is there a set interval at which the problem occurs? Is there some other event that causes it to fail?

Nope, no trigger (at least i haven't found one).
Nope, i changed my aliases earlier in this thread.
Nope, no specific interval on my system. Sometimes its some days, sometimes weeks.
Nope, i checked the other logfiles. I found no other action that was between last working and first not working filterdns.

I will increase the interval and debuglevel as Gertjan suggests. Maybe then we find some clues.

Birke

seems the entries in the alias are not the reason.
i put the exact fdqn, ips and networks into aliases with the same name on my pfsense at home (more than one month ago).
i had no problems at home but some here at work in that time.

nrasmus

Just chiming in that we're experiencing this issue too. Tried the kill and restart in verbose mode, but the logs just contained the "clearing entry x from pf table z on host y" messages, and then nothing further. I've put a cron job in place to do this nightly, as it's effecting our ability for our web apps to send mail via Amazon SES.

For what it's worth, this happens to us about every 2 weeks, it seems, but I've been unable to dig up any correlation to anything.

pfeil

Will the bug finally be resolved with 2.4.4?

The bug report is open for 8 months now. Although wie tried to mitigate the problems when it occured for the first time for us in November, we still had problems every time filterdns stopped.

If you're affected from the bug the problems will in many cases be critical, as either access will be allowed when it should be blocked or if you're using a whitelist approach systems or services will break because of the connection problems.

luckman212

Can anyone affected by this please try increasing their debuglevel for filterdns using this commit in System Patches?
https://github.com/luckman212/pfsense/commit/72834bf677bdbd1cf78f6772b79abe4b3eaa8235

After that you can follow the logs via console/ssh

clog -f /var/log/resolver.log

Related redmine: https://redmine.pfsense.org/issues/8758

hcww

Dear All

i am affected with same problem
it happens every day approx.
i must kill filterdns service and restart to make worked again
i increased the debug level for filterdns
and i attached the resolver log file
best regards,
0_1540362451825_resolver.txt

Muhammad Waqas

Problem persists even in Pfsense 2.4.4

Gertjan

What problem ?
Look carefully at the log that @hcww provided. Thousands of line all one the moment (Oct 23 23:15:21) because he was asking to resolve an URL that return many, many (no even more : more then 5000 entries at ones ) IP addresses.
That will bring down a this task for sure. Probably the entire system.

One might try to get an hold on every IP address that the big ones** use, but, looking at the log and you'll find out that that is not a good idea. Known subject btw.

** Google, Facebook, Twitter, etc

Grimson

@muhammad-waqas said in Filterdns stops working:

Problem persists even in Pfsense 2.4.4

Luckily it got resolved in 2.4.4p1, so update before you complain.

rgijsen

We're running 2.4.4-RELEASE-p2 (amd64), but the issue is still there for us. Over the last two weeks I've had two occurances of strange issues, people being unable to connect and such, and it turned out SOME of the aliasses weren't resolved. So far the only thing that helps is killing filterdns as people suggested.

Gertjan

@rgijsen said in Filterdns stops working:

and it turned out SOME of the aliasses weren't resolved

What hosts ?
What happened with the DNS at that moment ? (logs ..)
Default Resolver DNS - or a "have it handled by someone else" ?

rgijsen

@gertjan said in Filterdns stops working:

@rgijsen said in Filterdns stops working:

and it turned out SOME of the aliasses weren't resolved

What hosts ?
What happened with the DNS at that moment ? (logs ..)
Default Resolver DNS - or a "have it handled by someone else" ?

As far as I can tell now, only internal hosts in both cases. However, the default resolver are two internal AD controllers, hence acting as DNS servers as well. They resolve the internal names and using root-hints or if that fails forwarders, they resolve external names. Those two are our main DNS servers. If BOTH of them would be unreachable / not responding, I'd be completely down.

Unfortunately, I've don't have enough backlog in resolver.log to see the issue I had seemingly yesterday. So I'll have to increase my log file space. Would Status --> System --> LogsSettings --> Log file size (Bytes) include logs like resolver.log?

Gertjan

In your case : check log of your AD controller.
The request for a local host was received ? The answer was send back ?

rgijsen

@gertjan We don't have DNS debug-logging enabled by default, as usually we don't have any need for it. However, I'm pretty confident it was up and running, at least one node (although I don't have any reason to believe even one of them had issues). Our monitoring system (Zabbix) relies on DNS heavily. If DNS wouldn't repsond, pretty much all hosts would be not available to our monitoring systems. And of course without proper DNS nothing really works.

pfSense has an interface in the net where both AD/DNS servers are, so no routing involved. The thing is, this probably started yesterday (sunday) in the afternoon; but at least the issue was there this morning. Both DNS were up and running, no issues, but some names were not resolved by pfSense. Only when I killed filterdns and started a new instance by saving an existing alias, it started resolving again, without doing any other changes on the DNS machines or anything else. This makes me believe there was an issue with pfSense / filterdns.

I'll increase log size, so I have some more backlog in the future.

[edit]
My alias-sesolving time is 300s (default). I just found that when I add a new host based on DNS (a host residing on internet this time), even after 15 minutes the table for that alias isn't updated. Also resolver.log doesn't show any activity.

rgijsen

@rgijsen
I'm onto something now. I can currently reproduce some of the issue I think. When I add 'specific' hosts to an alias, they DO get resolved by our DNS:

2/18/2019 12:39:54 PM 1B40 PACKET 000001A857BE1DC0 UDP Rcv <pfsense IP> a463 Q [0001 D NOERROR] AAAA (8)host(7)i'm(2)resolving(0)

2/18/2019 12:39:54 PM 1B3C PACKET 000001A858859CC0 UDP Rcv <pfsense IP> 519a Q [0001 D NOERROR] AAAA (8)host(7)i'm(2)resolving(0)

2/18/2019 12:39:54 PM 1B40 PACKET 000001A857BE1DC0 UDP Snd <pfsense IP> a463 R Q [8081 DR NOERROR] AAAA (8)host(7)i'm(2)resolving(0)

2/18/2019 12:39:54 PM 1B3C PACKET 000001A858859CC0 UDP Snd <pfsense IP> 519a R Q [8085 A DR NOERROR] AAAA (8)host(7)i'm(2)resolving(0)

This is an external host, i.e. a DNS that needs to be externaly resolved by our DNS servers. That seems to work fine, however the host does NOT end up in the table for that alias. When I add another DNS, same domain, so hosted at the same DNS on internet, that works fine. I tried others like www.tweakers.net, www.nos.nl or bbc.co.uk I have the same success loggings in my DNS debug log, and they DO end up in the alias table as well. At first I though the issue was with hosts that are already in a table somewhere, but that doesn't seem to be the case. Most internal names I tried now don't end up in that table either.

pfSense Resolver log:
Feb 18 12:47:14 filterdns Adding host <Host that gets added to the alias> (I just added that one in the alias)
Feb 18 12:47:14 filterdns Adding Action: pf table: B_it_webserver host: <Host that gets added to the alias>
Feb 18 12:47:14 filterdns Adding Action: pf table: B_it_webserver host: <host that does NOT end up in table> (I just added that one in the alias as well)
Feb 18 12:47:14 filterdns Adding Action: pf table: B_it_webserver host: www.ict-net.nl

The host that does NOT end up in table here, is by the way successfully added to some other aliasses, where it works just as expected. But for this alias I am missing the 'Adding host' in the pfSense log.

I expect something fishy is going on. I obviously don't want my unmasked logs online, but if there's a better way to show you what's actually happening I'd be glad to do so.

[edit]
One more addition:
I tried creating a new alias, with the same three hosts as in the alias I used above. Here NONE of them end up in the table, after waiting for about 20 minutes, while in the alias used above two out of three (and the same two every time, no matter what order I put them in) work.
I again killed filterdns, restarted it and poof - the tables immediately got filled. So it seems filterdns is partially functional - some hosts get added, some aren't.