Filterdns stops working

Birke

ok, problem is back on my system :'(

When the problem happens, does /var/run/filterdns.pid contain a valid PID for filterdns? (Check the "ps uxaww" output to find the filterdns pid)

yes, its the pid of the running filterdns-process.

@jimp:

If you do "killall -HUP filterdns" do the entries resolve again, or are they still missing?

it doesnt change anything on my machine.

one thing changed:
the old entries were not deleted this time. but they are not updated.
new entries are not resolved but they are added to the filterdns.conf.

also the mentioned cron-job from m0nji doesnt seem to be a good idea, since the "pkill filterdns" doesn't end the filterdns-process (at least not on my pf). that means only more and more instances of filterdns will start.

is there any way to really kill that process so i can start a fresh instance?

dudi

I have the same problem!
No filterDNS in the system->DNS Resolver logs after a week or some days.

dudi

Somebody with a solution?

Birke

not really a solution, only a workaround:
run "killall -9 filterdns" in the shell and then "/usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 300 -c /var/etc/filterdns.conf -d 1" (or save and apply an existing alias). you could also put them in a cron as already mentioned.

since the error happened to me again, i checked the resolver.log again to see if there is any information what the reason could be. even with debug level 3 there is no clue.
the last working entries are some normal adding and clearing entries and some information about some static entries.
the next time filterdns should run, it starts with a "Received signal Hangup(1)" entry and only one entry gets deleted and the static entries are listed.
after that every time filterdns should run (automatically or after a manual save&apply of an alias), only the hangup-message is in log.

ps: seems this thread is about the same problem. maybe both threads should be merged.

Valeriy

I can confirm, same issue is happening with me. It seems to be it, it started happening after pfSense upgrade in Sept-Nov 2017. I am using development snapshot from 10th of January, issue still persists.

am using Policy Based Routing (PBR) and heavily rely on a lot of aliases: it took time to realize that tables of IP addresses (referring hostname based aliases) are not updated.

So temporary workaround so far is same, what you have suggested:

killall -9 filterdns
rm /var/pid/filterdns.pid (not sure if it correct path, just writing from my head)

and then start filterdns process again (or refresh aliases).

In fact, starting filterdns (with proper arguments) sometimes did not help, I had to kill the process again and refresh (edit-save-apply) one of aliases lists.

jimp

Do we have any reliable and predictable way to trigger this issue? Any specific alias contents that cause it? Is there a set interval at which the problem occurs? Is there some other event that causes it to fail?

Gertjan

I'm not having any issue with filterdns, but I'm using it, it resolves a couple of (very static) URL's to IPv4 and IPv6.

When you guys kill filterdns, restart it like this :

/usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 240 -c /var/etc/filterdns.conf -d 7

or even

/usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 240 -c /var/etc/filterdns.conf -d 7 -f

"-d 7" will produce massive logging is the DNS log. Something might show up.
"-f" will keep it in the foreground, so keep your console access open for the time being. Ctrl-C will end it.

Try also chancing the interval "-i 240" (4 minutes) to "i -600" (every 10 minutes) to give it more time.

Btw : "filterdns" is a pretty simple FreeBSD package (program), you'll find it here : https://github.com/pfsense/FreeBSD-ports/blob/devel/net/filterdns/files/filterdns.c
It doesn't do much, and it depends on one important thing : DNS should be working.
Also, it injects modifications into pf tables.
Knowing that all spawned threads (as many as there are tables) are relaunched every "-i xxx" seconds at the same time, is it possible that "pf " gets "overrun" ?

Birke

@jimp:

Do we have any reliable and predictable way to trigger this issue? Any specific alias contents that cause it? Is there a set interval at which the problem occurs? Is there some other event that causes it to fail?

Nope, no trigger (at least i haven't found one).
Nope, i changed my aliases earlier in this thread.
Nope, no specific interval on my system. Sometimes its some days, sometimes weeks.
Nope, i checked the other logfiles. I found no other action that was between last working and first not working filterdns.

I will increase the interval and debuglevel as Gertjan suggests. Maybe then we find some clues.

Birke

seems the entries in the alias are not the reason.
i put the exact fdqn, ips and networks into aliases with the same name on my pfsense at home (more than one month ago).
i had no problems at home but some here at work in that time.

nrasmus

Just chiming in that we're experiencing this issue too. Tried the kill and restart in verbose mode, but the logs just contained the "clearing entry x from pf table z on host y" messages, and then nothing further. I've put a cron job in place to do this nightly, as it's effecting our ability for our web apps to send mail via Amazon SES.

For what it's worth, this happens to us about every 2 weeks, it seems, but I've been unable to dig up any correlation to anything.

pfeil

Will the bug finally be resolved with 2.4.4?

The bug report is open for 8 months now. Although wie tried to mitigate the problems when it occured for the first time for us in November, we still had problems every time filterdns stopped.

If you're affected from the bug the problems will in many cases be critical, as either access will be allowed when it should be blocked or if you're using a whitelist approach systems or services will break because of the connection problems.

luckman212

Can anyone affected by this please try increasing their debuglevel for filterdns using this commit in System Patches?
https://github.com/luckman212/pfsense/commit/72834bf677bdbd1cf78f6772b79abe4b3eaa8235

After that you can follow the logs via console/ssh

clog -f /var/log/resolver.log

Related redmine: https://redmine.pfsense.org/issues/8758

hcww

Dear All

i am affected with same problem
it happens every day approx.
i must kill filterdns service and restart to make worked again
i increased the debug level for filterdns
and i attached the resolver log file
best regards,
0_1540362451825_resolver.txt

Muhammad Waqas

Problem persists even in Pfsense 2.4.4

Gertjan

What problem ?
Look carefully at the log that @hcww provided. Thousands of line all one the moment (Oct 23 23:15:21) because he was asking to resolve an URL that return many, many (no even more : more then 5000 entries at ones ) IP addresses.
That will bring down a this task for sure. Probably the entire system.

One might try to get an hold on every IP address that the big ones** use, but, looking at the log and you'll find out that that is not a good idea. Known subject btw.

** Google, Facebook, Twitter, etc

Grimson

@muhammad-waqas said in Filterdns stops working:

Problem persists even in Pfsense 2.4.4

Luckily it got resolved in 2.4.4p1, so update before you complain.

rgijsen

We're running 2.4.4-RELEASE-p2 (amd64), but the issue is still there for us. Over the last two weeks I've had two occurances of strange issues, people being unable to connect and such, and it turned out SOME of the aliasses weren't resolved. So far the only thing that helps is killing filterdns as people suggested.

Gertjan

@rgijsen said in Filterdns stops working:

and it turned out SOME of the aliasses weren't resolved

What hosts ?
What happened with the DNS at that moment ? (logs ..)
Default Resolver DNS - or a "have it handled by someone else" ?

rgijsen

@gertjan said in Filterdns stops working:

@rgijsen said in Filterdns stops working:

and it turned out SOME of the aliasses weren't resolved

What hosts ?
What happened with the DNS at that moment ? (logs ..)
Default Resolver DNS - or a "have it handled by someone else" ?

As far as I can tell now, only internal hosts in both cases. However, the default resolver are two internal AD controllers, hence acting as DNS servers as well. They resolve the internal names and using root-hints or if that fails forwarders, they resolve external names. Those two are our main DNS servers. If BOTH of them would be unreachable / not responding, I'd be completely down.

Unfortunately, I've don't have enough backlog in resolver.log to see the issue I had seemingly yesterday. So I'll have to increase my log file space. Would Status --> System --> LogsSettings --> Log file size (Bytes) include logs like resolver.log?

Gertjan

In your case : check log of your AD controller.
The request for a local host was received ? The answer was send back ?