Periodic since 2.2 pages load blank, certs invalid

swix

Thanks for your update Trel, we're still searching here, but enabling DNSSEC does not stop the issue. And last time, it stopped by itself after about 5 minutes.

When "broken", all webrequests are redirected to http://xsso.www.example.org (with the original domain name instead of example.org).

GET /domain/www.example.org HTTP/1.1
Host: sso.mlwr.io
User-Agent: Mozilla/5.0 (X11; Linux x86_64) (...)
Accept-Encoding: gzip, deflate, sdch
Accept-Language: de,en-US;q=0.8,en;q=0.6
Cookie: anbsso=a5f4221ae2729d945150c83748e2ea12 (...)

Response: HTTP/1.1 302 Moved Temporarily
Server: nginx-perl/1.2.9.7
Date: Mon, 09 Feb 2015 14:29:31 GMT
Content-Type: text/html
Transfer-Encoding: chunked Connection: keep-alive
Set-Cookie: btst=b1b2035cffe818d92d7f6604a1318beb|myip|...
Location: http://xsso.www.example.org/a5f4221ae2729d945150c83748e2ea12

Just increased the logfiles size to try to see more next time it happens and added monitoring to get an alert directly.

And another view with wget, with and without https, with the same "lolcat" I already saw in another thread:


om@ompc:~> wget http://www.example.org
--2015-02-09 13:55:37--  http://www.example.org                                                                                                                                                     
Resolving www.example.org (www.example.org)... 195.22.26.248                                                                                                                                           
Connecting to www.example.org (www.example.org)|195.22.26.248|:80... connected.                                                                                                                        
HTTP request sent, awaiting response... 302 Moved Temporarily                                                                                                                                      
Location: http://sso.mlwr.io/domain/www.example.org [following]                                                                                                                                      
--2015-02-09 13:55:39--  http://sso.mlwr.io/domain/www.example.org
Resolving sso.mlwr.io (sso.mlwr.io)... 195.22.26.248
Reusing existing connection to www.example.org:80.
HTTP request sent, awaiting response... 200 OK
Cookie coming from sso.mlwr.io attempted to set domain to example.org
Length: unspecified [text/html]
Saving to: ‘index.html.3’

om@ompc:~> wget https://www.example.org
--2015-02-09 13:55:43--  https://www.example.org/
Resolving www.example.org (www.example.org)... 195.22.26.248
Connecting to www.example.org (www.example.org)|195.22.26.248|:443... connected.
ERROR: cannot verify www.example.org's certificate, issued by ‘/CN=lolcat’:
  Self-signed certificate encountered.
    ERROR: certificate common name ‘lolcat’ doesn't match requested host name ‘www.example.org’.
To connect to www.example.org insecurely, use `--no-check-certificate'.

Trel

You said you enabled DNSSEC, but question, what do you have in

System -> General -> DNS servers?

swix

PS: installed packages on this router: arpwatch, bandwithd, cron, darkstat, mailreport, nrpe, rrd summary, openvpn client export.

swix

@Trel:

You said you enabled DNSSEC, but question, what do you have in
System -> General -> DNS servers?

Completely empty, so shoud unbound start with root servers directly. I was wondering where was the hints file for unbound, but it seems to be directly in the binary file (strings unbound) :

A.ROOT-SERVERS.NET.
198.41.0.4
B.ROOT-SERVERS.NET.
192.228.79.201
C.ROOT-SERVERS.NET.
192.33.4.12
D.ROOT-SERVERS.NET.
199.7.91.13
E.ROOT-SERVERS.NET.
192.203.230.10
F.ROOT-SERVERS.NET.
192.5.5.241
G.ROOT-SERVERS.NET.
192.112.36.4
H.ROOT-SERVERS.NET.
128.63.2.53
I.ROOT-SERVERS.NET.
192.36.148.17
J.ROOT-SERVERS.NET.
192.58.128.30
K.ROOT-SERVERS.NET.
193.0.14.129
L.ROOT-SERVERS.NET.
199.7.83.42

Derelict

Ok. So what are the DNS servers configured on the client?

swix

@Derelict:

Ok. So what are the DNS servers configured on the client?

Set via DHCP, simply the router's ip address:


   option domain-name-servers 192.168.1.100;

Derelict

And you've actually verified that's the case on the client?

doktornotor

@swix:

Thanks for your update Trel, we're still searching here, but enabling DNSSEC does not stop the issue.

It does NOT stop the issue on domains that are not signed, no. Also, it will NOT prevent the DNS hijack if your clients are NOT using pfSense or another DNSSEC-enabled resolver, even if the zones are signed. It will prevent resolving domains to malicious crap for the rest.

Block/redirect all DNS queries on LAN to pfSense
Find and reimage infected crap.

Trel

@doktornotor:

Find and reimage infected crap.

I agree, but I would like to point out the way it affects pfsense/unbound is not a good thing at all.

A lookup on a completely isolated network segment made unbound start giving bad resolutions to ALL network segments when other DNS servers were permitted, and when they were blocked, unbound simply stopped replying.

That's not really the best outcome for a single computer looking up a bad domain.

swix

@Derelict:

And you've actually verified that's the case on the client?

Yes, + also did tests with dig @192.168.1.100 and directly via the pfsense shell. "poisoned" ip in every case (after a few seconds after the beginning of a new occurence of the issue).

@doktornotor:

It does NOT stop the issue on domains that are not signed, no. Also, it will NOT prevent the DNS hijack if your clients are NOT using pfSense or another DNSSEC-enabled resolver, even if the zones are signed. It will prevent resolving domains to malicious crap for the rest.

Yep, I supposed that too, but sometimes there are collateral effets to such settings.

@doktornotor:

Block/redirect all DNS queries on LAN to pfSense

Find and reimage infected crap.

It will continue tomorrow, now it is calm again, as everybody left the office :) But even if it is related to one malicious host on the LAN, it shouldn't be able to break the unbound resolver so easily…

Thanks again for all your feedbacks and until tomorrow!

Derelict

If pfSense/unbound asks the configured upstream DNS servers to resolve a query and gets something unexpected back it's not the fault of pfSense/unbound.

You need to be looking at these queries from the root back and see where things go wrong.

Very intriguing.

doktornotor

@Derelict:

If pfSense/unbound asks the configured upstream DNS servers to resolve a query and gets something unexpected back it's not the fault of pfSense/unbound.

Yes, exactly. Strongly suspect most of the people here are either using some hacked ISP device that hijacks the DNS traffic or the clients do not query the pfSense DNS resolver at all.

Derelict

Also, by obfuscating everything to example.com, you are eliminating the ability of everyone reading this thread from seeing what responses they get to the same queries.

Maybe someone else would get the BS responses and be in a better position to troubleshoot it than you are.

I would put this on LAN:

pass IPv4 TCP/UDP source LAN net dest ! 192.168.1.100 port 53 log

Put that above your normal pass rule. If everything is as you say, it should log nothing.

On pfSense 2.2 you should be able to set the dest to ! This Firewall (self).

doktornotor

@Derelict:

Also, by obfuscating everything to example.com, you are eliminating the ability of everyone reading this thread from seeing what responses they get to the same queries.

Pretty sure I could get these guys involved in investigating the issue here (they've also written the Knot DNS server so I'm rather convinced they are familiar with DNS :P) – however that'd require either remote access or at least uncensored traffic captures. Not example.com -- totally useless.

Trel

@doktornotor:

@Derelict:

If pfSense/unbound asks the configured upstream DNS servers to resolve a query and gets something unexpected back it's not the fault of pfSense/unbound.

Yes, exactly. Strongly suspect most of the people here are either using some hacked ISP device that hijacks the DNS traffic or the clients do not query the pfSense DNS resolver at all.

Using Comcast with a modem only (not a gateway in bridged mode). Here's the block rule.
With these settings, if I try to look up the domain I get this scenario

When only unbound can be used and DNS Sec is set to ON, and port 53 is blocked except to pfsense
-A DNS lookup from any computer to one of the domains causes unbound to stop resolving anything, all lookups fail
(persists until unbound service is restarted)

I understand that an infected machine should not be on the network, but if a mere typical DNS lookup can cause this much havoc, then something is really wrong.

restricted_dns.gif_thumb

swix

@Derelict:

Also, by obfuscating everything to example.com, you are eliminating the ability of everyone reading this thread from seeing what responses they get to the same queries.
Maybe someone else would get the BS responses and be in a better position to troubleshoot it than you are.

It wasn't obfuscated, it really looked like that… (also with other domains, juste replace example.com by anything)

@Derelict:

I would put this on LAN:
pass IPv4 TCP/UDP source LAN net dest ! 192.168.1.100 port 53 log
Put that above your normal pass rule. If everything is as you say, it should log nothing.

Ok, thanks, will setup this.

@doktornotor:

Yes, exactly. Strongly suspect most of the people here are either using some hacked ISP device that hijacks the DNS traffic or the clients do not query the pfSense DNS resolver at all.

I would be really happy to know the cause, it is really strange that Trel is having a similar problem with the very same target IP "195.22.26.248", especially from different countries/ISP's. The only recent change to our infrastructure was upgrading to pfSense 2.2 at the beginning of January, otherwise nothing special. But I'll setup some network monitoring tools later this week.

Best regards

Derelict

@Trel:

When only unbound can be used and DNS Sec is set to ON, and port 53 is blocked except to pfsense
-A DNS lookup from any computer to one of the domains causes unbound to stop resolving anything, all lookups fail
(persists until unbound service is restarted)

A search of redmine does not show that as an open issue. Have you reported it?

2chemlud

@Trel:

When only unbound can be used and DNS Sec is set to ON, and port 53 is blocked except to pfsense
-A DNS lookup from any computer to one of the domains causes unbound to stop resolving anything, all lookups fail
(persists until unbound service is restarted)[

[/quote]

I can not confirm this, worked fine for me in this setup (with some service interruptions, 5-7times a day)

Derelict

I'm not in a position to test this at the moment. Tonight.

Trel

@2chemlud:

@Trel:

When only unbound can be used and DNS Sec is set to ON, and port 53 is blocked except to pfsense
-A DNS lookup from any computer to one of the domains causes unbound to stop resolving anything, all lookups fail
(persists until unbound service is restarted)

I can not confirm this, worked fine for me in this setup (with some service interruptions, 5-7times a day)

When you say interruptions, could those have been unbound not responding?
Someone did mention that one of the times I was unable to restart the service manually (as I was not available) it began working again after 45-50 minutes.

Either way though, as soon as I overrode the DNS for those sites, it's never happened again.

@Derelict:

I'm not in a position to test this at the moment. Tonight.

If you're going to test, try accessing and resolving

api-nyc01.exip.org

and

ns3.csof.net