Very strange DNS / Routing Issues
-
To be honest, I have no idea what's going on or even where to put this. I have a pfSense install giving me all sorts of fits. I believe it's DNS but how this is occurring is beyond me at the moment. This is an APU2 where:
WAN=Spectrum Cable. Static IP.
LAN=Various VLANS
OPT1=AT&T DSL. Dynamic local IP (no alternative since they say they can't do static IPs on DSL no matter who I talk to)
Gateway group is set to Spectrum as the primary and AT&T as the backup.It's been working fine for months when suddenly the site started having problems with the AT&T DSL circuit. That shouldn't be an issue since it isn't the primary connection, except it keeps taking down the site. We've found that it links back to a DNS issue and an issue with Unbound. Maybe. This is where it starts confusing me.
If I run a dig command I get:
/root: dig @127.0.0.1 google.com a +trace ; <<>> DiG 9.12.2-P1 <<>> @127.0.0.1 google.com a +trace ; (1 server found) ;; global options: +cmd . 85324 IN NS k.root-servers.net. . 85324 IN NS c.root-servers.net. . 85324 IN NS a.root-servers.net. . 85324 IN NS l.root-servers.net. . 85324 IN NS e.root-servers.net. . 85324 IN NS j.root-servers.net. . 85324 IN NS g.root-servers.net. . 85324 IN NS h.root-servers.net. . 85324 IN NS d.root-servers.net. . 85324 IN NS f.root-servers.net. . 85324 IN NS m.root-servers.net. . 85324 IN NS b.root-servers.net. . 85324 IN NS i.root-servers.net. . 85324 IN RRSIG NS 8 0 518400 20200609050000 20200527040000 48903 . x+IFXbC176ngC/oUvUwi2gtwz3zqxwsXjJMqTcR69ob+u2xNYYnJgWh6 MSdy11N+T2rpW+6E0LnZdJDIybKOEdU3E7dSYfXOI1tOhwJDVypRF20k Vwir5IM8fLCQDnc/1tPVw2wWFNcdudPY9CrviQuIcs5tcB3EfQG//F/p hSxzfsYA5+ZHQPnhMFf6ONl9oLkQJuURpyNQOmbjI64Q0nUbAp9JMLix Pbw82kVujOBpbYIFPXUUMCtQAKdBKhdmSRhbua7Ttt02OkOn9jFVYA5j B5X88RUW4y1hnXJyU9VMdlu44+U8BhpBI62jIJ2XstQqZ4Li2plZFXbp 3tgHmw== ;; Received 525 bytes from 127.0.0.1#53(127.0.0.1) in 0 ms com. 172800 IN NS a.gtld-servers.net. com. 172800 IN NS b.gtld-servers.net. com. 172800 IN NS c.gtld-servers.net. com. 172800 IN NS d.gtld-servers.net. com. 172800 IN NS e.gtld-servers.net. com. 172800 IN NS f.gtld-servers.net. com. 172800 IN NS g.gtld-servers.net. com. 172800 IN NS h.gtld-servers.net. com. 172800 IN NS i.gtld-servers.net. com. 172800 IN NS j.gtld-servers.net. com. 172800 IN NS k.gtld-servers.net. com. 172800 IN NS l.gtld-servers.net. com. 172800 IN NS m.gtld-servers.net. com. 86400 IN DS 30909 8 2 E2D3C916F6DEEAC73294E8268FB5885044A833FC5459588F4A9184CF C41A5766 com. 86400 IN RRSIG DS 8 1 86400 20200609050000 20200527040000 48903 . VxYSXULvWokvozpN6YxoRBsq+qWMDXKyc5I/bTXVX7cITbZFyeUQ/y22 yEFtZWD14KGM50ap4dH9s2DvvEzz1A/YUoOE2gbDwplBlRKcBIx0gsc8 QOR7/siG6btp8PFFG59MUdD9w14cXbeZl9Kncl+BY+Xck9tM0ij7Ig5X SXed9UELwNd+VwEAXtOCI/i4aXW+FIB5/gmq/0LOVlVYL514Eu4Nl1y5 tGNISDlrM5du4sJ1jkL8MfyyfcCnFZiVatikqJDTcrOrIxbyTk47/hvh RgBBU2ZGjvUy3HJK6nUq5cq0ghzibOVI196RzkxvjHGDaSZysThAHw99 l6oU7Q== couldn't get address for 'a.gtld-servers.net': not found ;; Received 1170 bytes from 198.97.190.53#53(h.root-servers.net) in 41 ms google.com. 0 IN A 192.168.1.254 ;; Received 44 bytes from 192.168.1.254#53(d.gtld-servers.net) in 1 ms
Why on earth is it resolving to 192.168.1.254? That's the upstream of the DSL connection. Locally it's giving me a 192.168.1.x IP with a 192.168.1.254 gateway.
So, I thought maybe an update from 2.4.4_3 to 2.4.5 might be a good idea. Sometimes it tells me I'm on the latest version. If I refresh enough it tells me there is an update to 2.4.5 available. If I keep refreshing it bounces between the two. It fails here:
[118/136] Fetching check_reload_status-0.0.8.txz: .... done [119/136] Fetching ccache-3.7.1.txz: .......... done [120/136] Fetching ca_root_nss-3.51.txz: .......... done [121/136] Fetching bsnmp-ucd-0.4.4.txz: ... done [122/136] Fetching bind-tools-9.14.12.txz: .......... done [123/136] Fetching python37-3.7.7.txz: .......... done Certificate verification failed for /C=US/ST=MA/L=Lowell/O=Arris Group, Inc./OU=Telco CPE/CN=dsldevice.domain_not_set.invalid 12972724:error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed:/usr/local/poudriere/jails/pfSense_v2_4_5_amd64/usr/src/crypto/openssl/ssl/s3_clnt.c:1269: Certificate verification failed for /C=US/ST=MA/L=Lowell/O=Arris Group, Inc./OU=Telco CPE/CN=dsldevice.domain_not_set.invalid 12972724:error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed:/usr/local/poudriere/jails/pfSense_v2_4_5_amd64/usr/src/crypto/openssl/ssl/s3_clnt.c:1269: pkg-static: https://pkg.pfsense.org/pfSense_v2_4_5_amd64-pfSense_v2_4_5/All/isc-dhcp44-server-4.4.1_4.txz: Authentication error Failed
Why is it looking at the certificate from the DSL Arris modem? That makes no sense at all!
Next, I disabled the AT&T gateway from the gateway groups. The Dig result is the same. Finally I disabled the OPT1 interface entirely and the problem appears to have gone away.
This is very confusing to me. Does anyone have any insight as to what's going on? It seems almost like a MITM on the Arris DSL modem.
-
@Stewart said in Very strange DNS / Routing Issues:
Why is it looking at the certificate from the DSL Arris modem? That makes no sense at all!
That's already answered by yourself :
@Stewart said in Very strange DNS / Routing Issues:
Why on earth is it resolving to 192.168.1.254?
If DNS requests tend to get back 192.168.1.254 then, yes, the update sofwtare will contact this 192.168.1.2541 for the updates.
That messy, and certificate problems is just the tip of the iceberg.By 192.168.1.254 .. Dono.
That about shutting down that OPT1 interface ?
Or start to detail your DNS setup ..
@Stewart said in Very strange DNS / Routing Issues:Sometimes it tells me I'm on the latest version
2.4.5, came out last month - or the month before. https://www.netgate.com/blog/pfsense-2-4-5-release-now-available.html
-
I'm sorry. I'm not understanding what you wrote. Unbound has forwarding disabled so it should be doing it's own resolving. I'm not sure what else you would want me to detail.
Network Interfaces: LAN+All VLANS
Outgoing Network Interfaces: LAN + OPT1
DNSSEC: ON
Forwarding: OFF
DHCP Registration: ON
Static DHCP: ON
OpenVPN Clients: ONAs for the version issue. Just refreshing changes whether it says I'm on the latest or if there is a new version available. If i keep refreshing it switches back and forth. I'm assuming that is because sometimes there is response on the WAN and sometimes on the OPT1, however, I would expect it to either be correct in showing 2.4.5 or just fail. Since I've disabled OPT1 it's been correct every time I refresh.
I know there is some kind of DNS issue on the AT&T side. I'm facing 2 issues that I see:
- Why DNS queries are sent out of OPT1 when the routing is still going out of WAN.
- Why DNS is failing and returning the wrong info over the DSL where it shows google.com at 192.168.1.254 and the Arris modem is trying to pass off the certificates. I assume that is because that is what the modem is sending.
My best guess at this moment is that the DSL modem has been reset and is intercepting all of the traffic because it's waiting for a user to log in and activate. That's the only thing I can think of that would cause it to behave in this way. (I HATE DSL). I suppose what I need to know, then, is how to limit DNS queries to go out the interface that is the current route. I don't want queries going out OPT1 when routing has the data going over the WAN.