NAT sometimes blocking connections.
-
Hi,
I've recently deployed pfSense as our main office gateway and it's mostly been a success, however I'm getting some odd connection issues with some websites. Most stuff is working fine, but specific sites will either timeout or load with some/all their JS/CSS timing out. The problem seems to come-and-go, and once a host has failed it will stay broken for everyone for a while.
I think it has to do with when the browser opens a lot of connections at once, as it only happens to sites with lots (~10+) of different domains loading piles of cruft, and running adblock fixes it. Installing squid has fixed this for non-SSL sites, which leads me to think it's the NAT causing the problem.
For example one of our designers tried to access freepik.com, and everything loaded except stuff from cdns2.freepik.com. If you come back a bit later it might work, come back again and it might not. It's always the same one that fails (cdns1,3 and 4 load fine). Connecting to it from the gateway works fine.
That particular host has been fixed by squid, however the same issue still happens sometimes with some SSL sites, so I'm hoping someone here has seen this issue before or can point me in the right direction for tracking things down.
I have pfSense running in a VM on a linux host with KVM, which is all working fine (once I disabled hardware checksum offload). It has 4 VLANs configured, two for PPPoE wan links and two as LAN networks. Again, this all seems to be working fine. I have gateway groups set, and it seems to be balancing as it should. NAT is all set to auto.
I've disabled the multi-wan stuff for a day and it's had no impact on connectivity issues, and neither upstream link is overloaded. What else should be I doing to debug?
-
you sure its loading stuff from cdns1, 3, 4 etc.. I don't even show those resolving.
I show cdns2.freepik.com pointing to some cnames
;; ANSWER SECTION:
cdns2.freepik.com. 3061 IN CNAME wac.9AA5.edgecastcdn.net.
wac.9AA5.edgecastcdn.net. 3265 IN CNAME gpla1.wac.v2cdn.net.
gpla1.wac.v2cdn.net. 3265 IN A 72.21.91.8The TTLs are only 3600 seconds, 1 hour. So if your having an issue resolving that could cause you problems.
Why do you think NAT has anything to do with it?
-
I could've sworn it was those before, but og well.
The main issue is some of the time cdns2 (and a bunch of other things) doesn't work through the nat, and sometimes they do.
-
Again what do you think the NAT has to do with it?? Not like pfsense is not going to nat your connection, etc. Are you behind a double nat?
If you have problem with things loading, I would look to your connection being a problem or dns. Are you using the resolve or forwarder? 2.2 defaults to resolver now using unbound, are you using dnssec? Did you make the suggested changes that are all over the board about locking it down from the default setting?
There was a day or so ago someone was having an issue with a osha.gov site - well their dnssec was broken. And it was enabled on pfsense, so it was doing exactly what it was told to do.. If dnssec is not valid - do return results, etc.
-
Again what do you think the NAT has to do with it?? Not like pfsense is not going to nat your connection, etc. Are you behind a double nat?
Sorry I missed the last line of your post.
pfSense is doing NAT, and there's no double NAT or anything. When one of the hosts is failing it won't work from any computer on the LAN, but if I SSH to pfSense it's able to connect and do HTTP queries fine, and squid can also connect fine. Of the differences between connections from the pfSense box and computers on the LAN, the NAT seems the most likely.
If you have problem with things loading, I would look to your connection being a problem or dns. Are you using the resolve or forwarder? 2.2 defaults to resolver now using unbound, are you using dnssec? Did you make the suggested changes that are all over the board about locking it down from the default setting?
There was a day or so ago someone was having an issue with a osha.gov site - well their dnssec was broken. And it was enabled on pfsense, so it was doing exactly what it was told to do.. If dnssec is not valid - do return results, etc.
Using the resolver with dnssec enabled and mostly default. But the domains resolve fine, and as previously mentioned connecting directly from pfSesene itself is fine, so I don't think it's an upstream issue.
-
By time you check it might have been resolved, but the client did not get a answer and neg cached it, so doesn't even ask for it again. Clients all have their own dns cache, browsers have their own cache as well, etc.
If you having an issue from a client with sites (fqdn) do a query from the client for that fqdn, does it resolve? look in the clients local cache with windows you can do it with.
/displaydns Display the contents of the DNS Resolver Cache.
Restart you browser.
To the settings that should be enabled until 2.2.1 makes them default you can check out https://redmine.pfsense.org/issues/4402
If your having issue with the resolver and speed, etc. Try changing over to the old forwarder(dnsmasq) vs resolver (unbound), enable the forwarder mode in resolver, etc. Possible your isp is doing something underhanded with dns queries and that could cause your resolver problems.