First post.... Lan/some Vlans cant get to website, some vlans can
-
I've got an issue that is probably something stupid, but I can't get my head wrapped around it.
PFSense running for about a week (switched over from Sonicwall TZ350). Base LAN and 4 Vlans, 2 WAN configured for failovers. I have 2 gateway groups running the failovers. Around 250 devices, half on wireless.OK, We have 2 domains hosted on hostgator, with some tools we use there. They also host our emails, etc. A couple times in the past 2 days the web tools become unreachable. I thought Hostgator blocked us, but I switch gateways to the second ISP and the issue persists. The main LAN and vlan7 (secondary internal lan) are affected, but vlan 2 is not, and our guest vlan is not.
What would cause some vlans to lose connectivity at the same time, but not others on the same external IP/gateway? Also, email and ping are fine, just http. The issue resolves itself in a couple hours each time.
Obviously I've been all over the firewall rules, but they wouldn't be intermittent. I was wondering if something sensed strange traffic and started blocking? I've searched all the block lists and tables and just do not see it.
Sorry, I'm sure im missing a ton of info you guys need to help, but my brain is mush after 4 days of chaos with this changeover. IT's probably something simple I just can't see.
Thanks for any ideas.
-
@VillageIT said in First post.... Lan/some Vlans cant get to website, some vlans can:
I was wondering if something sensed strange traffic and started blocking? I've searched all the block lists and tables and just do not see it.
This implies you have installed one of the available, but optional, third-party packages that block stuff. Examples would be pfBlockerNG, Snort, Suricata, or SquidGuard. Have you installed any of those? If you have, the very first troubleshooting step is to turn those packages off and see if the problem disappears. If it does, then add each package back one at the time and test in between. That will help you isolate the package that is causing the problem. Then you can post a query in the appropriate sub-forum here for that package to get specific help.
If you have no third-party packages installed, then it almost would have to be a physical connectivity issue or something funky with your network switches. pfSense will not just randomly block stuff with no third-party blocking packages installed.
One other remote possibility is asymmetric routing, but really can't make a judgement call there without a full network diagram to examine.
-
@bmeeks
Thank you for the reply.
I did in fact install suricata, but haven't enabled it, so I didn't see that as a threat.I agree it could be in the switchgear somewhere, but we run HPE switches for wired and Ubiquiti for our wireless (driving ubiquiti access points). The issue happens on both wired and wireless connections. I will research more on this.
I hadn't thought of asymetric routing. I'll see if I can prove/disprove that as well.Thank You.
-
@VillageIT said in First post.... Lan/some Vlans cant get to website, some vlans can:
I hadn't thought of asymetric routing. I'll see if I can prove/disprove that as well.
The only really weird thing is you said only HTTP was impacted. Does that also include HTTPS, or just HTTP? Asymmetric routing would be expected to affect everything from a given host, and not just a single protocol.
Are you sure your issue is a true loss of connectivity, or could it be a DNS resolution failure? One thing folks sometimes do is enable the option in the DHCP Server to "Register Leases in DNS". That can cause problems with
unbound
, the DNS resolver used in pfSense. Each time a DHCP lease renews, theunbound
service will be restarted so that it will read in the hosts from thedhcpleases
file. While restarting,unbound
is unable to perform DNS lookups. That can look like the connection is down to web browsers. Check and be sure that option under the DHCP Server is not checked for all the defined interfaces (including the VLANs). -
@bmeeks
Good points. It’s only to two specific sites. The devices seem to have no problem finding any other sites.
I’m wondering if there is something on the DNS side causing it not to find those specific sites. Wonder if I should map it? -
@VillageIT said in First post.... Lan/some Vlans cant get to website, some vlans can:
on the DNS side causing it not to find those specific sites. Wonder if I should map it?
What does "map it" mean - this is not a technical term I have ever heard in any sort of relation to dns or networking in general. map a network file share ok..
What is different about vlan 2 and 7? Do they use a different gateway from your groups? Are you pointing clients to different dns?
-
@VillageIT said in First post.... Lan/some Vlans cant get to website, some vlans can:
It’s only to two specific sites.
This would have been some very useful information to have shared in your very first post . That's a totally different scenario than I gleaned from reading your first post.
If it's just two sites, then DNS is where you need to laser focus your efforts.
-
@johnpoz
I was thinking of Host overrides or domain overrides. I believe they can be added manually. Not sure if that would eliminate the issues, but might eliminate possible causes right? -
@bmeeks
"OK, We have 2 domains hosted on hostgator, with some tools we use there. They also host our emails, etc. A couple times in the past 2 days the web tools become unreachable. I thought Hostgator blocked us, but I switch gateways to the second ISP and the issue persists. "
I thought I did in this sentence. Sorry.
I'm completely immersed in this network, so it's hard for me to see the information I'm not relaying. -
@VillageIT said in First post.... Lan/some Vlans cant get to website, some vlans can:
@bmeeks
"OK, We have 2 domains hosted on hostgator, with some tools we use there. They also host our emails, etc. A couple times in the past 2 days the web tools become unreachable. I thought Hostgator blocked us, but I switch gateways to the second ISP and the issue persists. "
I thought I did in this sentence. Sorry.
I'm completely immersed in this network, so it's hard for me to see the information I'm not relaying.Okay, so you are saying the two domains hosted on the external (to your pfSense firewall) host site fail to be reachable, but email traffic to the same host works? Or is the email traffic getting directed to a different host site? The MX record for your domain can potentially send email servers to anywhere for exchanging mail destined for your domain.
So are your web tools that periodically fail to be reachable and your email server both hosted at the same place?
You appear to have 3 VLANs configured: guest, vlan2, and vlan7. And these three VLANs are using the physical LAN interface as their parent. One red flag in your problem description is that vlan7 is a "secondary internal LAN", and LAN is your "main LAN". That is definitely a little strange sounding. Can you show this in a diagram? This setup, if I understand it correctly, seems ripe for misconfiguration.
-
@bmeeks
Yes, the mail server resolves to the same IP as the domains are pointed. In saying that, I could try using the mx listing and see where I get. Yes, they appear to be hosted in the same place.I took over a facility (large senior living complex) over 3 years ago and it's been a battle. They have the LAN subnet on a .24 and I had 11 free IPs when I got there. They had been telling employees to turn off their phones to not use the netowrk and telling day shift to power off their tabletswhen night shift took over. As a stop gap I added the .60/22 and pushed all corporate wifi clients to it.
I actually have the base LAN, we can call it .68/24. From the same interface we have Vlan2 .69/24 (Phones), Vlan5 .40/24 Phones New, .60/22 which is the Vlan7 I refered to, and a guest VLan with a 192.168I was working to get the subnets/vlans separated and of suitable size that at some point I could merge and put all internal devices on our base LAN network. I just moved the phones to a new .40 Vlan so I could make the .68 a .23 but I ran into this little web snag.
I struggled with the sonicwall for over 2 years and much of my battle was a pair of failing interfaces. Can you imagine spending 3 hours a week over a 2 year period thinking you were crazy as the stupid router wouldn't load balance and the wifi lan would just drop randomly and everyone including Sonicwall kept telling me ti was config errors? I'm sure full time IT guys get that from time to time, but it was a first for me.
Instead of replacing it in kind I opted for PFsense. I had used ClearOS for various routing systems since about 2004 when it was Clarkconnect.
We are a 24/7/365 facility and live network (and wifi) is extremely critical. I had it offline for ~20 minutes at 2am when I switched over to the PFsense box and had a dozen complaints from employees and residents.
So far I'm having good luck and very much enjoying the PFsense I'm running, though my hardware is extreme overkill. I have a few gremlins in the system I need to sort out, but thus far it's so superior to the sonicwall in every meaningful way that I feel silly for not getting this issue sorted.Thank you very much for the direction thus far. I'm feeling it may be a DNS issue as mentioned. In the sonicwall the DHCP server listed our domain server as first DNS, and I copied that over to the PFsense, as well as 8.8.8.8 and 8.8.4.4. The PFsense already lists the domain server as such in the tables, so it's quite possible that the PFsense is not handling it the same as the sonicwall. That'll be my first change in the morning.
-
@VillageIT said in First post.... Lan/some Vlans cant get to website, some vlans can:
. In the sonicwall the DHCP server listed our domain server as first DNS, and I copied that over to the PFsense, as well as 8.8.8.8 and 8.8.4.4.
So your handing out dns other than your AD to your clients? That is going to be problematic for sure. You really have little control what dns a client would use when you list more than 1.
Clients should only ever point to dns that resolves the same stuff.. 8.8.8.8 for sure isn't going to have a clue about your AD records.
Out of the box pfsense resolves - did you change this to forwarding? In an AD shop I would recommend dns and dhcp be handles by your AD.. clients should point to and get dhcp from your AD.
You can then have your AD forward to pfsense, which can then either just resolve or you could set it up to forward to whatever external dns service you want to use.
You could also just point your clients to pfsense, and have a domain override setup for your AD domain.. But if your a MS shop, and using AD its just easier to let the AD handle dns and dhcp.
But clients should only ever talk to dns that can resolve the same thing.. Lets say you setup X and Y for dns.. And X filters, and Y does not.. If your client asks X your filtered, but if you ask Y your not.. So are you actually filtered? You have no idea which dns a client might decide to talk to for any given query.
-
@VillageIT:
The number one rule for DNS is that all the DNS servers you hand out to your clients for them to use must all resolve everything the same. By that I mean each DNS server that a client might choose to use must be able to return the same IP for any domains/hosts a client might ask for (or at least a valid IP for the host - CDNs may return different IPs for different queries, but all the IPs would work).Let's first start with a common misconception among admins. DNS servers are not necessarily referenced in the order they are listed/provided to a client. Admins tend to believe clients will always use the first one listed, and if that one is not working, then skip down to the next one, etc. Many clients will instead randomly choose one from the list they are given, and then continue to use only that one until either it becomes unavailable or the client is rebooted and starts the random choice selection again. So, when you put three DNS servers in your client DHCP configuration, your clients (phones, PCs, and tablets) will likely randomly choose one of those three to use. And the real kicker is each individual client may choose its own unique server. That means you could have three clients on your network and one of them is using your domain server, but the others are using one each of the two Google servers. That's a recipe for confusion - most especially so if your domain server contains records that the Google servers do not.
Here is the potential problem point. You said the first one is your "domain server". Do you mean you have Windows Active Directory and an AD domain, or what exactly do you mean by "domain server"? If you mean Windows AD, then @johnpoz's point becomes extraordinarily critical. Your clients must always consult only your AD domain server for information about AD hosts because that is the only server that knows about those. Google's servers will have no clue about your local AD hosts, and network clients asking Google's DNS servers about local AD resources will be told "no such domain".
If you have a local AD server, then you either specify that server to clients as the only DNS they should use, or you create a domain override in pfSense that points the
unbound
resolver there to your "domain server" for any lookups related to your local AD. -
@bmeeks said in First post.... Lan/some Vlans cant get to website, some vlans can:
Let's first start with a common misconception among admins.
Common is an understatement to be honest - I have seen this so many times and in so many customer networks.. It drives me crazy.. And I have seen who I thought were very knowledgeable techs do this sort of nonsense - and wonder why they have weird issues..
As always spot on post @bmeeks ...
-
Allow me to follow up my previous post with one other common DNS misconception.
Some admins believe that when multiple DNS servers are provided to a client, and that client asks Server #1 for domain
something.com
and Server #1 says "can't find it", that the client will then automatically pass the same request along to Server #2 and so on to each DNS server the client was provided.That is totally incorrect! When Server #1 says "domain not found" (or actually, it will return NX DOMAIN), the client gives up and never asks any other DNS servers for that record. That's because the first server said "that domain does not exist", so the client figures there is no use to pursue the quest any further.
-
@johnpoz said in First post.... Lan/some Vlans cant get to website, some vlans can:
@VillageIT said in First post.... Lan/some Vlans cant get to website, some vlans can:
. In the sonicwall the DHCP server listed our domain server as first DNS, and I copied that over to the PFsense, as well as 8.8.8.8 and 8.8.4.4.
So your handing out dns other than your AD to your clients? That is going to be problematic for sure. You really have little control what dns a client would use when you list more than 1.
Clients should only ever point to dns that resolves the same stuff.. 8.8.8.8 for sure isn't going to have a clue about your AD records.
Out of the box pfsense resolves - did you change this to forwarding? In an AD shop I would recommend dns and dhcp be handles by your AD.. clients should point to and get dhcp from your AD.
You can then have your AD forward to pfsense, which can then either just resolve or you could set it up to forward to whatever external dns service you want to use.
You could also just point your clients to pfsense, and have a domain override setup for your AD domain.. But if your a MS shop, and using AD its just easier to let the AD handle dns and dhcp.
But clients should only ever talk to dns that can resolve the same thing.. Lets say you setup X and Y for dns.. And X filters, and Y does not.. If your client asks X your filtered, but if you ask Y your not.. So are you actually filtered? You have no idea which dns a client might decide to talk to for any given query.
Very true... I guess I set it up that way because "it's how it's always been". Yes, active directory. There are 22 devices that utilize the AD server (hosted on a synology diskstation in the main IT room). I had left the 8.8.8.8 in the secondary mostly because it had always been, and I felt that if the AD failed the hosts could at least resolve outside. Very good information you both share about hosts not always using the order specified. I'll definitely look to adjust how we handle this. I'm liking the domain override route as it allows me to avoid running everything through the synology. Maybe the synology would be capable, but I hate to have the whole network dependent on it.
Thank you so very much for bearing with me. I'm not a full time IT admin (outside our properties that is). I'm fairly capable, but I'm struggling to make changes wholesale that risk losing connectivity.
-
@bmeeks said in First post.... Lan/some Vlans cant get to website, some vlans can:
Allow me to follow up my previous post with one other common DNS misconception.
Some admins believe that when multiple DNS servers are provided to a client, and that client asks Server #1 for domain
something.com
and Server #1 says "can't find it", that the client will then automatically pass the same request along to Server #2 and so on to each DNS server the client was provided.That is totally incorrect! When Server #1 says "domain not found" (or actually, it will return NX DOMAIN), the client gives up and never asks any other DNS servers for that record. That's because the first server said "that domain does not exist", so the client figures there is no use to pursue the quest any further.
Iwas not previously aware of that either. Seems like an extremely basic concept, but I hadn't thought of it. I'll be adjusting the dns layout definitely.
Thank you for the pointers. The answers here are exactly what I was hoping for when I posted. -
@VillageIT said in First post.... Lan/some Vlans cant get to website, some vlans can:
I'm liking the domain override route as it allows me to avoid running everything through the synology.
Dont' forget to also add a reverse PTR domain override as well for your AD domain. You should place two domain overrides in your pfSense DNS Resolver configuration. One for the actual domain name and one reverse PTR so IP address lookups to hostnames will work in AD. You can see what both of these overrides should be named by looking at the current setup in your Windows AD DNS.
Secondly, I would ditch the Google servers altogether. Not needed. Set the DNS Resolver on pfSense to "resolve". That is the default. If you checked the "Forward" checkbox earlier, uncheck it. Also remove any DNS servers you may have entered on the SYSTEM > GENERAL setup tab. Leave those blank. pfSense comes out of the box with a ready to run DNS resolver -
unbound
. You don't have to change a single thing from the defaults for that to work. All you need to add are the two domain overrides I mentioned (one for your AD domain name and one for a reverse PTR scope).Make sure DHCP on pfSense (if that's where your DHCP is configured) is set up to give clients the pfSense box as their only DNS server. Do not give your clients multiple DNS servers. If you do domain overrides, then give the clients only the pfSense box IP for their DNS server. If you are using DHCP within Windows AD, then configure that server to hand out the pfSense box IP as the "DNS" server clients should use.
-
@VillageIT said in First post.... Lan/some Vlans cant get to website, some vlans can:
"it's how it's always been".
That is another one of those misconceptions if you ask me.. Network guy takes over other network - and just goes with the flow, since it seems to be working..
See that all the time as well - when would start working with a new customer.. And you ask the network guy(s) hey why do you have it setup this way.. Because that was the way it was when we got here is a very common answer..
One customer network drove me crazy with this sort nonsense.. Why do you have a /16 for your printer vlan? You have like maybe 100 printers tops in the whole company.. Why do you not set gateways on the printers?? Thats the way it was when we got here.. They were using proxy arp to be able for the other networks to talk to the printers.. Vs just setting the gateway up on them via dhcp or static.. Was like WTF ;)
-
@VillageIT said in First post.... Lan/some Vlans cant get to website, some vlans can:
@bmeeks said in First post.... Lan/some Vlans cant get to website, some vlans can:
Allow me to follow up my previous post with one other common DNS misconception.
Some admins believe that when multiple DNS servers are provided to a client, and that client asks Server #1 for domain
something.com
and Server #1 says "can't find it", that the client will then automatically pass the same request along to Server #2 and so on to each DNS server the client was provided.That is totally incorrect! When Server #1 says "domain not found" (or actually, it will return NX DOMAIN), the client gives up and never asks any other DNS servers for that record. That's because the first server said "that domain does not exist", so the client figures there is no use to pursue the quest any further.
Iwas not previously aware of that either. Seems like an extremely basic concept, but I hadn't thought of it. I'll be adjusting the dns layout definitely.
Thank you for the pointers. The answers here are exactly what I was hoping for when I posted.Yeah, the only way multiple DNS servers on a client would really come into play is if one of those servers failed to respond to a query at all.
To simplify things a bit for this example, you can pretend that the first thing a client does during a DNS request is to "ping" the target DNS server, and if that server fails to answer at all, then it will mark that one as "dead" and try the next one in the list. Once it finds the first DNS server that answers, it will mark that one as "the one we use" and that's it. The client will not choose another one unless this one also eventually fails to respond.
Now the client does not literally do an ICMP ping, it insteads executes a query over DNS port 53 (or 853 if using TLS) to the server. If it fails to get any reply back, it waits for the configured DNS timeout period and then tries the next DNS server in the list. Once any DNS server answers the initial query, the client will then use only that server for future DNS requests (until it fails to respond).
But key in the above is that "failure to respond" means the equivalent of "did not answer a ping". The server may instead answer the "ping" but say "I can't find the host or domain you asked for". The client says okay, forget it, and will not ask any other server. Having multiple DNS servers listed is for actual DNS server failures -- it is not to have a list of servers you ask in succession until one of them gives you an answer for some host or domain.
-