Resolving DNS names on CARP Backup
I have a curious problem here, not sure what to make of it. Wasn't sure whether to put it here or into the DNS forum either.
I have a standard CARP setup with master and one backup. We have been allocated a /28 from our provider, served by a DSL modem which acts as gateway. Both master and backup have the same settings, including the same DNS servers configured. The config is replicated to the backup, everything standard.
Everything works well, except resolving DNS entries on the backup. This leads to the problem that the NTP server can't be reached, which leads to non-functioning DHCP replication because of time differences between master and backup.
I can even ping the DNS servers from the backup, however host and nslookup time out. There is no filter rule blocking DNS requests either. Like I already wrote, everything works on the master which has the same setup.
Maybe someone here has an idea, I'm slowly running out of them.
EDIT: I should add, the DNS servers are indeed correctly written to /etc/resolv.conf.
EDIT2: This is pfSense 2.0RC1 by the way.
EDIT3: Another data point:When shutting down the master, and the backup becomes the master, DNS resolution suddenly works. After booting the original master, DNS resolution switches back to not working.
Sounds like maybe the routing isn't quite right or your IP settings for the slave aren't quite right.
Are you actually using three separate fully routable IPs on the WAN for those systems? One IP for master, one IP for slave, and another IP for CARP?
thanks forn your answer. As far as I can see, the WAN adressing is correct. All three adresses are in the /28 our provider has assigned to us. I have also checked the settings of the gateway again, they're identical on master and slave. Like I wrote, I can even ping the DNS servers from the backup.
Next step would be to do a packet capture of the DNS request as it leaves WAN on the backup, see what is really leaving.
Using tcpdump -i vr1 host 126.96.36.199 (WAN interface and my provider's NS) and doing a nslookup www.pfsense.org (109.164.244.X being the WAN interface IP):
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vr1, link-type EN10MB (Ethernet), capture size 96 bytes
10:06:19.324135 IP 109.164.244.X.53386 > 188.8.131.52.domain: 61531+ A? www.pfsense.org. (33)
10:06:19.832359 IP 109.164.244.X.45296 > 184.108.40.206.domain: 13967+ PTR? 220.127.116.11.in-addr.arpa. (44)
10:06:25.320350 IP 109.164.244.X.53386 > 18.104.22.168.domain: 61531+ A? www.pfsense.org. (33)
10:06:26.850236 IP 109.164.244.X.45296 > 22.214.171.124.domain: 13967+ PTR? 126.96.36.199.in-addr.arpa. (44)
10:06:31.340410 IP 109.164.244.X.53386 > 188.8.131.52.domain: 61531+ A? www.pfsense.org. (33)
10:06:41.871859 IP 109.164.244.X.36855 > 184.108.40.206.domain: 13968+ PTR? X.244.164.109.in-addr.arpa. (46)
10:06:48.890407 IP 109.164.244.X.36855 > 220.127.116.11.domain: 13968+ PTR? X.244.164.109.in-addr.arpa. (46)
This looks ok to be, except I don't get a reply from the NS.
So it's leaving but never coming back - that suggests that nothing is wrong with the box itself, but rather the router ahead of it or the switch (or something else on layer 2) holding it back.
You might repeat that same test but pass -e to tcpdump as well to check the MAC addresses, and compare it to the same test on the master.
Are you spoofing a MAC there? Perhaps the same mac on both master and slave? (that wouldn't work)
That's what I suspected as well, especially since I've had "interesting" effects with the VDSL modem before (no Switch in between, master and backup are directly connected to the VDSL modem). However I haven't seen any option in the modem that could be involved here.
I repeated the test with the -e switch, WAN on master and backup have different MAC addresses.
Also, no ARP spoofing involved. I'll try playing with the VDSL modem some more.
Try putting a small switch between the firewalls and the modem. I have seen others encounter problems with CARP on the switches built-into CPE/modems of various kinds.
Jim, first class support! After putting a switch in between and rebooting the modem, everything works.
Thank you very much!
Good to hear that worked. I suppose I should add that to the CARP troubleshooting doc on the wiki.
Guess I celebrated too fast. It worked yesterday, but after coming back to work today it doesn't work anymore. Will do some more testing later on.
EDIT: I updated both machines to 2.0RC3. After the subsequent reboot it's working again, even after coming back to work the next morning. Let's see how it goes.