NAT DNS
-
I have an authoritative BIND9 server (with does not allow recursion) in a server VLAN as defined by pfSense.
- 2x floating rules for IPv6 TCP/UDP on port 53 & 853 allow for external zone queries to the BIND9 server
All VLANS query the pfsense resolver which has DNSSEC & Forwarding (2606:4700:4700::1 & 1.1.1.1) with SSL/TLS enabled.
Via IPv6 this works perfectly.
On IPv4 it only works when the DNS is manually set to 1.1.1.1 or 8.8.8.8 (I understand both services query the BIND9 server via IPv6, yet will responds to IPv4 based client queries)
In the endevour to prevent having to set the DNS manually every time connecting via 4G a "NAT/Port Forward" is setup on the WAN IPv4 interface to the BIND9 server's private IPv4 interface (Preference would've been Layer4 load balancing with HAProxy should it have supported UDP)
- an IPv4 UDP rule with the BIND9 server's private IPv4 is automatically created under the WAN rules
However, only every 3rd or even 5th query (via IPv4 over 4G) to the BIND9 zone is answered. All others are finalised as "connection timed out; no servers could be reached"
- Inspecting the firewall logs indicate that the traffic is not being blocked
- Inspecting the BIND9 log reflects 3x successful queries to the server, per single client query, before the client states "connection timed out" response
- A packet capture on the WAN interface indicates every query made to the server with the server's response
Querying any other public DNS (8.8.8.8 or 1.1.1.1) service with the exact same query (via ipv4) is answered immediately each and every time.
What sorcery is required to fix this ?
-
i can only guess for now i don't see anything wrong / there are not enought information but can you try to set
edns-udp-size 512; max-udp-size 512;
inside named.conf
and test again?if it work you can find more information here
https://www.dns-oarc.net/oarc/services/replysizetest/ -
@kiokoman, unfortunately no change ...
Either the DNS query is answered or it times out. With only the 3rd or 4th subsequent query (dig) attempt being answered.
It's only happening over the PORT FORWARDING with IPv4. Any queries from IPv4 machines within other VLANs also work flawlessly.
Please do advise what additional info you'd required.
-
uhm it can't be a port forward problem, all the requests are hitting the bind9 server
maybe
checksum offloading ? is it disabled ?
a more detailed packet capture could be of help -
Moved the BIND9 server from the server VLAN to the LAN.
Adjusted the PORT FORWARDING to reference the new IPv4 LAN address.
Now, each and every query generated from the IPv4 4G network is answered immediately without any timeouts.
Yet queries from the SERVER VLAN to the BIND9 server now has the issue as initially described. Only exception being, the BIND9 log only registers the successful queries.
-
After checking (disabling) "Hardware Checksum Offloading" under SYSTEM > ADVANCED > NETWORKING > Network Interfaces every query is answered, immediately.
What a confusing and frustrating battle. Thank you @kiokoman aka "bringer of light to the dark packet woods" !
Would you advise on also having "Hardware TCP Segmentation Offloading" and "Hardware Large Receive Offloading" disabled ?
-
https://docs.netgate.com/pfsense/en/latest/hardware/tuning-and-troubleshooting-network-cards.html#tso-lro