DNS Resolution/Routing Issue on VLAN


  • Hey everyone. Longtime pfSense user, first time poster here. Please have mercy on me. 🙂

    I have created "VLAN200" (10.200.0.1/24) on the LAN (10.100.0.1/16) parent interface and DNS resolver is listening on both interfaces. I have an "any to any* rule on VLAN200 and can ping and tcp/udp in and out just fine- with the exception of DNS.

    I can query the resolver at its LAN address from the VLAN200 network- I can see the query in the resolver logs. However, lookups timeout on the VLAN200 client. A packet capture shows this during the query:

    07:40:11.534314 IP 10.200.0.202.46358 > 10.100.0.1.53: UDP, length 38
    07:40:11.534343 IP 10.200.0.202.46358 > 10.100.0.1.53: UDP, length 47
    07:40:11.534404 IP 10.200.0.1.53 > 10.200.0.202.46358: UDP, length 166
    07:40:12.835749 IP 10.200.0.202.39892 > 10.100.0.1.53: UDP, length 37
    07:40:12.835794 IP 10.200.0.1.53 > 10.200.0.202.39892: UDP, length 53
    07:40:16.534558 IP 10.200.0.202.46358 > 10.100.0.1.53: UDP, length 38
    07:40:16.534572 IP 10.200.0.202.46358 > 10.100.0.1.53: UDP, length 47
    07:40:16.534610 IP 10.200.0.1.53 > 10.200.0.202.46358: UDP, length 128
    07:40:16.534611 IP 10.200.0.1.53 > 10.200.0.202.46358: UDP, length 166
    07:40:17.835539 IP 10.200.0.202.39892 > 10.100.0.1.53: UDP, length 37
    07:40:17.835586 IP 10.200.0.1.53 > 10.200.0.202.39892: UDP, length 53
    07:40:22.835658 IP 10.200.0.202.39892 > 10.100.0.1.53: UDP, length 37
    07:40:22.835691 IP 10.200.0.1.53 > 10.200.0.202.39892: UDP, length 53
    07:40:26.535551 IP 10.200.0.202.46358 > 10.100.0.1.53: UDP, length 38
    07:40:26.535561 IP 10.200.0.202.46358 > 10.100.0.1.53: UDP, length 47
    07:40:26.535594 IP 10.200.0.1.53 > 10.200.0.202.46358: UDP, length 128
    07:40:26.535595 IP 10.200.0.1.53 > 10.200.0.202.46358: UDP, length 166
    

    and the state table shows similar to this:

    VLAN200	udp	10.200.0.1:53 -> 10.200.0.202:55194	SINGLE:NO_TRAFFIC	7 / 0	1 KiB / 0 B	
    VLAN200	udp	10.200.0.1:53 -> 10.200.0.203:59254	SINGLE:NO_TRAFFIC	3 / 0	306 B / 0 B	
    VLAN200	udp	10.200.0.1:53 -> 10.200.0.203:60315	SINGLE:NO_TRAFFIC	3 / 0	375 B / 0 B	
    VLAN200	udp	10.200.0.201:56156 -> 10.100.0.1:53	NO_TRAFFIC:SINGLE	8 / 0	564 B / 0 B	
    VLAN200	udp	10.200.0.202:55194 -> 10.100.0.1:53	NO_TRAFFIC:SINGLE	8 / 0	564 B / 0 B
    

    Is the problem here something beyond my understanding of the nature of udp and the fact the resolver is responding to the caller on a different interface from where the request came in? Is this something that's solvable with a route or a firewall rule? Queries to the resolver at the 10.200.0.1 address from that network return successfully and I suppose I could reconfigure clients on that segment to use that address, but I'd like to know the underlying cause of this issue and be sure I won't issues with other udp services.

    Thanks in advance for any advice.


  • @w3dg3 said in DNS Resolution/Routing Issue on VLAN:

    DNS resolver is listening on both interfaces.

    So why is your VLAN200 device sending DNS requests to the LAN address?
    Did you set this on the client or does he get the IPs from DHCP?


  • @viragomann It's listening on both because it defaulted to both when VLAN200 was created, really for no other reason. I'm sending requests to 10.100.0.1 because that would be my preference since it's routable and would cut down on the number of client IP/DNS configurations across the entire topology- most of which are static and not set by DHCP.

    *edited for typo.
    **edited to answer the DHCP question.

  • LAYER 8 Global Moderator

    Ok there is a piece of this puzzle missing.. Are you doing dns redirection? Are both this 10.200 and 10.100 networks directly attached to pfsense? Or are you doing some sort downstream connection, or are these devices multihomed (ip in both networks). Are you natting between these networks.. Etc.. clearly some piece of puzzle is missing.

    So I have a vlan, 192.168.2/24 and rules allow for it to query the lan IP 192.168.9.253 for dns..

    Here is what the state tables and a sniff of the the query on the 192.168.2 interface of pfsense looks like for a simple dns query for www.google.com

    dns.png

    Where exactly did you do that sniff? Its odd that your showing what looks like duplicate packets? But they are different lengths?

    07:40:16.534558 IP 10.200.0.202.46358 > 10.100.0.1.53: UDP, length 38
    07:40:16.534572 IP 10.200.0.202.46358 > 10.100.0.1.53: UDP, length 47
    

    You would show multiple states because those are different source ports in your query..


  • @johnpoz 10.100.0.1 is the LAN interface (em1) on pfsense. 10.200.0.1/24 is a VLAN on em1. Plugged into the LAN interface is a 24-port managed 1G switch to which several 10.100.x.x clients are connected as well as a managed 10G switch that contains 10.200.x.x clients (VLAN200), exclusively.

    The capture was done from pfsense itself, filtering on the VLAN200 interface, the caller's ip of 10.200.0.202, and port 53. Here is again for good measure in case anyone (me) missed anything. In fact, it looks more sane without the duplicate packets but with ostensible retries.

    13:49:13.112963 IP 10.200.0.202.41322 > 10.100.0.1.53: UDP, length 28
    13:49:13.113027 IP 10.200.0.1.53 > 10.200.0.202.41322: UDP, length 44
    13:49:18.112725 IP 10.200.0.202.41322 > 10.100.0.1.53: UDP, length 28
    13:49:18.112768 IP 10.200.0.1.53 > 10.200.0.202.41322: UDP, length 44
    13:49:23.112901 IP 10.200.0.202.41322 > 10.100.0.1.53: UDP, length 28
    13:49:23.112978 IP 10.200.0.1.53 > 10.200.0.202.41322: UDP, length 44
    

    *edit, again with the typos.

  • LAYER 8 Global Moderator

    So that looks normal.. Other than not understanding why queries are all from the same source port.. That looks to be duplicated traffic..

    Open in up in wireshark - which is it sending back.. Maybe its sending back refused? Or NX, what is being queried for?


  • @johnpoz Just google. Here in full detail from pfsense, same filters.

    14:36:27.503191 50:6b:8d:65:d3:c5 > 20:25:64:08:76:25, ethertype IPv4 (0x0800), length 70: (tos 0x0, ttl 64, id 10149, offset 0, flags [none], proto UDP (17), length 56)
        10.200.0.202.46519 > 10.100.0.1.53: [udp sum ok] 62538+ A? google.com. (28)
    14:36:27.624783 20:25:64:08:76:25 > 50:6b:8d:65:d3:c5, ethertype IPv4 (0x0800), length 86: (tos 0x0, ttl 64, id 61101, offset 0, flags [none], proto UDP (17), length 72)
        10.200.0.1.53 > 10.200.0.202.46519: [bad udp cksum 0x16a0 -> 0x39a3!] 62538 q: A? google.com. 1/0/0 google.com. A 172.217.3.238 (44)
    14:36:32.503027 50:6b:8d:65:d3:c5 > 20:25:64:08:76:25, ethertype IPv4 (0x0800), length 70: (tos 0x0, ttl 64, id 10260, offset 0, flags [none], proto UDP (17), length 56)
        10.200.0.202.46519 > 10.100.0.1.53: [udp sum ok] 62538+ A? google.com. (28)
    14:36:32.503099 20:25:64:08:76:25 > 50:6b:8d:65:d3:c5, ethertype IPv4 (0x0800), length 86: (tos 0x0, ttl 64, id 1137, offset 0, flags [none], proto UDP (17), length 72)
        10.200.0.1.53 > 10.200.0.202.46519: [bad udp cksum 0x16a0 -> 0x39a8!] 62538 q: A? google.com. 1/0/0 google.com. A 172.217.3.238 (44)
    14:36:37.503339 50:6b:8d:65:d3:c5 > 20:25:64:08:76:25, ethertype IPv4 (0x0800), length 70: (tos 0x0, ttl 64, id 14048, offset 0, flags [none], proto UDP (17), length 56)
        10.200.0.202.46519 > 10.100.0.1.53: [udp sum ok] 62538+ A? google.com. (28)
    14:36:37.503412 20:25:64:08:76:25 > 50:6b:8d:65:d3:c5, ethertype IPv4 (0x0800), length 86: (tos 0x0, ttl 64, id 41271, offset 0, flags [none], proto UDP (17), length 72)
        10.200.0.1.53 > 10.200.0.202.46519: [bad udp cksum 0x16a0 -> 0x39ad!] 62538 q: A? google.com. 1/0/0 google.com. A 172.217.3.238 (44)
    

    I don't have better sniffing tools anywhere on this VLAN nor do I have any convenient means to acquire them. I will need to deploy a VM out there and do some more testing. With other obligations, it might be a few hours before I return with anything more substantive.

    Thanks for the help so far.

  • LAYER 8 Global Moderator

    You know you can just open up the sniff on pfsense with wireshark right.. Just download it.

    But that looks like it was asked an answered.. Are you saying the client never go an answer?

    14:44:35.264004 02:11:32:22:cc:7d > 00:08:a2:0c:e6:20, ethertype IPv4 (0x0800), length 97: (tos 0x0, ttl 64, id 23966, offset 0, flags [none], proto UDP (17), length 83)
        192.168.2.12.58650 > 192.168.9.253.53: [udp sum ok] 32519+ [1au] A? www.google.com. ar: . OPT UDPsize=4096 (55)
    14:44:35.264267 00:08:a2:0c:e6:20 > 02:11:32:22:cc:7d, ethertype IPv4 (0x0800), length 101: (tos 0x0, ttl 64, id 24659, offset 0, flags [none], proto UDP (17), length 87)
        192.168.9.253.53 > 192.168.2.12.58650: [bad udp cksum 0x8dae -> 0x56ec!] 32519 q: A? www.google.com. 1/0/1 www.google.com. A 172.217.4.68 ar: . OPT UDPsize=4096 (59)
    

    There is full mode on pfsense, and here is my client query and answer..

     dig @192.168.9.253 www.google.com
    
    ; <<>> DiG 9.11.3-1ubuntu1.13-Ubuntu <<>> @192.168.9.253 www.google.com
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32519
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
    
    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 4096
    ;; QUESTION SECTION:
    ;www.google.com.                        IN      A
    
    ;; ANSWER SECTION:
    www.google.com.         3424    IN      A       172.217.4.68
    
    ;; Query time: 0 msec
    ;; SERVER: 192.168.9.253#53(192.168.9.253)
    ;; WHEN: Tue Nov 17 14:44:35 CST 2020
    ;; MSG SIZE  rcvd: 59
    

  • So the client is requesting the LAN IP, but pfSense is sending responses from the VLAN IP. Hence the client won't accept the response and the DNS request is failing.

    That's not the default behavior, even in a setup like yours. But I have no idea, what could be the reason for this.
    Possibly you have something miss-configurated with the VLAN or do a kind of outbound NAT?

    However, as I suggested above, simply use the VLAN IP as DNS on the clients and your headache will be gone.

  • LAYER 8 Global Moderator

    Oh good catch ;)

    Yeah client will say that is BAD.. Are you doing some sort of source nat?


  • @johnpoz said in DNS Resolution/Routing Issue on VLAN:

    Oh good catch ;)

    Yeah client will say that is BAD.. Are you doing some sort of source nat?

    I suspected the reply from the resolver's other IP was problematic and thought I had indicated that in my original post. My apologies for wasting cycles having been unclear on that. Additionally, I am not doing any source or outbound NAT anywhere.

    @viragomann said in DNS Resolution/Routing Issue on VLAN:

    So the client is requesting the LAN IP, but pfSense is sending responses from the VLAN IP. Hence the client won't accept the response and the DNS request is failing.

    That's not the default behavior, even in a setup like yours. But I have no idea, what could be the reason for this.
    Possibly you have something miss-configurated with the VLAN or do a kind of outbound NAT?

    However, as I suggested above, simply use the VLAN IP as DNS on the clients and your headache will be gone.

    The VLAN is pretty simply configured on pfSense and both downstream switches. I've pored over each config for hours now. I can't find anything in them that leads me back to this issue. Why is pfSense/unbound coming back to that network through the other interface? Ugh.

    I suppose I'll acquiesce to changing the DNS configs on that VLAN to query unbound on the VLAN200 interface rather than LAN. That won't bode well for my curiosity, but sometimes you have to admit defeat.