2100 DHCP VLAN configuration
-
@thalin VLAN200 will leave the 2100 tagged on all ports and now on the other end you gotta have either a switch that can handle VLANs or a client that is configured to handled VLAN200 tagged traffic.
How is the network looking after the 2100? Can you draw a simple diagram, like "internet - 2100 - VLAN-awar-switch - clients"?
And can you post a picture of the switch port configuration?
-
Yup what do you have attached to the 2100 ports? It needs to be something that can handle the tagged traffic.
If you have left the PVID as 1 on all ports though it must be arriving tagged as 200 though since the dhcp discover traffic has made it through.
-
Hey folks, thanks for the response! I appreciate your willingness to respond and maybe help!
I apologize for not being a bit more detailed here. I thought it was clear that since the 2100 is getting the DHCP request on the management VLAN interface (mvneta1.200), the client device was indeed requesting an address using that VLAN tag.
For context this is a unifi switch (US-8-60W) connected port 1 on the switch to LAN port 1 on the 2100, with the "Network Override" option on the US-8-60W set to the correct VLAN. This is the setting which tells the Unifi device which VLAN tag to request its management address on. Thus, the DHCP request to the 2100 is tagged on the correct network (not untagged, which would be the default behavior for these devices).
This is how I do most of my Unifi gear in other places (this works great with a Netgate 6100, but it has native ports, not a switch, which I think is the complication that I'm not understanding and getting correct here) even though it's probably unnecessary to have a separate management network. In this case it's also standing in for a more necessary requirement to have some traffic split off because of actual business requirements. Once I get this working I'll be able to replicate it for the other VLANs which will be needed.
So to explicitly state the problem I'm having:
- the traffic is tagged on the way in from the client
- the 2100 gets the request on the tagged vlan
- but the replies never make it back to the client for some reason I don't understand.
Hope that helps clarify the situation!
-
Also here are the requested diagrams/screenshots:
Switch port config:
Network diagram (it really is this simple right now, I'm configuring it through a Wireguard link so there is literally nothing else plugged into the device, and nothing plugged into the US-8-60W - the switch itself is the only client):
-
Ah so the DHCP client here is the switch itself? Hmm.
As you say it appears the dhcp replies never reach the client. Hard to see how that could happen though.
Try running a pcap on mvneta1, including tagged packets, for dhcp traffic on udp ports 67 and 68. Make sure it is actually sending the replies.
-
Yep this is one of the first things I did, so I have a pcap from pretty early on, before I posted this thread. I can definitely see the offer going out from the 2100. I will see if I can get a pcap from the client perspective today. Anyway, here's a screenshot from the pcap I did in pfSense:
-
Are those redacted addresses public IPs?
Are the MAC addresses correct in the replies?
Is the VLAN tagging correct?
The client is not the switch itself then?
-
@stephenw10 said in 2100 DHCP VLAN configuration:
Are those redacted addresses public IPs?
Nah, just don't want to spew private ip blocks around on the internet (I know, paranoid and probably unnecessary).
@stephenw10 said in 2100 DHCP VLAN configuration:
Are the MAC addresses correct in the replies?
Yes the MAC addresses seem correct. On one of the DHCP Offer packets, the destination MAC is the MAC I see in the Unifi UI for the client device, and the source is the MAC for
mvneta1
on the 2100.@stephenw10 said in 2100 DHCP VLAN configuration:
Is the VLAN tagging correct?
This capture was done on the VLAN interface. I will have to go do another capture on the LAN interface of the 2100 to see if the VLAN tagging is correct - though I assume it is at least for the traffic to be showing up there... I'll do another capture anyway to make sure that the replies are actually tagged.
EDIT: Yep, capture on the parent interface confirms that the offer packet is tagged on the correct VLAN:
@stephenw10 said in 2100 DHCP VLAN configuration:
The client is not the switch itself then?
Incorrect, the client is currently the US-8-60W Unifi switch. I was trying to say that I could swap out the switch as client and get another computer to do captures on the client side since afaik I can't do a client-side pcap from the switch.
-
Hmm, well I guess I would test a client that isn't the switch just in case it has some quirk that prevents it using a VLAN correctly for management. But if you used that same setup on a 6100 it should work here too.
Otherwise a pcap at the client will at least show if it reaches it.
-
@stephenw10 hello again, sorry for the pause. I went off to work on other things for a while, but now I'm back with a bit better test setup which is leaving me ultimately more confused than when I left off (spoiler alert lol).
A bit of a summary refresher on the setup here since it's been so long:
- Netgate 2100 running 24.11, configured to have VLAN 200 tagged on all 4 switch ports as well as port 5, the uplink port.
- The 2100 has a VLAN 200 interface defined (named MANAGEMENT, in case I refer to it as the management network or something later on accidentally) with an IP address of 10.XX.2.1/24
- The 2100 has DHCP configured on the VLAN 200 interface set to give out IPs in the range of 10.XX.2.100/24 - 10.XX.2.254/24
- Unifi US-8-60W PoE switch, with switch port 1 plugged into LAN port 1 of the 2100.
- This switch is configured to use VLAN 200 as its management network interface, and configured to DHCP to get its address.
I could try static IP for the switch management interface, but it would be much more painful to reconfigure it since I am going back and forth between networks to make configuration changes to the switch right now and my local network even though it has the same VLANs, has different IP ranges. The Unifi controller is on my local network (and will be available via VPN tunnel once it's actually on the 2100's network successfully).
New tests
So I have a new mini-pc that has yet to be installed with an actual OS for real usage, so I decided to boot up a Kali live-cd to do some easier pcaps than I was able to do before. I also configured the switch to set up port mirroring for port 1 so I could see what the switch sees when using wireshark on Kali.
A few tests.
Client PC instead of US-8-60W
Kali when plugged into the same port with the same configuration as the switch is able to get a DHCP address just fine on VLAN 200. So there's definitely some incompatibility/configuration problem between the US-8-60W & the 2100.
Mirrored port on US-8-60W
So next I set up port mirroring on the switch so I could get the DHCP conversation from the client switch side. I plugged in the Kali machine to port 3 and set up Wireshark to listen to all traffic on eth0 on that machine so I could just see whatever the switch sees, both tagged and untagged traffic.
Unfortunately for my sanity, the switch does see the DHCP Offer packet from the 2100. I have no idea why the switch isn't doing anything with it. This is especially confusing since it seems to work perfectly well on my home network, using the same VLAN - just with a different IP subnet. Literally just plug the exact same port on the US-8-60W into one of my Unifi switches configured to tag VLAN 200, and my home Netgate 4100 gives it an IP address on VLAN 200 with the correct IP subnet for that network. which it accepts right away.
FWIW I also tried this with another port on the client switch (the management interface is virtual and can attach to any physical port, so it shouldn't matter which port is the uplink to the 2100) and got as far as I can tell the same behavior.
What's Next
I'm a bit at a loss here what to do next. I have a second US-8-60W device here for another site that I am going to try to provision the same way and see if it has the same behavior. I may also try a Netgate 1100 device I have for this other site that I can see if it exhibits the same behavior as the 2100. Maybe it's just a weird compatibility thing between these two devices, or something. I have no idea at this point.
- Netgate 2100 running 24.11, configured to have VLAN 200 tagged on all 4 switch ports as well as port 5, the uplink port.
-
Ok that's some good testing.
So see the switch send the dhcp request and it is tagged 200?
And you see pfSense reply and that is also tagged?
Try using another client connected to the switch that pulls a lease whilst pcapping the mirror port. There must be some difference between the switch and another client.
It could be the switch requires something additional. For example we have seen some ISP that only respond to dhcp requests when given the right priority tag. Or that send a priority tag causing the replies to be dropped.
-
Ok, to do some more testing, I let the client switch I'm testing fall out of the DHCP cache for both my 2100 (where I'm having the problem) and my 6100 (which works fine) and then did a pcap from the client switch port mirroring so I could see if there are any obvious differences in the DISCOVER/OFFER packets which might give me clues as to why the client switch is happy on the 6100 but not on the 2100. I don't see any appreciably different options that the DHCP servers are passing along - the differences are:
- the MAC addresses of the DHCP servers
- the transaction IDs
- the IP addresses of the server and the offered IP address
- the subnet masks (which makes sense, the 2100 has a /24 while the 6100 has a /23 for management subnet)
- the option 55 content which should differ between the two routers.
The only thing that's not obviously supposed to be different is that the 2100 has the priority byte for the VLAN segment set to 0 instead of 7 on the 6100. I will see if I can change that and see if it makes a difference. EDIT: nope, no difference.
So, I have no idea what's going wrong here. Both client switches I have tried have the same behavior. Maybe I will try downgrading the client switch firmware and seeing if that makes a difference?
I may try to sanitize my pcaps and upload them if that seems like it might be useful.
-
Did you test some other client device behind the switch pulling a lease? How that differs from the switch as a client?
Bizarre. Hard to see what might be different there.
One possible test you could do would,be to assign mvneta0 as the LAN on the 2100 to remove the on-board switch. That's quite involved though.