LAGG (LACP) - UniFi Switch (16XG)
-
@kklouzal said in LAGG (LACP) - UniFi Switch (16XG):
I have read on other posts there may be an issue one one side trying to use static LAG and the other using dynamic LACP...
It is. It doesn't work.
... but I'm not sure how to confirm this is the issue on my end...
You don't how the switch is configured?
...or how to remedy it if so.
The remedy would be to configure LACP on the relevant ports on both sides.
-
The switch only gives me 3 options for the port: Switching, Mirrioring, and Aggregate. I have aggregate selected and from all the research I can find on google, Aggregate is in fact Dynamic LACP.
How can I confirm on the PfSense side were using the correct settings?
Outlined in various other posts people have had success by changing strict mode to 0 and cycling the LAGG interfaces.
I've gone ahead and done this but it doesn't help.
I've since put it back to the default value of 1.
As a side note, if I leave the LAGG interface setup in its current state, after about two hours every machine on the network will lose their IP address and DHCP will not supply a new address. Manually setting an IP on a machine will restore connectivity however DHCP will no longer supply an address, this occurs for machines with static mappings as well.
-
We can be confident pfSense is using the right config because based on what you have posted that is what you have told it to do.
Probably start by posting the output of:
ifconfig -v lagg0
ifconfig -v cxgbe0
ifconfig -v cxgbe1
Why the bridge? What are the bridge members?
-
Thank you for the reply. I'm positive as well that it's a configuration issue on the UniFi switch stopping us up here. The GUI gives little to no options for configuration, I'll have to dig into the CLI some more for better insight.
As for the bridge, the ports are connected to the IPMI of other servers, one port is connected to an access point but that will be moved over to one of the four RJ-45 ports on the 16XG after this current issue is sorted out. We have plans to get a managed POE switch in the near future but I think the bridge will stay regardless so those ports can continue being used for IPMI.
-
If that lacp strict sysctl did not help, I would change it back to the default of 0 (remove the sysctl)
Why is there no address on lagg0?
No idea why you would continue to bridge if you had a good link to a good switch.
-
I had changed it from the default of 1 to 0 and since then removed the loader.conf entry and rebooted the machine.
lagg0 is the Network Port for UniFi_LAGG which is setup like the other bridged interfaces to be enabled with no address then added as one of the bridge members. The bridge interface itself has the assigned address.
I was able to get some details from our UniFi switch about the port 11/12 LAGG
It appears to be transmitting and receiving packets without any errors however removing the ethernet cable that allows me to ssh into it results in no longer being able to ping the switch.
-
OK when I asked what interfaces were part of the bridge, that's what I was asking, what are the bridge member interfaces...
I have no idea if adding a lagg as a bridge member even works. You're certainly poking your head into dark corners. I would take the lagg out of the bridge, number it, and see if it behaves in a more consistent manner. As soon as you know that works, add it to the bridge and see if that breaks it. Then you know.
What does
ifconfig -v bridge0
show?And what port is that ssh client plugged into?
You're going to have to describe things in much more detail and with more specificity. Saying "the ethernet cable that allows me to ssh into it" tells us nothing we can act on.
-
I apologize, there are only 3 connected cables: two SPF+ direct attach connected to 11 and 12 on the switch as well as cxgbe0 and cxgbe1 on PfSense. In this configuration no traffic will pass, adding a third RJ-45 cable from port 16 on the switch into one of the free ports on PfSense allows us to communicate with the switch and login via SSH.
I'll try removing the LAGG from the bridge and assigning an IP so we can further narrow down the issue.
-
Not sure why STP is enabled on the lagg0 bridge member. Did you specifically enable that?
-
Yes, however I should leave it disabled since this is the only switch currently in the network topology and having it enabled only makes sense for a larger environment with multiple switches also running the protocol.
I went ahead and removed UniFi_LAGG from the bridge members, assigned it an IP address of 192.168.2.1, enabled DHCP for UniFi_LAGG, removed our 'third ethernet cable', and rebooted the switch. After the switch came online there were no DHCP leases handed out on the 192.168.2 subnet (was looking for one given to the switch) so I directly plugged my laptop into port 16 on the switch and was unable to receive an IP address. I then manually assigned my laptop an address of 192.168.2.10 and was unable to ping 192.168.2.1
-
OK so look at the VLAN configuration on the switch.
Did you enable a DHCP server on the lagg interface?
Did you add firewall rules to the lagg interface?
-
Yes DHCP server was enabled and LAGG interface has firewall rule to permit all traffic, the switch doesn't have any VLANs setup, it should just switch traffic for the subnet it's directly attached to.
The fact that our switch is unable to optain an IP address from DHCP after removing the LAGG interface as a bridge member, directly assigning it an IP address, then enabling DHCP on that subnet tells me there is still a miss-configuration in the LAGG interface somewhere either on the PfSense or UniFi side?
-
Packet capture for UDP 67 on the lagg and see what you see regarding DHCP traffic. Zero idea what is required for the switch itself to obtain DHCP there. It would have to be the management VLAN at least I would assume.
Personally I would be less concerned with that as I would be with clients connected to the switch on the lagg VLAN getting addresses.
-
I unplugged both SPF+ cables making up the LAGG and unplugged the RJ-45 connected to my laptop then started the packet capture, I then plugged the two SPF+ cables back in and finally the RJ-45 and waited until my laptop stopped trying to identify on the network and finally set itself a 169 IP address. Here are the results
On the UniFi_LAGG interface DHCP server I have 1 static lease setup for the UniFi switch to use 192.168.2.80 as it's address. The DHCP server lease range is from 192.168.2.100 to 192.168.2.200 and from this I can conclude that PfSense DHCP server is attempting to assign the two devices their proper addresses. It would seem that communication from the switch back to PfSense is allowed to pass however traffic from PfSense over towards the switch is being blocked?I tried this with net.link.lagg.lacp.default_strict_mode set to 1 and 0 but it gave the exact same results, I have again set the value back to the default of 1.
-
Yes, it looks like the DHCP server on the firewall is receiving the requests and responding.
Is there some sort of DHCP snooping or protection in the switch that might be in play here?
Odd that capture never sees the requests coming in. What did you filter on there?
-
I'm not aware of any security measures in place or can I find any options in the GUI about security that doesn't pertain to the UniFi Security Gateway, even at that those options are all disabled since we don't have that specific device present on our network.
I am still unable to ping PfSense at 192.168.2.1 from my laptop which is connected through the switch even after we set a static IP of 192.168.2.50, Default Gateway of 192.168.2.1 and Primary DNS of 192.168.2.1, each ping attempt results in destination host unreachable. I suppose this could be due to the switch not having a proper address at this point?
I'm also curious as to why net.link.lagg.lacp.default_strict_mode at 1 or 0 both yield the same results, shouldn't one of these values make the LAGG completely unusable?
When I did the Packet Capture these were my settings:
-
@kklouzal said in LAGG (LACP) - UniFi Switch (16XG):
I am still unable to ping PfSense at 192.168.2.1 from my laptop which is connected through the switch even after we set a static IP of 192.168.1.50,
Typo? 192.168.2.50?
-
Don't filter on the IP address. Capture everything on port 67. You are missing the DHCP broadcasts.
-
Yes it was a typo, whoops :)
I went ahead and removed the Host Address from the filter, still only capturing outgoing requests for some reason.
-
No, anything sourced on port 68 is from the client. Anything sourced from port 67 is from the server.
I would download that into wireshark and look at what DHCP is actually doing. Looks like it's working there to me, but can't see much other than two-way traffic.