Ruckus Access Points Heartbeat lost in LAN



  • Hi all
    I've several Ruckus R320 APs in my LAN. A Pfsense with the newest version (2.4) is in Front of it.
    In the Logs of the Ruckus APs all two hours the error log: heartbeat lost. In the ruckus forum I've found following:
    https://support.ruckuswireless.com/articles/000003945

    I beleive that it could be some timeouts in pfsense. Has anyone an idea how I could fix that? - They are all in the LAN zone.

    Thx
    admins


  • LAYER 8 Global Moderator

    And why would you think pfsense would be involved in devices on lan talking to other devices on lan?

    Is the ZD on a different network then the AP, where pfsense is routing this traffic?



  • Hi @johnpoz
    I've monitored the latency on L2 - no problems. Perhaps on L3 - PFsense is something wrong.
    I've seen such problems in the past with voip and udp timers.

    admins



  • Note on that page that nothing implicates your edge firewall/router.

    Possible fixes include:

    1. Verify everything from AP all the way to the ZD

    Unless you have some more elaborate setup your pfsense is not in that path

    1. Make sure UDP ports 12222 and 12223 are opened if there is a firewall in between

    Do you have your LAN segregated somehow?

    1. Check with Ruckus Support and find the appropriate firmware to use

  • LAYER 8 Global Moderator

    As chpalmer explained nicely - unless your network is different than what you have stated pfsense would not be involved in the conversations between the AP and ZD..

    Devices on a network do not talk to the gateway (pfsense) to talk to other devices on the same network.. They only need to send traffic to the gateway, to get OFF the network they are on..

    Since from your statement these devices are all on your LAN, pfsense is not involved in the conversation.. The only possible involvement pfsense could have is if the devices are using pfsense to resolve some fqdn to know where to send the hearbeat. But I have to assume they just talk to an IP, and not some fqdn that needs to be resolved.

    Only other involvement that pfsense might have, is if these devices are dhcp, and your dhcpd is running on pfsense.



  • Sorry to dig up an old thread, but I'm expirencing this too since moving to pfsense and I'm not sure why.

    Prior to pfsense (SG-1100 on 2.4.5), I've run DrakTek (2860 & 2862), USG and Araknis (110) and all ran without an issue on the same network. It's a flat network, no VLANs or anything special and the only thing that has changed is the router. The switches (UniFi) and Ruckus APs haven't changed.

    I know it's a doesn't make sense, but would anyone have any ideas of the hat might cause it or what I can Try?

    Thanks.


  • LAYER 8 Global Moderator

    Again - the router at the edge has ZERO to do with conversation of devices on the same network.. ZERO... Not how it works... So unless you setup pfsense with the same IP as one of these devices or something..



  • That's what I thought. I just can't get my head around why it's been fine on all other routers and that's the only thing that has changed on the network, which prompted me to search the web and find this post.

    Hmmm, back to the drawing board, or worse case, back to one of the old routers...? Argh, WiFi dropping every two hours is very frustrating, especially under the current climate, but more frustrating not knowing what's causing it.


  • LAYER 8 Global Moderator

    So the heartbeat is over wire? that goes through what? What are the IPs involved?

    You understand how 1 IP talks to another on the same network right?

    So 192.168.1.X/24 wants to talk to 192.168.1.Y/24 -- X needs the mac address of Y.. If he does not have that cached, he arps for it.. Which is a broadcast.. Says hey who has 192.168.1.Y

    Y seeing this broadcast says hey thats my IP, hey X - that is my IP, and my mac address aa:bb:cc:dd:ee:ff

    X then sends the traffic he want to send to Y on the wire to that mac address.

    Router has ZERO to do with that.. It could be OFF.. The only time traffic is sent to the routers IP/mac address is when the devices wants to talk to something not on its local network..

    192.168.1.x/24 wants to talk to 192.168.2.Y/24

    Well that is not my network -- let me send that to my router/gateway since its not a local IP - he will know how to get there.. If device doesn't know the mac address of his router, lets say 192.168.1.254 - he arps for it.. Then sends the traffic routers mac address, but with destination IP of 192.168.2.Y.. Router says oh you want go get to 192.168.2 -- I know where to send that, or maybe he doesn't - and just sends it on to his default gateway..

    So example, here cleared my arp cache - then pinged an IP on my network 192.168.9.0/24 where my PC is .100, and the dest nas trying to ping has IP of .10

    So you see the arp, and the response - and then when send ping you can see that is being sent to the mac address of my nas (192.168.9.10)

    localtraffic.jpg

    Now you can see the mac address of my router at 192.168.9.253 (pfsense lan IP)

    C:\WINDOWS\system32>arp -a | find "192.168.9.253"
      192.168.9.253         00-08-a2-0c-e6-24     dynamic
    

    So when I ping say 8.8.8.8, look at the mac that the traffic is sent too..

    outsidetraffic.jpg

    So when 192.168.9.100 is taking to 192.168.9.10 - how would pfsense be involved in that conversation?? Its not!! So if your .X can not talk to your .Y your going to have to figure out why, but it has zero to do with pfsense, unless you have bridged interfaces and your X and Y are on other sides of the bridge?? Or pfsense IP is same .x or .y, etc..

    So if pfsense does not have the same IP, nor your bridging - the only other way pfsense could be part of your problem is if your devices on this network are getting their IPs from dhcp (running on pfsense)... And that for whatever reason your devices can not renew their lease, and it runs out - and now the reason they can not talk to each other is they have no IP... If your lease time 2 hours for your dhcp? If so if your devices are not able to renew their dhcp, and the lease expires - then no they wouldn't be able to talk to each other.. But that would be because they don't have an IP... Not that pfsense had anything to do with them talking to each other..

    Look in your dhcp log, do you see the devices asking for renewal of the lease - what does pfsense tell them? A dhcp lease should normally renew around the 50% mark of the lease, so if your lease is for 2 hours, after an hour client would ask for renewal.. If nothing then then like 30 minutes later he has again, then 15 - pretty soon he will be screaming for renewal very fast... Only after it has expired will he loose his IP..



  • @johnpoz said in Ruckus Access Points Heartbeat lost in LAN:

    So when 192.168.9.100 is taking to 192.168.9.10 - how would pfsense be involved in that conversation?? Its not!!

    And easy to test.
    Step 1 : power up your network.
    Step 2 : check that all LAN devices have acquired an IP - or have a static IP.
    Step 3 : power down pfSense and/or rip out the LAN cable.
    Step 4 : Use every device on LAN and check that they can communicate with each other by using IP address, name resolution (DNS) isn't available now.
    Step 5 : with the knowledge obtained in step 4, start reshaping the way you think about networking.



  • First up, thanks for the detailed response, appreciated!

    @johnpoz said in Ruckus Access Points Heartbeat lost in LAN:

    Look in your dhcp log, do you see the devices asking for renewal of the lease - what does pfsense tell them? A dhcp lease should normally renew around the 50% mark of the lease, so if your lease is for 2 hours, after an hour client would ask for renewal.. If nothing then then like 30 minutes later he has again, then 15 - pretty soon he will be screaming for renewal very fast... Only after it has expired will he loose his IP..

    I've done some more digging and it appears the AP's are getting a new IP every two hours, when the lease expires? Why would that happen?

    Here's a snapshot for the MAC address;

    Screenshot 2020-04-11 at 14.37.35.png


  • LAYER 8 Global Moderator

    Well yeah such an issue would cause a blip... Are you running multiple pools, do you have HA pair setup in pfsense? I would sniff that traffic to see why your lease might be considered unknown..

    Did you delete old leases, do you see current lease for device on pfsense? Only these AP are having such an issue where you see that unknown lease entry in the log?

    Do you have something else running dhcp services on your network, where the client might of gotten that IP from a different dhcp server?



  • I only had one pool, that I've just removed as I was using it for testing.

    Nope, no HA pair.

    Didn't delete old leases, but, I've also just run into an issue where I've run out of DHCP leases? I've allowed 89, but only have 25 in use? How can I prevent that happening? Is there a way to automatically free up unused leases?

    Yes, just the WAPs having the 'unknown lease' entry in the logs.

    No, no other DHCP services.

    Many thanks for your continued patience and help, greatly appreciated! 👍


  • LAYER 8 Global Moderator

    I would do a packet capture - and lets take a look see at this request..

    Can you just set these devices to be static? Try setting up a reservation for them, so they always get the same IP...



  • Is packet capture quite straightforward, as it's not something I've done before...

    In the meantime, I've assigned a reservation to see if that resolves the issue.


  • LAYER 8 Global Moderator

    Did the clients grab the reservation? Can you not set them static on the devices?



  • I rebooted both the AP's and they picked up the reservation. That was 1.5 hours ago, so in the next 30 mins or so, we'll see if they drop again.

    Strange they were needing a new IP each time?

    Also, am I right in thinking IP's become available again in 24 hours if not used based on default settings?


  • LAYER 8 Global Moderator

    Depends on your settings... But once a lease has expired, then yes it should be made available again... You can always just clear out all your old lease that might be stuck in the leases file.

    As to why they were getting new IPs - because for whatever reason their request for renewal was not working, ie from your log they were asking for lease, but dhcpd was saying have no idea what that lease is "unknown" so can not renew... So client would have to do a new discover to get get an IP..

    If you set your reservation, and your lease time still 2 hours.. Then they should of already renewed, right around the 1 hour mark.



  • @johnpoz said in Ruckus Access Points Heartbeat lost in LAN:

    Depends on your settings... But once a lease has expired, then yes it should be made available again... You can always just clear out all your old lease that might be stuck in the leases file.

    Is that done direct through the 'Edit file' options? FYI - I've not changed any of the lease time time settings.

    As to why they were getting new IPs - because for whatever reason their request for renewal was not working, ie from your log they were asking for lease, but dhcpd was saying have no idea what that lease is "unknown" so can not renew... So client would have to do a new discover to get get an IP..

    Hmm, strange. So is that the WAP or the router not playing nice?

    If you set your reservation, and your lease time still 2 hours.. Then they should of already renewed, right around the 1 hour mark.

    Here's what I'm seeing in the logs so far for the same WAP. Does it look right?

    Screenshot 2020-04-11 at 15.58.58.png



  • @WannabeMKII said in Ruckus Access Points Heartbeat lost in LAN:

    Is packet capture quite straightforward, as it's not something I've done before...

    Go into Diagnostic > Packet Capture.
    Select LAN
    Enter port number 67 or 68
    Start the capture.

    After capturing the DHCP traffic, you can download the capture file, to examine with Wireshark.


  • LAYER 8 Global Moderator

    No that is not right... Once you see a discover and send the offer the client should send back ack..



  • @johnpoz said in Ruckus Access Points Heartbeat lost in LAN:

    No that is not right... Once you see a discover and send the offer the client should send back ack..

    Here's the other WAP. Is the DHCPACK we see here what we're looking for?

    Screenshot 2020-04-11 at 16.09.56.png



  • @WannabeMKII

    It looks like that .4091 isn't recognizing the offers and then, when it does accept and goes through the request and ack, it's doing the discover again. First off, when it gets the ack, it shouldn't be doing anything for about 1/2 - 2/3 of the lease time, but it's doing a discover again just seconds later. That is not normal! What happens if you try with a computer? If it gets an address and holds onto it, then the problem is with the switches.



  • I've just checked a selection of other wired devices and you're right, every hour they're going through the process and that's it.

    So you think it's the switches between the pfsense box and the WAP's causing the issue?

    FYI - 2 hours (16:27) have passed since the IP reservations and here are the logs and no drop-off. But then as you say, it starts talking again...

    Screenshot 2020-04-11 at 16.31.24.png



  • @WannabeMKII said in Ruckus Access Points Heartbeat lost in LAN:

    FYI - 2 hours (16:27) have passed since the IP reservations and here are the logs and no drop-off. But then as you say, it starts talking again...

    The default lease time is 2 hours (7200 seconds). So, you should see accepts and acks about 1 - 1.5 hours after that.

    The normal process, when the device doesn't have an address, is discover, offer, request and ack. Then at interval, renewing the lease with requests and acks.


  • LAYER 8 Global Moderator

    There is something with those clients, and the way dhcpd and those clients... If you send back ack, you sure and the F should not discover again..

    Those clients seemed hosed if you ask me... I would get on their forums about this behavior... Are they on the lastest firmware, etc..

    Once a client sends ack, he is telling the dhcp server - hey I accepted the lease, thanks! So why is he sending discover again?



  • Many thanks for your continued help on this, much appreciated and glad we got to the bottom of it!

    It's been running fine for the last few days, which is great news!

    They're both running the latest firmware, but, I'm going to provide the feedback to the manufacturer to get their feedback as to why it's happening in the first place.

    Many thanks once again!


Log in to reply