need some help troubleshooting apparent DHCP issue with VoIP phone
-
We recently implemented a VoIP system at my work and a couple of users have noted that occasionally their phone will indicate no service/network unavailable for 5-10 seconds before reconnecting to the network and working fine. I was at one user's desk when it happened today and was able to note that the phone had no IP address at that moment, but came back online in a few seconds. Before calling tech support I wanted to check and see what might be happening on our end. Given the lack of IP address I decided to check the DHCP logs (more on my config below) and found suspicious entries when filtering for the IP address of the phone in question, but I am not sure exactly how to interpret them or possible next steps. See the screenshots below.
Log filtered for phone with problem
Log filtered for phone without issue
The "reuse_lease" entries correspond to the time when the phone lost service. There were several other similar entries at various times from today, and the user reports it happened multiple time today. I filtered for the IPs of several other phones and only saw entries like in the second screenshot. Seems to me that the phone is having some sort of issue with its lease, but not sure what that might be or next steps for troubleshooting.
The phones are on their own VLAN with DHCP range of 192.168.10.100-200. Only 30 phones, so plenty of IP addresses available. Lease times are set to default of 2 hours. Phones are Yealink T-53Ws. From what I can see there are no issues with the phone settings, at least what I have access to.
Firewall is a Netgate SG-2100, running Snort (no blocking at this time) and pfblockerng-devel. Downstream from firewall are 2 Netgear JGS524Ev2 switches connecting to network. Phones have PCs daisy chained.
Any help is appreciated. Thanks. -
@pzanga said in need some help troubleshooting apparent DHCP issue with VoIP phone:
The "reuse_lease" entries correspond to the time when the phone lost service.
So this means the device requested a DHCP lease renewal well before the expected renewal time (1/2 the lease interval), in the examples you provide, 30 and 41 seconds after receiving the lease in the first place.
According to the DHCP docs, the dchp-cache-threshold parameter which defaults to 25% controls the behavior of the server when leases are renewed earlier than expected at 1/2 the lease duration.
https://kb.isc.org/docs/isc-dhcp-44-manual-pages-dhcpdconf - Search for dhcp-cache-thresholdSome stuff to check:
You mention the phones are on their own VLAN, does this VLAN terminate directly on the pfSense (ie: it is the default gateway for that VLAN), or a Layer 3 switch using a DHCP Helper?
If the latter case, multiple configured dhcp helpers might be sending unnecessary requests to the server, while in and of itself that isn't a problem, perhaps the phone firmware is not up to date and doesn't react well to this scenario?Is the phone is connected to a managed switch, if so check the switch's logs for any signs the interface is flapping
Check the PBX to see if the device is re-registering each time this occurs
Enable logging on the phone (Settings -> Configuration -> Local Log) and export it after an event to review what the phone is actually doing at that time
I'd start with a firmware update on the phone to the latest version, well almost...it appears that they dropped a new version (V86) today, Aug 25th... I'd not use that one, but the one before (V85).
https://support.yealink.com/en/portal/docList?archiveType=software&productCode=93703a7132b62e5a -
Thanks for the input. The definition of the dhcp-cache-threshold parameter was helpful, just to give me a better idea of what is happening. I haven't had a chance to call tech support yet, but plan on that for later today.
After spending some more time with the DHCP logs I did note the dhcp-cache-threshold parameter is being invoked for a few other IPs. Those IPs are all on my default VLAN and belong to some user smart phones and 1 laptop i.e. only 1 IP phone is displaying this behavior. The IP phone in question has by far the most entries, with events occurring around 6-7 times/day for the last couple of days, at varying intervals. I haven't confirmed that all events are associated with phone showing no service since many occur outside business hours and the user isn't always in front of their phone.
Also should note that I increased the lease time for my phones to 8 days. That seems a bit more reasonable than 2 hours. This is my first VoIP install and the provider was pretty lacking in terms of supplying any sort of best practice recommendations (realizing that the internal network is ultimately my responsibility).edit - Just checked, and it may be a bad patch cable. Ethernet jack is behind a filing cabinet which was pressed against the cable terminal, bending the cable 90 degrees as it exited the jack. Wiggling the cable caused loss of service and corresponding reuse_lease entry in log. I will replace cable and confirm when user doesn't need to be on phone (I accidentally lost the call she had on hold ). Always check your equipment!
You mention the phones are on their own VLAN, does this VLAN terminate directly on the pfSense (ie: it is the default gateway for that VLAN), or a Layer 3 switch using a DHCP Helper?
The VLAN terminates directly on the pfsense box, so no DHCP helpers involved.
Is the phone is connected to a managed switch, if so check the switch's logs for any signs the interface is flapping
Unfortunately, our managed switches don't support logging apparently. I do have some unused switch ports, so I could try switching the port for the phone in question as a way to try to rule the switch in or out, depending on what else I can find out.
Check the PBX to see if the device is re-registering each time this occurs
Enable logging on the phone (Settings -> Configuration -> Local Log) and export it after an event to review what the phone is actually doing at that timeIt's a cloud based system, so no onsite PBX, but I will ask tech support to check what is happening on their end, if anything. I don't see a way for me to configure logging on the phone, but again I believe tech support should be able to pull that info using the time stamps I have.
I'd start with a firmware update on the phone to the latest version, well almost...it appears that they dropped a new version (V86) today, Aug 25th... I'd not use that one, but the one before (V85).
https://support.yealink.com/en/portal/docList?archiveType=software&productCode=93703a7132b62e5aLooks like we are on V84, so I will explore that option as well with tech support.
I will post anything else I find out. Thanks again.
-
@pzanga
Just a quick update, if anyone cares. Turns out the issue was a flaky keystone jack. Not sure if it was a bad terminal connection or some dust in the jack, but took off the wall plate, made sure connections were secure, blew out some dust and now its working fine. Also made sure Ethernet cable is not being squeezed and it secure. No problems since.
So, as I said above, always check your equipment.