Bad experience in Unifi - pfSense network (namely ESP32 and Apple products), wifi connectivity drops, low wifi experience - SOLVED!
-
Fellow community members,
I am gaining a lot of knowledge from this forum and I want to pay back a little. The reason for this post is to help others in case you for some reason experience similar issues or lose their hope in troubleshooting of their home network.
Being a long term user and evangelist of pfSense (dozen instances) and Unifi appliances (dozens of APs and switches across 10 sites) i have also in my home network below described setup, that caused me a real headache for past several months and I have finally nailed down after spending days reading through all possible discussions and applied pretty much every possible fix. Nothing helped, until recently. Although I was trying to troubleshoot on Unifi side, the cause the was found on pfSense.
TLDR: it was not DNS, but it was close!
This is what I observed: For past several months after making no "obvious" change in the setup I experienced following issues:
- IOT devices based on ESP32 (ESPHome) were randomly dropping signal and (as discovered later by logs) restarting in pretty stable intervals (cca 2,5 hrs) but some days it run just OK
- General disconnects of my mostly Apple devices (MacBooks / iPhones) even though they were 2-3 m away from APs
- Devices connecting to my Wifi and LAN took long to get IP address, sometimes even 2-3 minutes (yes, it is normal for Windows to spent time investigating a connected network, but this was even longer and also on linux devices)
- the Unifi wifi experience on several devices was jumping between 100% and 50% (during the drops it was low, for the remaining time it was 100%
- the overal WiFi connectivity stats were 100% for both Association and Authentication but between 50-70% for DHCP and DNS - spoiler: I should have paid more attention
The thing was that this started already in summer, but I did not pay much attention, the sensors and relays I used for controlling gates, doors, switches, lights had those circa 5 min long blackouts every 2.5 hrs so I just waited for them to come back, but now with winter heating season a stove not starting at 3 am due to disconnected ESP relay was a pain in the eye as it was cold at home...
So I went through all best practices and here follows a list of things I have tried (in no particular order) - WITHOUT ANY SUCCESS - in past 2 months:
- Restarting all devices (pfSense, APs, Unifi controller, switches...)
- optimizing APs - wireless parameters - by either letting the network optimize channels automatically or selecting 1 - 6 - 11 manually to have all neighboring APs on a 2,4 Ghz channels as far as possible. 5 Ghz was not a problem.
- narrowed down the bandwidth to 20 Mhz for 2,4 Ghz and to 40 Mhz for 5 GHz network
- tried to set wifi transmit power to auto, max, min
- placed APs in a better position to "beam" the correct direction towards the ESP devices, removing obstacles
- disabled any possible "manually changed" wifi settings and advanced features (I used nearly any non default values, but making sure I do not use any "enhancements")
- as I use two out of 5 APs in a meshing, I have carefully set the meshing only on those "parent" devices which are used, disabled on others that are not expected to mesh down and also disabled meshing on child's to not to mesh further down to other APs
- Reflashed ESPHome devices with later firmware
- searched for bogus DHCP servers (none existed except the expected pfSense....)
- reset Unifi APs to factory default, readopted, reset the Unifi controller and started from scratch with a configuration - all in default
- when there was a new firmware for Unifi or HA or ESPHome, I always installed in no time in hope for a bug fix
- purchased brand new AP U6 Pro in a hope for a better signal
- disconnected some APs to make less wifi pollution (living in a countryside, there is no foreign wifi signal interference)
- replaced cables between switches, pfSense, and Unifi cloud key and also for APs (where possible)
Nothing above seemed to help, although there were periods when I saw no interruptions, but unfortunately within 24-48 hrs they came back again.
So recently I was again crawling through my home network in a hope for a random luck namely after installing several new HP notebooks for friends and waiting fro 5+ mins for IP address, for internet connectivity on all of them. And I saw in UniFi complaining multiple devices using same IP. Went to pfSense and started to watch the DHCP settings, logs, leases... the pfSense DHCP lease page took a minute to load but showed nothing weird, no duplicate... but I saw strange WARN messages in the LOG (unfortunately I cant find them any more, didn't make a copy) so I checked other pfSense instances on 2.7.2 and they were just fast in the same DHCP status page and the logs were filled just with INFO messages. So I got this idea to turn back to depreciated ISC DHCP server on my pfSense. And since then the network is rock solid again, for 72 hrs not a single drop, 100% wifi experience also DNS and DHCP... newly connected devices get their IP address in no time.
**IT JUST WORKS AGAIN!
So it was not DNS exceptionally, but DHCP. and the "no obvious" change I made was upgrading from 2.7.0 to 2.7.2 pf my pfSense and following an advice to switch from depreciated ISC to KEA DHCP. Here my trouble started.**
I was reading then about the KEA DHCP and many posts suggest to switch back to ISC in case of issues. Did not expect this to be anyhow related as my (bad) common sense told me that this can not affect a experience of WiFi or disconnects when it shoudl not be involved !after" a device is in the air... Well, perhaps someone will now write me that it was obvious, but I honestly didn't find a single post suggesting that relationship.
Tried going back to KEA but same as before so my next plan is to backu a configuration of pfSense, reinstall and restore to see if KEA gets any better, but if not, I stay on ISC until next version of pfSense.
I am glad I solved it as I felt embarrassed and pissed at the same time, loosing faith in myself, Unifi. I apologize to everyone for being down :)
My setup:
- pfSense 2.7.2 on Alix, runnig 5+ years
- Unifi Gatevay Cloud Key G2+ on latest firmware
- Unifi APs: AP AC LR, AP AC, AP AC lite, AP U6 Pro on latest firmware
- Unifi POE switches USW Lite 16 PoE and US 8 60W and USW Flex Mini on latest firmware
- 6 Unifi cameras (LAN and WiFi)
- roughly 50 devices connected (LAN and WiFi) namely Apple notebooks and phones, watches, tablets and 8 ESP32 based Home Assistant / ESPHome flashed boards (Sonoff, own made D1)
Attached are few images from Unifi controller....
-
@keson Very interesting story, thanks for sharing!
I have a very similar network: UniFi APs behind a pfSense router, with most wireless clients being Apple gear. I was new to pfSense seven or eight months ago, and I set it up with Kea because I read the same "ISC is deprecated" notice you did. I had some frustrating problems during the switchover from my previous DHCP server, mainly that Kea didn't honor most of the pre-existing leases leading to devices being given conflicting addresses. Probably that would have worked itself out over time; but I was sufficiently annoyed that I went to the ISC server, and it's been rock-solid since. I'm not in a hurry to try Kea again, and even less so after reading your story.
-
@tgl Thanks for the reply. I just hope the pfSense community and namely those responsible for KEA integration wont take my post as a negative complaint. I can imagine that covering all possible situation in such diverse setups like networks are is nearly impossible and I will do my best to provide as much logs as possible to help to troubleshoot it in the implementation. I plan to switch to KEA for a weekend perhaps and record the logs just to provide them to the developers, if this would be welcome.
P.S. I have extended my typical answer now: It is always DNS. Or DHCP. -
Thanks for post. I have similar network as @tlg and was preparing to switch to KEA. Guess I'll keep waiting.