Starlink is the Quintessential Flappy Wan Problem
-
I have been battling a couple of problems recently and I could not find a good article that tied everything together. The symptoms I had were high CPU temp and high CPU load and poor-ish performance. I assumed possibly incorrectly that this was in part due to upgrades from 22.01 to 23RC etc. That assumption appears to be a red herring.
The breakthrough came when I notice that the vast majority of the load was due to Unbound being restarted repeatedly, sometimes more than once a minute. I was also having the occasional DNS resolution issue. Researching this brought me to posts on the flappy wan problem. Basically, my network has Starlink in the WAN port and I have another more stable provider in the WAN2 port, with a balanced routing group spreading the load. Starlink IMHO presents the classic flappy wan connection problem. My connection has tons of .1,.2, and .3 second outages every hour (see below).
I believe I have solved the flappiness -> load and temp problem by: 1) doing a traceroute on each of my wan and VPN connections to find the very first hop outside my network and setting the gateway routing monitor IP to that instead of 8.8.8.8; and more importantly 2) I upped the packet loss thresholds to 50 and 100, I also upped the loss interval and the probe interval to double default. This has improved the stability of my connection and almost totally eradicated the high load issue.
The point of my post is to help people connect the dots from high cpu load and DNS issues -> flappy Starlink WAN -> the fixes I have come up with.
Also, even though I have been a (home) user for quite a long while, I would not consider myself a Pfsense expert. If anyone has additional improvements that they can suggest to help stabilize a flappy Starlink WAN in pfsense I am all ears.
-
I found one more cause of unbound restarts. If you have DHCP Registration (Register DHCP leases in the DNS Resolver) checked in your Unbound General settings, and you have devices repeatedly asking for dhcp lease this too seems to cause frequent restarts. All I actually need is the static dhcp setting to be checked.
-
@pmagid Second issue first, the DHCP registration is a known issue: https://redmine.pfsense.org/issues/5413#note-50
re: gateway there are options in the System>Routing>(edit the gateway) to force a gateway up, but that doesn't work well with multi-WAN and failover. I've also dealt with unstable connections and it is annoying to tune. One wasn't even the connection it was "some massive upload coming off a Mac that flooded out the WAN" for some unknown reason, possibly a backup?
I wouldn't have expected high CPU usage though.