502 Bad Gateway in Web GUI at 1500+ Captive Portal Users – Need Tuning Suggestions
-
We are encountering a “502 Bad Gateway (nginx)” error in the Web GUI whenever the captive portal user count exceeds approximately 1,500. Under normal load conditions (below 1,000 users), the system operates without issues.
We are able to temporarily regain access by using the “Restart PHP-FPM” option, but the same issue reoccurs after some time.
We seek your guidance on fine-tuning the configuration to support higher loads (2,000+ users).
Server Details:
Version: pfSense CE 2.7.2-RELEASE (amd64) CPU: Intel® Xeon® Gold 5318Y @ 2.10GHz, 96 CPUs (2 packages × 24 cores × 2 threads), AES-NI enabled, QAT disabled RAM: 128 GB Storage: 1 TB HDD
-
What do you see logged when that happens?
Check the Monitoring Graphs for memory usage vs CP users.
I would try bumping the PHP memory limit in Sys > Adv > Misc and see if that changes the time it takes to fail. Start by doubling it.
That hardware is massively overpowered for almost all deployments. What throughput does/can it pass?
-
@stephenw10 Thanks for the suggestion. Memory limit increased to 3072.
This system has 10 Gbps. -
@iamsumesh This issue is present in version 2.7 of pfSense. It seems the transition from IPFW to PF in the 2.7.x branch might be causing problems, or it may be related to the underlying operating system (FreeBSD). Even if you double the CPU and RAM, it will not work. Enabling the captive portal in 2.7.2 directed most traffic to CPU0, causing it to overload and crash the entire system.
You could try upgrading to version 2.8.1 to see if it resolves the issue (I have not personally tested this yet). However, version 2.7.2 will not work; I have already reported this problem.
https://forum.netgate.com/post/1151842
-