2.2.5 - Many VLANS and php-fpm 100%CPU Hangs web gui
-
I have a Dell R410 platform that has been running pfSense 2.0.3 Release for several years now without an issue. The system in question hosts 500 VLANS without issue and has been very stable. Due to some space constraints, I have tried to install a much smaller form-factor system running 2.2.5. After adding several hundred VLANS the web gui becomes unresponsive. 'top' shows the cpu is under 100% load from php-fpm. Eventually, if given long enough, the web gui will show up but can't be used as it is far too slow. If I killall -9 php-fpm the router performs as expected but the web gui can't be started again otherwise php-fpm will consume the cpu. Does anyone have an idea what is causing this behaviour? I have noticed BandwidthD causes a similar issue but this is a stock system load. I have recreated this same behaviour on a few other platforms including a Netgate C2758. It doesn't appear to be hardware related. Any suggestions would be most appreciated.
Thanks
-
https://forum.pfsense.org/index.php?topic=101448
no known fix. also can not find an official bug-report about it.
so for now: reduce your vlan-count & file a bug -
We did a lot of work in later 2.0.x releases to speed up large numbers of interfaces, but don't think I've tried that on 2.2.x. Judging by your experiences and one other report here recently, there's a performance regression somewhere there in 2.2.x versions at a minimum.
I'll be taking a look at that on 2.3 at some point. You're welcome to give it a shot and see. https://snapshots.pfsense.org I doubt it's any different than 2.2.5 in that regard, but there are a variety of underlying components upgraded which could have an impact (in either way, really).
-
I have a very similar issue today, I had my pfsense running as a VM with 14 nics, was running 2.2.4 and had no issues. Today i removed the ESXi and installed pfsense on the same bare-metal, although now installed with 2.2.5, after adding all NICs and adding all routes, then VIP's i noticed the interface kept hanging, restarting php-fpm resolves the issue immediately, then a few minutes later same thing, GUI hangs and php-fpm needs restarting. Tomorrow I will try rebuild using 2.2.4.
-
If it's doing it with only 14 interfaces I would consider it a critical issue.
I have 13 on an Atom D525 and it works fine. 2.2.5
-
I have a very similar issue today, I had my pfsense running as a VM with 14 nics, was running 2.2.4 and had no issues. Today i removed the ESXi and installed pfsense on the same bare-metal, although now installed with 2.2.5, after adding all NICs and adding all routes, then VIP's i noticed the interface kept hanging, restarting php-fpm resolves the issue immediately, then a few minutes later same thing, GUI hangs and php-fpm needs restarting. Tomorrow I will try rebuild using 2.2.4.
There certainly aren't any such issues with 14 interfaces. And nothing changed there from 2.2.4 to 2.2.5, so don't bother going back. This isn't the same issue, or even related to the number of interfaces, start a new thread describing what you're seeing.
-
@cmb:
We did a lot of work in later 2.0.x releases to speed up large numbers of interfaces, but don't think I've tried that on 2.2.x. Judging by your experiences and one other report here recently, there's a performance regression somewhere there in 2.2.x versions at a minimum.
I'll be taking a look at that on 2.3 at some point. You're welcome to give it a shot and see. https://snapshots.pfsense.org I doubt it's any different than 2.2.5 in that regard, but there are a variety of underlying components upgraded which could have an impact (in either way, really).
Thanks. I'll give 2.3 a go and see what the results are. I'm going to do a little digging in the php-fpm 'slowlog' and see if maybe I can spot what is causing it.
-
I have more information now on this topic. I tried 2.3 with 300 VLANs and see the same behavior as 2.2.5. I did a little digging. It appears the problem is on the index.php page. If I wait just long enough for the login mechanism to complete, then click away to something other than the index page, the gui will perform as expected - so long as I don't go back to the dashboard. PHP-FPM slowlog report is as follows for the index page:
[22-Nov-2015 01:59:42] [pool lighty] pid 84714
script_filename = /usr/local/www/index.php
[0x000000080284a970] pfSense_interface_listget() /etc/inc/interfaces.inc:66
[0x000000080284a7e0] get_interface_arr() /etc/inc/interfaces.inc:81
[0x000000080284a620] does_interface_exist() /etc/inc/interfaces.inc:4752
[0x000000080284a210] find_interface_ipv6_ll() /etc/inc/interfaces.inc:4900
[0x0000000802849c40] get_interface_linklocal() /etc/inc/pfsense-utils.inc:1264
[0x0000000802846518] get_interface_info() /usr/local/www/widgets/widgets/interfaces.widget.php:49
[0x0000000802845b20] +++ dump failedIt appears get_interface_info() might be the culprit. If I let index run for more than about 10 seconds the CPU will be locked at 100% usage until I kill the process. The pid from slow.log matches the hung pid. I hope this may shed some light on where the problem is originating.
-
have you tried removing the interfaces widget?
If the interfaces widget is confirmed to be the culprit, then i'm pretty sure it can easily be disabled if number of interfaces exceeds more then 50 (or whatever).
Looking at the code of widget itself, i don't really see a way todo without get_interface_info(). That function however returns more data, then what is relevant for the widget. I guess a more specific function could be created, that only gets the data required for the widget.
but first lets confirm/deny that your issue's are solved when disabling the interfaces widget.
-
have you tried removing the interfaces widget?
If the interfaces widget is confirmed to be the culprit, then i'm pretty sure it can easily be disabled if number of interfaces exceeds more then 50 (or whatever).
Looking at the code of widget itself, i don't really see a way todo without get_interface_info(). That function however returns more data, then what is relevant for the widget. I guess a more specific function could be created, that only gets the data required for the widget.
but first lets confirm/deny that your issue's are solved when disabling the interfaces widget.
I tried removing the interfaces widget as suggested as well as the other default widget that gives system information. It does not appear to make an impact. I haven't had a chance to check the log with the widgets removed but I'll post my findings when I have a minute to check it.
-
It might be perhaps better to use a 10 GBit/s Link to handle the 500 VLANs better, than on the 1 GBit/s
interface. In the pfSense store is a Chelsio T520-SO-CR Dual-Port 10 Gigabit Ethernet Adapter SFP+
available for $245 that can do a fully TCP offload and it also handles VLANs. Might be a choice. -
I believe that is missing a theme for pfSense GUI that is more suitable for handling many network interfaces, physical or virtual, as in EdgeOS of Ubiquiti, which is a descendant of the Vyatta system. For me, the current themes seem to work well with any other application.
-
If you can not resolve this issue through the theme setting, then the network interfaces should appear in a more appropriate way to this situation, as what happens in the GUI EdgeOS.
-
I've tested the pfSense in various ways, and so far, I gave up using pfSense in just one application, edge router in an ISP network.
Each client will access via a dedicated VLAN, so we will have hundreds or even thousands of VLANs.