Slow CP and webGUI



  • I have spent some time trying to figure out what's happening with my box. I have tried to find any post that could help me, but so far I haven't found anything. Well, I have tried different things, but with no success. Here's my settings:

    IBM x336 running vSphere 4.0

    VM 1 - pfSense with 4GB ram, 4 core processor, and high resource allocation from the vSphere
    VM 2 - CentOs 6.3 mysql & Apache.

    The pfSense box is running:

    • 35 Vlans with a DHCP pool in each one
    • Squid
    • Captive Portal - Radius validation on the mysql database set on the second VM.

    Now, I have around 200 laptops ( BYOD ) connecting wirelessly. Everybody gets their IP address without a problem, and they can surf on internet if the CP is turned off. Altought, we the CP is turned on, the login page takes a few seconds to load on, and after a few minutes the login page just takes ages to load, and when I look at the status of the system, the CPU is been totally used by system, as you can see in the screen shot. So, I would be glad if someone could enlighten me here where I'm going wrong.

    If there's any extra information needed, let me know and I'll post here.

    Regards



  • A snapshot of ```

    top -S -H

    
    Do you have most of those 200 WiFi users attempting to login to Captive Portal at around the same time? Is that a "local" login to pfSense or a login involving an external Radius server?
    
    Speculation: A significant proportion of your 82% system time might be consumed in lock contention. If your box has 4 (or fewer)  CPUs this might lead to CPU starvation in the Radius server, delaying response and further increasing lock contention. You MIGHT get better results by reducing number of CPUs in pfSense to 2 or even 1\. "More is sometimes less" - you don't always get better response from more CPUs than you need.


  • Assuming you're running 2.0.1, I did some testing a year ago and it was quite easy to overwhelm the CP and effectively cause a DoS (there are even a couple of open issues at redmine about this)



  • Hi !

    Yes, I'm running a 2.0.1 version.

    The system ran quite smooth last time, when I didn't have the majority of the wireless devices connected to it. Unfortunately, I have no control at all over those wireless devices connecting to the network, because they are all personal devices. So, I did a test I connected 20 desktops and load the CP portal login page on all of them, and everything worked quite well. But one thing I noticed is that if I press F5 for about 30 times, I noticed a significant increase of CPU usage.

    What I need to do is to give access to the internet and control the bandwidth which each device has.

    Normally, the laptops attempt login around the same time. ( The system has been implemented in a college )

    Surely, those devices have software installed that start up with the operating system, such as skype, torrent, P2P programs, and of course some of them might be even infected with virus. Therefore, the overload could be cause for a huge amount of request made by them.

    I'm definitely not an expert in pfSense, so still learning how to use it efficiently.

    Is there anything I could do to restrict the amount of requests from each client ? I have tried to set up a maximum number of connections per host in the advance option of the firewall rules, but haven't succeeded in.

    I have made some changes in the installation to make sure the system wasn't suffering from starvation. I turn down from 4 CPU to 2 CPU, and set the resource at 50/50 with the Radius database - which is running on another VM in the same equipment.

    As soon as I have any other information, I'll post it. In the meanwhile I would appreciate if anyone could help me out with other settings that could help me out, at least to identify what's going on.

    Regards



  • @dhatz:

    Assuming you're running 2.0.1, I did some testing a year ago and it was quite easy to overwhelm the CP and effectively cause a DoS (there are even a couple of open issues at redmine about this)

    Not true unless you're intentionally DoSing it.



  • @wallabybob:

    A snapshot of ```

    top -S -H

    Definitely need this.



  • @cmb:

    @dhatz:

    Assuming you're running 2.0.1, I did some testing a year ago and it was quite easy to overwhelm the CP and effectively cause a DoS (there are even a couple of open issues at redmine about this)

    Not true unless you're intentionally DoSing it.

    In my tests I did in fact try to simulate "misbehaving" clients, and experimented with various methods to mitigate possible issues (e.g. ipfw dynamic rules, do the CP login page http->https redirect with 302 lighttpd rather than via php, and using mod_evasive)

    Anyway, would you think that the guy who filed this redmine report was intentionally DoS'ed ?



  • Hi there,

    Thank you for your help so far.

    Today, I checked the performance of the pf using "top -S -H". For me, it is still not clear what's taking so much of the CPU.

    @dhatz: the report you mentioned shows basically the problem that I'm facing. Although, there's not much information about the settings had been used in the box.

    I'm going to leave the wireless vlan out of the captive portal at the moment, and I'll set up a new box, in test environment, to try to find out what is going on. I have a copy of the server running on my laptop, and it seems to run better than my box.

    Regards
    Iwashima



Locked