Poor response time with thousands of vlan interfaces

    I'm currently trying to make pfsense work for large vlan situation. With about 1000+ vlan created in system, the "ifconfig" command every time executed which consuming 50%-100% CPU usage, this results WebGUI very poor responsive.

    Is there a solution to avoid that overhead or reduce to a reasonable consume? Please feel free to give your opinion. I would appreciate that.

    1000 VLANs!  :o

    One of the developers once posted about a setup he had with over 2000 virtual interfaces though so it should be possible. (I can't find that post now though!)
    What hardware are you running this on?


    Currently I'm doing this test on a VMWare ESX 5.0 vm.
    pfSense version 2.0-RELEASE x86_64, with intel em nic virtualized.

    The problem is that 1000 vlan surely can be created, even 4094 (the max pfsense allowed) could be created; ifconfig system command runs slow with 1000 vlan interface, since most interface configuration related functions in WebGUI use mwexec() to execute it for getting or setting addressing info, which causes WebGUI function slow and less responsive.

    It seams a bug or defect of freebsd ifconfig… might there be some know workaround or suggestion?

    I migrated all other vm off the hypervisor host, run only pfsense vm, the results still no improvement.

  • Is anyone meet the problem or have experiences, please give a shot.

  • You might want to check the FreeBSD-net mailing list: http://lists.freebsd.org/mailman/listinfo/freebsd-net

    Functions inside pfSense were offloaded to the pfSense PHP module which does things directly on the system and bypasses the need for running ifconfig and netstat directly in many cases. That alone should be keeping the system a bit faster in that kind of situation.

    There are probably more places that could be optimized, but the number of times those commands get run has been reduced quite a bit since 1.2.3.

  • We did a lot of performance improvements with huge numbers of interfaces in 2.0, though things are still going to slow down to some extent with huge numbers of interfaces. I was running a system with 4000 interfaces at one point, and it was slow browsing for the spec of hardware it was running on, but tolerable. How slow is it?

    Don't bother any FreeBSD lists with that, it has little to do with the underlying OS and everything to do with how we interact with it.

  • I added 3000 vlans on a host running pfsense 2.0.1 release amd64 version.
    I record interfaces_assign.php execution time which was fast consuming less than 1 seconds, but the browser takes more than 30s to get the page. So I use tcpdump to examine the traffic between the server and browser, the packet timestamp clearly shows pfsense DO have a long sleep before sending the output.

    I'm confused why & where the php output get stuck before flush to client?

    tcpdump -i em2 -s0 
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on em2, link-type EN10MB (Ethernet), capture size 65535 bytes
    12:58:52.669459 IP > Flags [s], seq 182440526, win 65535, options [mss 1460,nop,nop,sackOK], length 0
    12:58:52.669568 IP > Flags [S.], seq 1571048289, ack 182440527, win 65228, options [mss 1460,sackOK,eol], length 0
    12:58:52.669889 IP > Flags [.], ack 1, win 65535, length 0
    12:58:52.670479 IP > Flags [P.], ack 1, win 65535, length 176
    12:58:52.670539 IP > Flags [.], ack 177, win 65524, length 0
    12:58:52.672004 IP > Flags [.], ack 177, win 65535, length 1460
    12:58:52.802508 IP > Flags [.], ack 1461, win 65535, length 0
    12:58:52.802571 IP > Flags [P.], ack 177, win 65535, length 219
    12:58:52.817805 IP > Flags [P.], ack 1680, win 65316, length 198
    12:58:52.817874 IP > Flags [.], ack 375, win 65502, length 0
    12:58:52.818550 IP > Flags [P.], ack 375, win 65535, length 234
    12:58:52.819485 IP > Flags [P.], ack 1914, win 65082, length 485
    12:58:52.819523 IP > Flags [.], ack 860, win 65215, length 0
    12:59:24.259507 IP > Flags [P.], ack 860, win 65535, length 442
    12:59:24.259540 IP > Flags [P.], ack 860, win 65535, length 74

    I haven't checked the code on that page but I imagine it's because of the giant nested foreach/foreach loop there.

    foreach (assigned interface) …
      foreach (existing interface) ...
        print a drop-down and select the current interface

    If you made 3000 vlans and didn't assign them, the inner loop would be slow enough. If you made 3000 and did assign them, then it would be even slower as it would have to do that 3000*3000 times.

    That may not be the code it's using (I'd have to look... I don't feel like coding on Christmas Eve :-) but just a guess. Adding some output flushes in maybe after building each drop-down would probably help at least keep the client loading the page.

    I add "$bt = time(); " in the beginning and "$at = time();" in the ending of the PHP script file, the $bt is about 1s larger than $at. I think the script executed as quickly as possible. But, the script output didn't flush to client immediately, after about 30s (according to the tcpdump output) it start to transfer.

    As far as I know, the PHP script ends the output is flushed to the client without any delay, but in this situation it turns out not.

    I also comment out some code to prevent vlan interface to be added to $portlist, but the result is the same. It seams the delay has nothing to do with interface foreach loop.

    Thanks & Merry Christmas!

    Try sprinkling some calls to flush(); and ob_flush(); in the loops, see if it helps.

  • Sorry, the script time I record previously is not acculately - I missed to recored files required in.
    After adding time record code snips recursively in every file required in, I found globals.inc called get_nics_with_capabilities() to get vlan capable interface, which caused the delay!

    code from globals.inc:

         43 function get_nics_with_capabilities($CAPABILITIES) {
         44         $ifs = `/sbin/ifconfig -l`;
         45         $if_list = split(" ", $ifs);
         46         $vlan_native_supp = array();
         47         foreach($if_list as $if => $iface) {
         48                 $iface = trim($iface);
         49                 $capable = pfSense_get_interface_addresses($iface);
         50                 if(isset($capable['caps'][$CAPABILITIES])) {
         51                         $interfacenonum = remove_numbers($iface);
         52                         if(!in_array($interfacenonum, $vlan_native_supp))
         53                                 $vlan_native_supp[] = $interfacenonum;
         54                 }
         55         }
         56         return $vlan_native_supp;
         57 }
    ...... some line omitted ....
     114 $vlan_native_supp = get_nics_with_capabilities("vlanmtu");

    After comment out line 114 in globals.inc, the script page shows immediately. But the side effect I found currently is that interfaces_bridges_edit.php can't see vlan interfaces any more.

  • From what i recall that function is a leftover.
    You can safely disable the call to it to speedup your test.

    Iirc it was non-existant when we tested the 4000 vlans cmb refers too.

  • Actually try this out https://github.com/bsdperimeter/pfsense/commit/121dc11eac00c2244547bf942f3e4416c8b6cf3b

    That should fix you issues with other places.

  • Surely it performed much better after applyling commit: 121dc11.
    But all vlan interfaces are missing in interfaces_assign.php & interfaces_bridges_edit.php list box, resulting we cannot use or configure any vlan interface we created at all. For example: we can't assign vlan interface to LAN nor bridge several vlans together.

  • I am not following you.
    Since there is nothing related with that commit and those pages!

    Can you show me real examples or even screenshots of this?

  • The version of pfsense I'm testing is the official 2.0.1-RELEASE amd64.
    I have applied commit 121dc11.

    As you can see in vlan.png, I have configured many vlans ( actually from vlan 999 to 3999).
    But in assign.png, these vlan interfaces does not show in assign list.
    Aslo in bridge.png, these vlan interfaces doest not show in select list.

  • ifconfig command output does show anything?
    Can you show me the config.xml?

    It should be there is you have configured since displaying of vlans is taken from the config file.

    BTW, bridge will not show any interface that is not assigned.
    This is the limitation in 2.0.x of that interface.

    Actually with that commit when I go to create a vlan, the parent interface drop-down is empty.

  • ifconfig -l have the configured interfaces.

    em0 em1 em2 plip0 pflog0 pfsync0 enc0 lo0 em1_vlan999 em1_vlan1000 em1_vlan1001 em1_vlan1002 em1_vlan1003 em1_vlan1004 em1_vlan1005 em1_vlan1006 em1_vlan1007 em1_vlan1008 em1_vlan1009 em1_vlan1010 em1_vlan1011 em1_vlan1012 em1_vlan1013 em1_vlan1014 ......(omited....)

    config.xml is attatched, please strip .txt suffix before unzip it, since it's over 250KB.

    I meet the same situation jimp meets, no parent interface in drop-down list. But the vlan interfaces are created before apply that commit.


    Works for me after that commit. :-)

  • @jimp:

    Works for me after that commit. :-)

    Me too.
    Still cannot see vlan interfaces in assigning or bridging page.

    They show up for me in the assignment drop-down list. And if I assign and enable one, then it shows up to bridge (the list in bridge only shows assigned and enabled interfaces)

  • I double checked, assigning interface page drop-down list do have vlan interfaces now.
    But my question is: since interfaces_bridge_edit.php get only configured interface as bridge member capable ones,

    48 $ifacelist = get_configured_interface_with_descr(); 

    Is there any need to assign them before we can add to bridge?

  • In 2.0 that is enforced to make sure the interface part of the bridge is always properly configured.

    Generally, its not a requirement, but there needs to be some bookkeeping for bridge members and that is why it was not done for 2.0.

  • I understand that and thanks for your help!

    Could this be purged to the official release pls :)

    Ermal already committed that to RELENG_2_0, but it's not like we'd make a 2.0.2 just for that… so it will be an official release eventually, just not yet. If you want it in the meantime, feel free to sync your code to the current RELENG_2_0 code using gitsync.

    Any updates on this matter??

    Is it in the snapshots of 2.1??

    Yes it should be in 2.1. Not sure what else there would be to "update" this - it was fixed in our repo (as I mentioned before) so it would be fixed in the next release.

