Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    [RESOLVED] pfctl using 100% CPU, preventing clean boot-up

    General pfSense Questions
    2
    4
    2.3k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • E
      ehayon
      last edited by

      I'm having a problem that only exists when I have a lot (~150 interfaces) attached to a captive portal zone.

      If I have 20 vlan interfaces on a captive portal, the box boots fine (plays the shutdown chime, 3 minutes later it plays the boot-up chime). However, as I add more vlan interfaces to the captive portal zone, boot up becomes slower and slower. I've waited hours for pfctl to finish loading the rules from /tmp/rules.debug into the firewall.

      When I look at top -a, this is what I see:

      last pid:  3330;  load averages:  1.06,  1.08,  1.07                                                up 0+12:01:35  13:57:43
      41 processes:  2 running, 39 sleeping
      CPU: 25.0% user,  0.0% nice,  0.0% system,  0.0% interrupt, 75.0% idle
      Mem: 296M Active, 100M Inact, 188M Wired, 91M Buf, 2629M Free
      Swap: 8192M Total, 8192M Free
      
        PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
      34697 root        1 103    0   302M   292M CPU2    2   1:20  99.85% /sbin/pfctl -o basic -f /tmp/rules.debug
      35164 root        1  52   20 10592K  2648K wait    0   0:18   0.00% /bin/sh /var/db/rrd/updaterrd.sh
      27763 root        1  20    0 17180K 17212K select  0   0:07   0.00% /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf -p /var/r
      10423 root        1  52    0 35540K 22412K piperd  3   0:04   0.00% php-fpm: pool lighty (php-fpm)
      50267 dhcpd       1  20    0 28636K 22968K select  1   0:04   0.00% [dhcpd]
      12083 root        1  52    0 35540K 22412K lockf   1   0:02   0.00% php-fpm: pool lighty (php-fpm)
      97675 root        1  20    0 10132K  1788K select  0   0:02   0.00% /usr/local/sbin/apinger -c /var/etc/apinger.conf
      

      So the next step was to try to manually execute the '/sbin/pfctl -o basic -f /tmp/rules.debug' command adding -vvv for verbosity. This is what happens:

      ...
      227(0) scrub on em1_vlan227 all fragment reassemble
      @228(0) scrub on em1_vlan228 all fragment reassemble
      @229(0) scrub on em1_vlan229 all fragment reassemble
      @230(0) scrub on em1_vlan230 all fragment reassemble
      @231(0) scrub on em1_vlan231 all fragment reassemble
      @232(0) scrub on em1_vlan232 all fragment reassemble
      @233(0) scrub on em1_vlan233 all fragment reassemble
      @234(0) scrub on em1_vlan234 all fragment reassemble
      @235(0) scrub on em1_vlan235 all fragment reassemble
      @236(0) scrub on em1_vlan236 all fragment reassemble
      @237(0) scrub on em1_vlan237 all fragment reassemble
      @238(0) scrub on em1_vlan238 all fragment reassemble
      @239(0) scrub on em1_vlan239 all fragment reassemble
      @240(0) scrub on em1_vlan240 all fragment reassemble
      @241(0) scrub on em1_vlan241 all fragment reassemble
      @242(0) scrub on em1_vlan242 all fragment reassemble
      @243(0) scrub on em1_vlan243 all fragment reassemble
      @244(0) scrub on em1_vlan244 all fragment reassemble
      @245(0) scrub on em1_vlan245 all fragment reassemble
      @246(0) scrub on em1_vlan246 all fragment reassemble
      @247(0) scrub on em1_vlan247 all fragment reassemble
      @248(0) scrub on em1_vlan248 all fragment reassemble
      @249(0) scrub on em1_vlan249 all fragment reassemble
      @250(0) scrub on em1_vlan250 all fragment reassemble
      @251(0) scrub on em1_vlan251 all fragment reassemble
      @252(0) scrub on em1_vlan252 all fragment reassemble
      @253(0) scrub on em1_vlan253 all fragment reassemble
      @254(0) scrub on em1_vlan254 all fragment reassemble
      @255(0) scrub on em1_vlan255 all fragment reassemble
      @256(0) no nat proto carp all
      @258(0) nat-anchor "/*" all
      @259(0) nat-anchor "/*" all
      @260(0) nat on em0 inet from <tonatsubnets:0> to any port = isakmp -> 98.109.201.85 static-port
      @261(0) nat on em0 inet from <tonatsubnets:0> to any -> 98.109.201.85 port 1024:65535
      @257(0) no rdr proto carp all
      @262(0) rdr-anchor "/*" all
      @263(0) rdr-anchor "/*" all
      @264(0) rdr-anchor "miniupnpd" all</tonatsubnets:0></tonatsubnets:0>
      

      For some reason, it's getting stuck, it stays there and never completes. It seems like this can be a ruleset optimization issue, so I change '-o basic' to '-o none'.

      That prevents it from hanging up and consuming 100% CPU. However, since optimizations have been turned off, it takes 20 or so minutes to load in the rules, far too long. So my two options right now are:

      1. Leave optimization to basic, and it crashes (gets stuck for hours)
      2. Turn optimization off but it takes 20 minutes to load in the ruleset everything a reload takes place.

      Does anybody have ideas of how I can debug this? Has anyone experienced this before? I would upload the rules.debug file, but its huge.

      Let me know if theres anything else I can provide to help debug.

      Thanks!

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        What pfSense version? What hardware are you running? CPU/RAM/NICs/Drives etc.
        Has this just started happening or was it slow from first installed?

        Steve

        1 Reply Last reply Reply Quote 0
        • E
          ehayon
          last edited by

          This is a fresh install of pfsense-2.2. I tried with 2.1.5 with the same results.

          Hardware should be more than adequate for this:

          hardware info:

          [2.2-RC][admin@t31.localdomain]/root: sysctl -a | egrep -i 'hw.machine|hw.model|hw.ncpu'
          hw.machine: i386
          hw.model: Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz
          hw.ncpu: 4
          hw.machine_arch: i386
          
          [2.2-RC][admin@t31.localdomain]/root: vmstat 
           procs      memory      page                    disks     faults         cpu
           r b w     avm    fre   flt  re  pi  po    fr  sr md0 ad0   in   sy   cs us sy id
           1 0 0    649M  2686M  2889   0   0   7  2829 109   0   0   21 3221 1668 27  0 73
          
          
          1 Reply Last reply Reply Quote 0
          • E
            ehayon
            last edited by

            Ok, I figured out why pfctl was hanging up. One of the captive portal rules was too long. I'm working on a patch to break up CP rules into smaller chunks in /etc/inc/filter.inc.

            Just wanted to post this in case someone else runs into this thread with a similar problem.

            1 Reply Last reply Reply Quote 0
            • First post
              Last post
            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.