503 - Service Not Available and webintefrace really slow after upgrade



  • Hi. Just ran an upgrade from 1.2.3 to 2.0 rc3 and everything went smoothly.
    The only thing I notice is that the webinterface is now way too slow! Usually it takes from 5 to 15 seconds for changing a tabs or even worse, I get a 503 - Service Not Available.
    Pfsense is installed on a Dell R310 Intel I5 and 4 GB Ram.
    Any suggestions?
    many thanks in advance,
    Max
    Italy

    [upgraded]
    It seems to be something related to the lan interface and not the web service. Ssh access gets stuck most of the time.



  • CPU busy?

    Errors on LAN interface?



  • @wallabybob:

    CPU busy?

    ooops CPU usage constantly near 100% !!!
    How can I identify what's eating all the cpu?
    thanks and sorry but I'm a newbie
    Max



  • pfSense shell command # top -S -H should give some clues.

    Perhaps you have inadvertently enabled polling - see System -> Advanced, click on the Networking tab.



  • @wallabybob:

    pfSense shell command # top -S -H should give some clues.

    Perhaps you have inadvertently enabled polling - see System -> Advanced, click on the Networking tab.

    Device polling disabled. This is a screenshot of top -S -H
    PID USERNAME PRI NICE  SIZE    RES STATE  C  TIME  WCPU COMMAND
      11 root    171 ki31    0K    32K CPU3    3 188:44 100.00% {idle: cpu3}
      11 root    171 ki31    0K    32K CPU1    1 188:40 100.00% {idle: cpu1}
      11 root    171 ki31    0K    32K CPU2    2 188:37 100.00% {idle: cpu2}
      11 root    171 ki31    0K    32K RUN    0 188:14 100.00% {idle: cpu0}
        0 root      76    0    0K    48K sched  0  1:31  0.00% {swapper}
      12 root    -68    -    0K  160K WAIT    3  0:43  0.00% {irq256: bce0}
      12 root    -68    -    0K  160K WAIT    1  0:29  0.00% {irq258: bce2}
      12 root    -32    -    0K  160K WAIT    2  0:28  0.00% {swi4: clock}
    15465 root      44    0  3448K  1448K select  1  0:12  0.00% syslogd
    37140 root      44    0 53888K 24024K accept  0  0:08  0.00% php
    37404 root      67    0 53888K 22256K accept  1  0:04  0.00% php
      14 root    -16    -    0K    8K -      2  0:04  0.00% yarrow
    16118 root      44    0  3316K  924K piperd  0  0:03  0.00% logger
      12 root    -68    -    0K  160K WAIT    2  0:02  0.00% {irq259: bce3}
    16028 root      44    0  5912K  2852K bpf    2  0:01  0.00% tcpdump
    37009 root      58    0 53888K 20684K accept  0  0:01  0.00% php
    7835 root      76  20  3656K  1512K wait    3  0:01  0.00% sh
    36705 root      44    0 54912K 19280K accept  0  0:01  0.00% php
    35755 root      44    0  6588K  3824K kqread  2  0:01  0.00% lighttpd
        8 root      44    -    0K    8K pftm    2  0:00  0.00% pfpurge
        3 root      -8    -    0K    8K -      2  0:00  0.00% g_up
    58156 root      44    0  3316K  1344K select  0  0:00  0.00% apinger
        2 root      -8    -    0K    8K -      2  0:00  0.00% g_event
      22 root      44    -    0K    8K syncer  2  0:00  0.00% syncer
      12 root    -64    -    0K  160K WAIT    1  0:00  0.00% {irq20: atapci0}
      12 root    -64    -    0K  160K WAIT    2  0:00  0.00% {irq22: ehci0 ehc}
      15 root    -64    -    0K    64K -      0  0:00  0.00% {usbus1}
        4 root      -8    -    0K    8K -      3  0:00  0.00% g_down
      12 root    -32    -    0K  160K WAIT    0  0:00  0.00% {swi4: clock}
      15 root    -64    -    0K    64K -      2  0:00  0.00% {usbus1}
      15 root    -64    -    0K    64K -      3  0:00  0.00% {usbus0}
    39094 root      44    0  5116K  3404K select  2  0:00  0.00% openvpn
      15 root    -64    -    0K    64K -      2  0:00  0.00% {usbus0}
    60860 nobody    44    0  5556K  2624K select  0  0:00  0.00% dnsmasq
      12 root    -32    -    0K  160K WAIT    1  0:00  0.00% {swi4: clock}
    15311 root      44    0  7992K  3532K select  2  0:00  0.00% sshd
      40 root      -8    -    0K    8K mdwait  0  0:00  0.00% md0
    36190 root      76    0 52864K 10600K wait    2  0:00  0.00% php



  • Here is the rrd_graph

    [update]
    I also tried twice a new fresh install but the machine hangs while configuring the interfaces. I have a spare firewall pc (same hardware) and tomorrow I'll make some tests on it.




  • I think the RRD graph is showing a sudden jump in the number of processes (not CPU utilisation) from about 40 to about 100 at around 1230,

    The top output doesn't suggest your system is CPU bound - with four CPUs apparently idle there should be plenty of spare CPU.

    What about errors on the interfaces? What is reported by pfSense shell command # netstat -i



  • Is this issue resolved? I'm having the exact same problem with my Dell R310. Tested with i386 and amd64 builds.



  • @thinktank:

    Is this issue resolved?

    Apparently not in that the information flow dried up leaving a couple of unanswered questions.

    @thinktank:

    I'm having the exact same problem with my Dell R310.

    Then please provide the information requested in earlier replies. And when providing the top -S -H output please don't cut off the top few lines which give the number of processes and memory and swap use summary.


  • Netgate Administrator

    Anybody using Dell hardware with a multiport broadcom based NIC should at least try the broadcom settings here:
    http://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards
    This seems to cause a range of symptoms.

    Steve



  • It seems I have narrowed my problem to the bcm network cards. Whenever I plug a network cable into one of these, the system becomes unresponsive.
    I can't even get a terminal session, so i can't run top command.

    I'll take stephenw10's advice and give the tweaking a shot. I'll post back with the results.



  • I disabled the ports connected on my switch connected to the broadcom network cards and my system became responsive again. I then logged in to the box with ssh, started top -S -H and enabled the ports again. Voila the box became totally unresponsive again. Top command also seems to be locked up. Here is the output from the top command:

    
    last pid: 17723;  load averages:  0.00,  0.00,  0.00                                                                                         up 0+19:29:52  10:45:26
    143 processes: 9 running, 94 sleeping, 40 waiting
    CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    Mem: 40M Active, 13M Inact, 115M Wired, 84K Cache, 24M Buf, 2820M Free
    Swap: 8192M Total, 8192M Free
    
      PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
       11 root     171 ki31     0K    64K CPU7    7  19.4H 100.00% {idle: cpu7}
       11 root     171 ki31     0K    64K CPU6    6  19.4H 100.00% {idle: cpu6}
       11 root     171 ki31     0K    64K CPU5    5  19.4H 100.00% {idle: cpu5}
       11 root     171 ki31     0K    64K CPU4    4  19.4H 100.00% {idle: cpu4}
       11 root     171 ki31     0K    64K CPU3    3  19.4H 100.00% {idle: cpu3}
       11 root     171 ki31     0K    64K CPU1    1  19.4H 100.00% {idle: cpu1}
       11 root     171 ki31     0K    64K RUN     2  19.4H 100.00% {idle: cpu2}
       11 root     171 ki31     0K    64K CPU0    0  19.4H 99.27% {idle: cpu0}
       12 root     -32    -     0K   320K WAIT    1   2:40  0.00% {swi4: clock}
        0 root      76    0     0K   176K sched   0   1:18  0.00% {swapper}
    18965 root      44    0  3316K  1340K select  2   0:18  0.00% apinger
       12 root     -44    -     0K   320K WAIT    3   0:05  0.00% {swi1: netisr 0}
        3 root      -8    -     0K     8K -       0   0:02  0.00% g_up
       12 root     -64    -     0K   320K WAIT    7   0:02  0.00% {irq20: atapci0}
       12 root     -68    -     0K   320K WAIT    0   0:02  0.00% {irq256: igb0:que}
       12 root     -64    -     0K   320K WAIT    7   0:02  0.00% {irq22: ehci0 ehc}
       14 root     -16    -     0K     8K -       2   0:02  0.00% yarrow
    28786 root      44    0 53596K 20084K lockf   5   0:01  0.00% php
       12 root     -32    -     0K   320K WAIT    1   0:01  0.00% {swi4: clock}
        2 root      -8    -     0K     8K -       2   0:01  0.00% g_event
       15 root     -64    -     0K    64K -       5   0:01  0.00% {usbus0}
       12 root     -68    -     0K   320K WAIT    7   0:01  0.00% {irq274: bce0}
        9 root      -8    -     0K     8K m:w1    5   0:01  0.00% g_mirror pfSenseMir
        4 root      -8    -     0K     8K -       0   0:01  0.00% g_down
       15 root     -64    -     0K    64K -       2   0:01  0.00% {usbus1}
       12 root     -32    -     0K   320K WAIT    4   0:01  0.00% {swi4: clock}
       15 root     -64    -     0K    64K -       3   0:01  0.00% {usbus0}
       23 root      44    -     0K     8K syncer  2   0:01  0.00% syncer
       15 root     -64    -     0K    64K -       6   0:01  0.00% {usbus1}
       12 root     -32    -     0K   320K WAIT    2   0:00  0.00% {swi4: clock}
       12 root     -68    -     0K   320K WAIT    5   0:00  0.00% {irq261: igb0:que}
    12474 root      44    0  5912K  2352K bpf     0   0:00  0.00% tcpdump
    24959 root      57    0 54620K 20024K keglim  2   0:00  0.00% php
       12 root     -24    -     0K   320K WAIT    0   0:00  0.00% {swi6: task queue}
       12 root     -32    -     0K   320K WAIT    2   0:00  0.00% {swi4: clock}
       12 root     -32    -     0K   320K WAIT    1   0:00  0.00% {swi4: clock}
       12 root     -68    -     0K   320K WAIT    1   0:00  0.00% {irq257: igb0:que}
       12 root     -68    -     0K   320K WAIT    3   0:00  0.00% {irq259: igb0:que}
    25118 root      44    0 53596K 16780K accept  0   0:00  0.00% php
        8 root      44    -     0K     8K pftm    2   0:00  0.00% pfpurge
       24 root      59    -     0K     8K sdflus  2   0:00  0.00% softdepflush
       22 root      59    -     0K     8K vlruwt  2   0:00  0.00% vnlru
       12 root     -68    -     0K   320K WAIT    7   0:00  0.00% {irq263: igb0:que}
       21 root      59    -     0K     8K psleep  2   0:00  0.00% bufdaemon
    13518 root      45    0 53596K 19708K keglim  5   0:00  0.00% php
       12 root     -32    -     0K   320K WAIT    3   0:00  0.00% {swi4: clock}
    12091 root      44    0  4944K  2492K select  7   0:00  0.00% syslogd
       12 root     -32    -     0K   320K WAIT    7   0:00  0.00% {swi4: clock}
    42301 root      44    0  3712K  2028K CPU2    2   0:00  0.00% top
    25939 root      64   20  6588K  4336K keglim  6   0:00  0.00% lighttpd
    42511 _ntp      44    0  3316K  1344K select  2   0:00  0.00% ntpd
       12 root     -68    -     0K   320K WAIT    6   0:00  0.00% {irq262: igb0:que}
       20 root      76 ki-6     0K     8K pollid  2   0:00  0.00% idlepoll
    58160 root      44    0  3404K  1372K nanslp  2   0:00  0.00% cron
       40 root      -8    -     0K     8K mdwait  2   0:00  0.00% md0
     1389 root      64   20  7992K  3544K select  7   0:00  0.00% sshd
     6888 root      68    0  3316K  1040K nanslp  0   0:00  0.00% minicron
       17 root      59    -     0K     8K psleep  0   0:00  0.00% pagedaemon
    
    


  • After taking this textdump I had to reboot since the system became unresponsive. Currently the system is not booting properly. It hangs after Starting Cron … Done.
    No debug messages. I haven't added the tuning parameters in loader.local yet, so i know thats not the problem.

    I'm one inch from using the two R310's as paperweights and getting new hardware  ;D


  • Netgate Administrator

    This looks a lot like the broadcom problem mentioned in the guide. Something in the driver/hardware combination causes it to use way more mbufs than other cards. If you get it to reboot what is your mbuf usage on the dashboard?
    Alternatively:

    No problems on my boxes with that driver but if you are on the 64bit build do a netstat -m and check the following line

    0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)

    If they are NOT zero then increase the value of kern.ipc.nmbclusters.

    Are you running 64bit?

    Steve



  • At first I went for 64-bit, but after a while of headbashing i swapped it in for 32-bit.

    Didn't manage to get the 32-bit booting again so now i'm back in 64-bit.

    I'll do some checking today and i'll post all debug informasjon I can get hold of.



  • Here is a little update.

    I have just reinstalled Pfsense with 64-bit. I'm currently only using the two Intel cards.

    System seems responsive so far but the MBUF is: 17670/25600.



  • Added the following to loader.conf.local:

    kern.ipc.nmbclusters="131072"
    hw.bce.tso_enable=0
    hw.pci.enable_msix=0
    
    

    This seems to have fixed the issues. I'll let it run for week and do some stresstesting, but so far it seems stable.

    A big thanks to wallabybob and stephenw10 for helping me out!



  • Hello,
    these settings are avalaible on pf 2.0.1 amd64 ?
    Thanks



  • Hi,

    Yes, you have to create the loader.conf.local in /boot/

    Add these three lines:

    kern.ipc.nmbclusters="131072"
    hw.bce.tso_enable=0
    hw.pci.enable_msix=0
    

    And reboot.



  • Sorry for this long delay. Being actually the one who posted this request, implies at least an update from me ;-)
    I can confirm that modifying the above mentioned files did the trick! Everything running flawlessly! ( using a del r210 with a Broadcom card and an Intel I3 cpu)
    Thanks so much for your help!
    Regards
    max
    Italy


Log in to reply