Freezing firewall :(



  • Hey guys,

    I am on rel 1.2 and I get random freezing type of behavior with my firewall.  It is really odd and I am not sure what is going on here, so I need some help trying to figure this out.  Out of the blue I won't be able to browse the web, pages won't load, not even the pfsense gui will load, it will just get stuck trying.  I can however ping google.com just dandy and I can ssh to pfsense without problem.  I ran vmstat and I am not really sure how to read the output to tell if anythings wrong, so I was hoping someone could help me out.  Here it is:

    vmstat 3

    procs      memory      page                  disk  faults      cpu
    r b w    avm    fre  flt  re  pi  po  fr  sr ad0  in  sy  cs us sy id
    1 4 0  75672 944424  653  0  0  0 560  0  0 1320  811 446  0  2 98
    0 4 0  75672 944424    1  0  0  0  0  0  0 1366  60 495  0  0 100
    0 4 0  75672 944424  29  0  0  0  26  0  0 1359  80 479  0  0 100
    0 4 0  75672 944424    0  0  0  5  0  0  10 1342  45 468  0  0 100
    0 4 0  75672 944424  29  0  0  0  26  0  0 1328  82 436  0  0 100
    0 4 0  75672 944424  29  0  0  0  26  0  0 1338  82 454  0  0 100
    0 4 0  75672 944424    1  0  0  0  2  0  8 1323  48 437  0  0 100
    0 4 0  75672 944424  29  0  0  0  26  0  0 1321  84 423  0  1 99
    0 4 0  75672 944424  341  0  0  0 291  0  0 1334  480 454  0  1 99
    0 4 0  75672 944424  29  0  0  0  26  0  0 1374  81 503  0  0 100
    2 6 0  78092 943496 20429  0  0  0 17517  0  0 1398 25664 1398 16 43 42
    0 5 0  77484 943572 1205  0  0  0 1029  0  2 1361 1983 542  0 14 86
    0 5 0  77484 943572  623  0  0  0 528  0  1 1339  570 496  1  2 98
    0 5 0  77484 943572  594  0  0  1 501  0  6 1343  541 507  0  2 98
    0 5 0  77484 943572  623  0  0  0 529  0  1 1336  578 493  1  1 98
    0 4 0  75672 944424  281  0  0  0 299  0  1 1332  309 464  0  1 99
    0 5 0  76512 944116 1910  0  0  0 1604  0  1 1331 2121 517  1  5 94
    0 4 0  75672 944424  204  0  0  0 201  0  8 1321  227 446  0  1 99
    0 4 0  75672 944424    0  0  0  0  0  0  0 1321  43 421  0  0 100
    0 4 0  75672 944424  29  0  0  0  26  0  0 1321  93 420  0  1 99
    0 4 0  75672 944424    0  0  0  0  0  0  0 1309  41 399  0  0 100
    0 4 0  75672 944424  29  0  0  0  26  0  0 1329  82 441  0  0 100
    procs      memory      page                  disk  faults      cpu
    r b w    avm    fre  flt  re  pi  po  fr  sr ad0  in  sy  cs us sy id
    1 4 0  75672 944424  29  0  0  0  26  0  0 1335  81 449  0  0 100
    0 4 0  75672 944424    1  0  0  0  0  0  0 1324  49 426  0  0 100
    0 4 0  75672 944424  29  0  0  11  26  0  13 1341  81 475  0  0 100
    0 4 0  75672 944424    0  0  0  0  0  0  0 1335  44 448  0  0 100
    0 4 0  75672 944424  29  0  0  0  26  0  0 1329  81 442  0  0 100
    0 4 0  75672 944424  29  0  0  0  27  0  4 1330  82 440  0  0 100
    0 4 0  75672 944424  341  0  0  0 292  0  0 1329  480 446  1  1 98
    0 4 0  75692 944420  30  0  0  0  26  0  0 1377  434 512  1  0 99
    0 4 0  75692 944416 1371  0  0  0 1241  0  0 1401 2146 617  2  4 95
    0 4 0  75692 944416  29  0  0  0  26  0  0 1330  86 432  0  0 100
    0 4 0  75672 944420  29  0  0  0  26  0  0 1318  96 419  0  1 99
    0 4 0  75672 944420    1  0  0  0  0  0  0 1321  49 424  0  0 100
    0 4 0  75672 944420  29  0  0  1  26  0  2 1317  82 420  0  0 100
    0 4 0  75672 944420    0  0  0  0  0  0  0 1327  41 431  0  0 100
    0 7 0  78252 943488 1833  0  0  0 1498  0  1 1322 2024 476  2  4 94
    0 5 0  76556 944064  104  0  0  0 135  0  7 1386  164 540  0  1 99
    0 4 0  75672 944420  175  0  0  0 177  0  0 1326  180 439  0  1 99
    0 4 0  75672 944420  29  0  0  0  26  0  0 1316  81 415  0  0 99
    0 4 0  75672 944420    0  0  0  0  0  0  0 1315  42 410  0  0 100

    The system logs don't show any problems, just full of dhcp messages like this:

    Mar 23 22:13:11 last message repeated 3 times
    Mar 23 22:03:37 last message repeated 2 times
    Mar 23 21:55:27 last message repeated 5 times
    Mar 23 21:39:39 dnsmasq[634]: reading /var/dhcpd/var/db/dhcpd.leases

    Any other help and tips are much appreciated.  everythign gets restored after a few minutes and I don't think it is related to my isp either.  This is a relatively small LAN, 3 wired PC's, 3 wireless laptops, WAP, 2 Switches.  Status > Interfaces looks normal as well (Intel Pro 1000's).  Thanks a ton friends.

    EDIT

    Top shows:

    last pid:  3561;  load averages:  0.08,  0.07,  0.02                                                                      up 0+20:30:02  22:26:19
    36 processes:  1 running, 34 sleeping, 1 zombie
    CPU states:  0.0% user,  0.4% nice,  0.0% system,  0.0% interrupt, 99.6% idle
    Mem: 43M Active, 9536K Inact, 24M Wired, 13M Buf, 920M Free
    Swap: 1024M Total, 1024M Free

    PID USERNAME  THR PRI NICE  SIZE    RES STATE    TIME  WCPU COMMAND
    1065 root        1  8  20  1844K  1284K wait    0:11  0.00% sh
    98040 root        1  8  20  1272K  716K nanslp  0:03  0.00% check_reload_status
      904 root        1  8    0  1720K  1156K wait    0:02  0.00% sh
      577 root        1  4    0  4360K  3628K kqread  0:02  0.00% lighttpd
      600 root        1  4    0 23688K 20888K accept  0:02  0.00% php
      344 root        1 -58    0  3712K  1828K bpf      0:01  0.00% tcpdump
      195 root        1  96    0  1440K  1016K select  0:01  0.00% syslogd
      836 dhcpd      1  96    0  2276K  1908K select  0:01  0.00% dhcpd
      345 root        1  -8    0  1276K  728K piperd  0:01  0.00% logger
      294 _dhcp      1  96    0  1468K  1092K select  0:01  0.00% dhclient
      906 root        1  96    0  1344K  944K select  0:01  0.00% miniupnpd
      634 nobody      1  96    0  1468K  1128K select  0:00  0.00% dnsmasq
      882 _ntp        1  96    0  1344K  1056K select  0:00  0.00% ntpd
      581 root        1  4    0 23204K 20032K accept  0:00  0.00% php
    1017 root        1 116  20  3364K  2808K select  0:00  0.00% racoon
      546 proxy      1  4    0  704K  452K kqread  0:00  0.00% pftpx
      910 root        1  8    0  1384K  1016K nanslp  0:00  0.00% cron
    93305 root        1  96    0  5748K  2768K select  0:00  0.00% sshd
      412 root        1  96    0  2428K  1656K RUN      0:00  0.00% top
      883 root        1  96    0  1376K  1052K select  0:00  0.00% ntpd
      932 root        1  8    0  1268K  732K nanslp  0:00  0.00% minicron
      595 root        1  8    0 14924K  5096K wait    0:00  0.00% php
    93402 root        1  20    0  4528K  3508K pause    0:00  0.00% tcsh
      578 root        1  8    0 14924K  5096K wait    0:00  0.00% php
      949 root        1  8    0  1712K  1356K wait    0:00  0.00% login
    93377 root        1  8    0  1728K  1148K wait    0:00  0.00% sh
      239 root        1  -8    0  1268K  664K piperd  0:00  0.00% sshlockout_pf
      958 root        1  5    0  1724K  1144K ttyin    0:00  0.00% sh
      950 root        1  8    0  1720K  1140K wait    0:00  0.00% sh
      238 root        1  96    0  3064K  2416K select  0:00  0.00% sshd
      242 root        1  96    0  1468K  1044K select  0:00  0.00% dhclient
    2053 root        1  8  20  1256K  468K nanslp  0:00  0.00% sleep
    3556 root        1  8    0  1256K  468K nanslp  0:00  0.00% sleep
      110 root        1 113    0  504K  360K select  0:00  0.00% devd
    97981 root        1  -8    0  1384K  1068K piperd  0:00  0.00% cron



  • I've seen this behaviour with one of my installs as well. Fortunately it was only my home box.
    And pfSense still serves DHCP requests, right?

    I was able to ssh and HTTPS in from WAN (rushed to the office for that).
    I was seeing link UP and Downs on LAN in the log. However, it is NOT the cable or the switch port.
    I speak way to few BSD to see anything useful in the logs, though.

    Didn't you mention to use a bunch of Intel Pro100 NICs? Same over here. Dual flavour NIC.
    Since I switched to a 3Com 905B NIC this never happened again with the Pro100 still installed but not assigned.

    Do you have a chance to use another of those Intel NICs and keep observing? Might just be a hardware issue…



  • I've seen this behaviour when I have heavy torrent traffic on the wan. everything seems like out of states symptoms but clearly i have 50000 before I hit the roof. I can not surf, open new connections (slowly that is) but existing connections are fine. AFter I stop torrent traffic, it will recover itself and everything is back to normal.

    I don't know how to explain it as well.



  • Actually, I didn't have any significant traffic at the time it stopped.

    The kids were outdoors and their PCs off and my wife couldn't access anything outside after booting up her PC. That's when she called me…



  • What kernels are you running? Can you test with the SMP kernel if you are using the uniprocessor version currently?



  • How can I install the smp kernel on my box now? Can I copy something over?



  • Reinstall.



  • Eeks, if I re-install, it will overwrite everything I have then right? I've installed a few unofficial packages and would prefer not to re-install everything again. To isolate this bug, i want to keep everything the same, then just upgrade the kernel components and see if it works. I can mount the microdrive from another installation if that is what is required.

    Thanks.



  • Backup the configuration with Diagnostics -> Backup / Restore.  You will not loose anything.



  • @GoldServe:


    I've installed a few unofficial packages and would prefer not to re-install everything again.
    ...

    And you are sure none of your "unofficial packages" is causing issues?



  • Oh I totally missed that part.  Sorry, you are on your own on this one.



  • @hoba:

    @GoldServe:


    I've installed a few unofficial packages and would prefer not to re-install everything again.
    ...

    And you are sure none of your "unofficial packages" is causing issues?

    Oh, I'm sure quite SURE the unofficial package isn't the cause. It's just a little client and nothing like squid or proxies or anything that interferes with traffic.



  • @sullrich:

    Sorry, you are on your own on this one.

    I think MC doesn't do any harm, does it?

    BTW, before I made a fresh install of 1.2final I did a backup of my config. Restoring that now gives a blank webGUI on HTTP what was httpS previously. The only thing that helps is a 'factory defaults' from local console. I can repeat this.
    Anybody willing to look into that config.xml for me?


Locked