Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Need help finding why memory and swap are full

    Scheduled Pinned Locked Moved General pfSense Questions
    11 Posts 2 Posters 4.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • B Offline
      brianb
      last edited by

      I've inherited a pfsense 2.0.1 box. Before this weekend it had a flawless uptime of 500+ days. Out of the blue internet traffic was interrupted for my users. I checked the dashboard of the pfsense box and memory and swap usage in the dashboard showed maxed out and the web gui is slower than molasses. After hours of chasing ghosts I soft and hard rebooted the box. After the hard reboot the internet traffic came back but memory and swap usage has climed back up to 100. I know zero about unix, linux, freebsd, etc… I've pulled the following information from the firewall:

      Diagnostic>System Activity
      last pid: 32814;  load averages: 58.23, 59.80, 61.22  up 2+00:50:53    18:55:53
      487 processes: 62 running, 386 sleeping, 22 zombie, 17 waiting
      
      Mem: 724M Active, 65M Inact, 166M Wired, 26M Cache, 110M Buf, 4408K Free
      Swap: 2048M Total, 2042M Used, 6508K Free, 99% Inuse
      
        PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
         12 root     -68    -     0K   160K CPU1    1  25.9H 54.39% {irq17: xl0 rl2}
         12 root     -68    -     0K   160K RUN     1  24.9H 53.76% {irq18: rl0}
         12 root     -68    -     0K   160K WAIT    0  26.3H 51.95% {irq19: rl1}
         12 root     -68    -     0K   160K WAIT    0 414:12 16.55% {irq16: dc0}
         12 root     -64    -     0K   160K WAIT    0 149:19  4.49% {irq5: uhci0 uhci}
      32464 root      96    0 54620K 10272K RUN     0   0:00  0.98% php
         17 root      44    -     0K     8K psleep  0  73:37  0.88% pagedaemon
      23184 root      45    0  3316K   476K nanslp  0  69:01  0.88% logger
         12 root     -32    -     0K   160K RUN     1  48:36  0.29% {swi4: clock}
      60330 root      96    0  6052K   772K RUN     1   7:09  0.29% perl5.14.2
      34837 root      96    0  6052K   772K RUN     1   5:46  0.29% perl5.14.2
      19241 root      96    0  6052K   776K RUN     1   2:03  0.29% perl5.14.2
      15674 root      96    0  6052K   776K RUN     1   3:01  0.20% perl5.14.2
      58316 root      96    0  6052K   772K RUN     1   7:57  0.10% perl5.14.2
      27743 root      96    0  6052K   780K RUN     1   5:58  0.10% perl5.14.2
      27903 root      96    0  6052K   772K RUN     1   5:38  0.10% perl5.14.2
       4990 root      96    0  6052K   776K RUN     1   3:54  0.10% perl5.14.2
      47321 root      96    0  6052K   784K RUN     1   1:14  0.10% perl5.14.2
      
      Run command:  $ top
      
      last pid:  3871;  load averages: 72.68, 67.09, 63.85  up 2+01:01:01    19:06:01
      437 processes: 73 running, 342 sleeping, 22 zombie
      
      Mem: 724M Active, 65M Inact, 166M Wired, 26M Cache, 110M Buf, 3572K Free
      Swap: 2048M Total, 2042M Used, 5832K Free, 99% Inuse
      
        PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
      23184 root        1  45    0  3316K   464K nanslp  0  69:13  0.88% logger
      52346 root        1  49    0 53596K  8104K piperd  1   0:00  0.59% php
      23871 root        1  96    0  4948K  1096K RUN     0  28:42  0.00% syslogd
      23090 root        1  44    0  5912K   840K pipewr  1   9:38  0.00% tcpdump
      58316 root        1  96    0  6052K   772K RUN     1   7:58  0.00% perl5.14.2
      60330 root        1  96    0  6052K   772K RUN     1   7:11  0.00% perl5.14.2
      62913 root        1  96    0  6052K   772K RUN     1   7:03  0.00% perl5.14.2
       2641 root        1  96    0  6052K   772K RUN     1   6:39  0.00% perl5.14.2
      37461 root        1  96    0  6052K   772K RUN     1   6:25  0.00% perl5.14.2
      27743 root        1  96    0  6052K   772K RUN     1   6:00  0.00% perl5.14.2
      34837 root        1  96    0  6052K   772K RUN     1   5:47  0.00% perl5.14.2
      27903 root        1  96    0  6052K   772K RUN     0   5:39  0.00% perl5.14.2
      54383 root        1  96    0  6052K   776K RUN     1   5:15  0.00% perl5.14.2
      41645 root        1  96    0  6052K   776K RUN     0   4:57  0.00% perl5.14.2
      31429 root        1  96    0  6052K   776K RUN     1   4:33  0.00% perl5.14.2
      27978 root        1  96    0  6052K   776K RUN     0   4:14  0.00% perl5.14.2
      30543 root        1  96    0  6052K   776K RUN     0   4:09  0.00% perl5.14.2
       4990 root        1  96    0  6052K   776K RUN     0   3:55  0.00% perl5.14.2
      
      

      command prompt: ps uxawww (Results are attached as a text file due to the forum not liking my use of the code formatting around it)

      The console is being spammed with the following entries:

      Nov 24 19:53:00 kernel: interrupt storm detected on "irq16:"; throttling interrupt source
      Nov 24 19:53:00 kernel: swap_pager_getswapspace(3): failed
      Nov 24 19:53:00 kernel: interrupt storm detected on "irq16:"; throttling interrupt source
      Nov 24 19:52:59 kernel: swap_pager_getswapspace(2): failed
      Nov 24 19:52:59 kernel: swap_pager_getswapspace(6): failed
      Nov 24 19:52:59 kernel: swap_pager_getswapspace(16): failed
      Nov 24 19:52:59 kernel: swap_pager_getswapspace(2): failed
      Nov 24 19:52:59 kernel: swap_pager_getswapspace(3): failed
      Nov 24 19:52:59 kernel: swap_pager_getswapspace(2): failed
      Nov 24 19:52:59 kernel: interrupt storm detected on "irq16:"; throttling interrupt source
      Nov 24 19:52:59 kernel: interrupt storm detected on "irq16:"; throttling interrupt source

      The box has squid, squid guard, and openvpn packages installed. The squid and squid guard services aren't running as they did not restart after I hard rebooted the machine and I never manually started them.

      I'm not sure what more information to provide. I'm just regurgitating the information I read to retrieve while searching for solutions on the internet.
      [ps uxawww.txt](/public/imported_attachments/1/ps uxawww.txt)

      1 Reply Last reply Reply Quote 0
      • stephenw10S Online
        stephenw10 Netgate Administrator
        last edited by

        400+ processes is way too many. Looks like lightsquid is having a problem.
        What packages are you running? What hardware are you running on? What sort of network is it in front of?

        Steve

        Edit: I see you've added some of that.

        1 Reply Last reply Reply Quote 0
        • B Offline
          brianb
          last edited by

          Yeah, the forum chopped off part of my post and I didn't notice until just now. I apologize for that.

          It's running on a Pentium III machine with 1GB ram, 80GB HDD, and 4 network cards. I'm not really sure how to answer the network question. It's the firewall and dns forwarder for a network of several managed switches of mixed manufacture, two windows domain controllers, an exchange server, a file server, a backup server, a nas, and about 50 workstations. Only one person uses the openvpn package for a vpn connection. The network is split into three vlans. I apologize if this isn't want you were asking for.

          1 Reply Last reply Reply Quote 0
          • stephenw10S Online
            stephenw10 Netgate Administrator
            last edited by

            That's pretty much exactly what I was asking.
            You have one NIC for each vlan and one for wan? In which case are your switches handling the vlans? There are no vlan interfaces in the pfSense box? It's probably not relevant but it's best to get an idea of what you're desling with.
            Are you using lightsquid? Have you ever used it? If the squid and squidguard logs aren't present it's going to have a hard time, you should disable it at least.

            Steve

            1 Reply Last reply Reply Quote 0
            • B Offline
              brianb
              last edited by

              Yes, one nic for each vlan and wan. The vlans are configured as interfaces on the firewall. Most traffic uses one vlan. One is setup to segregate a few public computers. The other currently isn't used. Oddly enough we never experienced lan problems when the box went haywire. We only had an issue with port 80 traffic until we rebooted the machine that pfsense is loaded on. Yes we were using lightsquid for reporting. I have just started a package removal for lightsquid but it's moving very slowly, but it is moving.

              1 Reply Last reply Reply Quote 0
              • stephenw10S Online
                stephenw10 Netgate Administrator
                last edited by

                In your system activity output you have 5 interfaces listed: rl0-2, dc0 and xl0. Is one of those unassigned? Irq16 seems to be causing an interrupt storm, dc0 is using that IRQ. Something else may be though, what do you see from 'vmstat -i'?

                Steve

                1 Reply Last reply Reply Quote 0
                • B Offline
                  brianb
                  last edited by

                  Under the interfaces tab I get (assign), LAN, VLAN3, VLAN4, WAN. The three vlans and the wan have the enabled check box checked. I cannot currently get the interfaces (assign) page to load. Below is the results of the Status>Interfaces page. I apologize if this isn't what you were referring to.

                  WAN interface (xl0)  
                  Status up  
                  MAC address 00:60:97:a1:8d:a4  
                  IP address xx.xx.109.126    
                  Subnet mask 255.255.255.240  
                  Gateway Windstream xx.xx.109.113  
                  ISP DNS servers 127.0.0.1
                  xx.xx.222.222
                  xx.xx.220.220
                  
                  Media 100baseTX <full-duplex>In/out packets 1942932/1926379 (1.27 GB/747.06 MB)  
                  In/out packets (pass) 1926379/1844701 (1.27 GB/747.06 MB)  
                  In/out packets (block) 16553/0 (1.14 MB/0 bytes)  
                  In/out errors 4/0  
                  Collisions 0  
                  
                  LAN interface (rl0)  
                  Status up  
                  MAC address 00:50:ba:5d:3f:9f  
                  IP address 192.168.241.254    
                  Subnet mask 255.255.255.0  
                  Media 100baseTX <full-duplex>In/out packets 7062023361/1488130 (211.12 GB/779.59 MB)  
                  In/out packets (pass) 1488130/1418930 (711.83 MB/779.59 MB)  
                  In/out packets (block) 7060535231/0 (210.42 GB/0 bytes)  
                  In/out errors 75180/0  
                  Collisions 0  
                  
                  VLAN3 interface (rl1_vlan3)  
                  Status up  
                  MAC address 00:50:ba:ba:a1:85  
                  IP address 192.168.12.254    
                  Subnet mask 255.255.255.0  
                  Media 100baseTX <full-duplex>In/out packets 369100/352325 (37.78 MB/436.87 MB)  
                  In/out packets (pass) 351807/404665 (36.49 MB/436.84 MB)  
                  In/out packets (block) 17293/518 (1.29 MB/28 KB)  
                  In/out errors 0/108  
                  Collisions 52  
                  
                  VLAN4 interface (rl2_vlan4)  
                  Status up  
                  MAC address 00:e0:29:6f:8e:3c  
                  IP address 192.168.11.254    
                  Subnet mask 255.255.255.0  
                  Media 100baseTX <full-duplex>In/out packets 23020/23012 (5.14 MB/29.92 MB)  
                  In/out packets (pass) 23012/29450 (5.14 MB/29.92 MB)  
                  In/out packets (block) 8/0 (506 bytes/0 bytes)  
                  In/out errors 0/0  
                  Collisions 0</full-duplex></full-duplex></full-duplex></full-duplex> 
                  

                  Here is the command you requested to be ran.

                  $ vmstat -i
                  interrupt                          total       rate
                  irq1: atkbd0                           8          0
                  irq5: uhci0 uhci1             2980212854      14796
                  irq6: fdc0                             2          0
                  irq14: ata0                      4424609         21
                  irq15: ata1                           68          0
                  irq16: dc0                     729307791       3620
                  irq17: xl0 rl2                1512276573       7508
                  irq18: rl0                    1257810302       6244
                  irq19: rl1                    2026324816      10060
                  cpu0: timer                    402790186       1999
                  cpu1: timer                    402791167       1999
                  Total                         9315938376      46252
                  
                  
                  1 Reply Last reply Reply Quote 0
                  • stephenw10S Online
                    stephenw10 Netgate Administrator
                    last edited by

                    Interesting. So two things:
                    You're not using dc0 but that looks to be what's causing the interrupt flood. Is that an on board NIC perhaps? You may be able to disable it in the BIOS.
                    Although you have separate NICs for your local VLANs the actual tagged VLAN traffic is being trunked through to the pfSense NICs. This isn't necessary since you could handle the vlan tagging/untagging in the switch(es) which is potentially less problematic. However it's obviously been working fine for you so I wouldn't change it now.

                    The intrupt rate on rl1 (VLAN3) is significantly higher than the other NICs, is that where  most of your clients are?

                    The fact that squid and squidguard didn't start correctly is not a good sign. Possibly your HD is failing, if that was the case there would be evidece of it in the system log. Is there anything in the squid log to idicate why it didn't start?

                    Steve

                    1 Reply Last reply Reply Quote 0
                    • B Offline
                      brianb
                      last edited by

                      I will have to reboot the firewall after hours and inspect the bios.

                      Most of the clients are on LAN. VLAN3 is the vlan for the couple public computers I mentioned earlier.

                      I'm not finding any useful information inside the squid logs located in /var/squid/log/. The only logs located there are the access.log and cache.log logs. Below is a list of all of the log files and directories relating to squid I knew how to find:

                      [2.0.1-RELEASE][admin@kit-pfsense.thekitcheninc.org]/root(1): cd /var/squid/
                      [2.0.1-RELEASE][admin@kit-pfsense.thekitcheninc.org]/var/squid(2): ls
                      acl   cache log   logs
                      [2.0.1-RELEASE][admin@kit-pfsense.thekitcheninc.org]/var/squid(3): cd /var/squid/log
                      [2.0.1-RELEASE][admin@kit-pfsense.thekitcheninc.org]/var/squid/log(4): ls
                      access.log   access.log.2 access.log.5 cache.log.0  cache.log.3  cache.log.6
                      access.log.0 access.log.3 access.log.6 cache.log.1  cache.log.4
                      access.log.1 access.log.4 cache.log    cache.log.2  cache.log.5
                      [2.0.1-RELEASE][admin@kit-pfsense.thekitcheninc.org]/var/squid/log(5): cd /var/squid/logs
                      [2.0.1-RELEASE][admin@kit-pfsense.thekitcheninc.org]/var/squid/logs(6): ls
                      cache.log
                      [2.0.1-RELEASE][admin@kit-pfsense.thekitcheninc.org]/var/squid/logs(7): cd /var/squid/cache
                      [2.0.1-RELEASE][admin@kit-pfsense.thekitcheninc.org]/var/squid/cache(8): ls
                      00               05               0A               0F
                      01               06               0B               swap.state
                      02               07               0C               swap.state.clean
                      03               08               0D
                      04               09               0E
                      [2.0.1-RELEASE][admin@kit-pfsense.thekitcheninc.org]/var/squid/cache(9): cd /var/log
                      [2.0.1-RELEASE][admin@kit-pfsense.thekitcheninc.org]/var/log(10): ls
                      apinger.log        lastlog            portalauth.log     system.log
                      dhcpd.log          lighttpd.error.log ppp.log            userlog
                      dmesg.boot         lighttpd.log       pptps.log          vpn.log
                      filter.log         ntpd.log           relayd.log         wireless.log
                      ipsec.log          openvpn.log        slbd.log
                      l2tps.log          poes.log           squidGuard.log
                      

                      I think I remember seeing part of a message mentioning sectors on the console after reboot but the interrupt storm was spamming the console so much I couldn't make out much more than the word sectors. Unfortunately I cleared the log after disabling system console logging in case the log getting spammed by the interrupt storm was dragging the system down. I realize in hindsight that was really dumb. However checking the HEALTH and SMART information of the drive through pfsense shows no failures and a passing grade on the SMART assessment.

                      I did find this in the Portal Auth logs:

                      Nov 22 21:30:40 squid[12747]: Squid Parent: child process 13229 exited due to signal 9 
                      Nov 22 18:13:54 squid[57530]: Squid Parent: child process 59181 started 
                      
                      
                      1 Reply Last reply Reply Quote 0
                      • B Offline
                        brianb
                        last edited by

                        Just wanted to follow up on this. I ended up updating the firewall to the latest full release. I then reinstalled the squid and squid guard packages. Thankfully pfsense backs up the package configs before reinstalling them so I didn't need to change anything. I also disabled teh serial and parallel ports in the bios to get rid of the interrupt storm. Everything is running perfectly now.

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S Online
                          stephenw10 Netgate Administrator
                          last edited by

                          Thanks for following up, many don't.  :)
                          Good to hear you sorted it.

                          Steve

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post
                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.