Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    SG1100 memory starvation - Unbound not restarting

    Scheduled Pinned Locked Moved General pfSense Questions
    13 Posts 3 Posters 484 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      michmoor LAYER 8 Rebel Alliance
      last edited by

      I have set up an SG1100 remotely and its been good for 77 days so far.
      This evening i received a call from the site that the Internet is down. Checking my email alerts i do see down servers but i do not see any gateway events emails which would indicate a outage or internet issue.
      I was able to log into the firewall via IPsec that's a good sign. I noticed that the unbound service is not started.

      Here are the logs when i try to restart the service - multiple times
      I don't know how to get Unbound working. Memory usage is at 65% and i stopped pfblocker DNSBL thinking there was some memory contention there with the block list but its fine.

      Any ideas how to get unbound working?

      ct 11 20:53:37 nyc-fw1-inet kernel: pid 12767 (unbound), jid 0, uid 59, was killed: failed to reclaim memory
      Oct 11 20:53:37 nyc-fw1-inet bfdd[5688]: [HKQ2F-8D0MY][EC 100663315] Thread Starvation: {(thread *)0x23092efb1b00 arg=0x23092ef9a800 timer  r=-32.085    bfd_xmt_cb() &bs->xmttimer_ev from bfdd/event.c:75} was scheduled to pop greater than 4s ago
      Oct 11 20:53:52 nyc-fw1-inet sshd[3945]: Accepted keyboard-interactive/pam for admin from 192.168.70.26 port 55964 ssh2
      Oct 11 20:55:13 nyc-fw1-inet mgmtd[2919]: [R9P89-HWKBA][EC 100663314] STARVATION: command took 7375ms (cpu time 3ms): echo PING
      Oct 11 20:55:50 nyc-fw1-inet kernel: pid 75937 (unbound), jid 0, uid 59, was killed: a thread waited too long to allocate a page
      Oct 11 20:55:50 nyc-fw1-inet bfdd[5688]: [HKQ2F-8D0MY][EC 100663315] Thread Starvation: {(thread *)0x23092efb1b00 arg=0x23092ef9a800 timer  r=-49.496    bfd_xmt_cb() &bs->xmttimer_ev from bfdd/event.c:75} was scheduled to pop greater than 4s ago
      Oct 11 20:55:51 nyc-fw1-inet vnstatd[57642]: Warning: Writing cached data to database took 42.7 seconds.
      
      
      
      Oct 11 20:56:36 nyc-fw1-inet php-fpm[47822]: /pfblockerng/pfblockerng_dnsbl.php: Beginning configuration backup to https://acb.netgate.com/save
      Oct 11 20:57:12 nyc-fw1-inet php[27257]: /usr/local/sbin/acbupload.php: An error occurred while uploading the encrypted Netgate pfSense Plus configuration to https://acb.netgate.com/save ( Unable to resolve acb.netgate.com ) -  Unable to resolve acb.netgate.com
      Oct 11 20:57:12 nyc-fw1-inet php[27257]: /usr/local/sbin/acbupload.php: New alert found: An error occurred while uploading the encrypted Netgate pfSense Plus configuration to https://acb.netgate.com/save ( Unable to resolve acb.netgate.com )
      Oct 11 20:58:33 nyc-fw1-inet kernel: pid 22633 (unbound), jid 0, uid 59, was killed: failed to reclaim memory
      Oct 11 20:58:33 nyc-fw1-inet bfdd[5688]: [HKQ2F-8D0MY][EC 100663315] Thread Starvation: {(thread *)0x23092ee42780 arg=0x23092ef9a800 timer  r=-37.695    bfd_xmt_cb() &bs->xmttimer_ev from bfdd/event.c:75} was scheduled to pop greater than 4s ago
      Oct 11 20:58:33 nyc-fw1-inet mgmtd[2919]: [R9P89-HWKBA][EC 100663314] STARVATION: command took 75204ms (cpu time 0ms): echo PING
      Oct 11 20:58:33 nyc-fw1-inet staticd[3994]: [R9P89-HWKBA][EC 100663314] STARVATION: command took 75202ms (cpu time 4ms): echo PING
      Oct 11 20:58:34 nyc-fw1-inet mgmtd[2919]: [J2RAS-MZ95C] Terminating on signal
      Oct 11 20:58:34 nyc-fw1-inet staticd[3994]: [MRN6F-AYZC4] Terminating on signal
      Oct 11 20:58:37 nyc-fw1-inet php-cgi[55116]: notify_monitor.php: Message sent to admin@networkingtitan.com OK
      Oct 11 20:58:38 nyc-fw1-inet mgmtd[83642]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
      Oct 11 20:58:38 nyc-fw1-inet staticd[84057]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
      Oct 11 20:58:38 nyc-fw1-inet bfdd[5688]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
      Oct 11 20:58:38 nyc-fw1-inet mgmtd[83642]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:01
      Oct 11 20:58:39 nyc-fw1-inet staticd[84057]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
      Oct 11 20:58:39 nyc-fw1-inet bfdd[5688]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
      

      Firewall: NetGate,Palo Alto-VM,Juniper SRX
      Routing: Juniper, Arista, Cisco
      Switching: Juniper, Arista, Cisco
      Wireless: Unifi, Aruba IAP
      JNCIP,CCNP Enterprise

      M 1 Reply Last reply Reply Quote 0
      • M
        michmoor LAYER 8 Rebel Alliance @michmoor
        last edited by michmoor

        @michmoor

        I may have found the starting event to this issue. I see that the ports on the firewall went down and thereby initiated a bunch of process restarts...I'm thinking that's what killed Unbound. OOM killer got to it.

         cat system.log | grep check_reload_status
        Oct  6 17:55:38 790CCV-FW check_reload_status[666]: Syncing firewall
        Oct  6 17:56:02 nyc-fw1-inet check_reload_status[666]: Reloading filter
        Oct 10 22:33:14 nyc-fw1-inet check_reload_status[666]: Syncing firewall
        Oct 11 20:35:30 nyc-fw1-inet check_reload_status[666]: Linkup starting $e6000sw0port3
        Oct 11 20:35:34 nyc-fw1-inet check_reload_status[666]: updating dyndns WAN_DHCP
        Oct 11 20:35:34 nyc-fw1-inet check_reload_status[666]: Restarting IPsec tunnels
        Oct 11 20:35:34 nyc-fw1-inet check_reload_status[666]: Restarting OpenVPN tunnels/interfaces
        Oct 11 20:35:34 nyc-fw1-inet check_reload_status[666]: Reloading filter
        Oct 11 20:35:36 nyc-fw1-inet check_reload_status[666]: Linkup starting $e6000sw0port3
        Oct 11 20:35:37 nyc-fw1-inet check_reload_status[666]: Linkup starting $e6000sw0port2
        Oct 11 20:35:39 nyc-fw1-inet check_reload_status[666]: Reloading filter
        Oct 11 20:35:40 nyc-fw1-inet check_reload_status[666]: rc.newwanip starting mvneta0.4090
        Oct 11 20:35:40 nyc-fw1-inet check_reload_status[666]: Restarting IPsec tunnels
        Oct 11 20:35:40 nyc-fw1-inet check_reload_status[666]: Linkup starting $e6000sw0port2
        Oct 11 20:35:50 nyc-fw1-inet check_reload_status[666]: Reloading filter
        Oct 11 20:35:54 nyc-fw1-inet check_reload_status[666]: Reloading filter
        Oct 11 20:35:57 nyc-fw1-inet check_reload_status[666]: updating dyndns wan
        Oct 11 20:35:58 nyc-fw1-inet check_reload_status[666]: Reloading filter
        Oct 11 20:36:06 nyc-fw1-inet check_reload_status[666]: Restarting IPsec tunnels
        Oct 11 20:36:13 nyc-fw1-inet check_reload_status[666]: updating dyndns lan
        Oct 11 20:36:13 nyc-fw1-inet check_reload_status[666]: Reloading filter
        Oct 11 20:36:13 nyc-fw1-inet check_reload_status[666]: Reloading filter
        Oct 11 20:36:16 nyc-fw1-inet check_reload_status[666]: Reloading filter
        Oct 11 20:36:22 nyc-fw1-inet check_reload_status[666]: Reloading filter
        Oct 11 20:36:30 nyc-fw1-inet check_reload_status[666]: Starting packages
        Oct 11 20:36:31 nyc-fw1-inet check_reload_status[666]: Reloading filter
        Oct 11 20:36:32 nyc-fw1-inet check_reload_status[666]: Syncing firewall
        Oct 11 20:36:33 nyc-fw1-inet check_reload_status[666]: Syncing firewall
        Oct 11 20:56:36 nyc-fw1-inet check_reload_status[666]: Syncing firewall
        
        

        EDIT: Stopping pfBlocker completely brought Unbound back from the dead.

        Firewall: NetGate,Palo Alto-VM,Juniper SRX
        Routing: Juniper, Arista, Cisco
        Switching: Juniper, Arista, Cisco
        Wireless: Unifi, Aruba IAP
        JNCIP,CCNP Enterprise

        M 1 Reply Last reply Reply Quote 0
        • M
          michmoor LAYER 8 Rebel Alliance @michmoor
          last edited by

          Which ports are these on the SG1100 ??

          ./system.log:Oct 11 20:35:30 nyc-fw1-inet check_reload_status[666]: Linkup starting $e6000sw0port3
          ./system.log:Oct 11 20:35:30 nyc-fw1-inet kernel: e6000sw0port3: link state changed to DOWN
          ./system.log:Oct 11 20:35:36 nyc-fw1-inet check_reload_status[666]: Linkup starting $e6000sw0port3
          ./system.log:Oct 11 20:35:37 nyc-fw1-inet kernel: e6000sw0port3: link state changed to UP
          
          
          
          ./system.log:Oct 11 20:35:37 nyc-fw1-inet check_reload_status[666]: Linkup starting $e6000sw0port2
          ./system.log:Oct 11 20:35:37 nyc-fw1-inet kernel: e6000sw0port2: link state changed to DOWN
          ./system.log:Oct 11 20:35:40 nyc-fw1-inet kernel: e6000sw0port2: link state changed to UP
          ./system.log:Oct 11 20:35:40 nyc-fw1-inet check_reload_status[666]: Linkup starting $e6000sw0port2
          
          

          Firewall: NetGate,Palo Alto-VM,Juniper SRX
          Routing: Juniper, Arista, Cisco
          Switching: Juniper, Arista, Cisco
          Wireless: Unifi, Aruba IAP
          JNCIP,CCNP Enterprise

          S 1 Reply Last reply Reply Quote 0
          • S
            SteveITS Galactic Empire @michmoor
            last edited by

            @michmoor I don’t see that in the docs. Is there an ISP modem that booted? pfSense would see that as a disconnect.

            Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
            When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
            Upvote 👍 helpful posts!

            M 1 Reply Last reply Reply Quote 0
            • M
              michmoor LAYER 8 Rebel Alliance @SteveITS
              last edited by

              @SteveITS no ISP modem that booted. Two link down events from what appears to be two ports but nothing on the WAN or LAN side was restarted. Maybe these are internal ports as the 1100 has a switch? I dunno

              Firewall: NetGate,Palo Alto-VM,Juniper SRX
              Routing: Juniper, Arista, Cisco
              Switching: Juniper, Arista, Cisco
              Wireless: Unifi, Aruba IAP
              JNCIP,CCNP Enterprise

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Port3 is WAN port2 is LAN. By default at least.

                Screenshot from 2024-10-12 14-03-03.png

                M 1 Reply Last reply Reply Quote 0
                • M
                  michmoor LAYER 8 Rebel Alliance @stephenw10
                  last edited by

                  @stephenw10
                  d3c22937-00ee-4ce8-a21d-454e7a8bebbe-image.png

                  So based on my syslog, both WAN and LAN went down?

                  Firewall: NetGate,Palo Alto-VM,Juniper SRX
                  Routing: Juniper, Arista, Cisco
                  Switching: Juniper, Arista, Cisco
                  Wireless: Unifi, Aruba IAP
                  JNCIP,CCNP Enterprise

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Yes, that's what's logged there. Since it's actually the switch driver reporting that it pretty much has to have happened.

                    M 1 Reply Last reply Reply Quote 0
                    • M
                      michmoor LAYER 8 Rebel Alliance @stephenw10
                      last edited by

                      @stephenw10 hmm than i suspect something is wrong with the unit.

                      One port directly connects to a cable modem.
                      The other port directly connects to a switch.
                      They both go down at the same time?

                      Is there anything else I can check to monitor hardware health?

                      Firewall: NetGate,Palo Alto-VM,Juniper SRX
                      Routing: Juniper, Arista, Cisco
                      Switching: Juniper, Arista, Cisco
                      Wireless: Unifi, Aruba IAP
                      JNCIP,CCNP Enterprise

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        I assume both those things didn't reboot?

                        It could have been pfSense driving the switch ports to re-link of course. A change to the internal switch config perhaps.

                        M 1 Reply Last reply Reply Quote 0
                        • M
                          michmoor LAYER 8 Rebel Alliance @stephenw10
                          last edited by

                          @stephenw10
                          The modem , pfsense and switch are on the same PDU. So it wasn't a loss of power. By some sort of freak act there could be a fault on two outlets but highly unlikely.
                          No configuration changes have taken place on this unit since 10/6 according to Configuration History.
                          Im back to thinking its perhaps a faulty unit. Replacement here would be a real PITA. Hoping there are some other diagnostics i can perform.

                          Firewall: NetGate,Palo Alto-VM,Juniper SRX
                          Routing: Juniper, Arista, Cisco
                          Switching: Juniper, Arista, Cisco
                          Wireless: Unifi, Aruba IAP
                          JNCIP,CCNP Enterprise

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            Is there anything logged running up to that? Some interface change etc?

                            M 1 Reply Last reply Reply Quote 0
                            • M
                              michmoor LAYER 8 Rebel Alliance @stephenw10
                              last edited by

                              @stephenw10

                              This is the only thing i see prior

                              Oct 11 20:16:00 nyc-fw1-inet sshguard[69504]: Exiting on signal.
                              Oct 11 20:16:00 nyc-fw1-inet sshguard[77474]: Now monitoring attacks.
                              Oct 11 20:35:30 nyc-fw1-inet check_reload_status[666]: Linkup starting $e6000sw0port3
                              Oct 11 20:35:30 nyc-fw1-inet kernel: e6000sw0port3: link state changed to DOWN
                              Oct 11 20:35:31 nyc-fw1-inet php-fpm[20159]: /rc.linkup: Hotplug event detected for WAN(wan) dynamic IP address (4: dhcp)
                              Oct 11 20:35:31 nyc-fw1-inet php-fpm[20159]: /rc.linkup: DEVD Ethernet detached event for wan
                              Oct 11 20:35:33 nyc-fw1-inet syslogd: sendto: Network is down
                              Oct 11 20:35:33 nyc-fw1-inet syslogd: sendto: Network is down
                              Oct 11 20:35:33 nyc-fw1-inet syslogd: sendto: Network is down
                              Oct 11 20:35:33 nyc-fw1-inet syslogd: sendto: Network is down
                              Oct 11 20:35:33 nyc-fw1-inet syslogd: sendto: Network is down
                              Oct 11 20:35:33 nyc-fw1-inet syslogd: sendto: Network is down
                              Oct 11 20:35:33 nyc-fw1-inet syslogd: sendto: Network is down
                              Oct 11 20:35:33 nyc-fw1-inet syslogd: sendto: Network is down
                              Oct 11 20:35:33 nyc-fw1-inet syslogd: sendto: Network is down
                              Oct 11 20:35:33 nyc-fw1-inet syslogd: sendto: Network is down
                              
                              

                              I do see that the LAN side had a Hotplug event as well. Looking at the timestamps the LAN side event happened more or less at the same time as the WAN side.
                              To me this indicates either

                              1. As part of any link-status event, pfSense restarts the internal switch ports
                              2. There was some weird failure on both LAN and WAN side which i honestly don't see happening.
                              3. Other cause not yet known.
                              ][admin@nyc-fw1-inet.moore.lan]/var/log: cat system.log | grep "Hotplug"
                              Oct 11 20:35:31 nyc-fw1-inet php-fpm[20159]: /rc.linkup: Hotplug event detected for WAN(wan) dynamic IP address (4: dhcp)
                              Oct 11 20:35:38 nyc-fw1-inet php-fpm[55571]: /rc.linkup: Hotplug event detected for LAN(lan) dynamic IP address (4: 192.168.70.254, 6: track6)
                              Oct 11 20:35:39 nyc-fw1-inet php-fpm[17116]: /rc.linkup: Hotplug event detected for WAN(wan) dynamic IP address (4: dhcp)
                              Oct 11 20:35:54 nyc-fw1-inet php-fpm[20159]: /rc.linkup: Hotplug event detected for LAN(lan) dynamic IP address (4: 192.168.70.254, 6: track6)
                              

                              Firewall: NetGate,Palo Alto-VM,Juniper SRX
                              Routing: Juniper, Arista, Cisco
                              Switching: Juniper, Arista, Cisco
                              Wireless: Unifi, Aruba IAP
                              JNCIP,CCNP Enterprise

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.