SG1100 memory starvation - Unbound not restarting
-
I have set up an SG1100 remotely and its been good for 77 days so far.
This evening i received a call from the site that the Internet is down. Checking my email alerts i do see down servers but i do not see any gateway events emails which would indicate a outage or internet issue.
I was able to log into the firewall via IPsec that's a good sign. I noticed that the unbound service is not started.Here are the logs when i try to restart the service - multiple times
I don't know how to get Unbound working. Memory usage is at 65% and i stopped pfblocker DNSBL thinking there was some memory contention there with the block list but its fine.Any ideas how to get unbound working?
ct 11 20:53:37 nyc-fw1-inet kernel: pid 12767 (unbound), jid 0, uid 59, was killed: failed to reclaim memory Oct 11 20:53:37 nyc-fw1-inet bfdd[5688]: [HKQ2F-8D0MY][EC 100663315] Thread Starvation: {(thread *)0x23092efb1b00 arg=0x23092ef9a800 timer r=-32.085 bfd_xmt_cb() &bs->xmttimer_ev from bfdd/event.c:75} was scheduled to pop greater than 4s ago Oct 11 20:53:52 nyc-fw1-inet sshd[3945]: Accepted keyboard-interactive/pam for admin from 192.168.70.26 port 55964 ssh2 Oct 11 20:55:13 nyc-fw1-inet mgmtd[2919]: [R9P89-HWKBA][EC 100663314] STARVATION: command took 7375ms (cpu time 3ms): echo PING Oct 11 20:55:50 nyc-fw1-inet kernel: pid 75937 (unbound), jid 0, uid 59, was killed: a thread waited too long to allocate a page Oct 11 20:55:50 nyc-fw1-inet bfdd[5688]: [HKQ2F-8D0MY][EC 100663315] Thread Starvation: {(thread *)0x23092efb1b00 arg=0x23092ef9a800 timer r=-49.496 bfd_xmt_cb() &bs->xmttimer_ev from bfdd/event.c:75} was scheduled to pop greater than 4s ago Oct 11 20:55:51 nyc-fw1-inet vnstatd[57642]: Warning: Writing cached data to database took 42.7 seconds. Oct 11 20:56:36 nyc-fw1-inet php-fpm[47822]: /pfblockerng/pfblockerng_dnsbl.php: Beginning configuration backup to https://acb.netgate.com/save Oct 11 20:57:12 nyc-fw1-inet php[27257]: /usr/local/sbin/acbupload.php: An error occurred while uploading the encrypted Netgate pfSense Plus configuration to https://acb.netgate.com/save ( Unable to resolve acb.netgate.com ) - Unable to resolve acb.netgate.com Oct 11 20:57:12 nyc-fw1-inet php[27257]: /usr/local/sbin/acbupload.php: New alert found: An error occurred while uploading the encrypted Netgate pfSense Plus configuration to https://acb.netgate.com/save ( Unable to resolve acb.netgate.com ) Oct 11 20:58:33 nyc-fw1-inet kernel: pid 22633 (unbound), jid 0, uid 59, was killed: failed to reclaim memory Oct 11 20:58:33 nyc-fw1-inet bfdd[5688]: [HKQ2F-8D0MY][EC 100663315] Thread Starvation: {(thread *)0x23092ee42780 arg=0x23092ef9a800 timer r=-37.695 bfd_xmt_cb() &bs->xmttimer_ev from bfdd/event.c:75} was scheduled to pop greater than 4s ago Oct 11 20:58:33 nyc-fw1-inet mgmtd[2919]: [R9P89-HWKBA][EC 100663314] STARVATION: command took 75204ms (cpu time 0ms): echo PING Oct 11 20:58:33 nyc-fw1-inet staticd[3994]: [R9P89-HWKBA][EC 100663314] STARVATION: command took 75202ms (cpu time 4ms): echo PING Oct 11 20:58:34 nyc-fw1-inet mgmtd[2919]: [J2RAS-MZ95C] Terminating on signal Oct 11 20:58:34 nyc-fw1-inet staticd[3994]: [MRN6F-AYZC4] Terminating on signal Oct 11 20:58:37 nyc-fw1-inet php-cgi[55116]: notify_monitor.php: Message sent to admin@networkingtitan.com OK Oct 11 20:58:38 nyc-fw1-inet mgmtd[83642]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00 Oct 11 20:58:38 nyc-fw1-inet staticd[84057]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00 Oct 11 20:58:38 nyc-fw1-inet bfdd[5688]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00 Oct 11 20:58:38 nyc-fw1-inet mgmtd[83642]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:01 Oct 11 20:58:39 nyc-fw1-inet staticd[84057]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00 Oct 11 20:58:39 nyc-fw1-inet bfdd[5688]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
-
I may have found the starting event to this issue. I see that the ports on the firewall went down and thereby initiated a bunch of process restarts...I'm thinking that's what killed Unbound. OOM killer got to it.
cat system.log | grep check_reload_status Oct 6 17:55:38 790CCV-FW check_reload_status[666]: Syncing firewall Oct 6 17:56:02 nyc-fw1-inet check_reload_status[666]: Reloading filter Oct 10 22:33:14 nyc-fw1-inet check_reload_status[666]: Syncing firewall Oct 11 20:35:30 nyc-fw1-inet check_reload_status[666]: Linkup starting $e6000sw0port3 Oct 11 20:35:34 nyc-fw1-inet check_reload_status[666]: updating dyndns WAN_DHCP Oct 11 20:35:34 nyc-fw1-inet check_reload_status[666]: Restarting IPsec tunnels Oct 11 20:35:34 nyc-fw1-inet check_reload_status[666]: Restarting OpenVPN tunnels/interfaces Oct 11 20:35:34 nyc-fw1-inet check_reload_status[666]: Reloading filter Oct 11 20:35:36 nyc-fw1-inet check_reload_status[666]: Linkup starting $e6000sw0port3 Oct 11 20:35:37 nyc-fw1-inet check_reload_status[666]: Linkup starting $e6000sw0port2 Oct 11 20:35:39 nyc-fw1-inet check_reload_status[666]: Reloading filter Oct 11 20:35:40 nyc-fw1-inet check_reload_status[666]: rc.newwanip starting mvneta0.4090 Oct 11 20:35:40 nyc-fw1-inet check_reload_status[666]: Restarting IPsec tunnels Oct 11 20:35:40 nyc-fw1-inet check_reload_status[666]: Linkup starting $e6000sw0port2 Oct 11 20:35:50 nyc-fw1-inet check_reload_status[666]: Reloading filter Oct 11 20:35:54 nyc-fw1-inet check_reload_status[666]: Reloading filter Oct 11 20:35:57 nyc-fw1-inet check_reload_status[666]: updating dyndns wan Oct 11 20:35:58 nyc-fw1-inet check_reload_status[666]: Reloading filter Oct 11 20:36:06 nyc-fw1-inet check_reload_status[666]: Restarting IPsec tunnels Oct 11 20:36:13 nyc-fw1-inet check_reload_status[666]: updating dyndns lan Oct 11 20:36:13 nyc-fw1-inet check_reload_status[666]: Reloading filter Oct 11 20:36:13 nyc-fw1-inet check_reload_status[666]: Reloading filter Oct 11 20:36:16 nyc-fw1-inet check_reload_status[666]: Reloading filter Oct 11 20:36:22 nyc-fw1-inet check_reload_status[666]: Reloading filter Oct 11 20:36:30 nyc-fw1-inet check_reload_status[666]: Starting packages Oct 11 20:36:31 nyc-fw1-inet check_reload_status[666]: Reloading filter Oct 11 20:36:32 nyc-fw1-inet check_reload_status[666]: Syncing firewall Oct 11 20:36:33 nyc-fw1-inet check_reload_status[666]: Syncing firewall Oct 11 20:56:36 nyc-fw1-inet check_reload_status[666]: Syncing firewall
EDIT: Stopping pfBlocker completely brought Unbound back from the dead.
-
Which ports are these on the SG1100 ??
./system.log:Oct 11 20:35:30 nyc-fw1-inet check_reload_status[666]: Linkup starting $e6000sw0port3 ./system.log:Oct 11 20:35:30 nyc-fw1-inet kernel: e6000sw0port3: link state changed to DOWN ./system.log:Oct 11 20:35:36 nyc-fw1-inet check_reload_status[666]: Linkup starting $e6000sw0port3 ./system.log:Oct 11 20:35:37 nyc-fw1-inet kernel: e6000sw0port3: link state changed to UP ./system.log:Oct 11 20:35:37 nyc-fw1-inet check_reload_status[666]: Linkup starting $e6000sw0port2 ./system.log:Oct 11 20:35:37 nyc-fw1-inet kernel: e6000sw0port2: link state changed to DOWN ./system.log:Oct 11 20:35:40 nyc-fw1-inet kernel: e6000sw0port2: link state changed to UP ./system.log:Oct 11 20:35:40 nyc-fw1-inet check_reload_status[666]: Linkup starting $e6000sw0port2
-
@michmoor I don’t see that in the docs. Is there an ISP modem that booted? pfSense would see that as a disconnect.
-
@SteveITS no ISP modem that booted. Two link down events from what appears to be two ports but nothing on the WAN or LAN side was restarted. Maybe these are internal ports as the 1100 has a switch? I dunno
-
Port3 is WAN port2 is LAN. By default at least.
-
So based on my syslog, both WAN and LAN went down?
-
Yes, that's what's logged there. Since it's actually the switch driver reporting that it pretty much has to have happened.
-
@stephenw10 hmm than i suspect something is wrong with the unit.
One port directly connects to a cable modem.
The other port directly connects to a switch.
They both go down at the same time?Is there anything else I can check to monitor hardware health?
-
I assume both those things didn't reboot?
It could have been pfSense driving the switch ports to re-link of course. A change to the internal switch config perhaps.
-
@stephenw10
The modem , pfsense and switch are on the same PDU. So it wasn't a loss of power. By some sort of freak act there could be a fault on two outlets but highly unlikely.
No configuration changes have taken place on this unit since 10/6 according to Configuration History.
Im back to thinking its perhaps a faulty unit. Replacement here would be a real PITA. Hoping there are some other diagnostics i can perform. -
Is there anything logged running up to that? Some interface change etc?
-
This is the only thing i see prior
Oct 11 20:16:00 nyc-fw1-inet sshguard[69504]: Exiting on signal. Oct 11 20:16:00 nyc-fw1-inet sshguard[77474]: Now monitoring attacks. Oct 11 20:35:30 nyc-fw1-inet check_reload_status[666]: Linkup starting $e6000sw0port3 Oct 11 20:35:30 nyc-fw1-inet kernel: e6000sw0port3: link state changed to DOWN Oct 11 20:35:31 nyc-fw1-inet php-fpm[20159]: /rc.linkup: Hotplug event detected for WAN(wan) dynamic IP address (4: dhcp) Oct 11 20:35:31 nyc-fw1-inet php-fpm[20159]: /rc.linkup: DEVD Ethernet detached event for wan Oct 11 20:35:33 nyc-fw1-inet syslogd: sendto: Network is down Oct 11 20:35:33 nyc-fw1-inet syslogd: sendto: Network is down Oct 11 20:35:33 nyc-fw1-inet syslogd: sendto: Network is down Oct 11 20:35:33 nyc-fw1-inet syslogd: sendto: Network is down Oct 11 20:35:33 nyc-fw1-inet syslogd: sendto: Network is down Oct 11 20:35:33 nyc-fw1-inet syslogd: sendto: Network is down Oct 11 20:35:33 nyc-fw1-inet syslogd: sendto: Network is down Oct 11 20:35:33 nyc-fw1-inet syslogd: sendto: Network is down Oct 11 20:35:33 nyc-fw1-inet syslogd: sendto: Network is down Oct 11 20:35:33 nyc-fw1-inet syslogd: sendto: Network is down
I do see that the LAN side had a Hotplug event as well. Looking at the timestamps the LAN side event happened more or less at the same time as the WAN side.
To me this indicates either- As part of any link-status event, pfSense restarts the internal switch ports
- There was some weird failure on both LAN and WAN side which i honestly don't see happening.
- Other cause not yet known.
][admin@nyc-fw1-inet.moore.lan]/var/log: cat system.log | grep "Hotplug" Oct 11 20:35:31 nyc-fw1-inet php-fpm[20159]: /rc.linkup: Hotplug event detected for WAN(wan) dynamic IP address (4: dhcp) Oct 11 20:35:38 nyc-fw1-inet php-fpm[55571]: /rc.linkup: Hotplug event detected for LAN(lan) dynamic IP address (4: 192.168.70.254, 6: track6) Oct 11 20:35:39 nyc-fw1-inet php-fpm[17116]: /rc.linkup: Hotplug event detected for WAN(wan) dynamic IP address (4: dhcp) Oct 11 20:35:54 nyc-fw1-inet php-fpm[20159]: /rc.linkup: Hotplug event detected for LAN(lan) dynamic IP address (4: 192.168.70.254, 6: track6)