System freezes after 20-30 days on the regular
-
Hmm, the console works when you reboot though I assume?
Is it still routing traffic in that state? Responds to ping?
Do the logs show anything at the point this first starts?
Check the Monitoring graphs in Status > Monitoring. Make sure it's not exhausting the states or RAM.
-
@stephenw10 - The routing essentially grinds to a halt. Accessing the console works for a very short time then becomes unresponsive - even pings time out.
In terms of the logs-: Nov 3 16:45:23 ppp 46676 [wan_link0] LCP: no reply to 1 echo request(s) - is the first sign of issue.
Upon reboot everything goes back to normal. There is 16GB ram in this machine and the GUI shows barely any usage of any resources prior to the stall.
I have considered also the current issue of PHP exhausting itself (15471) that is fixed in the next release as when I have had the LCD display running in the past it happens much quicker.The inability to reacquire an IP address seems concerning and after a lot of requests it falls over. Is it possibly related?
-
Hmm, well if it was that I'd expect to see some sort of error logged like:
PHP Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 8192 bytes) in /usr/local/pkg/lcdproc_client.php on line 856 The actual line varies from run to run.
Also even if php can no longer run there would be some response at the console.
Do you know if it responds to
ctl+t
?The system will often respond to that at the console even when nothing else does and it shows what process the system is waiting for.
-
@stephenw10 Only reason I mention the php issue is I have received notifications at the top of the GUI screen in the past - which seemed new - and stopped when i disabled the lcd.
For the ctl+t I will need to wait until it happens again to see what I can glean. -
Did you check the monitoring graphs/ That can often show show some resource leak.
-
@stephenw10
States hovers around 400 with filter states up to around 1100
MBUF has a max of 1M and hovers around 6000-7000
Memory hovers around 91% free
CPU sits around 300 processes -
Nothing ramping up like a resource leak?
-
@stephenw10 Just checked these again. No change in their numbers. Still all minimal.
-
Hmm, sure seems like a leak of some sort if it's happening that regularly.
What NICs is it using? You might check the sysctl stats for something exhausting there.
-
@stephenw10 Thought I would just go the whole output of system devices. This is a Sophos SG450 Rev1 box. Lots of Intel devices.
igb0 is WAN - ix0 is LANhostb0@pci0:0:0:0: class=0x060000 rev=0x06 hdr=0x00 vendor=0x8086 device=0x0c08 subvendor=0x8086 subdevice=0x0c08
vendor = 'Intel Corporation'
device = 'Xeon E3-1200 v3 Processor DRAM Controller'
class = bridge
subclass = HOST-PCI
pcib1@pci0:0:1:0: class=0x060400 rev=0x06 hdr=0x01 vendor=0x8086 device=0x0c01 subvendor=0x8086 subdevice=0x0c01
vendor = 'Intel Corporation'
device = 'Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller'
class = bridge
subclass = PCI-PCI
pcib2@pci0:0:1:1: class=0x060400 rev=0x06 hdr=0x01 vendor=0x8086 device=0x0c05 subvendor=0x8086 subdevice=0x0c05
vendor = 'Intel Corporation'
device = 'Xeon E3-1200 v3/4th Gen Core Processor PCI Express x8 Controller'
class = bridge
subclass = PCI-PCI
pcib3@pci0:0:1:2: class=0x060400 rev=0x06 hdr=0x01 vendor=0x8086 device=0x0c09 subvendor=0x8086 subdevice=0x0c09
vendor = 'Intel Corporation'
device = 'Xeon E3-1200 v3/4th Gen Core Processor PCI Express x4 Controller'
class = bridge
subclass = PCI-PCI
vgapci0@pci0:0:2:0: class=0x030000 rev=0x06 hdr=0x00 vendor=0x8086 device=0x041a subvendor=0x8086 subdevice=0x041a
vendor = 'Intel Corporation'
device = 'Xeon E3-1200 v3 Processor Integrated Graphics Controller'
class = display
subclass = VGA
xhci0@pci0:0:20:0: class=0x0c0330 rev=0x05 hdr=0x00 vendor=0x8086 device=0x8c31 subvendor=0x8086 subdevice=0x8c31
vendor = 'Intel Corporation'
device = '8 Series/C220 Series Chipset Family USB xHCI'
class = serial bus
subclass = USB
ehci0@pci0:0:26:0: class=0x0c0320 rev=0x05 hdr=0x00 vendor=0x8086 device=0x8c2d subvendor=0x8086 subdevice=0x8c2d
vendor = 'Intel Corporation'
device = '8 Series/C220 Series Chipset Family USB EHCI'
class = serial bus
subclass = USB
pcib4@pci0:0:28:0: class=0x060400 rev=0xd5 hdr=0x01 vendor=0x8086 device=0x8c10 subvendor=0x8086 subdevice=0x8c10
vendor = 'Intel Corporation'
device = '8 Series/C220 Series Chipset Family PCI Express Root Port'
class = bridge
subclass = PCI-PCI
pcib9@pci0:0:28:4: class=0x060400 rev=0xd5 hdr=0x01 vendor=0x8086 device=0x8c18 subvendor=0x8086 subdevice=0x8c18
vendor = 'Intel Corporation'
device = '8 Series/C220 Series Chipset Family PCI Express Root Port'
class = bridge
subclass = PCI-PCI
pcib10@pci0:0:28:6: class=0x060400 rev=0xd5 hdr=0x01 vendor=0x8086 device=0x8c1c subvendor=0x8086 subdevice=0x8c1c
vendor = 'Intel Corporation'
device = '8 Series/C220 Series Chipset Family PCI Express Root Port'
class = bridge
subclass = PCI-PCI
ehci1@pci0:0:29:0: class=0x0c0320 rev=0x05 hdr=0x00 vendor=0x8086 device=0x8c26 subvendor=0x8086 subdevice=0x8c26
vendor = 'Intel Corporation'
device = '8 Series/C220 Series Chipset Family USB EHCI'
class = serial bus
subclass = USB
isab0@pci0:0:31:0: class=0x060100 rev=0x05 hdr=0x00 vendor=0x8086 device=0x8c56 subvendor=0x8086 subdevice=0x8c56
vendor = 'Intel Corporation'
device = 'C226 Series Chipset Family Server Advanced SKU LPC Controller'
class = bridge
subclass = PCI-ISA
ahci0@pci0:0:31:2: class=0x010601 rev=0x05 hdr=0x00 vendor=0x8086 device=0x8c02 subvendor=0x8086 subdevice=0x8c02
vendor = 'Intel Corporation'
device = '8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode]'
class = mass storage
subclass = SATA
ichsmb0@pci0:0:31:3: class=0x0c0500 rev=0x05 hdr=0x00 vendor=0x8086 device=0x8c22 subvendor=0x8086 subdevice=0x8c22
vendor = 'Intel Corporation'
device = '8 Series/C220 Series Chipset Family SMBus Controller'
class = serial bus
subclass = SMBus
ix0@pci0:1:0:0: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x10fb subvendor=0xffff subdevice=0xffff
vendor = 'Intel Corporation'
device = '82599ES 10-Gigabit SFI/SFP+ Network Connection'
class = network
subclass = ethernet
ix1@pci0:1:0:1: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x10fb subvendor=0xffff subdevice=0xffff
vendor = 'Intel Corporation'
device = '82599ES 10-Gigabit SFI/SFP+ Network Connection'
class = network
subclass = ethernet
pcib5@pci0:4:0:0: class=0x060400 rev=0xbb hdr=0x01 vendor=0x10b5 device=0x8616 subvendor=0x10b5 subdevice=0x8616
vendor = 'PLX Technology, Inc.'
device = 'PEX 8616 16-lane, 4-Port PCI Express Gen 2 (5.0 GT/s) Switch'
class = bridge
subclass = PCI-PCI
pcib6@pci0:5:4:0: class=0x060400 rev=0xbb hdr=0x01 vendor=0x10b5 device=0x8616 subvendor=0x10b5 subdevice=0x8616
vendor = 'PLX Technology, Inc.'
device = 'PEX 8616 16-lane, 4-Port PCI Express Gen 2 (5.0 GT/s) Switch'
class = bridge
subclass = PCI-PCI
pcib7@pci0:5:5:0: class=0x060400 rev=0xbb hdr=0x01 vendor=0x10b5 device=0x8616 subvendor=0x10b5 subdevice=0x8616
vendor = 'PLX Technology, Inc.'
device = 'PEX 8616 16-lane, 4-Port PCI Express Gen 2 (5.0 GT/s) Switch'
class = bridge
subclass = PCI-PCI
pcib8@pci0:5:6:0: class=0x060400 rev=0xbb hdr=0x01 vendor=0x10b5 device=0x8616 subvendor=0x10b5 subdevice=0x8616
vendor = 'PLX Technology, Inc.'
device = 'PEX 8616 16-lane, 4-Port PCI Express Gen 2 (5.0 GT/s) Switch'
class = bridge
subclass = PCI-PCI
igb0@pci0:7:0:0: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x15bb subdevice=0x0008
vendor = 'Intel Corporation'
device = 'I350 Gigabit Network Connection'
class = network
subclass = ethernet
igb1@pci0:7:0:1: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x15bb subdevice=0x0008
vendor = 'Intel Corporation'
device = 'I350 Gigabit Network Connection'
class = network
subclass = ethernet
igb2@pci0:7:0:2: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x15bb subdevice=0x0008
vendor = 'Intel Corporation'
device = 'I350 Gigabit Network Connection'
class = network
subclass = ethernet
igb3@pci0:7:0:3: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x15bb subdevice=0x0008
vendor = 'Intel Corporation'
device = 'I350 Gigabit Network Connection'
class = network
subclass = ethernet
igb4@pci0:8:0:0: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x15bb subdevice=0x0008
vendor = 'Intel Corporation'
device = 'I350 Gigabit Network Connection'
class = network
subclass = ethernet
igb5@pci0:8:0:1: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x15bb subdevice=0x0008
vendor = 'Intel Corporation'
device = 'I350 Gigabit Network Connection'
class = network
subclass = ethernet
igb6@pci0:8:0:2: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x15bb subdevice=0x0008
vendor = 'Intel Corporation'
device = 'I350 Gigabit Network Connection'
class = network
subclass = ethernet
igb7@pci0:8:0:3: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x15bb subdevice=0x0008
vendor = 'Intel Corporation'
device = 'I350 Gigabit Network Connection'
class = network
subclass = ethernet
igb8@pci0:9:0:0: class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x1539 subvendor=0x15bb subdevice=0x0000
vendor = 'Intel Corporation'
device = 'I211 Gigabit Network Connection'
class = network
subclass = ethernet -
Hmm, well those should be fine.
Those do provide a lot of stats via the sysctls so there might be something there.
-
@stephenw10 you want the whole "-a" lot? Or is there a qualifier you would use that might be a slightly smaller list? Seems there is potentially sensitive info in that file. Can I send it to you direct?
-
Nope I'd start with the mac stats so like:
sysctl dev.ix.0.mac_stats
There may be something in the iflib values. Though to be honest nothing there should really stop it responding entirely. You might be able to see some value growing where it shouldn't. The reported errors perhaps.
-
@stephenw10 Herewith the stats. Anything stand out?
sysctl dev.igb.0.mac_stats - WAN
dev.igb.0.mac_stats.tso_ctx_fail: 0
dev.igb.0.mac_stats.tso_txd: 0
dev.igb.0.mac_stats.tx_frames_1024_1522: 1213197
dev.igb.0.mac_stats.tx_frames_512_1023: 534722
dev.igb.0.mac_stats.tx_frames_256_511: 1123590
dev.igb.0.mac_stats.tx_frames_128_255: 4832979
dev.igb.0.mac_stats.tx_frames_65_127: 32163642
dev.igb.0.mac_stats.tx_frames_64: 55650
dev.igb.0.mac_stats.mcast_pkts_txd: 3
dev.igb.0.mac_stats.bcast_pkts_txd: 1
dev.igb.0.mac_stats.good_pkts_txd: 39923780
dev.igb.0.mac_stats.total_pkts_txd: 39923780
dev.igb.0.mac_stats.good_octets_txd: 6044072988
dev.igb.0.mac_stats.good_octets_recvd: 125791724095
dev.igb.0.mac_stats.rx_frames_1024_1522: 84038222
dev.igb.0.mac_stats.rx_frames_512_1023: 569131
dev.igb.0.mac_stats.rx_frames_256_511: 711777
dev.igb.0.mac_stats.rx_frames_128_255: 3067212
dev.igb.0.mac_stats.rx_frames_65_127: 6108020
dev.igb.0.mac_stats.rx_frames_64: 35985
dev.igb.0.mac_stats.mcast_pkts_recvd: 0
dev.igb.0.mac_stats.bcast_pkts_recvd: 0
dev.igb.0.mac_stats.good_pkts_recvd: 94530347
dev.igb.0.mac_stats.total_pkts_recvd: 94530347
dev.igb.0.mac_stats.xoff_txd: 0
dev.igb.0.mac_stats.xoff_recvd: 0
dev.igb.0.mac_stats.xon_txd: 0
dev.igb.0.mac_stats.xon_recvd: 0
dev.igb.0.mac_stats.coll_ext_errs: 0
dev.igb.0.mac_stats.alignment_errs: 0
dev.igb.0.mac_stats.crc_errs: 0
dev.igb.0.mac_stats.recv_errs: 0
dev.igb.0.mac_stats.recv_jabber: 0
dev.igb.0.mac_stats.recv_oversize: 0
dev.igb.0.mac_stats.recv_fragmented: 0
dev.igb.0.mac_stats.recv_undersize: 0
dev.igb.0.mac_stats.recv_no_buff: 0
dev.igb.0.mac_stats.missed_packets: 0
dev.igb.0.mac_stats.defer_count: 0
dev.igb.0.mac_stats.sequence_errors: 0
dev.igb.0.mac_stats.symbol_errors: 0
dev.igb.0.mac_stats.collision_count: 0
dev.igb.0.mac_stats.late_coll: 0
dev.igb.0.mac_stats.multiple_coll: 0
dev.igb.0.mac_stats.single_coll: 0
dev.igb.0.mac_stats.excess_coll: 0sysctl dev.ix.0.mac_stats - LAN
dev.ix.0.mac_stats.tx_frames_1024_1522: 84980080
dev.ix.0.mac_stats.tx_frames_512_1023: 1874500
dev.ix.0.mac_stats.tx_frames_256_511: 3155469
dev.ix.0.mac_stats.tx_frames_128_255: 4252724
dev.ix.0.mac_stats.tx_frames_65_127: 7137625
dev.ix.0.mac_stats.tx_frames_64: 3687813
dev.ix.0.mac_stats.management_pkts_txd: 0
dev.ix.0.mac_stats.mcast_pkts_txd: 4503729
dev.ix.0.mac_stats.bcast_pkts_txd: 3311
dev.ix.0.mac_stats.good_pkts_txd: 105088211
dev.ix.0.mac_stats.total_pkts_txd: 105088211
dev.ix.0.mac_stats.good_octets_txd: 128346978728
dev.ix.0.mac_stats.checksum_errs: 3059
dev.ix.0.mac_stats.management_pkts_drpd: 0
dev.ix.0.mac_stats.management_pkts_rcvd: 0
dev.ix.0.mac_stats.recv_jabberd: 0
dev.ix.0.mac_stats.recv_oversized: 0
dev.ix.0.mac_stats.recv_fragmented: 0
dev.ix.0.mac_stats.recv_undersized: 0
dev.ix.0.mac_stats.rx_frames_1024_1522: 1609965
dev.ix.0.mac_stats.rx_frames_512_1023: 1469556
dev.ix.0.mac_stats.rx_frames_256_511: 2149914
dev.ix.0.mac_stats.rx_frames_128_255: 5942215
dev.ix.0.mac_stats.rx_frames_65_127: 33162197
dev.ix.0.mac_stats.rx_frames_64: 3125265
dev.ix.0.mac_stats.bcast_pkts_rcvd: 276058
dev.ix.0.mac_stats.mcast_pkts_rcvd: 2059299
dev.ix.0.mac_stats.good_pkts_rcvd: 47459112
dev.ix.0.mac_stats.total_pkts_rcvd: 47891475
dev.ix.0.mac_stats.good_octets_rcvd: 7710095457
dev.ix.0.mac_stats.total_octets_rcvd: 7750170180
dev.ix.0.mac_stats.xoff_recvd: 0
dev.ix.0.mac_stats.xoff_txd: 0
dev.ix.0.mac_stats.xon_recvd: 0
dev.ix.0.mac_stats.xon_txd: 0
dev.ix.0.mac_stats.rx_missed_packets: 0
dev.ix.0.mac_stats.rec_len_errs: 0
dev.ix.0.mac_stats.remote_faults: 0
dev.ix.0.mac_stats.local_faults: 19
dev.ix.0.mac_stats.short_discards: 0
dev.ix.0.mac_stats.byte_errs: 0
dev.ix.0.mac_stats.ill_errs: 0
dev.ix.0.mac_stats.crc_errs: 0
dev.ix.0.mac_stats.rx_errs: 0 -
Nope everything there looks good.
Though if you were to see something there it would be probably be just before it stops responding. Even then NIC/connection issues shouldn't really stop a system responding at the console.
Migh just have to wait until it happens again and try ctl+t. If it really doesn't respond even to that it starts to look like a hardware issue.
-
@stephenw10 cheers. We shall wait!