System freezes after 20-30 days on the regular
-
Hoping someone can read the below logs and see if there is something that is causing my system to become unresponsive and stop. Cannot access the GUI and the only way past this is a hard reset.
Seems to happen somewhere between 20-30 days of uptime and happens without fail.
Maybe its related to ISP issuing new IP address? Seems odd to try and reconnect that many times and only a hard reset to fix it.
Install 24.03-RELEASE - This is a bare metal install.
The log below shows a normal state right up until I have had to hard reset it.
Nov 3 16:50:15 kernel ---<<BOOT>>---
Nov 3 16:50:15 syslogd kernel boot file is /boot/kernel/kernel
Nov 3 16:48:59 syslogd exiting on signal 15
Nov 3 16:48:58 ppp 46676 [wan_link0] PPPoE: Connecting to 'TPG'
Nov 3 16:48:58 ppp 46676 [wan_link0] Link: reconnection attempt 14
Nov 3 16:48:58 login 88427 login on ttyv0 as root
Nov 3 16:48:57 ppp 46676 [wan_link0] Link: reconnection attempt 14 in 1 seconds
Nov 3 16:48:57 ppp 46676 [wan_link0] LCP: Down event
Nov 3 16:48:57 ppp 46676 [wan_link0] Link: DOWN event
Nov 3 16:48:57 ppp 46676 [wan_link0] PPPoE connection timeout after 9 seconds
Nov 3 16:48:56 login 80060 login on ttyv0 as root
Nov 3 16:48:51 login 59827 login on ttyv0 as root
Nov 3 16:48:49 login 26235 login on ttyv0 as root
Nov 3 16:48:48 ppp 46676 [wan_link0] PPPoE: Connecting to 'TPG'
Nov 3 16:48:48 ppp 46676 [wan_link0] Link: reconnection attempt 13
Nov 3 16:48:47 login 17066 login on ttyv0 as root
Nov 3 16:48:45 ppp 46676 [wan_link0] Link: reconnection attempt 13 in 3 seconds
Nov 3 16:48:45 ppp 46676 [wan_link0] LCP: Down event
Nov 3 16:48:45 ppp 46676 [wan_link0] Link: DOWN event
Nov 3 16:48:45 ppp 46676 [wan_link0] PPPoE connection timeout after 9 seconds
Nov 3 16:48:36 ppp 46676 [wan_link0] PPPoE: Connecting to 'TPG'
Nov 3 16:48:36 ppp 46676 [wan_link0] Link: reconnection attempt 12
Nov 3 16:48:34 ppp 46676 [wan_link0] Link: reconnection attempt 12 in 2 seconds
Nov 3 16:48:34 ppp 46676 [wan_link0] LCP: Down event
Nov 3 16:48:34 ppp 46676 [wan_link0] Link: DOWN event
Nov 3 16:48:34 ppp 46676 [wan_link0] PPPoE connection timeout after 9 seconds
Nov 3 16:48:25 ppp 46676 [wan_link0] PPPoE: Connecting to 'TPG'
Nov 3 16:48:25 ppp 46676 [wan_link0] Link: reconnection attempt 11
Nov 3 16:48:23 ppp 46676 [wan_link0] Link: reconnection attempt 11 in 2 seconds
Nov 3 16:48:23 ppp 46676 [wan_link0] LCP: Down event
Nov 3 16:48:23 ppp 46676 [wan_link0] Link: DOWN event
Nov 3 16:48:23 ppp 46676 [wan_link0] PPPoE connection timeout after 9 seconds
Nov 3 16:48:14 ppp 46676 [wan_link0] PPPoE: Connecting to 'TPG'
Nov 3 16:48:14 ppp 46676 [wan_link0] Link: reconnection attempt 10
Nov 3 16:48:10 ppp 46676 [wan_link0] Link: reconnection attempt 10 in 4 seconds
Nov 3 16:48:10 ppp 46676 [wan_link0] LCP: Down event
Nov 3 16:48:10 ppp 46676 [wan_link0] Link: DOWN event
Nov 3 16:48:10 ppp 46676 [wan_link0] PPPoE connection timeout after 9 seconds
Nov 3 16:48:03 nginx 2024/11/03 16:48:03 [error] 10950#101340: *2655425 upstream timed out (60: Operation timed out) while reading response header from upstream, client: 10.10.10.42, server: , request: "POST /xmlrpc.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "10.10.10.1:10443"
Nov 3 16:48:01 ppp 46676 [wan_link0] PPPoE: Connecting to 'TPG'
Nov 3 16:48:01 ppp 46676 [wan_link0] Link: reconnection attempt 9
Nov 3 16:47:59 ppp 46676 [wan_link0] Link: reconnection attempt 9 in 2 seconds
Nov 3 16:47:59 ppp 46676 [wan_link0] LCP: Down event
Nov 3 16:47:59 ppp 46676 [wan_link0] Link: DOWN event
Nov 3 16:47:59 ppp 46676 [wan_link0] PPPoE connection timeout after 9 seconds
Nov 3 16:47:50 ppp 46676 [wan_link0] PPPoE: Connecting to 'TPG'
Nov 3 16:47:50 ppp 46676 [wan_link0] Link: reconnection attempt 8
Nov 3 16:47:46 ppp 46676 [wan_link0] Link: reconnection attempt 8 in 4 seconds
Nov 3 16:47:46 ppp 46676 [wan_link0] LCP: Down event
Nov 3 16:47:46 ppp 46676 [wan_link0] Link: DOWN event
Nov 3 16:47:46 ppp 46676 [wan_link0] PPPoE connection timeout after 9 seconds
Nov 3 16:47:37 ppp 46676 [wan_link0] PPPoE: Connecting to 'TPG'
Nov 3 16:47:37 ppp 46676 [wan_link0] Link: reconnection attempt 7
Nov 3 16:47:33 ppp 46676 [wan_link0] Link: reconnection attempt 7 in 4 seconds
Nov 3 16:47:33 ppp 46676 [wan_link0] LCP: Down event
Nov 3 16:47:33 ppp 46676 [wan_link0] Link: DOWN event
Nov 3 16:47:33 ppp 46676 [wan_link0] PPPoE connection timeout after 9 seconds
Nov 3 16:47:24 ppp 46676 [wan_link0] PPPoE: Connecting to 'TPG'
Nov 3 16:47:24 ppp 46676 [wan_link0] Link: reconnection attempt 6
Nov 3 16:47:21 ppp 46676 [wan_link0] Link: reconnection attempt 6 in 3 seconds
Nov 3 16:47:21 ppp 46676 [wan_link0] LCP: Down event
Nov 3 16:47:21 ppp 46676 [wan_link0] Link: DOWN event
Nov 3 16:47:21 ppp 46676 [wan_link0] PPPoE connection timeout after 9 seconds
Nov 3 16:47:12 ppp 46676 [wan_link0] PPPoE: Connecting to 'TPG'
Nov 3 16:47:12 ppp 46676 [wan_link0] Link: reconnection attempt 5
Nov 3 16:47:10 ppp 46676 [wan_link0] Link: reconnection attempt 5 in 2 seconds
Nov 3 16:47:10 ppp 46676 [wan_link0] LCP: Down event
Nov 3 16:47:10 ppp 46676 [wan_link0] Link: DOWN event
Nov 3 16:47:10 ppp 46676 [wan_link0] PPPoE connection timeout after 9 seconds
Nov 3 16:47:01 ppp 46676 [wan_link0] PPPoE: Connecting to 'TPG'
Nov 3 16:47:01 ppp 46676 [wan_link0] Link: reconnection attempt 4
Nov 3 16:46:57 ppp 46676 [wan_link0] Link: reconnection attempt 4 in 4 seconds
Nov 3 16:46:57 ppp 46676 [wan_link0] LCP: Down event
Nov 3 16:46:57 ppp 46676 [wan_link0] Link: DOWN event
Nov 3 16:46:57 ppp 46676 [wan_link0] PPPoE connection timeout after 9 seconds
Nov 3 16:46:48 ppp 46676 [wan_link0] PPPoE: Connecting to 'TPG'
Nov 3 16:46:48 ppp 46676 [wan_link0] Link: reconnection attempt 3
Nov 3 16:46:44 ppp 46676 [wan_link0] Link: reconnection attempt 3 in 4 seconds
Nov 3 16:46:44 ppp 46676 [wan_link0] LCP: Down event
Nov 3 16:46:44 ppp 46676 [wan_link0] Link: DOWN event
Nov 3 16:46:44 ppp 46676 [wan_link0] PPPoE connection timeout after 9 seconds
Nov 3 16:46:35 ppp 46676 [wan_link0] PPPoE: Connecting to 'TPG'
Nov 3 16:46:35 ppp 46676 [wan_link0] Link: reconnection attempt 2
Nov 3 16:46:34 ppp 46676 [wan_link0] Link: reconnection attempt 2 in 1 seconds
Nov 3 16:46:34 ppp 46676 [wan_link0] LCP: Down event
Nov 3 16:46:34 ppp 46676 [wan_link0] Link: DOWN event
Nov 3 16:46:34 ppp 46676 [wan_link0] PPPoE connection timeout after 9 seconds
Nov 3 16:46:25 ppp 46676 [wan_link0] PPPoE: Connecting to 'TPG'
Nov 3 16:46:25 ppp 46676 [wan_link0] Link: reconnection attempt 1
Nov 3 16:46:24 ppp 46676 [wan_link0] Link: reconnection attempt 1 in 1 seconds
Nov 3 16:46:24 ppp 46676 [wan_link0] LCP: LayerStart
Nov 3 16:46:24 ppp 46676 [wan_link0] LCP: state change Stopped --> Starting
Nov 3 16:46:24 ppp 46676 [wan_link0] LCP: Down event
Nov 3 16:46:24 ppp 46676 [wan_link0] Link: DOWN event
Nov 3 16:46:24 ppp 46676 [wan_link0] PPPoE: connection closed
Nov 3 16:46:24 ppp 46676 [wan_link0] LCP: LayerFinish
Nov 3 16:46:24 ppp 46676 [wan_link0] LCP: state change Stopping --> Stopped
Nov 3 16:46:22 ppp 46676 [wan_link0] LCP: SendTerminateReq #3
Nov 3 16:46:12 ppp 46676 [wan_link0] LCP: LayerDown
Nov 3 16:46:12 ppp 46676 [wan_link0] LCP: SendTerminateReq #2
Nov 3 16:46:12 ppp 46676 [wan] Bundle: Last link has gone, no links for bw-manage defined
Nov 3 16:46:12 ppp 46676 [wan] IPV6CP: state change Closing --> Initial
Nov 3 16:46:12 ppp 46676 [wan] Bundle: No NCPs left. Closing links...
Nov 3 16:46:12 ppp 46676 [wan] IPV6CP: LayerFinish
Nov 3 16:46:12 ppp 46676 [wan] IPV6CP: Down event
Nov 3 16:46:12 ppp 46676 [wan] IPCP: state change Closing --> Initial
Nov 3 16:46:12 ppp 46676 [wan] IPCP: LayerFinish
Nov 3 16:46:12 ppp 46676 [wan] IPCP: Down event
Nov 3 16:46:12 ppp 46676 [wan] IFACE: Set description "WAN"
Nov 3 16:46:12 ppp 46676 [wan] IFACE: Rename interface pppoe0 to pppoe0
Nov 3 16:46:12 ppp 46676 [wan] IFACE: Down event
Nov 3 16:46:12 check_reload_status 634 Rewriting resolv.conf
Nov 3 16:46:11 php-cgi 9230 rc.kill_states: rc.kill_states: Removing states for interface pppoe0
Nov 3 16:46:11 php-cgi 9230 rc.kill_states: rc.kill_states: Removing states for IP fe80::####:####:####:8adc%pppoe0/32
Nov 3 16:46:11 ppp 46676 [wan] IPV6CP: LayerDown
Nov 3 16:46:11 ppp 46676 [wan] IPV6CP: SendTerminateReq #2
Nov 3 16:46:11 ppp 46676 [wan] IPV6CP: state change Opened --> Closing
Nov 3 16:46:11 ppp 46676 [wan] IPV6CP: Close event
Nov 3 16:46:11 ppp 46676 [wan] IFACE: Removing IPv4 address from pppoe0 failed(IGNORING for now. This should be only for PPPoE friendly!): Can't assign requested address
Nov 3 16:46:11 check_reload_status 634 Rewriting resolv.conf
Nov 3 16:46:10 php-cgi 6155 rc.kill_states: rc.kill_states: Removing states for interface pppoe0
Nov 3 16:46:04 php-cgi 6155 rc.kill_states: rc.kill_states: Removing states for IP 124.###.###.234/32
Nov 3 16:46:03 ppp 46676 [wan] IPCP: LayerDown
Nov 3 16:46:03 ppp 46676 [wan] IPCP: SendTerminateReq #4
Nov 3 16:46:03 ppp 46676 [wan] IPCP: state change Opened --> Closing
Nov 3 16:46:03 ppp 46676 [wan] IPCP: Close event
Nov 3 16:46:03 ppp 46676 [wan] Bundle: Status update: up 0 links, total bandwidth 9600 bps
Nov 3 16:46:03 ppp 46676 [wan_link0] Link: Leave bundle "wan"
Nov 3 16:46:03 ppp 46676 [wan_link0] LCP: state change Opened --> Stopping
Nov 3 16:46:03 ppp 46676 [wan_link0] LCP: peer not responding to echo requests
Nov 3 16:46:03 ppp 46676 [wan_link0] LCP: no reply to 5 echo request(s)
Nov 3 16:45:53 ppp 46676 [wan_link0] LCP: no reply to 4 echo request(s)
Nov 3 16:45:49 php-fpm 85169 /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use WAN_PPPOE.
Nov 3 16:45:48 check_reload_status 634 Reloading filter
Nov 3 16:45:48 check_reload_status 634 Restarting OpenVPN tunnels/interfaces
Nov 3 16:45:48 check_reload_status 634 Restarting IPsec tunnels
Nov 3 16:45:48 check_reload_status 634 updating dyndns WAN_PPPOE
Nov 3 16:45:48 rc.gateway_alarm 20260 >>> Gateway alarm: WAN_PPPOE (Addr:203.##.#.71 Alarm:1 RTT:4.784ms RTTsd:0ms Loss:80%)
Nov 3 16:45:43 ppp 46676 [wan_link0] LCP: no reply to 3 echo request(s)
Nov 3 16:45:33 ppp 46676 [wan_link0] LCP: no reply to 2 echo request(s)
Nov 3 16:45:23 ppp 46676 [wan_link0] LCP: no reply to 1 echo request(s)
Nov 3 16:40:00 sshguard 53455 Now monitoring attacks.
Nov 3 16:40:00 sshguard 72671 Exiting on signal.
Nov 3 16:20:00 sshguard 72671 Now monitoring attacks.
Nov 3 16:20:00 sshguard 85625 Exiting on signal. -
Mmm, it's clearly still logging at that point so must be doing something.
Is it unresponsive at the local console?
-
@stephenw10 Correct. Keyboard plugged directly in to unit does nothing at this point. Is there any other logs that might help with this?
-
Hmm, the console works when you reboot though I assume?
Is it still routing traffic in that state? Responds to ping?
Do the logs show anything at the point this first starts?
Check the Monitoring graphs in Status > Monitoring. Make sure it's not exhausting the states or RAM.
-
@stephenw10 - The routing essentially grinds to a halt. Accessing the console works for a very short time then becomes unresponsive - even pings time out.
In terms of the logs-: Nov 3 16:45:23 ppp 46676 [wan_link0] LCP: no reply to 1 echo request(s) - is the first sign of issue.
Upon reboot everything goes back to normal. There is 16GB ram in this machine and the GUI shows barely any usage of any resources prior to the stall.
I have considered also the current issue of PHP exhausting itself (15471) that is fixed in the next release as when I have had the LCD display running in the past it happens much quicker.The inability to reacquire an IP address seems concerning and after a lot of requests it falls over. Is it possibly related?
-
Hmm, well if it was that I'd expect to see some sort of error logged like:
PHP Fatal error: Allowed memory size of 536870912 bytes exhausted (tried to allocate 8192 bytes) in /usr/local/pkg/lcdproc_client.php on line 856 The actual line varies from run to run.
Also even if php can no longer run there would be some response at the console.
Do you know if it responds to
ctl+t
?The system will often respond to that at the console even when nothing else does and it shows what process the system is waiting for.
-
@stephenw10 Only reason I mention the php issue is I have received notifications at the top of the GUI screen in the past - which seemed new - and stopped when i disabled the lcd.
For the ctl+t I will need to wait until it happens again to see what I can glean. -
Did you check the monitoring graphs/ That can often show show some resource leak.
-
@stephenw10
States hovers around 400 with filter states up to around 1100
MBUF has a max of 1M and hovers around 6000-7000
Memory hovers around 91% free
CPU sits around 300 processes -
Nothing ramping up like a resource leak?
-
@stephenw10 Just checked these again. No change in their numbers. Still all minimal.
-
Hmm, sure seems like a leak of some sort if it's happening that regularly.
What NICs is it using? You might check the sysctl stats for something exhausting there.
-
@stephenw10 Thought I would just go the whole output of system devices. This is a Sophos SG450 Rev1 box. Lots of Intel devices.
igb0 is WAN - ix0 is LANhostb0@pci0:0:0:0: class=0x060000 rev=0x06 hdr=0x00 vendor=0x8086 device=0x0c08 subvendor=0x8086 subdevice=0x0c08
vendor = 'Intel Corporation'
device = 'Xeon E3-1200 v3 Processor DRAM Controller'
class = bridge
subclass = HOST-PCI
pcib1@pci0:0:1:0: class=0x060400 rev=0x06 hdr=0x01 vendor=0x8086 device=0x0c01 subvendor=0x8086 subdevice=0x0c01
vendor = 'Intel Corporation'
device = 'Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller'
class = bridge
subclass = PCI-PCI
pcib2@pci0:0:1:1: class=0x060400 rev=0x06 hdr=0x01 vendor=0x8086 device=0x0c05 subvendor=0x8086 subdevice=0x0c05
vendor = 'Intel Corporation'
device = 'Xeon E3-1200 v3/4th Gen Core Processor PCI Express x8 Controller'
class = bridge
subclass = PCI-PCI
pcib3@pci0:0:1:2: class=0x060400 rev=0x06 hdr=0x01 vendor=0x8086 device=0x0c09 subvendor=0x8086 subdevice=0x0c09
vendor = 'Intel Corporation'
device = 'Xeon E3-1200 v3/4th Gen Core Processor PCI Express x4 Controller'
class = bridge
subclass = PCI-PCI
vgapci0@pci0:0:2:0: class=0x030000 rev=0x06 hdr=0x00 vendor=0x8086 device=0x041a subvendor=0x8086 subdevice=0x041a
vendor = 'Intel Corporation'
device = 'Xeon E3-1200 v3 Processor Integrated Graphics Controller'
class = display
subclass = VGA
xhci0@pci0:0:20:0: class=0x0c0330 rev=0x05 hdr=0x00 vendor=0x8086 device=0x8c31 subvendor=0x8086 subdevice=0x8c31
vendor = 'Intel Corporation'
device = '8 Series/C220 Series Chipset Family USB xHCI'
class = serial bus
subclass = USB
ehci0@pci0:0:26:0: class=0x0c0320 rev=0x05 hdr=0x00 vendor=0x8086 device=0x8c2d subvendor=0x8086 subdevice=0x8c2d
vendor = 'Intel Corporation'
device = '8 Series/C220 Series Chipset Family USB EHCI'
class = serial bus
subclass = USB
pcib4@pci0:0:28:0: class=0x060400 rev=0xd5 hdr=0x01 vendor=0x8086 device=0x8c10 subvendor=0x8086 subdevice=0x8c10
vendor = 'Intel Corporation'
device = '8 Series/C220 Series Chipset Family PCI Express Root Port'
class = bridge
subclass = PCI-PCI
pcib9@pci0:0:28:4: class=0x060400 rev=0xd5 hdr=0x01 vendor=0x8086 device=0x8c18 subvendor=0x8086 subdevice=0x8c18
vendor = 'Intel Corporation'
device = '8 Series/C220 Series Chipset Family PCI Express Root Port'
class = bridge
subclass = PCI-PCI
pcib10@pci0:0:28:6: class=0x060400 rev=0xd5 hdr=0x01 vendor=0x8086 device=0x8c1c subvendor=0x8086 subdevice=0x8c1c
vendor = 'Intel Corporation'
device = '8 Series/C220 Series Chipset Family PCI Express Root Port'
class = bridge
subclass = PCI-PCI
ehci1@pci0:0:29:0: class=0x0c0320 rev=0x05 hdr=0x00 vendor=0x8086 device=0x8c26 subvendor=0x8086 subdevice=0x8c26
vendor = 'Intel Corporation'
device = '8 Series/C220 Series Chipset Family USB EHCI'
class = serial bus
subclass = USB
isab0@pci0:0:31:0: class=0x060100 rev=0x05 hdr=0x00 vendor=0x8086 device=0x8c56 subvendor=0x8086 subdevice=0x8c56
vendor = 'Intel Corporation'
device = 'C226 Series Chipset Family Server Advanced SKU LPC Controller'
class = bridge
subclass = PCI-ISA
ahci0@pci0:0:31:2: class=0x010601 rev=0x05 hdr=0x00 vendor=0x8086 device=0x8c02 subvendor=0x8086 subdevice=0x8c02
vendor = 'Intel Corporation'
device = '8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode]'
class = mass storage
subclass = SATA
ichsmb0@pci0:0:31:3: class=0x0c0500 rev=0x05 hdr=0x00 vendor=0x8086 device=0x8c22 subvendor=0x8086 subdevice=0x8c22
vendor = 'Intel Corporation'
device = '8 Series/C220 Series Chipset Family SMBus Controller'
class = serial bus
subclass = SMBus
ix0@pci0:1:0:0: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x10fb subvendor=0xffff subdevice=0xffff
vendor = 'Intel Corporation'
device = '82599ES 10-Gigabit SFI/SFP+ Network Connection'
class = network
subclass = ethernet
ix1@pci0:1:0:1: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x10fb subvendor=0xffff subdevice=0xffff
vendor = 'Intel Corporation'
device = '82599ES 10-Gigabit SFI/SFP+ Network Connection'
class = network
subclass = ethernet
pcib5@pci0:4:0:0: class=0x060400 rev=0xbb hdr=0x01 vendor=0x10b5 device=0x8616 subvendor=0x10b5 subdevice=0x8616
vendor = 'PLX Technology, Inc.'
device = 'PEX 8616 16-lane, 4-Port PCI Express Gen 2 (5.0 GT/s) Switch'
class = bridge
subclass = PCI-PCI
pcib6@pci0:5:4:0: class=0x060400 rev=0xbb hdr=0x01 vendor=0x10b5 device=0x8616 subvendor=0x10b5 subdevice=0x8616
vendor = 'PLX Technology, Inc.'
device = 'PEX 8616 16-lane, 4-Port PCI Express Gen 2 (5.0 GT/s) Switch'
class = bridge
subclass = PCI-PCI
pcib7@pci0:5:5:0: class=0x060400 rev=0xbb hdr=0x01 vendor=0x10b5 device=0x8616 subvendor=0x10b5 subdevice=0x8616
vendor = 'PLX Technology, Inc.'
device = 'PEX 8616 16-lane, 4-Port PCI Express Gen 2 (5.0 GT/s) Switch'
class = bridge
subclass = PCI-PCI
pcib8@pci0:5:6:0: class=0x060400 rev=0xbb hdr=0x01 vendor=0x10b5 device=0x8616 subvendor=0x10b5 subdevice=0x8616
vendor = 'PLX Technology, Inc.'
device = 'PEX 8616 16-lane, 4-Port PCI Express Gen 2 (5.0 GT/s) Switch'
class = bridge
subclass = PCI-PCI
igb0@pci0:7:0:0: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x15bb subdevice=0x0008
vendor = 'Intel Corporation'
device = 'I350 Gigabit Network Connection'
class = network
subclass = ethernet
igb1@pci0:7:0:1: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x15bb subdevice=0x0008
vendor = 'Intel Corporation'
device = 'I350 Gigabit Network Connection'
class = network
subclass = ethernet
igb2@pci0:7:0:2: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x15bb subdevice=0x0008
vendor = 'Intel Corporation'
device = 'I350 Gigabit Network Connection'
class = network
subclass = ethernet
igb3@pci0:7:0:3: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x15bb subdevice=0x0008
vendor = 'Intel Corporation'
device = 'I350 Gigabit Network Connection'
class = network
subclass = ethernet
igb4@pci0:8:0:0: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x15bb subdevice=0x0008
vendor = 'Intel Corporation'
device = 'I350 Gigabit Network Connection'
class = network
subclass = ethernet
igb5@pci0:8:0:1: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x15bb subdevice=0x0008
vendor = 'Intel Corporation'
device = 'I350 Gigabit Network Connection'
class = network
subclass = ethernet
igb6@pci0:8:0:2: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x15bb subdevice=0x0008
vendor = 'Intel Corporation'
device = 'I350 Gigabit Network Connection'
class = network
subclass = ethernet
igb7@pci0:8:0:3: class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x15bb subdevice=0x0008
vendor = 'Intel Corporation'
device = 'I350 Gigabit Network Connection'
class = network
subclass = ethernet
igb8@pci0:9:0:0: class=0x020000 rev=0x03 hdr=0x00 vendor=0x8086 device=0x1539 subvendor=0x15bb subdevice=0x0000
vendor = 'Intel Corporation'
device = 'I211 Gigabit Network Connection'
class = network
subclass = ethernet -
Hmm, well those should be fine.
Those do provide a lot of stats via the sysctls so there might be something there.
-
@stephenw10 you want the whole "-a" lot? Or is there a qualifier you would use that might be a slightly smaller list? Seems there is potentially sensitive info in that file. Can I send it to you direct?
-
Nope I'd start with the mac stats so like:
sysctl dev.ix.0.mac_stats
There may be something in the iflib values. Though to be honest nothing there should really stop it responding entirely. You might be able to see some value growing where it shouldn't. The reported errors perhaps.
-
@stephenw10 Herewith the stats. Anything stand out?
sysctl dev.igb.0.mac_stats - WAN
dev.igb.0.mac_stats.tso_ctx_fail: 0
dev.igb.0.mac_stats.tso_txd: 0
dev.igb.0.mac_stats.tx_frames_1024_1522: 1213197
dev.igb.0.mac_stats.tx_frames_512_1023: 534722
dev.igb.0.mac_stats.tx_frames_256_511: 1123590
dev.igb.0.mac_stats.tx_frames_128_255: 4832979
dev.igb.0.mac_stats.tx_frames_65_127: 32163642
dev.igb.0.mac_stats.tx_frames_64: 55650
dev.igb.0.mac_stats.mcast_pkts_txd: 3
dev.igb.0.mac_stats.bcast_pkts_txd: 1
dev.igb.0.mac_stats.good_pkts_txd: 39923780
dev.igb.0.mac_stats.total_pkts_txd: 39923780
dev.igb.0.mac_stats.good_octets_txd: 6044072988
dev.igb.0.mac_stats.good_octets_recvd: 125791724095
dev.igb.0.mac_stats.rx_frames_1024_1522: 84038222
dev.igb.0.mac_stats.rx_frames_512_1023: 569131
dev.igb.0.mac_stats.rx_frames_256_511: 711777
dev.igb.0.mac_stats.rx_frames_128_255: 3067212
dev.igb.0.mac_stats.rx_frames_65_127: 6108020
dev.igb.0.mac_stats.rx_frames_64: 35985
dev.igb.0.mac_stats.mcast_pkts_recvd: 0
dev.igb.0.mac_stats.bcast_pkts_recvd: 0
dev.igb.0.mac_stats.good_pkts_recvd: 94530347
dev.igb.0.mac_stats.total_pkts_recvd: 94530347
dev.igb.0.mac_stats.xoff_txd: 0
dev.igb.0.mac_stats.xoff_recvd: 0
dev.igb.0.mac_stats.xon_txd: 0
dev.igb.0.mac_stats.xon_recvd: 0
dev.igb.0.mac_stats.coll_ext_errs: 0
dev.igb.0.mac_stats.alignment_errs: 0
dev.igb.0.mac_stats.crc_errs: 0
dev.igb.0.mac_stats.recv_errs: 0
dev.igb.0.mac_stats.recv_jabber: 0
dev.igb.0.mac_stats.recv_oversize: 0
dev.igb.0.mac_stats.recv_fragmented: 0
dev.igb.0.mac_stats.recv_undersize: 0
dev.igb.0.mac_stats.recv_no_buff: 0
dev.igb.0.mac_stats.missed_packets: 0
dev.igb.0.mac_stats.defer_count: 0
dev.igb.0.mac_stats.sequence_errors: 0
dev.igb.0.mac_stats.symbol_errors: 0
dev.igb.0.mac_stats.collision_count: 0
dev.igb.0.mac_stats.late_coll: 0
dev.igb.0.mac_stats.multiple_coll: 0
dev.igb.0.mac_stats.single_coll: 0
dev.igb.0.mac_stats.excess_coll: 0sysctl dev.ix.0.mac_stats - LAN
dev.ix.0.mac_stats.tx_frames_1024_1522: 84980080
dev.ix.0.mac_stats.tx_frames_512_1023: 1874500
dev.ix.0.mac_stats.tx_frames_256_511: 3155469
dev.ix.0.mac_stats.tx_frames_128_255: 4252724
dev.ix.0.mac_stats.tx_frames_65_127: 7137625
dev.ix.0.mac_stats.tx_frames_64: 3687813
dev.ix.0.mac_stats.management_pkts_txd: 0
dev.ix.0.mac_stats.mcast_pkts_txd: 4503729
dev.ix.0.mac_stats.bcast_pkts_txd: 3311
dev.ix.0.mac_stats.good_pkts_txd: 105088211
dev.ix.0.mac_stats.total_pkts_txd: 105088211
dev.ix.0.mac_stats.good_octets_txd: 128346978728
dev.ix.0.mac_stats.checksum_errs: 3059
dev.ix.0.mac_stats.management_pkts_drpd: 0
dev.ix.0.mac_stats.management_pkts_rcvd: 0
dev.ix.0.mac_stats.recv_jabberd: 0
dev.ix.0.mac_stats.recv_oversized: 0
dev.ix.0.mac_stats.recv_fragmented: 0
dev.ix.0.mac_stats.recv_undersized: 0
dev.ix.0.mac_stats.rx_frames_1024_1522: 1609965
dev.ix.0.mac_stats.rx_frames_512_1023: 1469556
dev.ix.0.mac_stats.rx_frames_256_511: 2149914
dev.ix.0.mac_stats.rx_frames_128_255: 5942215
dev.ix.0.mac_stats.rx_frames_65_127: 33162197
dev.ix.0.mac_stats.rx_frames_64: 3125265
dev.ix.0.mac_stats.bcast_pkts_rcvd: 276058
dev.ix.0.mac_stats.mcast_pkts_rcvd: 2059299
dev.ix.0.mac_stats.good_pkts_rcvd: 47459112
dev.ix.0.mac_stats.total_pkts_rcvd: 47891475
dev.ix.0.mac_stats.good_octets_rcvd: 7710095457
dev.ix.0.mac_stats.total_octets_rcvd: 7750170180
dev.ix.0.mac_stats.xoff_recvd: 0
dev.ix.0.mac_stats.xoff_txd: 0
dev.ix.0.mac_stats.xon_recvd: 0
dev.ix.0.mac_stats.xon_txd: 0
dev.ix.0.mac_stats.rx_missed_packets: 0
dev.ix.0.mac_stats.rec_len_errs: 0
dev.ix.0.mac_stats.remote_faults: 0
dev.ix.0.mac_stats.local_faults: 19
dev.ix.0.mac_stats.short_discards: 0
dev.ix.0.mac_stats.byte_errs: 0
dev.ix.0.mac_stats.ill_errs: 0
dev.ix.0.mac_stats.crc_errs: 0
dev.ix.0.mac_stats.rx_errs: 0 -
Nope everything there looks good.
Though if you were to see something there it would be probably be just before it stops responding. Even then NIC/connection issues shouldn't really stop a system responding at the console.
Migh just have to wait until it happens again and try ctl+t. If it really doesn't respond even to that it starts to look like a hardware issue.
-
@stephenw10 cheers. We shall wait!