Core Dumped - less than 12h after upgrading to 2.4.5-RELEASE-p1


  • Upgraded my pfsense install yesterday, was running 2.4.1, hadn't been rebooted in a LONG time. Less than 12h after the upgrade, the box cratered in a ROYAL style. Services stopped working one-by-one, till I couldn't even SSH into it any more. Could reboot it from the web gui, after trying repeatedly.

    Admin homedir is full of core files, that I'm sure would be really useful, need to figure out how to get them into the right hands.

    -rw-------  1 root  wheel   557056 Aug 17 18:45 bsnmpd.core
    -rw-------  1 root  wheel   512000 Aug 17 20:22 dc.core
    -rw-------  1 root  wheel   905216 Aug 17 20:22 dhcpd.core
    -rw-------  1 root  wheel   520192 Aug 17 20:06 gnid.core
    -rw-------  1 root  wheel   978944 Aug 17 18:45 lldpd.core
    -rw-------  1 root  wheel   643072 Aug 17 20:22 ntpq.core
    -rw-------  1 root  wheel   614400 Aug 17 18:45 openssl.core
    -rw-------  1 root  wheel   622592 Aug 17 18:45 openvpn.core
    -rw-------  1 root  wheel  8028160 Aug 17 20:22 php-cgi.core
    -rw-------  1 root  wheel  8044544 Aug 17 20:22 php.core
    -rw-------  1 root  wheel   790528 Aug 17 20:21 sshd.core
    -rw-------  1 root  wheel   819200 Aug 17 18:45 zabbix_agentd.core
    

    Looking back in the logs, the oldest entries I find are:

    Aug 17 17:18:04	kernel		pid 90504 (php-cgi), jid 0, uid 0: exited on signal 11 (core dumped)
    Aug 17 17:18:04	kernel		pid 86262 (ntpq), jid 0, uid 0: exited on signal 11 (core dumped)
    Aug 17 17:17:01	kernel		pid 12933 (dc), jid 0, uid 0: exited on signal 11 (core dumped)
    Aug 17 17:17:01	kernel		pid 11592 (dc), jid 0, uid 0: exited on signal 11 (core dumped)
    Aug 17 17:17:01	kernel		pid 8050 (dc), jid 0, uid 0: exited on signal 11 (core dumped)
    Aug 17 17:17:01	kernel		pid 7236 (dc), jid 0, uid 0: exited on signal 11 (core dumped)
    Aug 17 17:17:01	kernel		pid 3145 (dc), jid 0, uid 0: exited on signal 11 (core dumped)
    Aug 17 17:17:01	kernel		pid 1704 (dc), jid 0, uid 0: exited on signal 11 (core dumped)
    Aug 17 17:17:01	kernel		pid 98109 (dc), jid 0, uid 0: exited on signal 11 (core dumped)
    Aug 17 17:17:01	kernel		pid 97146 (dc), jid 0, uid 0: exited on signal 11 (core dumped)
    

    I could probably attach the core files to the post, anyone able to take a look at them, figure out what crashed? Doesn't make me feel good about 2.4.5-p1 at the moment.


  • Yes, please attach textdump.tar and info.0 from the Dashboard page

    What is your hardware?
    Please show dmesg


  • @viktor_g

    [2.4.5-RELEASE][admin@pfsense001]/root: dmesg
    Copyright (c) 1992-2020 The FreeBSD Project.
    Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
            The Regents of the University of California. All rights reserved.
    FreeBSD is a registered trademark of The FreeBSD Foundation.
    FreeBSD 11.3-STABLE #243 abf8cba50ce(RELENG_2_4_5): Tue Jun  2 17:53:37 EDT 2020
        root@buildbot1-nyi.netgate.com:/build/ce-crossbuild-245/obj/amd64/YNx4Qq3j/build/ce-crossbuild-245/sources/FreeBSD-src/sys/pfSense amd64
    FreeBSD clang version 8.0.1 (tags/RELEASE_801/final 366581) (based on LLVM 8.0.1)
    VT(vga): resolution 640x480
    CPU: Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz (2400.02-MHz K8-class CPU)
      Origin="GenuineIntel"  Id=0x6fb  Family=0x6  Model=0xf  Stepping=11
      Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
      Features2=0xe3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM>
      AMD Features=0x20100800<SYSCALL,NX,LM>
      AMD Features2=0x1<LAHF>
      VT-x: HLT,PAUSE
      TSC: P-state invariant, performance statistics
    real memory  = 4294967296 (4096 MB)
    avail memory = 4034084864 (3847 MB)
    Event timer "LAPIC" quality 100
    ACPI APIC Table: <GBT    GBTUACPI>
    FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
    FreeBSD/SMP: 1 package(s) x 4 core(s)
    ioapic0: Changing APIC ID to 2
    ioapic0 <Version 2.0> irqs 0-23 on motherboard
    SMP: AP CPU #2 Launched!
    SMP: AP CPU #1 Launched!
    SMP: AP CPU #3 Launched!
    Timecounter "TSC-low" frequency 1200009771 Hz quality 1000
    module_register_init: MOD_LOAD (ipw_bss_fw, 0xffffffff806a2f20, 0) error 1
    wlan: mac acl policy registered
    kbd1 at kbdmux0
    000.000022 [4213] netmap_init               netmap: loaded module
    module_register_init: MOD_LOAD (vesa, 0xffffffff812d9960, 0) error 19
    mlx5en: Mellanox Ethernet driver 3.5.2 (September 2019)
    nexus0
    vtvga0: <VT VGA driver> on motherboard
    cryptosoft0: <software crypto> on motherboard
    padlock0: No ACE support.
    acpi0: <GBT GBTUACPI> on motherboard
    acpi0: Power Button (fixed)
    cpu0: <ACPI CPU> on acpi0
    cpu1: <ACPI CPU> on acpi0
    cpu2: <ACPI CPU> on acpi0
    cpu3: <ACPI CPU> on acpi0
    attimer0: <AT timer> port 0x40-0x43 on acpi0
    Timecounter "i8254" frequency 1193182 Hz quality 0
    Event timer "i8254" frequency 1193182 Hz quality 100
    hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff irq 0,8 on acpi0
    Timecounter "HPET" frequency 14318180 Hz quality 950
    Event timer "HPET" frequency 14318180 Hz quality 450
    Event timer "HPET1" frequency 14318180 Hz quality 440
    Event timer "HPET2" frequency 14318180 Hz quality 440
    atrtc0: <AT realtime clock> port 0x70-0x73 on acpi0
    atrtc0: registered as a time-of-day clock, resolution 1.000000s
    Event timer "RTC" frequency 32768 Hz quality 0
    Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
    acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
    acpi_button0: <Power Button> on acpi0
    pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
    pci0: <ACPI PCI bus> on pcib0
    pcib1: <PCI-PCI bridge> irq 16 at device 1.0 on pci0
    pci1: <PCI bus> on pcib1
    pcib2: <PCI-PCI bridge> at device 0.0 on pci1
    pci2: <PCI bus> on pcib2
    pcib3: <PCI-PCI bridge> at device 2.0 on pci2
    pci3: <PCI bus> on pcib3
    igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xdf00-0xdf1f mem 0xfd9a0000-0xfd9bffff,0xfd400000-0xfd5fffff,0xfd9fc000-0xfd9fffff irq 18 at device 0.0 on pci3
    igb0: Using MSIX interrupts with 5 vectors
    igb0: Bound queue 0 to cpu 0
    igb0: Bound queue 1 to cpu 1
    igb0: Bound queue 2 to cpu 2
    igb0: Bound queue 3 to cpu 3
    igb0: netmap queues/slots: TX 4/1024, RX 4/1024
    igb1: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xde00-0xde1f mem 0xfd9c0000-0xfd9dffff,0xfd600000-0xfd7fffff,0xfd9f8000-0xfd9fbfff irq 19 at device 0.1 on pci3
    igb1: Using MSIX interrupts with 5 vectors
    igb1: Bound queue 0 to cpu 0
    igb1: Bound queue 1 to cpu 1
    igb1: Bound queue 2 to cpu 2
    igb1: Bound queue 3 to cpu 3
    igb1: netmap queues/slots: TX 4/1024, RX 4/1024
    pcib4: <PCI-PCI bridge> at device 4.0 on pci2
    pci4: <PCI bus> on pcib4
    igb2: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xcf00-0xcf1f mem 0xfd3a0000-0xfd3bffff,0xfce00000-0xfcffffff,0xfd3fc000-0xfd3fffff irq 16 at device 0.0 on pci4
    igb2: Using MSIX interrupts with 5 vectors
    igb2: Bound queue 0 to cpu 0
    igb2: Bound queue 1 to cpu 1
    igb2: Bound queue 2 to cpu 2
    igb2: Bound queue 3 to cpu 3
    igb2: netmap queues/slots: TX 4/1024, RX 4/1024
    igb3: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xce00-0xce1f mem 0xfd3c0000-0xfd3dffff,0xfd000000-0xfd1fffff,0xfd3f8000-0xfd3fbfff irq 17 at device 0.1 on pci4
    igb3: Using MSIX interrupts with 5 vectors
    igb3: Bound queue 0 to cpu 0
    igb3: Bound queue 1 to cpu 1
    igb3: Bound queue 2 to cpu 2
    igb3: Bound queue 3 to cpu 3
    igb3: netmap queues/slots: TX 4/1024, RX 4/1024
    vgapci0: <VGA-compatible display> port 0xff00-0xff07 mem 0xfc800000-0xfcbfffff,0xd0000000-0xdfffffff irq 16 at device 2.0 on pci0
    agp0: <Intel G41 SVGA controller> on vgapci0
    agp0: aperture size is 256M, detected 32764k stolen memory
    vgapci0: Boot video device
    pcib5: <ACPI PCI-PCI bridge> irq 16 at device 28.0 on pci0
    pcib5: [GIANT-LOCKED]
    pci5: <ACPI PCI bus> on pcib5
    pcib6: <PCI-PCI bridge> irq 16 at device 0.0 on pci5
    pci6: <PCI bus> on pcib6
    pcib7: <PCI-PCI bridge> irq 19 at device 3.0 on pci6
    pci7: <PCI bus> on pcib7
    re0: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xbe00-0xbeff mem 0xfdeff000-0xfdefffff,0xfdcfc000-0xfdcfffff irq 19 at device 0.0 on pci7
    re0: Using 1 MSI-X message
    re0: Chip rev. 0x2c800000
    re0: MAC rev. 0x00100000
    miibus0: <MII bus> on re0
    rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0
    rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
    re0: Using defaults for TSO: 65518/35/2048
    re0: netmap queues/slots: TX 1/256, RX 1/256
    pcib8: <PCI-PCI bridge> irq 19 at device 7.0 on pci6
    pci8: <PCI bus> on pcib8
    re1: <RealTek 8168/8111 B/C/CP/D/DP/E/F/G PCIe Gigabit Ethernet> port 0xae00-0xaeff mem 0xfddff000-0xfddfffff,0xfdbfc000-0xfdbfffff irq 19 at device 0.0 on pci8
    re1: Using 1 MSI-X message
    re1: Chip rev. 0x2c800000
    re1: MAC rev. 0x00100000
    miibus1: <MII bus> on re1
    rgephy1: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus1
    rgephy1:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master, auto, auto-flow
    re1: Using defaults for TSO: 65518/35/2048
    re1: netmap queues/slots: TX 1/256, RX 1/256
    pcib9: <ACPI PCI-PCI bridge> irq 17 at device 28.1 on pci0
    pcib9: [GIANT-LOCKED]
    pci9: <ACPI PCI bus> on pcib9
    alc0: <Atheros AR8151 v1.0 PCIe Gigabit Ethernet> port 0x9f00-0x9f7f mem 0xfdac0000-0xfdafffff irq 17 at device 0.0 on pci9
    alc0: 11776 Tx FIFO, 12032 Rx FIFO
    alc0: Using 1 MSI message(s).
    miibus2: <MII bus> on alc0
    atphy0: <Atheros F1 10/100/1000 PHY> PHY 0 on miibus2
    atphy0:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
    alc0: Using defaults for TSO: 65518/35/2048
    uhci0: <Intel 82801G (ICH7) USB controller USB-A> port 0xfe00-0xfe1f irq 23 at device 29.0 on pci0
    usbus0 on uhci0
    usbus0: 12Mbps Full Speed USB v1.0
    uhci1: <Intel 82801G (ICH7) USB controller USB-B> port 0xfd00-0xfd1f irq 19 at device 29.1 on pci0
    usbus1 on uhci1
    usbus1: 12Mbps Full Speed USB v1.0
    uhci2: <Intel 82801G (ICH7) USB controller USB-C> port 0xfc00-0xfc1f irq 18 at device 29.2 on pci0
    usbus2 on uhci2
    usbus2: 12Mbps Full Speed USB v1.0
    uhci3: <Intel 82801G (ICH7) USB controller USB-D> port 0xfb00-0xfb1f irq 16 at device 29.3 on pci0
    usbus3 on uhci3
    usbus3: 12Mbps Full Speed USB v1.0
    ehci0: <Intel 82801GB/R (ICH7) USB 2.0 controller> mem 0xfdfff000-0xfdfff3ff irq 23 at device 29.7 on pci0
    usbus4: EHCI version 1.0
    usbus4 on ehci0
    usbus4: 480Mbps High Speed USB v2.0
    pcib10: <ACPI PCI-PCI bridge> at device 30.0 on pci0
    pci10: <ACPI PCI bus> on pcib10
    isab0: <PCI-ISA bridge> at device 31.0 on pci0
    isa0: <ISA bus> on isab0
    atapci0: <Intel ICH7 SATA300 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf800-0xf80f at device 31.2 on pci0
    ata0: <ATA channel> at channel 0 on atapci0
    ata1: <ATA channel> at channel 1 on atapci0
    uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
    atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
    atkbd0: <AT Keyboard> irq 1 on atkbdc0
    kbd0 at atkbd0
    atkbd0: [GIANT-LOCKED]
    psm0: <PS/2 Mouse> irq 12 on atkbdc0
    psm0: [GIANT-LOCKED]
    psm0: model IntelliMouse Explorer, device ID 4
    orm0: <ISA Option ROM> at iomem 0xc0000-0xcc7ff on isa0
    ppc0: cannot reserve I/O port range
    acpi_perf0: <ACPI CPU Frequency Control> on cpu0
    est1: <Enhanced SpeedStep Frequency Control> on cpu1
    est: CPU supports Enhanced Speedstep, but is not recognized.
    est: cpu_vendor GenuineIntel, msr 921092106000921
    device_attach: est1 attach returned 6
    est2: <Enhanced SpeedStep Frequency Control> on cpu2
    est: CPU supports Enhanced Speedstep, but is not recognized.
    est: cpu_vendor GenuineIntel, msr 921092106000921
    device_attach: est2 attach returned 6
    est3: <Enhanced SpeedStep Frequency Control> on cpu3
    est: CPU supports Enhanced Speedstep, but is not recognized.
    est: cpu_vendor GenuineIntel, msr 921092106000921
    device_attach: est3 attach returned 6
    Timecounters tick every 1.000 msec
    ugen4.1: <Intel EHCI root HUB> at usbus4
    ugen2.1: <Intel UHCI root HUB> at usbus2
    ugen3.1: <Intel UHCI root HUB> at usbus3
    ugen1.1: <Intel UHCI root HUB> at usbus1
    uhub0: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus4
    uhub1: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus3
    uhub2: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
    ugen0.1: <Intel UHCI root HUB> at usbus0
    uhub3: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1
    uhub4: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
    uhub1: 2 ports with 2 removable, self powered
    uhub2: 2 ports with 2 removable, self powered
    uhub4: 2 ports with 2 removable, self powered
    uhub3: 2 ports with 2 removable, self powered
    uhub0: 8 ports with 8 removable, self powered
    ada0 at ata0 bus 0 scbus0 target 0 lun 0
    ada0: <INTEL SSDSA2CW120G3 4PC10362> ATA8-ACS SATA 2.x device
    ada0: Serial Number BTPR141500PH120LGN
    ada0: 150.000MB/s transfers (SATA, UDMA5, PIO 8192bytes)
    ada0: 114473MB (234441648 512 byte sectors)
    ada0: quirks=0x1<4K>
    Trying to mount root from ufs:/dev/ufsid/5dfcea10cfecd0c1 [rw]...
    random: unblocking device.
    CPU: Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz (2400.02-MHz K8-class CPU)
      Origin="GenuineIntel"  Id=0x6fb  Family=0x6  Model=0xf  Stepping=11
      Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
      Features2=0xe3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM>
      AMD Features=0x20100800<SYSCALL,NX,LM>
      AMD Features2=0x1<LAHF>
      VT-x: HLT,PAUSE
      TSC: P-state invariant, performance statistics
    coretemp0: <CPU On-Die Thermal Sensors> on cpu0
    coretemp1: <CPU On-Die Thermal Sensors> on cpu1
    est1: <Enhanced SpeedStep Frequency Control> on cpu1
    est: CPU supports Enhanced Speedstep, but is not recognized.
    est: cpu_vendor GenuineIntel, msr 921092106000921
    device_attach: est1 attach returned 6
    coretemp2: <CPU On-Die Thermal Sensors> on cpu2
    est2: <Enhanced SpeedStep Frequency Control> on cpu2
    est: CPU supports Enhanced Speedstep, but is not recognized.
    est: cpu_vendor GenuineIntel, msr 921092106000921
    device_attach: est2 attach returned 6
    coretemp3: <CPU On-Die Thermal Sensors> on cpu3
    est3: <Enhanced SpeedStep Frequency Control> on cpu3
    est: CPU supports Enhanced Speedstep, but is not recognized.
    est: cpu_vendor GenuineIntel, msr 921092106000921
    device_attach: est3 attach returned 6
    

  • @viktor_g said in Core Dumped - less than 12h after upgrading to 2.4.5-RELEASE-p1:

    Yes, please attach textdump.tar and info.0 from the Dashboard page

    What is your hardware?
    Please show dmesg

    Sorry, I don't see where to find the textdump.tar or info.0?

    cores.zip <= zip file of the *.core listed above.


  • Hi,

    First of all, don't worry.
    You'll be needing a (direct) console access when these things happen - if even SSH goes down ....
    A process or program can contain a bug that can pop on in situations that exist on your system. But a bug in a process that died a couple of minutes ago can't impact another process. They only share the processor, the kernel and the hardware (memory that is).
    Most of the pfSense users (95 % or plus ?) use 2.4.5-p1 these days, as we do not want to deal with possible security bugs (I rather have my system down as hacked). 2Your .4.1 was dangerously old.
    Close to none on this forum are complaining about all ( ? ) processes dying.

    Your processes dying have one thing in common : signal 11 => https://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/troubleshoot.html
    Also : check memory usage.

    Btw : I've been using p1 since day 1 : did not saw any core dumps. Neither with previous versions.

    The fastest way to convince yourself pfSense is fine : swap hardware, or fire up a VM.


  • yeah its pointing to an i/o error, failing disk or memory or hardware been ran out of spec (XMP ram, overclock etc.).


  • @chrcoluk said in Core Dumped - less than 12h after upgrading to 2.4.5-RELEASE-p1:

    yeah its pointing to an i/o error, failing disk or memory or hardware been ran out of spec (XMP ram, overclock etc.).

    Thanks. Nothing has changed in the hardware profile in years, it's actually bone stock. Might be on to something with the disk however, I'll force a filesystem check tonight when I can take it down without impacting people.

    https://docs.netgate.com/pfsense/en/latest/hardware/forcing-a-filesystem-check.html


  • Note : even excellent hardware can die on you.