Intel Interface Issues



  • Hardware

    I got pfSense 2.4.4 installed and running on an HP EliteDesk 705 G1 SFF Desktop A4 PRO-7300B @3.8GHz 6GB RAM 500GB HDD. I installed two Intel PRO 1000 PT Quad Port 1Gb PCIe Ethernet Network Adapters. I am using my old MikroTik RB951G as a wireless AP.

    Network

    bge0 is empty
    em0 is to my broadband ISP cable modem.
    em1 goes to the new 5 port switch, which has my PC, my printer and my son's PC.
    em2 is empty
    em3 is MiktoTik

    It's been 5 days now and can't seem to keep the router up for more than 8-10 hours at a time. The system stops responding to the network but remains up. It doesn't seem to matter what the load is, as it's dropped interfaces while copying gigabyte sized files just as much as while I'm away or sleeping with no activity at all. It just crashed again as I am writing, this time with a crash report, but it's too large to paste here. Please help as I am losing what hair I have left!

    Steps so far

    First, I setup the two quad NIC's as a bridge for my local LAN and after the first couple reboots I dropped down to one quad NIC. I saw some people have had trouble with using pfSense bridging, so I bought a 5 port switch and went back to routed interfaces. I disabled ACPI in the bios, I made sure kern.ipc.nmbclusters="1000000" was in /boot/loader.conf as I saw that was suggested for quad NIC's. I think maybe the next step is to mess with MSI settings? I'll include my dmesg and system log entries from the last two crashes.

    Dmesg (trimmed to save space)

    FreeBSD 11.2-RELEASE-p3 #17 e6b497fa0a3(RELENG_2_4_4): Thu Sep 20 09:04:45 EDT 2018
    root@buildbot3:/crossbuild/ce-244/obj/amd64/WvDslnYb/crossbuild/ce-244/pfSense/tmp/FreeBSD-src/sys/pfSense amd64
    FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based on LLVM 6.0.0)
    CPU: AMD A4 PRO-7300B APU with Radeon HD Graphics (3792.99-MHz K8-class CPU)
    Origin="AuthenticAMD" Id=0x610f31 Family=0x15 Model=0x13 Stepping=1
    Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
    Features2=0x3e98320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C>
    AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
    AMD Features2=0x1ebbffb<LAHF,CMP,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,XOP,SKINIT,WDT,LWP,FMA4,TCE,NodeId,TBM,Topology,PCXC,PNXC>
    Structured Extended Features=0x8<BMI1>
    TSC: P-state invariant, performance statistics
    real memory = 6442450944 (6144 MB)
    avail memory = 5962985472 (5686 MB)
    Event timer "LAPIC" quality 600
    ACPI APIC Table: <HPQOEM SLIC-BPC>
    FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
    FreeBSD/SMP: 1 package(s) x 2 core(s)
    Firmware Warning (ACPI): Optional FADT field Pm2ControlBlock has valid Length but zero Address: 0x0000000000000000/0x1 (20171214/tbfadt-796)
    ioapic0 <Version 2.1> irqs 0-23 on motherboard
    SMP: AP CPU #1 Launched!
    Timecounter "TSC-low" frequency 1896496292 Hz quality 1000
    ipw_bss: You need to read the LICENSE file in /usr/share/doc/legal/intel_ipw.LICENSE.
    ipw_bss: If you agree with the license, set legal.intel_ipw.license_ack=1 in /boot/loader.conf.
    module_register_init: MOD_LOAD (ipw_bss_fw, 0xffffffff80680430, 0) error 1
    random: entropy device external interface
    ipw_ibss: You need to read the LICENSE file in /usr/share/doc/legal/intel_ipw.LICENSE.
    ipw_ibss: If you agree with the license, set legal.intel_ipw.license_ack=1 in /boot/loader.conf.
    module_register_init: MOD_LOAD (ipw_ibss_fw, 0xffffffff806804e0, 0) error 1
    ipw_monitor: You need to read the LICENSE file in /usr/share/doc/legal/intel_ipw.LICENSE.
    ipw_monitor: If you agree with the license, set legal.intel_ipw.license_ack=1 in /boot/loader.conf.
    module_register_init: MOD_LOAD (ipw_monitor_fw, 0xffffffff80680590, 0) error 1
    iwi_bss: You need to read the LICENSE file in /usr/share/doc/legal/intel_iwi.LICENSE.
    iwi_bss: If you agree with the license, set legal.intel_iwi.license_ack=1 in /boot/loader.conf.
    module_register_init: MOD_LOAD (iwi_bss_fw, 0xffffffff806a7460, 0) error 1
    iwi_ibss: You need to read the LICENSE file in /usr/share/doc/legal/intel_iwi.LICENSE.
    iwi_ibss: If you agree with the license, set legal.intel_iwi.license_ack=1 in /boot/loader.conf.
    module_register_init: MOD_LOAD (iwi_ibss_fw, 0xffffffff806a7510, 0) error 1
    wlan: mac acl policy registered
    hn: tranparent VF mode, if_transmit will be used, instead of if_start
    kbd1 at kbdmux0
    netmap: loaded module
    module_register_init: MOD_LOAD (vesa, 0xffffffff81209800, 0) error 19
    nexus0
    cryptosoft0: <software crypto> on motherboard
    padlock0: No ACE support.
    acpi0: <HPQOEM SLIC-BPC> on motherboard
    acpi0: Power Button (fixed)
    cpu0: <ACPI CPU> on acpi0
    cpu1: <ACPI CPU> on acpi0
    attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
    Timecounter "i8254" frequency 1193182 Hz quality 0
    Event timer "i8254" frequency 1193182 Hz quality 100
    atrtc0: <AT realtime clock> port 0x70-0x71 on acpi0
    atrtc0: registered as a time-of-day clock, resolution 1.000000s
    Event timer "RTC" frequency 32768 Hz quality 0
    hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff irq 0,8 on acpi0
    Timecounter "HPET" frequency 14318180 Hz quality 950
    Event timer "HPET" frequency 14318180 Hz quality 550
    Event timer "HPET1" frequency 14318180 Hz quality 450
    Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
    acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
    pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
    pci0: <ACPI PCI bus> on pcib0
    pci0: <base peripheral, IOMMU> at device 0.2 (no driver attached)
    vgapci0: <VGA-compatible display> port 0xf000-0xf0ff mem 0xe0000000-0xe7ffffff,0xff700000-0xff73ffff irq 17 at device 1.0 on pci0
    vgapci0: Boot video device
    hdac0: <ATI (0x9902) HDA Controller> mem 0xff744000-0xff747fff irq 18 at device 1.1 on pci0
    hdac0: hdac_get_capabilities: Invalid corb size (0)
    device_attach: hdac0 attach returned 6
    pcib1: <ACPI PCI-PCI bridge> irq 18 at device 2.0 on pci0
    pci1: <ACPI PCI bus> on pcib1
    pcib2: <ACPI PCI-PCI bridge> at device 0.0 on pci1
    pci2: <ACPI PCI bus> on pcib2
    pcib3: <PCI-PCI bridge> at device 0.0 on pci2
    pci3: <PCI bus> on pcib3
    em0: <Intel(R) PRO/1000 Network Connection 7.6.1-k> at device 0.0 on pci3
    em0: Using an MSI interrupt
    em0: Ethernet address: 00:15:17:a0:18:e4
    em0: netmap queues/slots: TX 1/1024, RX 1/1024
    em1: <Intel(R) PRO/1000 Network Connection 7.6.1-k> at device 0.1 on pci3
    em1: Using an MSI interrupt
    em1: Ethernet address: 00:15:17:a0:18:e5
    em1: netmap queues/slots: TX 1/1024, RX 1/1024
    pcib4: <PCI-PCI bridge> at device 1.0 on pci2
    pci4: <PCI bus> on pcib4
    em2: <Intel(R) PRO/1000 Network Connection 7.6.1-k> irq 19 at device 0.0 on pci4
    em2: Using an MSI interrupt
    em2: Ethernet address: 00:15:17:a0:18:e6
    em2: netmap queues/slots: TX 1/1024, RX 1/1024
    em3: <Intel(R) PRO/1000 Network Connection 7.6.1-k> irq 16 at device 0.1 on pci4
    em3: Using an MSI interrupt
    em3: Ethernet address: 00:15:17:a0:18:e7
    em3: netmap queues/slots: TX 1/1024, RX 1/1024
    xhci0: <AMD FCH USB 3.0 controller> mem 0xff74a000-0xff74bfff irq 18 at device 16.0 on pci0
    xhci0: 32 bytes context size, 64-bit DMA
    xhci0: Unable to map MSI-X table
    usbus0 on xhci0
    usbus0: 5.0Gbps Super Speed USB v3.0
    xhci1: <AMD FCH USB 3.0 controller> mem 0xff748000-0xff749fff irq 17 at device 16.1 on pci0
    xhci1: 32 bytes context size, 64-bit DMA
    xhci1: Unable to map MSI-X table
    usbus1 on xhci1
    usbus1: 5.0Gbps Super Speed USB v3.0
    ahci0: <AMD Hudson-2 AHCI SATA controller> port 0xf140-0xf147,0xf130-0xf133,0xf120-0xf127,0xf110-0xf113,0xf100-0xf10f mem 0xff751000-0xff7517ff irq 19 at device 17.0 on pci0

    System Log

    Oct 13 01:29:02 kernel Event timer "HPET1" frequency 14318180 Hz quality 450
    Oct 13 01:29:02 kernel Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
    Oct 13 01:29:02 kernel acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
    Oct 13 01:29:02 kernel pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
    Oct 13 01:29:02 kernel pci0: <ACPI PCI bus> on pcib0
    Oct 13 01:29:02 kernel pci0: <base peripheral, IOMMU> at device 0.2 (no driver attached)
    Oct 13 01:29:02 kernel vgapci0: <VGA-compatible display> port 0xf000-0xf0ff mem 0xe0000000-0xe7ffffff,0xff700000-0xff73ffff irq 17 at device 1.0 on pci0
    Oct 13 01:29:02 kernel vgapci0: Boot video device
    Oct 13 01:29:02 kernel hdac0: <ATI (0x9902) HDA Controller> mem 0xff744000-0xff747fff irq 18 at device 1.1 on pci0
    Oct 13 01:29:02 kernel hdac0: hdac_get_capabilities: Invalid corb size (0)
    Oct 13 01:29:02 kernel device_attach: hdac0 attach returned 6
    Oct 13 01:29:02 kernel pcib1: <ACPI PCI-PCI bridge> irq 18 at device 2.0 on pci0
    Oct 13 01:29:02 kernel pci1: <ACPI PCI bus> on pcib1
    Oct 13 01:29:02 kernel pcib2: <ACPI PCI-PCI bridge> at device 0.0 on pci1
    Oct 13 01:29:02 kernel pci2: <ACPI PCI bus> on pcib2
    Oct 13 01:29:02 kernel pcib3: <PCI-PCI bridge> at device 0.0 on pci2
    Oct 13 01:29:02 kernel pci3: <PCI bus> on pcib3
    Oct 13 01:29:02 kernel em0: <Intel(R) PRO/1000 Network Connection 7.6.1-k> at device 0.0 on pci3
    Oct 13 01:29:02 kernel em0: Using an MSI interrupt
    Oct 13 01:29:02 kernel em0: Ethernet address: 00:15:17:a0:18:e4
    Oct 13 01:29:02 kernel em0: netmap queues/slots: TX 1/1024, RX 1/1024
    Oct 13 01:29:02 kernel em1: <Intel(R) PRO/1000 Network Connection 7.6.1-k> at device 0.1 on pci3
    Oct 13 01:29:02 kernel em1: Using an MSI interrupt
    Oct 13 01:29:02 kernel em1: Ethernet address: 00:15:17:a0:18:e5
    Oct 13 01:29:02 kernel em1: netmap queues/slots: TX 1/1024, RX 1/1024
    Oct 13 01:29:02 kernel pcib4: <PCI-PCI bridge> at device 1.0 on pci2
    Oct 13 01:29:02 kernel pci4: <PCI bus> on pcib4
    Oct 13 01:29:02 kernel em2: <Intel(R) PRO/1000 Network Connection 7.6.1-k> irq 19 at device 0.0 on pci4
    Oct 13 01:29:02 kernel em2: Using an MSI interrupt
    Oct 13 01:29:02 kernel em2: Ethernet address: 00:15:17:a0:18:e6
    Oct 13 01:29:02 kernel em2: netmap queues/slots: TX 1/1024, RX 1/1024
    Oct 13 01:29:02 kernel em3: <Intel(R) PRO/1000 Network Connection 7.6.1-k> irq 16 at device 0.1 on pci4
    Oct 13 01:29:02 kernel em3: Using an MSI interrupt
    Oct 13 01:29:02 kernel em3: Ethernet address: 00:15:17:a0:18:e7
    Oct 13 01:29:02 kernel em3: netmap queues/slots: TX 1/1024, RX 1/1024
    Oct 13 01:29:02 kernel xhci0: <AMD FCH USB 3.0 controller> mem 0xff74a000-0xff74bfff irq 18 at device 16.0 on pci0
    Oct 13 01:29:02 kernel xhci0: 32 bytes context size, 64-bit DMA
    Oct 13 01:29:02 kernel xhci0: Unable to map MSI-X table
    Oct 13 01:29:02 kernel usbus0 on xhci0
    Oct 13 01:29:02 kernel usbus0: 5.0Gbps Super Speed USB v3.0
    Oct 13 01:29:02 kernel xhci1: <AMD FCH USB 3.0 controller> mem 0xff748000-0xff749fff irq 17 at device 16.1 on pci0
    Oct 13 01:29:02 kernel xhci1: 32 bytes context size, 64-bit DMA
    Oct 13 01:29:02 kernel xhci1: Unable to map MSI-X table
    Oct 13 01:29:02 kernel usbus1 on xhci1
    Oct 13 01:29:02 kernel usbus1: 5.0Gbps Super Speed USB v3.0
    Oct 13 01:29:02 kernel ahci0: <AMD Hudson-2 AHCI SATA controller> port 0xf140-0xf147,0xf130-0xf133,0xf120-0xf127,0xf110-0xf113,0xf100-0xf10f mem 0xff751000-0xff7517ff irq 19 at device 17.0 on pci0
    Oct 13 01:29:02 kernel ahci0: AHCI v1.30 with 8 6Gbps ports, Port Multiplier supported
    Oct 13 01:29:02 kernel ahcich0: <AHCI channel> at channel 0 on ahci0
    Oct 13 01:29:02 kernel ahcich1: <AHCI channel> at channel 1 on ahci0
    Oct 13 01:29:02 kernel ahcich2: <AHCI channel> at channel 2 on ahci0
    Oct 13 01:29:02 kernel ahcich3: <AHCI channel> at channel 3 on ahci0
    Oct 13 01:29:02 kernel ahcich4: <AHCI channel> at channel 4 on ahci0
    Oct 13 01:29:02 kernel ahcich5: <AHCI channel> at channel 5 on ahci0
    Oct 13 01:29:02 kernel ahcich6: <AHCI channel> at channel 6 on ahci0
    Oct 13 01:29:02 kernel ahcich7: <AHCI channel> at channel 7 on ahci0
    Oct 13 01:29:02 kernel ohci0: <AMD FCH USB Controller> mem 0xff750000-0xff750fff irq 18 at device 18.0 on pci0
    Oct 13 01:29:02 kernel usbus2 on ohci0
    Oct 13 01:29:02 kernel usbus2: 12Mbps Full Speed USB v1.0
    Oct 13 01:29:02 kernel ehci0: <AMD FCH USB 2.0 controller> mem 0xff74f000-0xff74f0ff irq 17 at device 18.2 on pci0
    Oct 13 01:29:02 kernel usbus3: EHCI version 1.0
    Oct 13 01:29:02 kernel usbus3 on ehci0
    Oct 13 01:29:02 kernel usbus3: 480Mbps High Speed USB v2.0
    Oct 13 01:29:02 kernel ohci1: <AMD FCH USB Controller> mem 0xff74e000-0xff74efff irq 18 at device 19.0 on pci0
    Oct 13 01:29:02 kernel usbus4 on ohci1
    Oct 13 01:29:02 kernel usbus4: 12Mbps Full Speed USB v1.0
    Oct 13 01:29:02 kernel ehci1: <AMD FCH USB 2.0 controller> mem 0xff74d000-0xff74d0ff irq 17 at device 19.2 on pci0
    Oct 13 01:29:02 kernel usbus5: EHCI version 1.0
    Oct 13 01:29:02 kernel usbus5 on ehci1
    Oct 13 01:29:02 kernel usbus5: 480Mbps High Speed USB v2.0
    Oct 13 01:29:02 kernel hdac0: <AMD Hudson-2 HDA Controller> mem 0xff740000-0xff743fff irq 16 at device 20.2 on pci0
    Oct 13 01:29:02 kernel isab0: <PCI-ISA bridge> at device 20.3 on pci0
    Oct 13 01:29:02 kernel isa0: <ISA bus> on isab0
    Oct 13 01:29:02 kernel pcib5: <ACPI PCI-PCI bridge> at device 20.4 on pci0
    Oct 13 01:29:02 kernel pci5: <ACPI PCI bus> on pcib5
    Oct 13 01:29:02 kernel ohci2: <OHCI (generic) USB controller> mem 0xff74c000-0xff74cfff irq 18 at device 20.5 on pci0
    Oct 13 01:29:02 kernel usbus6 on ohci2
    Oct 13 01:29:02 kernel usbus6: 12Mbps Full Speed USB v1.0
    Oct 13 01:29:02 kernel pcib6: <ACPI PCI-PCI bridge> at device 21.0 on pci0
    Oct 13 01:29:02 kernel pci6: <ACPI PCI bus> on pcib6
    Oct 13 01:29:02 kernel pcib7: <ACPI PCI-PCI bridge> at device 21.2 on pci0
    Oct 13 01:29:02 kernel pci7: <ACPI PCI bus> on pcib7
    Oct 13 01:29:02 kernel bge0: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x5762100> mem 0xe8020000-0xe802ffff,0xe8010000-0xe801ffff,0xe8000000-0xe800ffff irq 18 at device 0.0 on pci7
    Oct 13 01:29:02 kernel bge0: CHIP ID 0x05762100; ASIC REV 0x5762; CHIP REV 0x57621; PCI-E
    Oct 13 01:29:02 kernel miibus0: <MII bus> on bge0
    Oct 13 01:29:02 kernel brgphy0: <BCM5725C 1000BASE-T media interface> PHY 1 on miibus0
    Oct 13 01:29:02 kernel brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
    Oct 13 01:29:02 kernel bge0: Using defaults for TSO: 65518/35/2048
    Oct 13 01:29:02 kernel bge0: Ethernet address: 64:51:06:5f:05:c1
    Oct 13 01:29:02 kernel acpi_button0: <Power Button> on acpi0
    Oct 13 01:29:02 kernel atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
    Oct 13 01:29:02 kernel atkbd0: <AT Keyboard> irq 1 on atkbdc0
    Oct 13 01:29:02 kernel kbd0 at atkbd0
    Oct 13 01:29:02 kernel atkbd0: [GIANT-LOCKED]
    Oct 13 01:29:02 kernel driver bug: Unable to set devclass (class: atkbdc devname: (unknown))
    Oct 13 01:29:02 kernel uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
    Oct 13 01:29:02 kernel ppc0: cannot reserve I/O port range
    Oct 13 01:29:02 kernel hwpstate0: <Cool`n'Quiet 2.0> on cpu0
    Oct 13 01:29:02 kernel ZFS filesystem version: 5
    Oct 13 01:29:02 kernel ZFS storage pool version: features support (5000)
    Oct 13 01:29:02 kernel Timecounters tick every 1.000 msec
    Oct 13 01:29:02 kernel hdacc0: <Realtek ALC221 HDA CODEC> at cad 0 on hdac0
    Oct 13 01:29:02 kernel hdaa0: <Realtek ALC221 Audio Function Group> at nid 1 on hdacc0
    Oct 13 01:29:02 kernel pcm0: <Realtek ALC221 (Analog)> at nid 23 and 26,27 on hdaa0
    Oct 13 01:29:02 kernel pcm1: <Realtek ALC221 (Analog 2.0+HP)> at nid 20,33 on hdaa0
    Oct 13 01:29:02 kernel ugen5.1: <AMD EHCI root HUB> at usbus5
    Oct 13 01:29:02 kernel ugen0.1: <0x1022 XHCI root HUB> at usbus0
    Oct 13 01:29:02 kernel uhub0: <AMD EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus5
    Oct 13 01:29:02 kernel ugen3.1: <AMD EHCI root HUB> at usbus3
    Oct 13 01:29:02 kernel uhub1: <0x1022 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
    Oct 13 01:29:02 kernel uhub2: <AMD EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
    Oct 13 01:29:02 kernel ugen1.1: <0x1022 XHCI root HUB> at usbus1
    Oct 13 01:29:02 kernel ugen6.1: <AMD OHCI root HUB> at usbus6
    Oct 13 01:29:02 kernel uhub3: <0x1022 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus1
    Oct 13 01:29:02 kernel uhub4: <AMD OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus6
    Oct 13 01:29:02 kernel ugen4.1: <AMD OHCI root HUB> at usbus4
    Oct 13 01:29:02 kernel ugen2.1: <AMD OHCI root HUB> at usbus2
    Oct 13 01:29:02 kernel uhub5: <AMD OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4
    Oct 13 01:29:02 kernel uhub6: <AMD OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
    Oct 13 01:29:02 kernel uhub4: 2 ports with 2 removable, self powered
    Oct 13 01:29:02 kernel uhub1: 4 ports with 4 removable, self powered
    Oct 13 01:29:02 kernel uhub3: 4 ports with 4 removable, self powered
    Oct 13 01:29:02 kernel uhub6: 5 ports with 5 removable, self powered
    Oct 13 01:29:02 kernel uhub5: 5 ports with 5 removable, self powered
    Oct 13 01:29:02 kernel uhub2: 5 ports with 5 removable, self powered
    Oct 13 01:29:02 kernel uhub0: 5 ports with 5 removable, self powered
    Oct 13 01:29:02 kernel ugen4.2: <Primax HP USB Keyboard> at usbus4
    Oct 13 01:29:02 kernel ukbd0 on uhub5
    Oct 13 01:29:02 kernel ukbd0: <Primax HP USB Keyboard, class 0/0, rev 1.10/1.11, addr 2> on usbus4
    Oct 13 01:29:02 kernel kbd2 at ukbd0
    Oct 13 01:29:02 kernel uhid0 on uhub5
    Oct 13 01:29:02 kernel uhid0: <Primax HP USB Keyboard, class 0/0, rev 1.10/1.11, addr 2> on usbus4
    Oct 13 01:29:02 kernel ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
    Oct 13 01:29:02 kernel ada0: <WDC WD5003ABYX-01WERA2 01.06SX2> ATA8-ACS SATA 3.x device
    Oct 13 01:29:02 kernel ada0: Serial Number WD-WMAYP6593607
    Oct 13 01:29:02 kernel ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
    Oct 13 01:29:02 kernel ada0: Command Queueing enabled
    Oct 13 01:29:02 kernel ada0: 476940MB (976773168 512 byte sectors)
    Oct 13 01:29:02 kernel cd0 at ahcich2 bus 0 scbus2 target 0 lun 0
    Oct 13 01:29:02 kernel cd0: <hp CDDVDW SN-208FB HJ10> Removable CD-ROM SCSI device
    Oct 13 01:29:02 kernel cd0: Serial Number S11P6YBFB010DA
    Oct 13 01:29:02 kernel cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes)
    Oct 13 01:29:02 kernel cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed
    Oct 13 01:29:02 kernel Trying to mount root from zfs:zroot/ROOT/default []...
    Oct 13 01:29:02 kernel random: unblocking device.
    Oct 13 01:29:02 kernel CPU: AMD A4 PRO-7300B APU with Radeon HD Graphics (3792.99-MHz K8-class CPU)
    Oct 13 01:29:02 kernel Origin="AuthenticAMD" Id=0x610f31 Family=0x15 Model=0x13 Stepping=1
    Oct 13 01:29:02 kernel Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
    Oct 13 01:29:02 kernel Features2=0x3e98320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C>
    Oct 13 01:29:02 kernel AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
    Oct 13 01:29:02 kernel AMD Features2=0x1ebbffb<LAHF,CMP,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,XOP,SKINIT,WDT,LWP,FMA4,TCE,NodeId,TBM,Topology,PCXC,PNXC>
    Oct 13 01:29:02 kernel Structured Extended Features=0x8<BMI1>
    Oct 13 01:29:02 kernel TSC: P-state invariant, performance statistics
    Oct 13 01:29:03 sshd 5357 Server listening on :: port 22.
    Oct 13 01:29:03 sshd 5357 Server listening on 0.0.0.0 port 22.
    Oct 13 01:29:03 syslogd Logging subprocess 5365 (exec /usr/local/sbin/sshguard) exited due to signal 15.
    Oct 13 01:29:05 check_reload_status Linkup starting em0
    Oct 13 01:29:05 kernel em0: link state changed to UP
    Oct 13 01:29:06 check_reload_status rc.newwanip starting em0
    Oct 13 01:29:07 kernel done.
    Oct 13 01:29:07 kernel done.
    Oct 13 01:29:07 php-fpm 344 /rc.newwanip: rc.newwanip: Info: starting on em0.
    Oct 13 01:29:07 php-fpm 344 /rc.newwanip: rc.newwanip: on (IP address: 173.21.171.***) (interface: WAN[wan]) (real interface: em0).
    Oct 13 01:29:07 check_reload_status Linkup starting bge0
    Oct 13 01:29:07 kernel bge0: link state changed to DOWN
    Oct 13 01:29:08 php-cgi rc.bootup: Resyncing OpenVPN instances.
    Oct 13 01:29:08 kernel pflog0: promiscuous mode enabled
    Oct 13 01:29:09 kernel done.
    Oct 13 01:29:09 php-cgi rc.bootup: sync unbound done.
    Oct 13 01:29:09 kernel done.
    Oct 13 01:29:10 check_reload_status Linkup starting em1
    Oct 13 01:29:10 kernel done.
    Oct 13 01:29:10 kernel em1: link state changed to UP
    Oct 13 01:29:11 php-cgi rc.bootup: NTPD is starting up.
    Oct 13 01:29:11 kernel done.
    Oct 13 01:29:11 check_reload_status Updating all dyndns
    Oct 13 01:29:11 kernel ....0 addresses deleted.
    Oct 13 01:29:15 check_reload_status Linkup starting em3
    Oct 13 01:29:15 kernel em3: link state changed to UP
    Oct 13 01:29:15 php-cgi rc.bootup: Creating rrd update script
    Oct 13 01:29:15 kernel done.
    Oct 13 01:29:16 syslogd exiting on signal 15
    Oct 13 01:29:16 syslogd kernel boot file is /boot/kernel/kernel
    Oct 13 01:29:16 kernel done.
    Oct 13 01:29:16 php-fpm 343 /rc.start_packages: Restarting/Starting all packages.
    Oct 13 01:29:16 kernel em0: promiscuous mode enabled
    Oct 13 01:29:21 kernel em1: promiscuous mode enabled
    Oct 13 01:29:21 kernel em3: promiscuous mode enabled
    Oct 13 01:29:39 php-fpm 343 [pfBlockerNG] Starting cron process.
    Oct 13 01:29:39 check_reload_status Syncing firewall
    Oct 13 01:29:39 check_reload_status Reloading filter
    Oct 13 01:29:41 login login on ttyv0 as root
    Oct 13 01:29:41 ntopng [mongoose.c:4534] ERROR: set_ports_option: cannot bind to 3000: Address already in use
    Oct 13 01:29:41 ntopng [HTTPserver.cpp:923] ERROR: Unable to start HTTP server (IPv4) on ports 3000
    Oct 13 01:29:43 sshd 96304 user root login class [preauth]
    Oct 13 01:29:43 sshd 96304 user root login class [preauth]
    Oct 13 01:29:46 sshd 96304 Accepted keyboard-interactive/pam for root from 192.168.33.13 port 58796 ssh2
    Oct 13 01:29:57 check_reload_status Syncing firewall
    Oct 13 01:29:58 kernel pid 52255 (ntopng), uid 0: exited on signal 11 (core dumped)
    Oct 13 01:29:58 kernel em1: promiscuous mode disabled
    Oct 13 01:29:58 kernel em3: promiscuous mode disabled
    Oct 13 01:30:02 kernel em1: promiscuous mode enabled
    Oct 13 01:30:02 kernel em3: promiscuous mode enabled
    Oct 13 01:30:02 check_reload_status Syncing firewall
    Oct 13 01:30:06 kernel pid 73450 (ntopng), uid 0: exited on signal 11 (core dumped)
    Oct 13 01:30:06 kernel em1: promiscuous mode disabled
    Oct 13 01:30:06 kernel em3: promiscuous mode disabled

    Oct 13 09:07:45 kernel em3: Watchdog timeout Queue[0]-- resetting
    Oct 13 09:07:45 kernel Interface is RUNNING and ACTIVE
    Oct 13 09:07:45 kernel em3: TX Queue 0 ------
    Oct 13 09:07:45 kernel em3: hw tdh = -1, hw tdt = -1
    Oct 13 09:07:45 kernel em3: Tx Queue Status = -2147483648
    Oct 13 09:07:45 kernel em3: TX descriptors avail = 44
    Oct 13 09:07:45 kernel em3: Tx Descriptors avail failure = 0
    Oct 13 09:07:45 kernel em3: RX Queue 0 ------
    Oct 13 09:07:45 kernel em3: hw rdh = -1, hw rdt = -1
    Oct 13 09:07:45 kernel em3: RX discarded packets = 0
    Oct 13 09:07:45 kernel em3: RX Next to Check = 957
    Oct 13 09:07:45 kernel em3: RX Next to Refresh = 956
    Oct 13 09:08:03 kernel em3: Watchdog timeout Queue[0]-- resetting
    Oct 13 09:08:03 kernel Interface is RUNNING and ACTIVE
    Oct 13 09:08:03 kernel em3: TX Queue 0 ------
    Oct 13 09:08:03 kernel em3: hw tdh = -1, hw tdt = -1
    Oct 13 09:08:03 kernel em3: Tx Queue Status = -2147483648
    Oct 13 09:08:03 kernel em3: TX descriptors avail = 40
    Oct 13 09:08:03 kernel em3: Tx Descriptors avail failure = 11
    Oct 13 09:08:03 kernel em3: RX Queue 0 ------
    Oct 13 09:08:03 kernel em3: hw rdh = -1, hw rdt = -1
    Oct 13 09:08:03 kernel em3: RX discarded packets = 0
    Oct 13 09:08:03 kernel em3: RX Next to Check = 0
    Oct 13 09:08:03 kernel em3: RX Next to Refresh = 0
    Oct 13 09:08:21 kernel em3: Watchdog timeout Queue[0]-- resetting
    Oct 13 09:08:21 kernel Interface is RUNNING and ACTIVE
    Oct 13 09:08:21 kernel em3: TX Queue 0 ------
    Oct 13 09:08:21 kernel em3: hw tdh = -1, hw tdt = -1
    Oct 13 09:08:21 kernel em3: Tx Queue Status = -2147483648
    Oct 13 09:08:21 kernel em3: TX descriptors avail = 40
    Oct 13 09:08:21 kernel em3: Tx Descriptors avail failure = 14
    Oct 13 09:08:21 kernel em3: RX Queue 0 ------
    Oct 13 09:08:21 kernel em3: hw rdh = -1, hw rdt = -1
    Oct 13 09:08:21 kernel em3: RX discarded packets = 0
    Oct 13 09:08:21 kernel em3: RX Next to Check = 0
    Oct 13 09:08:21 kernel em3: RX Next to Refresh = 0
    Oct 13 09:08:41 kernel em3: Watchdog timeout Queue[0]-- resetting
    Oct 13 09:08:41 kernel Interface is RUNNING and ACTIVE
    Oct 13 09:08:41 kernel em3: TX Queue 0 ------
    Oct 13 09:08:41 kernel em3: hw tdh = -1, hw tdt = -1
    Oct 13 09:08:41 kernel em3: Tx Queue Status = -2147483648
    Oct 13 09:08:41 kernel em3: TX descriptors avail = 40
    Oct 13 09:08:41 kernel em3: Tx Descriptors avail failure = 22
    Oct 13 09:08:41 kernel em3: RX Queue 0 ------
    Oct 13 09:08:41 kernel em3: hw rdh = -1, hw rdt = -1
    Oct 13 09:08:41 kernel em3: RX discarded packets = 0
    Oct 13 09:08:41 kernel em3: RX Next to Check = 0
    Oct 13 09:08:41 kernel em3: RX Next to Refresh = 0
    Oct 13 09:09:00 kernel em3: Watchdog timeout Queue[0]-- resetting
    Oct 13 09:09:00 kernel Interface is RUNNING and ACTIVE
    Oct 13 09:09:00 kernel em3: TX Queue 0 ------
    Oct 13 09:09:00 kernel em3: hw tdh = -1, hw tdt = -1
    Oct 13 09:09:00 kernel em3: Tx Queue Status = -2147483648
    Oct 13 09:09:00 kernel em3: TX descriptors avail = 40
    Oct 13 09:09:00 kernel em3: Tx Descriptors avail failure = 41
    Oct 13 09:09:00 kernel em3: RX Queue 0 ------
    Oct 13 09:09:00 kernel em3: hw rdh = -1, hw rdt = -1
    Oct 13 09:09:00 kernel em3: RX discarded packets = 0
    Oct 13 09:09:00 kernel em3: RX Next to Check = 0
    Oct 13 09:09:00 kernel em3: RX Next to Refresh = 0
    Oct 13 09:09:20 kernel em3: Watchdog timeout Queue[0]-- resetting
    Oct 13 09:09:20 kernel Interface is RUNNING and ACTIVE
    Oct 13 09:09:20 kernel em3: TX Queue 0 ------
    Oct 13 09:09:20 kernel em3: hw tdh = -1, hw tdt = -1
    Oct 13 09:09:20 kernel em3: Tx Queue Status = -2147483648
    Oct 13 09:09:20 kernel em3: TX descriptors avail = 40
    Oct 13 09:09:20 kernel em3: Tx Descriptors avail failure = 65
    Oct 13 09:09:20 kernel em3: RX Queue 0 ------
    Oct 13 09:09:20 kernel em3: hw rdh = -1, hw rdt = -1
    Oct 13 09:09:20 kernel em3: RX discarded packets = 0
    Oct 13 09:09:20 kernel em3: RX Next to Check = 0
    Oct 13 09:09:20 kernel em3: RX Next to Refresh = 0
    Oct 13 09:09:38 kernel em3: Watchdog timeout Queue[0]-- resetting
    Oct 13 09:09:38 kernel Interface is RUNNING and ACTIVE
    Oct 13 09:09:38 kernel em3: TX Queue 0 ------
    Oct 13 09:09:38 kernel em3: hw tdh = -1, hw tdt = -1
    Oct 13 09:09:38 kernel em3: Tx Queue Status = -2147483648
    Oct 13 09:09:38 kernel em3: TX descriptors avail = 62
    Oct 13 09:09:38 kernel em3: Tx Descriptors avail failure = 65
    Oct 13 09:09:38 kernel em3: RX Queue 0 ------
    Oct 13 09:09:38 kernel em3: hw rdh = -1, hw rdt = -1
    Oct 13 09:09:38 kernel em3: RX discarded packets = 0
    Oct 13 09:09:38 kernel em3: RX Next to Check = 0
    Oct 13 09:09:38 kernel em3: RX Next to Refresh = 0
    Oct 13 09:09:57 kernel em3: Watchdog timeout Queue[0]-- resetting
    Oct 13 09:09:57 kernel Interface is RUNNING and ACTIVE
    Oct 13 09:09:57 kernel em3: TX Queue 0 ------
    Oct 13 09:09:57 kernel em3: hw tdh = -1, hw tdt = -1
    Oct 13 09:09:57 kernel em3: Tx Queue Status = -2147483648
    Oct 13 09:09:57 kernel em3: TX descriptors avail = 40
    Oct 13 09:09:57 kernel em3: Tx Descriptors avail failure = 83
    Oct 13 09:09:57 kernel em3: RX Queue 0 ------
    Oct 13 09:09:57 kernel em3: hw rdh = -1, hw rdt = -1
    Oct 13 09:09:57 kernel em3: RX discarded packets = 0
    Oct 13 09:09:57 kernel em3: RX Next to Check = 0
    Oct 13 09:09:57 kernel em3: RX Next to Refresh = 0
    Oct 13 09:10:16 kernel em3: Watchdog timeout Queue[0]-- resetting
    Oct 13 09:10:16 kernel Interface is RUNNING and ACTIVE
    Oct 13 09:10:16 kernel em3: TX Queue 0 ------
    Oct 13 09:10:16 kernel em3: hw tdh = -1, hw tdt = -1
    Oct 13 09:10:16 kernel em3: Tx Queue Status = -2147483648
    Oct 13 09:10:16 kernel em3: TX descriptors avail = 53
    Oct 13 09:10:16 kernel em3: Tx Descriptors avail failure = 83
    Oct 13 09:10:16 kernel em3: RX Queue 0 ------
    Oct 13 09:10:16 kernel em3: hw rdh = -1, hw rdt = -1
    Oct 13 09:10:16 kernel em3: RX discarded packets = 0
    Oct 13 09:10:16 kernel em3: RX Next to Check = 0
    Oct 13 09:10:16 kernel em3: RX Next to Refresh = 0



  • Do I need legal.intel_wpi.license_ack=1 in /boot/loader.conf.local?

    Is this accepting that Intel will own all the data the interface passes? :)



  • I tried that setting in /boot/loader.conf.local, rebooted, left to run some errands and when I came back two hours later the interface was down again:

    Oct 13 12:45:49 kernel em0: Watchdog timeout Queue[0]-- resetting
    Oct 13 12:45:49 kernel Interface is RUNNING and ACTIVE
    Oct 13 12:45:49 kernel em0: TX Queue 0 ------
    Oct 13 12:45:49 kernel em0: hw tdh = -1, hw tdt = -1
    Oct 13 12:45:49 kernel em0: Tx Queue Status = -2147483648
    Oct 13 12:45:49 kernel em0: TX descriptors avail = 117
    Oct 13 12:45:49 kernel em0: Tx Descriptors avail failure = 0
    Oct 13 12:45:49 kernel em0: RX Queue 0 ------
    Oct 13 12:45:49 kernel em0: hw rdh = -1, hw rdt = -1
    Oct 13 12:45:49 kernel em0: RX discarded packets = 0
    Oct 13 12:45:49 kernel em0: RX Next to Check = 440
    Oct 13 12:45:49 kernel em0: RX Next to Refresh = 439

    I still get this in my dmesg:

    ipw_bss: You need to read the LICENSE file in /usr/share/doc/legal/intel_ipw.LICENSE.
    ipw_bss: If you agree with the license, set legal.intel_ipw.license_ack=1 in /boot/loader.conf

    I'm guessing my /boot/loader.conf.local wasn't processed? So I added the line to /boot/loader.conf; not even bothering to cross my fingers at this point.



  • I just crashed again and on reboot, I still get:

    iwi_bss: If you agree with the license, set legal.intel_iwi.license_ack=1 in /boot/loader.conf.

    As noted in a previous post, I added that to /boot/loader.conf.local and I also added it to /boot/loader.conf this morning, too.

    Here's the syslog from the last crash and reboot. I had continuous cross pings in two windows, one from my PC to the router and the other from a ssh shell on the router to my PC. They both dropped at the same time of course, but the only error in the syslog was that my gateway dropped 22%... ???

    Oct 13 18:07:05 rc.gateway_alarm 31127 >>> Gateway alarm: WAN_DHCP (Addr:173.21.160.1 Alarm:1 RTT:13.081ms RTTsd:4.581ms Loss:22%)
    Oct 13 18:07:05 check_reload_status updating dyndns WAN_DHCP
    Oct 13 18:07:05 check_reload_status Restarting ipsec tunnels
    Oct 13 18:07:05 check_reload_status Restarting OpenVPN tunnels/interfaces
    Oct 13 18:07:05 check_reload_status Reloading filter
    Oct 13 18:07:22 login login on ttyv0 as root
    Oct 13 18:07:27 php-cgi rc.initial.reboot: Stopping all packages.
    Oct 13 18:07:31 reboot rebooted by root

    On a side note, when I looked just before one of the system crashes last night, I saw I had filled up all 6 GB of RAM and dipped into swap a few percent. I think maybe it was due to NtopNG, so I disabled it and haven't been over probably 10-20% memory since. I sort of think I have 2-3 or maybe even more different issues that I can't seem to find. I'm going to uninstall Snort, even though I never started to configure it. I'm also going to disable dark stat as well. It's probably unrelated, but at this point I'm starting to try anything...

    What in the world is so unstable about using an HP PC, Intel card and a fresh 2.4.4 pfSense install???



  • Additional symptoms

    When I enabled em3 and assigned it a 192.168 address for my MikroTik router, I could not see it in the list of interfaces in the DHCP Server menu on the GUI. I had to set a DHCP pool manually from the console.

    I was going to simply disable Snort instead of uninstalling it, but the Snort option was not available in the Services menu on the web GUI.

    I might have to wind up blaming Russian hackers! This is getting crazy. Do I need to reformat the drive and reinstall from scratch?



  • Just happened again and the only error in the system log was:

    Oct 13 21:23:04 rc.gateway_alarm 6087 >>> Gateway alarm: WAN_DHCP (Addr:173.21.160.1 Alarm:1 RTT:13.090ms RTTsd:5.287ms Loss:21%)

    em0 just stopped passing traffic, but this time I could still access the LAN port on the router. I unplugged the ethernet cable on em0, plugged it back in, nothing. I rebooted my cable modem and no luck there, either. I decided to re-enable ACPI in the BIOS since it doesn't seem to matter. I also enabled powerd in the misc section of the pfSense advanced config.


  • Netgate Administrator

    The Intel liscence ACKs only stop those messages appearing. They are only related to the wireless drivers iwi(4) and ipw(4), it will have no effect here.

    Those watchdog errors should never normally appear. It is failing, trying to recover and failing ti do so.

    I would start trying to solve this by making the most basic install possible and cheking that runs OK before adding any packages etc.
    You said you installed 2 quad port NICs but it looks like you're using on 4 ports. I would remove the second NIC if you;re not using it. Or even swap it out with the first one to be sure it's not a hardware issue.

    Steve



  • @stephenw10 Thanks much for the reply! I did drop to one NIC a couple days ago. These last few crashes are particularly puzzling because the only symptom I see is a log entry about the WAN interface losing some packets (15-25% loss). A few seconds after that, I notice the WAN link is totally unresponsive. I don't think I've had an instance yet where dpinger notes dropped packets and the device recovers. Sometimes the LAN side drops, too, but other times I can still ping, ssh and use the web interface. I "think" the LAN side stays responsive as long as there aren't watchdog timeouts and additional interface related log entries.

    It does seem slightly more stable with ACPI/powerd enabled, as it's been staying up longer, but that could be random too.

    I think you're right, my next two steps are:

    Change NIC's, even use the other PCI slot
    Reinstall from scratch, don't use the old config and don't install any packages

    Currently I have darkstat, iftop, nmap, ntopng and pfBlockerNG. I had Snort, but something was wrong with it as it didn't show up in the menus, so I dropped it. This might also be a symptom of something wrong in the install.



  • I forgot to note, I started out bridging 7 of the 8 ports in the 2 quad NIC's. It was working about the same as now, so my first step in troubleshooting was to disable the bridge and also change out the WAN cable and the cable to my PC.

    I kind of wanted to use all 8 ports in the pfSense box as it saves cables, extra hardware, interfaces and allows more port by port management overall, not that I need it in a home network. I'm kind of testing this to see if it's something I want to install at work, where there may be some use in connecting 6 home locations via VPN.


  • Netgate Administrator

    It should work even if bridging is usually a bad idea.
    Yes try to rule out hardware initially if you can. As I said start out with a super basic two NIC config and make sure that works.
    Disable anything you don't need in the BIOS, soundcards etc.

    Steve



  • Just found something, pciconf -l -c em0 gives some PCI info, including the line:

    ecap 0001[100] = AER 1 1 fatal 3 non-fatal 5 corrected

    AER is Advanced Error Reporting and this notes some PCI bus errors. Next time I crash, I'll run this command at the console and see what it reveals.



  • EDIT: Changed script for all interfaces

    And just 'cause it's Sunday, I wrote a little perl script:

    #!/usr/local/bin/perl

    for (my $i=1; $i <= 604800; $i++) {
    print "\n";
    my $ts=system('date');
    my $err=system('/usr/sbin/pciconf -l -c em0 | grep AER');
    my $err=system('/usr/sbin/pciconf -l -c em1 | grep AER');
    my $err=system('/usr/sbin/pciconf -l -c em2 | grep AER');
    my $err=system('/usr/sbin/pciconf -l -c em3 | grep AER');
    my $err=system('/usr/sbin/pciconf -l -c bge0 | grep AER');
    sleep(1);

    which outputs:

    Sun Oct 14 13:13:41 CDT 2018
    ecap 0001[100] = AER 1 1 fatal 3 non-fatal 5 corrected
    ecap 0001[100] = AER 1 1 fatal 3 non-fatal 5 corrected
    ecap 0001[100] = AER 1 0 fatal 2 non-fatal 5 corrected
    ecap 0001[100] = AER 1 0 fatal 3 non-fatal 5 corrected
    ecap 0001[100] = AER 1 0 fatal 0 non-fatal 0 corrected

    I redirected the output to a text file so I can have a second by second account of the state of the em0-em3 and bge0 interfaces, to see if PCI errors (and what kind and how many) occur second(s) before dpinger makes its syslog entry about the gateway dropping.





  • So I waited a while until a crash. dpinger says the interface crashed at 16:57:23. My script stopped logging a full minute earlier at 16:56:10; maybe it was hanging on the system call to pciconf? The log I made found 2 additional fatal errors though, on em2 (nothing plugged in) and em3 (MikroTik router). So we went from:

    em0 - 1 fatal 3 non-fatal 5 corrected
    em1 - 1 fatal 3 non-fatal 5 corrected
    em2 - 0 fatal 2 non-fatal 5 corrected
    em3 - 0 fatal 3 non-fatal 5 corrected
    

    bge0 - 0 fatal 0 non-fatal 0 corrected

    to

    em0 - 1 fatal 3 non-fatal 5 corrected
    em1 - 1 fatal 3 non-fatal 5 corrected
    em2 - 1 fatal 2 non-fatal 5 corrected
    em3 - 1 fatal 3 non-fatal 5 corrected
    bge0 - 0 fatal 0 non-fatal 0 corrected

    But this error happened at 14:18:43, 2.5 hours before the eventual crash. After I rebooted again, without any changes (my son was trying to play time sensitive games), the machine crashed 2 more times inside 10 minutes.

    Oh well, after the third crash/reboot, I swapped the NIC out and put it in a different PCI slot. dpinger logged packet loss on the WAN interface after that, but it hasn't dropped the interface altogether yet after 30 min knock on wood.

    @BFEITELL I thought about MSI maybe causing problems. The dmesg I have above shows the USB device having trouble:

    xhci0: Unable to map MSI-X table

    but I don't know if that would matter? I could disable all the USB for that matter, I only need it for booting to install.



  • I neglected to say, my little perl script logged PCI status once every second (57,000+ lines) until mysteriously hanging/stopping one minute short of the crash. I doubt that's a coincidence.



  • @rediske said in Intel Interface Issues:

    I neglected to say, my little perl script logged PCI status once every second (57,000+ lines) until mysteriously hanging/stopping one minute short of the crash. I doubt that's a coincidence.

    Well, let's say the device driver, and the related NIC most probably, goes down that moment - or, at least, becomes very busy.
    The NIC takes the system with it a couple of moments later.

    Just to exclude outside issues (DDOS) : is it possible that you change your "real" WAN IP ?
    Or leave WAN disconnected for a while.



  • @gertjan said in Intel Interface Issues:

    Well, let's say the device driver, and the related NIC most probably, goes down that moment - or, at least, becomes very busy.
    The NIC takes the system with it a couple of moments later.

    Just to exclude outside issues (DDOS) : is it possible that you change your "real" WAN IP ?
    Or leave WAN disconnected for a while.

    I'm sorry, I got a little fast and loose with the term crash. The pfSense router never actually crashes, the ethernet interfaces become unresponsive to network traffic (ping, web configurator, etc).

    Since I swapped the NIC out and changed PCI slots, em3 on the second NIC died twice now. On the first NIC it was em0 that kept dropping. Same config as before, em0 WAN, em1 LAN, em2 empty, em3 MikroTik router for wireless. I see I got a different WAN IP after the reboot last night, but this morning em3 is down already again and that's on an internal network, with very little traffic (wireless for 2 phones and 2 tablets) and my son and I were sleeping.

    Right now it shows em2 and em3 have single fatal PCI errors and the ethernet connection and activity lights on em3 both went dark. I'm writing this on a PC plugged into a switch that's connected to em1 and the WAN is on em0, and those seem to work fine.

    When this happened last night, I unplugged the em3 cable and plugged it back in and got link lights back, but it still wouldn't talk. This morning when I unplugged it and plugged it back in, the lights stayed dark.

    At this point, I think I'm going to reinstall pfSense and maybe try messing with MSI settings. But I'm betting nothing I do will get either of these intel cards to be stable with this HP PC/mobo. I don't think it's traffic related as I imaged 2 VM's on my PC at the same time, 60 GB of traffic in 40 min (200 Mbits) and that went fine.

    It just seems after some period of time, anything from an hour to 12 hours, it shuts off one or more ethernet interfaces, sometimes putting messages in the system log and sometimes not.

    I saw these from the latest crash:

    Oct 15 07:24:43 kernel em3: Watchdog timeout Queue[0]-- resetting
    Oct 15 07:24:43 kernel Interface is RUNNING and ACTIVE
    Oct 15 07:24:43 kernel em3: TX Queue 0 ------
    Oct 15 07:24:43 kernel em3: hw tdh = -1, hw tdt = -1
    Oct 15 07:24:43 kernel em3: Tx Queue Status = -2147483648
    Oct 15 07:24:43 kernel em3: TX descriptors avail = 40
    Oct 15 07:24:43 kernel em3: Tx Descriptors avail failure = 5
    Oct 15 07:24:43 kernel em3: RX Queue 0 ------
    Oct 15 07:24:43 kernel em3: hw rdh = -1, hw rdt = -1
    Oct 15 07:24:43 kernel em3: RX discarded packets = 0
    Oct 15 07:24:43 kernel em3: RX Next to Check = 525
    Oct 15 07:24:43 kernel em3: RX Next to Refresh = 524

    That repeated a few times, the last time being:

    Oct 15 07:27:12 kernel em3: Watchdog timeout Queue[0]-- resetting
    Oct 15 07:27:12 kernel Interface is RUNNING and ACTIVE
    Oct 15 07:27:12 kernel em3: TX Queue 0 ------
    Oct 15 07:27:12 kernel em3: hw tdh = -1, hw tdt = -1
    Oct 15 07:27:12 kernel em3: Tx Queue Status = -2147483648
    Oct 15 07:27:12 kernel em3: TX descriptors avail = 58
    Oct 15 07:27:12 kernel em3: Tx Descriptors avail failure = 119
    Oct 15 07:27:12 kernel em3: RX Queue 0 ------
    Oct 15 07:27:12 kernel em3: hw rdh = -1, hw rdt = -1
    Oct 15 07:27:12 kernel em3: RX discarded packets = 0
    Oct 15 07:27:12 kernel em3: RX Next to Check = 0
    Oct 15 07:27:12 kernel em3: RX Next to Refresh = 0

    And now it's 10 AM and there's been no kernel errors since.



  • I left the machine with em3 down, since I don't need wifi anyway, and it's been functioning fine as far as I can tell. Only 4 entries on the system log:

    Oct 15 10:01:07 check_reload_status Syncing firewall
    Oct 15 10:01:07 syslogd exiting on signal 15
    Oct 15 10:01:07 syslogd kernel boot file is /boot/kernel/kernel
    Oct 15 10:01:07 pfsense.localdomain nginx: 2018/10/15 10:01:07 [error] 58467#100412: send() failed (54: Connection reset by peer)

    It's been at 1-3% cpu usage and 7% memory, totally normal for a home network with just 1 PC using the web.

    As a refresher, I'm using an AMD A4 PRO-7300B processor (3.8 GHz) in an HP EliteDesk 705 G1 SFF, 6GB RAM 500GB HDD. I did not disable the on board bge0 ethernet and it has nothing plugged into it. I have a single Intel PRO 1000 PT Quad Port 1Gb PCIe Ethernet card and I've tried two different cards in two different slots.

    It'll be a bummer if I can't use the Intel cards. When I researched it, I heard they're usually wonderful for pfSense and I got the pair for $70. There's something sexy about having 8 MAC addresses numbered in a row ;)



  • One idea.

    what hapends if you plug the wireless to the mainboard nic?
    My idea is if it's an issue between microtic and Intel it might help running the mcirotic against another nictype



  • I did not try putting the MikroTik on another port, however I did try only having two of the Intel interfaces up as WAN and LAN, and I still want up having problems.

    For fun, I tried installing the ESXi on the machine to put pfsense inside that. ESXi wouldn’t recognize the Intel at all.