PFsense hanging since version 2.4.4



  • Computer hanged again.. After 1day and 8 hours. I will update to 2.4.4 p2 and if it hangs again i will connect it to another PSU to test one thing each time.. Will let you know in here to help out others maybe :)


  • Rebel Alliance Netgate Administrator

    It sounds like it's hardware.

    I'm not in your shoes, but chasing hardware faults is difficult at best sometimes.

    When I had this happen 10 years ago, in my home, my wife insisted i fix it.... I ended up buying new hardware as that was the fastest path to resolution. Good luck!



  • It hung again. So now im running on another PSU. Also running a memtest now with Memtest86+... Will keep you updated... :) When it hung the console screen were just frozen and no special messages.

    If it freezes again next step is to run it from USB stick i guess...
    If it frezes then i will test older releases of pfsense.


  • Rebel Alliance Netgate Administrator

    This is a hardware fault.

    If you have another device to swap out I would do so, or if you can run a VM to test I would.



  • @chrismacmahon said in PFsense hanging since version 2.4.4:

    This is a hardware fault.

    If you have another device to swap out I would do so, or if you can run a VM to test I would.

    A VM to test what exactly? A VM of my current installation?

    The router hanged again an hour ago. Im now making a USB stick to try to run my config from it if possible... Live.


  • Rebel Alliance Netgate Administrator

    If you have the hardware to spin up a virtual machine, you can import your working config into the VM and run off of that.



  • @chrismacmahon said in PFsense hanging since version 2.4.4:

    If you have the hardware to spin up a virtual machine, you can import your working config into the VM and run off of that.

    ok.. i dont have a ESXi machine at home.. but at work.. i will try to reinstall 2.4.4.. on the computer first.. or run 2.3 if that failes.



  • i tried to create a USB stick with 2.4.4 p1 but after installing i get into that crappy serial console bug.. i pressed ESC to type "set kern.vty=sc" but then booting into multi user mode i seem to get a crash.. text scrolls by too fast for me to read.. so after ESC i want to boot into single user mode to be able to see the boot errors.. but how do i boot from CLI (ESC) into single user mode.. i have tried to google for it but cannot find anything... i tried like boot single and such without success.. please help!

    I really wanted to try 2.4.4 before i go back and try 2.3 or something that might work better...


  • Rebel Alliance Netgate Administrator

    Not sure, I would re-burn the image, try again...if it happens again the hardware issue is the problem.

    We are a fan of Etcher.io



  • Found it... "boot -s" it is :) .. Will try it and review the logs.



  • Had to run fsck -y a few times due to unclean system because i did a power off because of the display issue in 2.4.4.. i did not realize the system would start up anyway..

    So now the system is up .. 2.4.4 p1 with basic settings.. Nothing changed except password.. will let it run like that to see what happens.. if it hangs again i will try to go back to an old version where i know it was working... to rule out software too...

    Is there anyone that has an older image available? 2.4 or 2.4.1 or 2.4.2? that i could have to test?



  • Another thing... i found this...

    Intel Atom systems containing HD Graphics chipsets similar to the Z3700 may experience console problems after the update. Affected systems will boot successfully, but fail to display console output after the boot menu. To fix the problem, add the following lines to /boot/loader.conf.local:
    
    i915kms_load="YES"
    drm.i915.enable_unsupported=1
    Systems with similar console problems not containing a graphics chip supported by the i915 driver may need to reinstall 2.4.4 to use a UEFI console.
    
    Alternately, try using the syscons console instead of VT in /boot/loader.conf.local:
    
    kern.vty=sc
    

    I have been using kern.vty=sc.. but i see i can enter the other stuff instead... Could this be a source of my problems i have had? I'm running Atom with Z36xx or Z37xx gfx chip..

    Here is pciconf -lv output of my system:

    code
    [2.4.4-RELEASE][root@pfSense.localdomain]/root: pciconf -lv
    hostb0@pci0:0:0:0:      class=0x060000 card=0x0f318086 chip=0x0f008086 rev=0x0e hdr=0x00
        vendor     = 'Intel Corporation'
        device     = 'Atom Processor Z36xxx/Z37xxx Series SoC Transaction Register'
        class      = bridge
        subclass   = HOST-PCI
    vgapci0@pci0:0:2:0:     class=0x030000 card=0x0f318086 chip=0x0f318086 rev=0x0e hdr=0x00
        vendor     = 'Intel Corporation'
        device     = 'Atom Processor Z36xxx/Z37xxx Series Graphics & Display'
        class      = display
        subclass   = VGA
    ahci0@pci0:0:19:0:      class=0x010601 card=0x0f238086 chip=0x0f238086 rev=0x0e hdr=0x00
        vendor     = 'Intel Corporation'
        device     = 'Atom Processor E3800 Series SATA AHCI Controller'
        class      = mass storage
        subclass   = SATA
    xhci0@pci0:0:20:0:      class=0x0c0330 card=0x0f358086 chip=0x0f358086 rev=0x0e hdr=0x00
        vendor     = 'Intel Corporation'
        device     = 'Atom Processor Z36xxx/Z37xxx, Celeron N2000 Series USB xHCI'
        class      = serial bus
        subclass   = USB
    none0@pci0:0:26:0:      class=0x108000 card=0x0f188086 chip=0x0f188086 rev=0x0e hdr=0x00
        vendor     = 'Intel Corporation'
        device     = 'Atom Processor Z36xxx/Z37xxx Series Trusted Execution Engine'
        class      = encrypt/decrypt
    hdac0@pci0:0:27:0:      class=0x040300 card=0x0f048086 chip=0x0f048086 rev=0x0e hdr=0x00
        vendor     = 'Intel Corporation'
        device     = 'Atom Processor Z36xxx/Z37xxx Series High Definition Audio Controller'
        class      = multimedia
        subclass   = HDA
    pcib1@pci0:0:28:0:      class=0x060400 card=0x0f488086 chip=0x0f488086 rev=0x0e hdr=0x01
        vendor     = 'Intel Corporation'
        device     = 'Atom Processor E3800 Series PCI Express Root Port 1'
        class      = bridge
        subclass   = PCI-PCI
    pcib2@pci0:0:28:1:      class=0x060400 card=0x0f4a8086 chip=0x0f4a8086 rev=0x0e hdr=0x01
        vendor     = 'Intel Corporation'
        device     = 'Atom Processor E3800 Series PCI Express Root Port 2'
        class      = bridge
        subclass   = PCI-PCI
    pcib3@pci0:0:28:2:      class=0x060400 card=0x0f4c8086 chip=0x0f4c8086 rev=0x0e hdr=0x01
        vendor     = 'Intel Corporation'
        device     = 'Atom Processor E3800 Series PCI Express Root Port 3'
        class      = bridge
        subclass   = PCI-PCI
    pcib4@pci0:0:28:3:      class=0x060400 card=0x0f4e8086 chip=0x0f4e8086 rev=0x0e hdr=0x01
        vendor     = 'Intel Corporation'
        device     = 'Atom Processor E3800 Series PCI Express Root Port 4'
        class      = bridge
        subclass   = PCI-PCI
    isab0@pci0:0:31:0:      class=0x060100 card=0x0f1c8086 chip=0x0f1c8086 rev=0x0e hdr=0x00
        vendor     = 'Intel Corporation'
        device     = 'Atom Processor Z36xxx/Z37xxx Series Power Control Unit'
        class      = bridge
        subclass   = PCI-ISA
    none1@pci0:0:31:3:      class=0x0c0500 card=0x0f128086 chip=0x0f128086 rev=0x0e hdr=0x00
        vendor     = 'Intel Corporation'
        device     = 'Atom Processor E3800 Series SMBus Controller'
        class      = serial bus
        subclass   = SMBus
    igb0@pci0:1:0:0:        class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00
        vendor     = 'Intel Corporation'
        device     = 'I211 Gigabit Network Connection'
        class      = network
        subclass   = ethernet
    igb1@pci0:2:0:0:        class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00
        vendor     = 'Intel Corporation'
        device     = 'I211 Gigabit Network Connection'
        class      = network
        subclass   = ethernet
    igb2@pci0:3:0:0:        class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00
        vendor     = 'Intel Corporation'
        device     = 'I211 Gigabit Network Connection'
        class      = network
        subclass   = ethernet
    igb3@pci0:4:0:0:        class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00
        vendor     = 'Intel Corporation'
        device     = 'I211 Gigabit Network Connection'
        class      = network
        subclass   = ethernet
    
    


  • Ok.. i entered:
    i915kms_load="YES"
    drm.i915.enable_unsupported=1

    Into /boot/loader.conf.local

    I now have much smaller text on the monitor connected to the system (VGA) when i boot the system than i have had before.. Maybe this could have been the issue why i had system freeze? .. I will run with the default config for 2 days... if no freeze i will load my config into the system and see if everything works as it should....



  • This looks promising.. I really hope this was the issue.. seems like it though... Uptime of 2 days and 6 hours now.. I will let it run until 3 days then i will apply my own config..

    0_1547434145944_a1c10830-388b-4bad-b6af-a68fcf73192e-image.png



  • While you're applying, switch to p3.



  • @gertjan said in PFsense hanging since version 2.4.4:

    While you're applying, switch to p3.

    That one seems not to be available.. At least not in the stable brand. Is there some improvements for the problems im having?



  • Ok. I think i found the problem. The problem were infact not Hardware fault but it was software "fault" related to the 2.4.4 releases.

    When installing 2.4.4 the bootup changed to console display instead of VGA display. To fix that i found articles that i should add:
    kern.vty=sc
    To my /boot/loader.conf.local
    But that seemed to have caused the system freezes i have had.

    I now found another article that i should add the following to /boot/loader.conf.local
    i915kms_load="YES"
    drm.i915.enable_unsupported=1

    THAT solved my problems it seems like!

    I still have to apply my original config though to verify. I will do that now.. System has now been running fine for 4.5 days.

    0_1547625520149_7f628828-e65d-463e-90a2-9cf02554c1a0-image.png



  • @taz79 said in PFsense hanging since version 2.4.4:

    That one seems not to be available.

    You're right : it should be p2 - sorry.



  • @unb2b said in PFsense hanging since version 2.4.4:

    may i know how old your pc does it supported?

    If you read the thread you will find out.. it has already been mentioned. Computer is from 1st of April 2017 so not very old.



  • My backup with all configuration has now been applied.

    Lets give it a few more days too see if this is a complete solution or not :)



  • Ok. After applying my config system hung after 2 days again. So i suspect there is something wrong with my config since v 2.4.4... Or my 2 OpenVPN tunnels or something else causing the system to crash. Any advices how to go forward? I think i have established that this is not a hardware issue anymore?



  • And it hung again this morning. I reverted back to factory default settings now to get it stable again..



  • @taz79 Maybe you'd want to open the box and see if there are any swelled or popped capacitors. Just a thought.



  • I have replaced the pfsense router with a dovado router in the meantime. I will see if i could find some other hardware i could use instead :( ... As a last resort i will try to use an older version of PFsense on the router in a last hope there wont be a hardware issue :) .. But yesterday the router hung again with default config this time :(



  • Ok.. Setup with both 2.3.5 and 2.4.3 did lockup and hang with a flickering screen.. So as a few of you said in the beginning.. Hardware trouble.. I guess this PC will end up in the recycle pile :( Thanks for the help and input everyone!!! :)



  • Probably gonna try to buy a SG-1100 instead :)



  • So... I'm with my first Netgate hardware! :D :D :D

    0_1548437929862_220d1541-3097-4aca-8dde-33a97336562f-image.png


  • Rebel Alliance Netgate Administrator

    Thank you for your purchase!

    It's a nice unit!



  • @chrismacmahon

    It seems to be a nice product! :)

    5 to 6 MB/s in traffic from 4 IP:s and CPU load at 23%.. Looks good so far :) This is a small network with around 20 devices and max speed is 117Mbit/s so i think its a walk in the park for this hardware :)

    0_1548438675186_a8d221ff-5fec-4eab-a9b7-2f05cf02b13e-image.png



  • Also have this on Feb2019 from artway and maybe we are on the same boat. :D
    Only admin password has been changed at install, cmd reboot 3-times after install then crash happen randomly 40mins, 1hr, 2hrs etc.
    Already contact with manufacturer but not very responsive.

    Bought here:
    https://www.lazada.com.ph/products/-i261594927-s361115350.html

    Manufacturer link:
    http://artwaytech.com/index.php?s=/Android/detail/id/87

    Dump files link:
    https://drive.google.com/open?id=1Z1_nkFzruCgFYMQxunjQ2PkcrpyfiI3W


  • Netgate Administrator

    db:0:kdb.enter.default>  show pcpu
    cpuid        = 3
    dynamic pcpu = 0xfffffe026599f380
    curthread    = 0xfffff8000c007000: pid 14696 "filterlog"
    curpcb       = 0xfffffe022f325cc0
    fpcurthread  = 0xfffff8000c007000: pid 14696 "filterlog"
    idlethread   = 0xfffff80005217620: tid 100006 "idle: cpu3"
    curpmap      = 0xfffff80005f92138
    tssp         = 0xffffffff82bb6948
    commontssp   = 0xffffffff82bb6948
    rsp0         = 0xfffffe022f325cc0
    gs32p        = 0xffffffff82bbd1a0
    ldt          = 0xffffffff82bbd1e0
    tss          = 0xffffffff82bbd1d0
    db:0:kdb.enter.default>  bt
    Tracing pid 14696 tid 100153 td 0xfffff8000c007000
    __mtx_lock_sleep() at __mtx_lock_sleep+0xcd/frame 0xfffffe022f325610
    binsfree() at binsfree+0x2bf/frame 0xfffffe022f325660
    bqrelse() at bqrelse+0xde/frame 0xfffffe022f325690
    ffs_read() at ffs_read+0x281/frame 0xfffffe022f325720
    VOP_READ_APV() at VOP_READ_APV+0x7c/frame 0xfffffe022f325750
    vn_read() at vn_read+0x186/frame 0xfffffe022f3257d0
    vn_io_fault_doio() at vn_io_fault_doio+0x43/frame 0xfffffe022f325830
    vn_io_fault1() at vn_io_fault1+0x161/frame 0xfffffe022f325970
    vn_io_fault() at vn_io_fault+0x198/frame 0xfffffe022f3259e0
    dofileread() at dofileread+0xba/frame 0xfffffe022f325a20
    kern_readv() at kern_readv+0x68/frame 0xfffffe022f325a70
    sys_read() at sys_read+0x86/frame 0xfffffe022f325ac0
    amd64_syscall() at amd64_syscall+0xa38/frame 0xfffffe022f325bf0
    fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe022f325bf0
    --- syscall (3, FreeBSD ELF64, sys_read), rip = 0x800dd2d4a, rsp = 0x7fffffffe678, rbp = 0x7fffffffe690 ---
    db:0:kdb.enter.default>  ps
    

    Hmm. possibly a failing drive.

    Is the crash report the same every time?

    Steve



  • Yes crash report is the same every time and I don't know how to interpret crash logs.
    How could you possibly say it's a failing drive?
    Although SSD mSATA has improper character on BIOS, "SATA PM: ?????? SSD 128GB". I have no extra mSATA yet to replace.
    Added more photo on gdrive for reference.


  • Netgate Administrator

    There are a number of read functions in the backtrace leading up to the crash which makes me think it's a read error and hence possibly a drive or drive controller issue.
    However I would expect some variation. If the backtrace is identical at each crash it's something in software triggering it.

    Steve


  • LAYER 8

    if you can run a system from a usb stick i 'll suggest a memtest and https://www.smartmontools.org/wiki/Download to test the disk
    https://www.smartmontools.org/wiki/LiveCDs



  • Run memtest as suggested, see 'ref' folder on gdrive.
    I've install ClearOS 7.6.0 on 1 of 3 devices we bought, it's up and running 3-days now without crash dump.

    [root@xyz-svr-utm ~]# cat /etc/centos-release
    ClearOS release 7.6.0 (Final)
    [root@xyz-svr-utm ~]# uptime
     10:05:38 up 3 days, 16:46,  2 users,  load average: 0.01, 0.07, 0.08
    

  • LAYER 8

    can you disable uefi in the bios?
    also i found ppl with similar problem that was able to boot without crash with hw.ibrs_disable=1 and
    vm.pmap.pti=0
    but this will disable spectre and meltdown mitigation



  • Maybe this hardware from ARTWAY is not stable.
    Disabled secure boot and set devices to LEGACY only but still UEFI appears on boot devices.

    1. Did a clean install choosing non-UEFI USB installer at boot selection (F11) with only admin change at install. This proceeds as usual (large text sizes on installation GUI) but after installation, when USB installer removed, restart then SSD selected on BIOS boot, it says no bootable device. Checking partition says unformatted.

    2. Clean install choosing UEFI USB installer at boot selection (F11) with only admin change at install. This proceeds as usual (small text sizes on installation GUI with mouse cursor). After installation, USB installer removed then restart and is working normally.

    But on ClearOS it's still up and running 6-days on production without crash dump.



  • tried install xcp-ng, it's working fine. setup pfsense vm on xcp-ng and still crash dump.



  • replaced ram and now working without crash dump. sigh!


  • Netgate Administrator

    Ouch! Nice catch though.


Log in to reply