Frequent crashes with APU2



  • Hi,

    due to my "LAN stops working" problem I decided to step up to 2.3.1. Worked like a charm (not fixing the LAN problem). However yesterday I stepped up to 2.3.1.a.20160506.0040. In 24 hours the system halted twice. Serial console shows nothing. Syslog stops. No traffic nothing. Only power down/up solves the problem. After reboot I find a crash dump which I upload. Looking like

    
    Fatal trap 12: page fault while in kernel mode
    cpuid = 0; apic id = 00
    fault virtual address	= 0x10
    fault code		= supervisor read data, page not present
    instruction pointer	= 0x20:0xffffffff80bf7604
    stack pointer	        = 0x28:0xfffffe00458f8310
    frame pointer	        = 0x28:0xfffffe00458f8330
    code segment		= base 0x0, limit 0xfffff, type 0x1b
    			= DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags	= interrupt enabled, resume, IOPL = 0
    current process		= 68724 (pfctl)
    version.txt06000025712713164773  7632 ustarrootwheelFreeBSD 10.3-RELEASE-p2 #50 3938f6f(RELENG_2_3): Fri May  6 01:18:07 CDT 2016
        root@ce23-amd64-builder:/builder/pfsense/tmp/obj/builder/pfsense/tmp/FreeBSD-src/sys/pfSense
    
    
    Fatal trap 12: page fault while in kernel mode
    cpuid = 0; apic id = 00
    fault virtual address	= 0x0
    fault code		= supervisor read data, page not present
    instruction pointer	= 0x20:0xffffffff80d22566
    stack pointer	        = 0x28:0xfffffe001a38c590
    frame pointer	        = 0x28:0xfffffe001a38c770
    code segment		= base 0x0, limit 0xfffff, type 0x1b
    			= DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags	= interrupt enabled, resume, IOPL = 0
    current process		= 12 (irq260: igb2:que 0)
    version.txt06000025412713111367  7616 ustarrootwheelFreeBSD 10.3-RELEASE #31 01118b4(RELENG_2_3): Thu Apr 28 03:57:55 CDT 2016
        root@ce23-amd64-builder:/builder/pfsense/tmp/obj/builder/pfsense/tmp/FreeBSD-src/sys/pfSense
    
    

    Any idea how to fix this or how to go back to the 2.3 stream?

    I am close to downgrading to 2.2.6 anyways due to the LAN problem. But now with the crashes the entire system is nearly useless…. :-(

    Regards,
      JP





  • Kernel hasn't changed in any significant area in between 2.3 and 2.3.1, going back doesn't seem like it'll help. Likely the same root cause between them if I had to guess. What's your IP(v6) IP the crash report came from?



  • Constant WAN-IP changes on every login and after the last crash I upgraded and received a new one… It was a crash uploaded in the past 60 minutes I would say.



  • Found them. What do you have set for kern.ipc.nmbclusters? Run "sysctl kern.ipc.nmbclusters" if you're not sure.



  • 1000000



  • That's fine. There's mbuf in the backtrace, which at times can be indicative of mbuf exhaustion. Wouldn't be the case there though.

    I noticed one potentially relevant change. Let's try going back to 2.3 and see what happens. First backup your config and be ready to reinstall just in case as this isn't widely tested, but it does seem to work fine.

    Change your /usr/local/etc/pkg/repos/pfSense.conf to contain the following:

    FreeBSD: { enabled: no }
    
    pfSense-core: {
      url: "pkg+http://pkg.pfsense.org/pfSense_v2_3_0_amd64-core",
      mirror_type: "srv",
      signature_type: "fingerprints",
      fingerprints: "/usr/local/share/pfSense/keys/pkg"
      enabled: yes
    }
    
    pfSense: {
      url: "pkg+http://pkg.pfsense.org/pfSense_v2_3_0_amd64-pfSense_v2_3_0",
      mirror_type: "srv",
      signature_type: "fingerprints",
      fingerprints: "/usr/local/share/pfSense/keys/pkg"
      enabled: yes
    }
    
    

    Then run:

    pkg update -f
    pkg upgrade -fy
    

    and reboot when it's done.



  • Ok.

    Will download 2.2.6 and 2.3 images later, backup and try to step back. Will keep you posted. Might take a few hours.

    Thanks for the great support!!!!



  • Downgrade worked however the kernel is still

    pfSense-kernel-pfSense-2.3.1.a.20160506.1958 pfSense kernel (pfSense)

    I got a "is locked" message for this package during upgrade (or rather downgrade). So the one thing that needed downgrading is not… :-(



  • ok. pkg unlock and a few more tries resulted in a 2.3 system with kernel

    pfSense-kernel-pfSense-2.3    pfSense kernel (pfSense)

    Let's wait and see.



  • Next crash this time under 2.3. However after reboot no crashlogs/dumps… Will keep looking but probably step down to 2.2.6 tomorrow... Need a stable system next week.



  • Just out of curiously what are the specs on which you are running pfSense?

    This issue may boil down to an incompatibility with any hardware in the system, so details would be appreciated.

    Regards,
    Jorge M. Oliveira



  • Sure,

    Standard apu 1c with an msata hdd.



  • Have the same problem with version 2.3 and 2.3.1 with APU 1d4 Board AMD G-T40E Processor. Interrupts 50%. No traffic on LAN and WAN.

    Only power off is possible and then power on.



  • The third crash was also a completely different backtrace from the first two. I'm guessing that's some kind of issue after disabling additional cores maybe in combination with https://redmine.pfsense.org/issues/6296

    It's definitely not after upgrading to 2.3.1. I suspect once the root IPsec issue is fixed and you're not disabling the additional cores, the crashes will be gone.

    @DESIGN-COMPANY:

    Have the same problem with version 2.3 and 2.3.1 with APU 1d4 Board AMD G-T40E Processor. Interrupts 50%. No traffic on LAN and WAN.

    https://redmine.pfsense.org/issues/6296



  • Hi cmb,

    just experienced a completely new crash with 2.3, four cores enabled. Looking different as it also gave a backtrace. I uploaded the crash and will give you the details via PM.

    Regards,
      JP



  • From further review and private conversation, this can be summarized as "frequent crashes with APU2", it has no relation to 2.3 or 2.3.1. Some of the crashes match things people were getting on APU2 with 2.2.6 as well.



  • Agreed.

    However: How to proceed (if not in this threat)? PM? A new thread in another forum (and if so which one would be the correct one)?



  • You can continue here. I'll move the thread since it's not 2.3.1-related.

    The default for AES-NI is off (with the exception of hardware we sell), which is why that one sticks out at me. In the other thread, it wouldn't have been anything config-related as far as certain features in use or not, so anything hardware-related is what came to mind. AES-NI seems most potentially suspect given the remainder of your config.



  • Brilliant support (can't really say that enough)!

    I turned it off, will wait for the next reboot and then see how this behaves.

    I just checked a config-backup from end of march and to the best of my knowledge aesni was turned on there as well:

    <crypto_hardware>aesni</crypto_hardware>

    And as mentioned in the PM the crashes seem to have started in May (or end of April) after several changes to the config but this part was left untouched. Still I will see what happens and let you know!

    Regards,
      JP



  • No change. Still crashing.



  • @j.koopmann:

    No change. Still crashing.

    Curious that temps are not getting high ?.. I had one freeze up as I had incorrectly mounted the heatsink to the APU2C4 case:)

    Do you have one of these boards –> http://www.pcengines.ch/apu2c4.htm



  • I do and had thought of that. I was sure to have mounted it correctly but hey it might have loosened. How can I monitor the temperature. Looked for it today but could not find anything.



  • Ok. I found a module and am logging temp every minute. So far the CPU temp is around 59C which should be well within limits…



  • Hi,

    I run 2.3 Update 1 with my APU2 without any problem.
    Uptime: 13 Days 02 Hours 32 Minutes 51 Seconds
    Following packages are installed:
    Backup sysutils 0.4_1
    Cron sysutils 0.3.6_2
    RRD_Summary sysutils 1.3.1_2
    Shellcmd sysutils 1.0.2_2

    Temp is always 56.6 °C and very stable.


Log in to reply