Frequent crashes with APU2
-
Hi,
due to my "LAN stops working" problem I decided to step up to 2.3.1. Worked like a charm (not fixing the LAN problem). However yesterday I stepped up to 2.3.1.a.20160506.0040. In 24 hours the system halted twice. Serial console shows nothing. Syslog stops. No traffic nothing. Only power down/up solves the problem. After reboot I find a crash dump which I upload. Looking like
Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x10 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80bf7604 stack pointer = 0x28:0xfffffe00458f8310 frame pointer = 0x28:0xfffffe00458f8330 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 68724 (pfctl) version.txt06000025712713164773 7632 ustarrootwheelFreeBSD 10.3-RELEASE-p2 #50 3938f6f(RELENG_2_3): Fri May 6 01:18:07 CDT 2016 root@ce23-amd64-builder:/builder/pfsense/tmp/obj/builder/pfsense/tmp/FreeBSD-src/sys/pfSense
Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80d22566 stack pointer = 0x28:0xfffffe001a38c590 frame pointer = 0x28:0xfffffe001a38c770 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq260: igb2:que 0) version.txt06000025412713111367 7616 ustarrootwheelFreeBSD 10.3-RELEASE #31 01118b4(RELENG_2_3): Thu Apr 28 03:57:55 CDT 2016 root@ce23-amd64-builder:/builder/pfsense/tmp/obj/builder/pfsense/tmp/FreeBSD-src/sys/pfSense
Any idea how to fix this or how to go back to the 2.3 stream?
I am close to downgrading to 2.2.6 anyways due to the LAN problem. But now with the crashes the entire system is nearly useless…. :-(
Regards,
JP -
https://redmine.pfsense.org/issues/6330
-
Kernel hasn't changed in any significant area in between 2.3 and 2.3.1, going back doesn't seem like it'll help. Likely the same root cause between them if I had to guess. What's your IP(v6) IP the crash report came from?
-
Constant WAN-IP changes on every login and after the last crash I upgraded and received a new one… It was a crash uploaded in the past 60 minutes I would say.
-
Found them. What do you have set for kern.ipc.nmbclusters? Run "sysctl kern.ipc.nmbclusters" if you're not sure.
-
1000000
-
That's fine. There's mbuf in the backtrace, which at times can be indicative of mbuf exhaustion. Wouldn't be the case there though.
I noticed one potentially relevant change. Let's try going back to 2.3 and see what happens. First backup your config and be ready to reinstall just in case as this isn't widely tested, but it does seem to work fine.
Change your /usr/local/etc/pkg/repos/pfSense.conf to contain the following:
FreeBSD: { enabled: no } pfSense-core: { url: "pkg+http://pkg.pfsense.org/pfSense_v2_3_0_amd64-core", mirror_type: "srv", signature_type: "fingerprints", fingerprints: "/usr/local/share/pfSense/keys/pkg" enabled: yes } pfSense: { url: "pkg+http://pkg.pfsense.org/pfSense_v2_3_0_amd64-pfSense_v2_3_0", mirror_type: "srv", signature_type: "fingerprints", fingerprints: "/usr/local/share/pfSense/keys/pkg" enabled: yes }
Then run:
pkg update -f pkg upgrade -fy
and reboot when it's done.
-
Ok.
Will download 2.2.6 and 2.3 images later, backup and try to step back. Will keep you posted. Might take a few hours.
Thanks for the great support!!!!
-
Downgrade worked however the kernel is still
pfSense-kernel-pfSense-2.3.1.a.20160506.1958 pfSense kernel (pfSense)
I got a "is locked" message for this package during upgrade (or rather downgrade). So the one thing that needed downgrading is not… :-(
-
ok. pkg unlock and a few more tries resulted in a 2.3 system with kernel
pfSense-kernel-pfSense-2.3 pfSense kernel (pfSense)
Let's wait and see.
-
Next crash this time under 2.3. However after reboot no crashlogs/dumps… Will keep looking but probably step down to 2.2.6 tomorrow... Need a stable system next week.
-
Just out of curiously what are the specs on which you are running pfSense?
This issue may boil down to an incompatibility with any hardware in the system, so details would be appreciated.
Regards,
Jorge M. Oliveira -
Sure,
Standard apu 1c with an msata hdd.
-
Have the same problem with version 2.3 and 2.3.1 with APU 1d4 Board AMD G-T40E Processor. Interrupts 50%. No traffic on LAN and WAN.
Only power off is possible and then power on.
-
The third crash was also a completely different backtrace from the first two. I'm guessing that's some kind of issue after disabling additional cores maybe in combination with https://redmine.pfsense.org/issues/6296
It's definitely not after upgrading to 2.3.1. I suspect once the root IPsec issue is fixed and you're not disabling the additional cores, the crashes will be gone.
Have the same problem with version 2.3 and 2.3.1 with APU 1d4 Board AMD G-T40E Processor. Interrupts 50%. No traffic on LAN and WAN.
https://redmine.pfsense.org/issues/6296
-
Hi cmb,
just experienced a completely new crash with 2.3, four cores enabled. Looking different as it also gave a backtrace. I uploaded the crash and will give you the details via PM.
Regards,
JP -
From further review and private conversation, this can be summarized as "frequent crashes with APU2", it has no relation to 2.3 or 2.3.1. Some of the crashes match things people were getting on APU2 with 2.2.6 as well.
-
Agreed.
However: How to proceed (if not in this threat)? PM? A new thread in another forum (and if so which one would be the correct one)?
-
You can continue here. I'll move the thread since it's not 2.3.1-related.
The default for AES-NI is off (with the exception of hardware we sell), which is why that one sticks out at me. In the other thread, it wouldn't have been anything config-related as far as certain features in use or not, so anything hardware-related is what came to mind. AES-NI seems most potentially suspect given the remainder of your config.
-
Brilliant support (can't really say that enough)!
I turned it off, will wait for the next reboot and then see how this behaves.
I just checked a config-backup from end of march and to the best of my knowledge aesni was turned on there as well:
<crypto_hardware>aesni</crypto_hardware>
And as mentioned in the PM the crashes seem to have started in May (or end of April) after several changes to the config but this part was left untouched. Still I will see what happens and let you know!
Regards,
JP