Dpinger causing crash on apu2 ?
-
Hi,
I hope someone can help. My apu2 has had a couple of crashes recently which seem to point to dpinger. The box is new and speaking with the supplier they have said I will have to roll it back to the latest stable release before considering a hardware change as I am on 2.4
I can supply full crash dumps if required and I submitted both crash reports to pfsense but here is a brief sample.
Hope someone can offer some help.
thanks
1st crash
Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address = 0x8
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80cf0b40
stack pointer = 0x28:0xfffffe01206b3680
frame pointer = 0x28:0xfffffe01206b36c0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 13593 (dpinger)
version.txt06000025413036070762 7620 ustarrootwheelFreeBSD 11.0-RELEASE-p6 #53 2ede8a24166(RELENG_2_4): Wed Jan 11 05:33:28 CST 2017
root@buildbot2.netgate.com:/builder/ce/tmp/obj/builder/ce/tmp/FreeBSD-src/sys/pfSensetodays
Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address = 0x8
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80cf0b40
stack pointer = 0x28:0xfffffe01206a9680
frame pointer = 0x28:0xfffffe01206a96c0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 96407 (dpinger)
version.txt06000025013032630207 7604 ustarrootwheelFreeBSD 11.0-RELEASE-p5 #31 f1e039d(RELENG_2_4): Mon Jan 2 07:57:06 CST 2017
root@buildbot2.netgate.com:/builder/ce/tmp/obj/builder/ce/tmp/FreeBSD-src/sys/pfSense -
I have an APU2B2 fw(160311) running 2.4B (20161231) fine.
And/but I use dpinger settings relaxed, Probe I=1500; Loss I=7500; Alert I=3000. -
That only indicates it was the active process at the time, it doesn't mean dpinger caused the crash.
We can't tell anything from only the panic message. We need the backtrace and other parts of the crash dump.
-
no probs, here you go. I did submit these via the auto up-loader as I thought it might help.
Be good to get some feedback/advice as I don't really want to roll back at the moment unless this is related to the hardware.
Many thanks
-
db:0:kdb.enter.default> bt Tracing pid 13593 tid 100237 td 0xfffff80075c5e500 sbcut_internal() at sbcut_internal+0x70/frame 0xfffffe01206b36c0 sbdestroy() at sbdestroy+0x18/frame 0xfffffe01206b36e0 sofree() at sofree+0x22a/frame 0xfffffe01206b3710 soclose() at soclose+0x502/frame 0xfffffe01206b3750 _fdrop() at _fdrop+0x1a/frame 0xfffffe01206b3770 closef() at closef+0x2d4/frame 0xfffffe01206b3800 fdescfree_fds() at fdescfree_fds+0x7d/frame 0xfffffe01206b3840 fdescfree() at fdescfree+0x6a2/frame 0xfffffe01206b3900 exit1() at exit1+0x73a/frame 0xfffffe01206b3980 sys_sys_exit() at sys_sys_exit+0xd/frame 0xfffffe01206b3990 amd64_syscall() at amd64_syscall+0x4ce/frame 0xfffffe01206b3ab0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe01206b3ab0
Seems familiar but I'm not finding anything on it right away. It's close to https://redmine.pfsense.org/issues/4689 but not quite the same.
And there aren't any notable errors in the message buffer from the crash either, just the panic message.
-
ok thanks, I am going to run a ram test in the day or 2 just to check that out. It already has had a new SSD just in case but its still happening
-
@hda:
I have an APU2B2 fw(160311) running 2.4B (20161231) fine.
And/but I use dpinger settings relaxed, Probe I=1500; Loss I=7500; Alert I=3000.thanks I have input your settings just in case it helps..
-
did you test your ram out?
-
No not yet but thats the first job this weekend. Then if thats ok I might roll back to 2.32 just to see if it crashes again as supplier wont change unless on 2.32. :(
-
for reference I have had no more panic's I do now have my igb set to only use one queue tho.
if you want to see if reducing igb queues stabilises your box then add this line to /boot/loader.conf.local and reboot
hw.igb.num_queues=1
odd supplier trusts 2.3 but not 2.4 as neither is a stable version.
-
Thanks. Might try this at some point.
8 hour ram test - 3 full passes with no errors so time to rebuild as 2.3.2 as requested and take it from there.
-
odd supplier trusts 2.3 but not 2.4 as neither is a stable version.
2.3 has been quite stable here on all my boxes. My customers would have fled long ago if it were not the case. But I have not lost one account.
2.4 is not to release state yet so any "supplier" would play it safe by avoiding it.
-
odd supplier trusts 2.3 but not 2.4 as neither is a stable version.
2.3 has been quite stable here on all my boxes. My customers would have fled long ago if it were not the case. But I have not lost one account.
2.4 is not to release state yet so any "supplier" would play it safe by avoiding it.
Need to specify which 2.3 - 2.3.2-p1 is the stable official release. 2.3.3-DEVELOPMENT is (as it says) a development build, but actually it has all the fixes and many of the "little" new front-end things that are in 2.40-BETA, it is still called DEVELOPMENT because there is no decision yet about if, how or when it may actually become a release for the 2.3.* series. 2.3.3-DEVELOPMENT should not have "underlying" regressions, since it is built on FreeBSD 10.3 that is proven in 2.3.2-p1 (although various ports of underlying stuff have been updated into the 2.3.3-DEVELOPMENT builds).
-
The supplier sent me this message
"According to the pfsense download page V2.3.2 is the latest stable version,
please use that one.https://nyifiles.pfsense.org/mirror/downloads/pfSense-CE-2.3.2-RELEASE-amd64.iso.gz
or
https://nyifiles.pfsense.org/mirror/downloads/pfSense-CE-2.3.2-RELEASE-2g-amd64-nanobsd.im
Any comments on this appreciated
Thanks
-
That is correct. The latest full installer is for 2.3.2.
After it installs and comes online it will tell you here is n upgrade available to 2.3.2-p1 - you can then upgrade to the "p1".
Full installer images for 2.3.2-p1 were not made. -
Thanks - as my apu2 unit is serial only will the first link work on serial mode?
-
for reference I have had no more panic's I do now have my igb set to only use one queue tho.
if you want to see if reducing igb queues stabilises your box then add this line to /boot/loader.conf.local and reboot
hw.igb.num_queues=1
odd supplier trusts 2.3 but not 2.4 as neither is a stable version.
2.3.2 is classed as such as it does not have the dev label. 2.3.3 does. It is weird as My 'live' APU on 2.4 is pretty much a vanilla install with pfblocker being the only addition. O.K. There are one or two patches in there but that's around the area of the launch of dhcp6c for testing.
I would have said I that in SkyECI's case it's the amount of gaming being done by his offspring, but as the last crash was when they should have been asleep that can be ruled out.
I can honestly say I've only ever seen a crash once and that was when I was moving back and forth between 2.3 and 2.4 - it worked a couple of times then fell over. Now I use my test unit for dev work and leave the live unit pretty much alone.
-
In some ways I hope it crashes at 2.3.2 as well because I'm convinced skyeci has a problem unit
As you say we are both solid on 2.4 and I had zero issues at 2.3.3 before I upgraded
-
12 hour mem test =check all ok
Heatsink in place properly
Put original as suppplied ssd back in (crashed on both I have tried)All done. Installed 2.3.2 -opted for stable in updates and on first internet connection it went to
2.3.2 _1no ipv6 support with sky in this config but its fine for now to see if goes down or not..
-
I'll do a patch for sky for you in the morning.