• Hi,

    I hope someone can help. My apu2 has had a couple of crashes recently which seem to point to dpinger. The box is new and speaking with the supplier they have said I will have to roll it back to the latest stable release before considering a hardware change as I am on 2.4

    I can supply full crash dumps if required and I submitted both crash reports to pfsense but here is a brief sample.

    Hope someone can offer some help.

    thanks

    1st crash

    Fatal trap 12: page fault while in kernel mode
    cpuid = 2; apic id = 02
    fault virtual address = 0x8
    fault code = supervisor read data, page not present
    instruction pointer = 0x20:0xffffffff80cf0b40
    stack pointer         = 0x28:0xfffffe01206b3680
    frame pointer         = 0x28:0xfffffe01206b36c0
    code segment = base 0x0, limit 0xfffff, type 0x1b
    = DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags = interrupt enabled, resume, IOPL = 0
    current process = 13593 (dpinger)
    version.txt06000025413036070762  7620 ustarrootwheelFreeBSD 11.0-RELEASE-p6 #53 2ede8a24166(RELENG_2_4): Wed Jan 11 05:33:28 CST 2017
        root@buildbot2.netgate.com:/builder/ce/tmp/obj/builder/ce/tmp/FreeBSD-src/sys/pfSense

    todays

    Fatal trap 12: page fault while in kernel mode
    cpuid = 2; apic id = 02
    fault virtual address = 0x8
    fault code = supervisor read data, page not present
    instruction pointer = 0x20:0xffffffff80cf0b40
    stack pointer         = 0x28:0xfffffe01206a9680
    frame pointer         = 0x28:0xfffffe01206a96c0
    code segment = base 0x0, limit 0xfffff, type 0x1b
    = DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags = interrupt enabled, resume, IOPL = 0
    current process = 96407 (dpinger)
    version.txt06000025013032630207  7604 ustarrootwheelFreeBSD 11.0-RELEASE-p5 #31 f1e039d(RELENG_2_4): Mon Jan  2 07:57:06 CST 2017
        root@buildbot2.netgate.com:/builder/ce/tmp/obj/builder/ce/tmp/FreeBSD-src/sys/pfSense


  • I have an APU2B2 fw(160311) running 2.4B (20161231) fine.
    And/but I use dpinger settings relaxed, Probe I=1500; Loss I=7500; Alert I=3000.

  • Rebel Alliance Developer Netgate

    That only indicates it was the active process at the time, it doesn't mean dpinger caused the crash.

    We can't tell anything from only the panic message. We need the backtrace and other parts of the crash dump.


  • no probs, here you go. I did submit these via the auto up-loader as I thought it might help.

    Be good to get some feedback/advice as I don't really want to roll back at the moment unless this is related to  the hardware.

    Many thanks

    crash1.txt
    crash2.txt

  • Rebel Alliance Developer Netgate

    db:0:kdb.enter.default>  bt
    Tracing pid 13593 tid 100237 td 0xfffff80075c5e500
    sbcut_internal() at sbcut_internal+0x70/frame 0xfffffe01206b36c0
    sbdestroy() at sbdestroy+0x18/frame 0xfffffe01206b36e0
    sofree() at sofree+0x22a/frame 0xfffffe01206b3710
    soclose() at soclose+0x502/frame 0xfffffe01206b3750
    _fdrop() at _fdrop+0x1a/frame 0xfffffe01206b3770
    closef() at closef+0x2d4/frame 0xfffffe01206b3800
    fdescfree_fds() at fdescfree_fds+0x7d/frame 0xfffffe01206b3840
    fdescfree() at fdescfree+0x6a2/frame 0xfffffe01206b3900
    exit1() at exit1+0x73a/frame 0xfffffe01206b3980
    sys_sys_exit() at sys_sys_exit+0xd/frame 0xfffffe01206b3990
    amd64_syscall() at amd64_syscall+0x4ce/frame 0xfffffe01206b3ab0
    Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe01206b3ab0
    
    

    Seems familiar but I'm not finding anything on it right away. It's close to https://redmine.pfsense.org/issues/4689 but not quite the same.

    And there aren't any notable errors in the message buffer from the crash either, just the panic message.


  • ok thanks, I am going to run a ram test in the day or 2 just to check that out.  It already has had a new SSD just in case but its still happening


  • @hda:

    I have an APU2B2 fw(160311) running 2.4B (20161231) fine.
    And/but I use dpinger settings relaxed, Probe I=1500; Loss I=7500; Alert I=3000.

    thanks I have input your settings just in case it helps..


  • did you test your ram out?


  • No not yet but thats the first job this weekend. Then if thats ok I might roll back to 2.32 just to see if it crashes  again as supplier wont change unless on 2.32.  :(


  • for reference I have had no more panic's I do now have my igb set to only use one queue tho.

    if you want to see if reducing igb queues stabilises your box then add this line to /boot/loader.conf.local and reboot

    hw.igb.num_queues=1
    

    odd supplier trusts 2.3 but not 2.4 as neither is a stable version.


  • Thanks. Might try this at some point.

    8 hour ram test - 3 full passes with no errors so time to rebuild as 2.3.2 as requested and take it from there.


  • @chrcoluk:

    odd supplier trusts 2.3 but not 2.4 as neither is a stable version.

    2.3 has been quite stable here on all my boxes. My customers would have fled long ago if it were not the case. But I have not lost one account.

    2.4 is not to release state yet so any "supplier" would play it safe by avoiding it.


  • @chpalmer:

    @chrcoluk:

    odd supplier trusts 2.3 but not 2.4 as neither is a stable version.

    2.3 has been quite stable here on all my boxes. My customers would have fled long ago if it were not the case. But I have not lost one account.

    2.4 is not to release state yet so any "supplier" would play it safe by avoiding it.

    Need to specify which 2.3 - 2.3.2-p1 is the stable official release. 2.3.3-DEVELOPMENT is (as it says) a development build, but actually it has all the fixes and many of the "little" new front-end things that are in 2.40-BETA, it is still called DEVELOPMENT because there is no decision yet about if, how or when it may actually become a release for the 2.3.* series. 2.3.3-DEVELOPMENT should not have "underlying" regressions, since it is built on FreeBSD 10.3 that is proven in 2.3.2-p1 (although various ports of underlying stuff have been updated into the 2.3.3-DEVELOPMENT builds).


  • The supplier sent me this message

    "According to the pfsense download page V2.3.2 is the latest stable version,
    please use that one.

    https://nyifiles.pfsense.org/mirror/downloads/pfSense-CE-2.3.2-RELEASE-amd64.iso.gz

    or

    https://nyifiles.pfsense.org/mirror/downloads/pfSense-CE-2.3.2-RELEASE-2g-amd64-nanobsd.im

    Any comments on this appreciated

    Thanks


  • That is correct. The latest full installer is for 2.3.2.
    After it installs and comes online it will tell you here is n upgrade available to 2.3.2-p1 - you can then upgrade to the "p1".
    Full installer images for 2.3.2-p1 were not made.


  • Thanks - as my apu2  unit is serial only will the first link work on serial mode?


  • @chrcoluk:

    for reference I have had no more panic's I do now have my igb set to only use one queue tho.

    if you want to see if reducing igb queues stabilises your box then add this line to /boot/loader.conf.local and reboot

    hw.igb.num_queues=1
    

    odd supplier trusts 2.3 but not 2.4 as neither is a stable version.

    2.3.2 is classed as such as it does not have the dev label. 2.3.3 does.  It is weird as My 'live' APU on 2.4 is pretty much a vanilla install with pfblocker being the only addition. O.K. There are one or two patches in there but that's around the area of the launch of dhcp6c for testing.

    I would have said I that in SkyECI's case it's the amount of gaming being done by his offspring, but as the last crash was when they should have been asleep that can be ruled out.

    I can honestly say I've only ever seen a crash once and that was when I was moving back and forth between 2.3 and 2.4 - it worked a couple of times then fell over. Now I use my test unit for dev work and leave the live unit pretty much alone.


  • In some ways I hope it crashes at 2.3.2 as well because I'm convinced skyeci has a problem unit

    As you say we are both solid on 2.4 and I had zero issues at 2.3.3 before I upgraded


  • 12 hour mem test =check all ok
    Heatsink in place properly
    Put original as suppplied ssd back in (crashed on both I have tried)

    All done. Installed 2.3.2 -opted for stable in updates and on first internet connection it went to
    2.3.2 _1

    no ipv6 support with sky in this config but its fine for now to see if goes down or not..


  • I'll do a patch for sky for you in the morning.😀


  • Thanks Mate  :) :)


  • ok stand corrected with 2.3.2 :)

    let us know how things get on with the downgrade :)


  • Wouldn't this be better in the 2.3 forum?  This is the 2.4 beta forum…


  • Opening post is 2.4


  • Thought I would update this anyway. I ran the apu2c4  for one week on 2.3.2-p1 no crash. Re-installed 2.4 with latest patches. Locked up 24hrs later but no visible errors and no response via serial or gui etc.

    Have now setup sys log server to try and catch something should it occur again shortly as I have left it on 2.4


  • please set the igb driver to 1 queue bud, I am confident that will solve your issues, I left you a pm on kitz with how to do it.


  • It's weird though, both i and nivek have had no issues with the APU2's we are running.


  • the igb queuing bug probably only surface's itself with certain types of usage patterns.


  • Probably.. He has game players to contend with.  ::)


  • Thanks re igb. As I mentioned it has locked up once since back on 2.4 (did not crash for 1 week on 2.3.2). I have just re-enabled some stuff so I want to see if it goes down again. If it does and there is no log or other info I will try your suggestion but just waiting to see if it crashes with the other settings re-enabled…

    Cheers