pfsense crashed



  • Hi,

    I've just uploaded a crash report for my pfSense Cluster.
    It crashed at 3pm CET, the report should come from IP 217.11x.3x.34.
    Please can one of the devs check, what happened?

    regards,
    Christian


  • Rebel Alliance Developer Netgate

    Looks like a crash happened in the broadcom NIC bce0

    bce0: /builder/ce-243/tmp/FreeBSD-src/sys/dev/bce/if_bce.c(7886): Watchdog timeout occurred, resetting!
    <5>bce0: link state changed to DOWN
    bce0: Gigabit link up!
    <5>bce0: link state changed to UP
    
    
    Fatal trap 12: page fault while in kernel mode
    cpuid = 8; apic id = 32
    fault virtual address	= 0x18
    fault code		= supervisor read data, page not present
    instruction pointer	= 0x20:0xffffffff804f1a4b
    stack pointer	        = 0x28:0xfffffe0466c33a90
    frame pointer	        = 0x28:0xfffffe0466c33b20
    code segment		= base 0x0, limit 0xfffff, type 0x1b
    			= DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags	= interrupt enabled, resume, IOPL = 0
    current process		= 12 (irq256: bce0)
    
    db:0:kdb.enter.default>  show pcpu
    cpuid        = 8
    dynamic pcpu = 0xfffffe04d3f1b200
    curthread    = 0xfffff800098655c0: pid 12 "irq256: bce0"
    curpcb       = 0xfffffe0466c33cc0
    fpcurthread  = none
    idlethread   = 0xfffff80009421000: tid 100011 "idle: cpu8"
    curpmap      = 0xffffffff82a31918
    tssp         = 0xffffffff82a62ad0
    commontssp   = 0xffffffff82a62ad0
    rsp0         = 0xfffffe0466c33cc0
    gs32p        = 0xffffffff82a69328
    ldt          = 0xffffffff82a69368
    tss          = 0xffffffff82a69358
    db:0:kdb.enter.default>  bt
    Tracing pid 12 tid 100189 td 0xfffff800098655c0
    bce_intr() at bce_intr+0x4fb/frame 0xfffffe0466c33b20
    intr_event_execute_handlers() at intr_event_execute_handlers+0xec/frame 0xfffffe0466c33b60
    ithread_loop() at ithread_loop+0xd6/frame 0xfffffe0466c33bb0
    fork_exit() at fork_exit+0x85/frame 0xfffffe0466c33bf0
    fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0466c33bf0
    --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
    

    There is a small chance it's a driver bug, but a better chance it's hardware. Tough to say which though based on that alone.



  • Hi,

    thanks for the fast response

    Christian



  • Hi,

    today at 8:15am the firewall encountered again a crash, so we activated the persistant carp maintenance modus to let the "slave" firewall take over.
    At around 9:26pm CET, the slave firewall encountered the same issue (crashed and rebooted), I've posted a crash report for this (at 9:50pm, from IP 217.11x.3x.34).
    The "slave" firewall is the same modell, Dell R610. So if this crash report results in the same driver/kernel problem, it is mostly a driver bug, as I don't think that both hardwares were broken in the same way.
    It did not happen with 2.4.3, we updated to 2.4.3p1 last wednesday.

    regards,
    Christian



  • Same here, NEVER have had issues with pfsense until I updated to p1. It happed 3 times in the last hour and now its completely dead and I cant connect at all. Now I have people on my back and I have to commute to the site to reset it.

    Same swap error, system has never had a issue before

    Thanks Netgate



  • Fatal trap 12: page fault while in kernel mode
    cpuid = 0; apic id = 00
    fault virtual address	= 0x60
    fault code		= supervisor read data, page not present
    instruction pointer	= 0x20:0xffffffff80cc10ef
    stack pointer	        = 0x28:0xfffffe0079672e00
    frame pointer	        = 0x28:0xfffffe0079672e00
    code segment		= base 0x0, limit 0xfffff, type 0x1b
    			= DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags	= interrupt enabled, resume, IOPL = 0
    current process		= 0 (em2 taskq)
    


  • @ykazari said in pfsense crashed:

    Same swap error, system has never had a issue before

    swap error ? I vote for hardware issue right away.
    Next best : a pretty gore OOM error.

    Since when processor instructions = (== pfSEnse) wear out ?
    Devices break down. happens all the time.

    If you were running Windows on your device, would you thank Microsoft ?



  • @gertjan said in pfsense crashed:

    swap error ? I vote for hardware issue right away.

    we have one R610 in cold reserve and exchanged the hardware of the first failed firewall yesterday and switched back to this master in the evening.
    This morning the firewall crashed again.
    So we reproduced this issue on 3 absolutly identical Dell R610.
    All have 16GB RAM, one 4x1GbitE Broadcom onboard, one 2x1GbitE Intel card, same bios, same lifecycle controller version.

    I'm pretty sure, that this is a issue with 2.4.3p1.

    Christian



  • Just for the possibility to reproduce the error for the development guys, we are had the issue with three similar dell r610 servers with the exact same setup.
    These servers do have an LAG Bond with one port on the boradcom and one port on the intel network cards. For both cards it is the first port. The Switches used are Brocade icx6610.

    In the logfiles we do see that the bce0 interface does crash and that the firewall will start to do the carp magic and directly return as the network interface will only be unavailable for 4 sek. We did see that some crashes happend two or three times in a row parted with a reboot of the firewall.

    Actually we did deactivate the boradcom network interface to quick and dirty solve the problem for us.

    Would anybody need the logfiles or how do we procede?

    Thanks

    Alex


 

© Copyright 2002 - 2018 Rubicon Communications, LLC | Privacy Policy