pfsense crashed
-
Hi,
I've just uploaded a crash report for my pfSense Cluster.
It crashed at 3pm CET, the report should come from IP 217.11x.3x.34.
Please can one of the devs check, what happened?regards,
Christian -
Looks like a crash happened in the broadcom NIC
bce0
bce0: /builder/ce-243/tmp/FreeBSD-src/sys/dev/bce/if_bce.c(7886): Watchdog timeout occurred, resetting! <5>bce0: link state changed to DOWN bce0: Gigabit link up! <5>bce0: link state changed to UP Fatal trap 12: page fault while in kernel mode cpuid = 8; apic id = 32 fault virtual address = 0x18 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff804f1a4b stack pointer = 0x28:0xfffffe0466c33a90 frame pointer = 0x28:0xfffffe0466c33b20 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq256: bce0)
db:0:kdb.enter.default> show pcpu cpuid = 8 dynamic pcpu = 0xfffffe04d3f1b200 curthread = 0xfffff800098655c0: pid 12 "irq256: bce0" curpcb = 0xfffffe0466c33cc0 fpcurthread = none idlethread = 0xfffff80009421000: tid 100011 "idle: cpu8" curpmap = 0xffffffff82a31918 tssp = 0xffffffff82a62ad0 commontssp = 0xffffffff82a62ad0 rsp0 = 0xfffffe0466c33cc0 gs32p = 0xffffffff82a69328 ldt = 0xffffffff82a69368 tss = 0xffffffff82a69358 db:0:kdb.enter.default> bt Tracing pid 12 tid 100189 td 0xfffff800098655c0 bce_intr() at bce_intr+0x4fb/frame 0xfffffe0466c33b20 intr_event_execute_handlers() at intr_event_execute_handlers+0xec/frame 0xfffffe0466c33b60 ithread_loop() at ithread_loop+0xd6/frame 0xfffffe0466c33bb0 fork_exit() at fork_exit+0x85/frame 0xfffffe0466c33bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0466c33bf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
There is a small chance it's a driver bug, but a better chance it's hardware. Tough to say which though based on that alone.
-
Hi,
thanks for the fast response
Christian
-
Hi,
today at 8:15am the firewall encountered again a crash, so we activated the persistant carp maintenance modus to let the "slave" firewall take over.
At around 9:26pm CET, the slave firewall encountered the same issue (crashed and rebooted), I've posted a crash report for this (at 9:50pm, from IP 217.11x.3x.34).
The "slave" firewall is the same modell, Dell R610. So if this crash report results in the same driver/kernel problem, it is mostly a driver bug, as I don't think that both hardwares were broken in the same way.
It did not happen with 2.4.3, we updated to 2.4.3p1 last wednesday.regards,
Christian -
Same here, NEVER have had issues with pfsense until I updated to p1. It happed 3 times in the last hour and now its completely dead and I cant connect at all. Now I have people on my back and I have to commute to the site to reset it.
Same swap error, system has never had a issue before
Thanks Netgate
-
Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x60 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80cc10ef stack pointer = 0x28:0xfffffe0079672e00 frame pointer = 0x28:0xfffffe0079672e00 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (em2 taskq)
-
@ykazari said in pfsense crashed:
Same swap error, system has never had a issue before
swap error ? I vote for hardware issue right away.
Next best : a pretty gore OOM error.Since when processor instructions = (== pfSEnse) wear out ?
Devices break down. happens all the time.If you were running Windows on your device, would you thank Microsoft ?
-
@gertjan said in pfsense crashed:
swap error ? I vote for hardware issue right away.
we have one R610 in cold reserve and exchanged the hardware of the first failed firewall yesterday and switched back to this master in the evening.
This morning the firewall crashed again.
So we reproduced this issue on 3 absolutly identical Dell R610.
All have 16GB RAM, one 4x1GbitE Broadcom onboard, one 2x1GbitE Intel card, same bios, same lifecycle controller version.I'm pretty sure, that this is a issue with 2.4.3p1.
Christian
-
Just for the possibility to reproduce the error for the development guys, we are had the issue with three similar dell r610 servers with the exact same setup.
These servers do have an LAG Bond with one port on the boradcom and one port on the intel network cards. For both cards it is the first port. The Switches used are Brocade icx6610.In the logfiles we do see that the bce0 interface does crash and that the firewall will start to do the carp magic and directly return as the network interface will only be unavailable for 4 sek. We did see that some crashes happend two or three times in a row parted with a reboot of the firewall.
Actually we did deactivate the boradcom network interface to quick and dirty solve the problem for us.
Would anybody need the logfiles or how do we procede?
Thanks
Alex
-
Hello everyone,
You seem to have the same problem as me. The Pfsense of my company is also on version 2.4.3.
By cons, it is an r310, we use the ports of the motherboard for the network.
Here is the error I receive:
bce0: /builder/ce-243/tmp/FreeBSD-src/sys/bce/if_bce.c(7886): Watchdog timeout occured, resseting!
bce0: Gigabit link up!
The bce0 interface serves as a bgp transit.
The problem happened suddenly.Does the update to pfsense 2.4.4 fix your problem?
Thank you!
-
Do you actually see a crash report like the others in this thread or just the Watchdog error in the system log?
Steve
-
Hello,
Our Pfsense router starts to display the message as a warning, but the network cuts when the warning is issued. Then after 5-6 warning, the router eventually crashed completely and I need to restart it.
We have also tried on our 2nd interface bc1, and even try to put another network card, but the problem persists. I hesitate to update Pfsense to 2.4.4 to see if the problem comes from there ...
Thank you!
Gabriel
-
Did it have a crash report though? If so we need to see the panic message and backtrace from it as shown above for those systems.
Steve
-
Hello Stephen,
Could you tell me where is the message? Because I'm in shell on the router, the crash is actually freezing it completely and displaying the same thing many times (/builder/ce-243/tmp/FreeBSD-src/sys/bce/if_bce.c(7886) : Watchdog timeout occured, resseting!)
thank you,
Gabriel
-
You would see it reported as an alert on the dashboard in the GUI.
Any crash reports would be in /var/crash.Steve
-
Thank you Steven,
Finally we virtualize Pfsense with Proxmox.Gab