Very serious fault with pfSense
-
I look after about 30-40 pfsense machines. Almost all instances are virutalized using VMWare ESXi (versions ranging from 5.1 to 5.5 U2).
Recently, our ISP conducted some planned "maintenance". At the end of the maintenance, everything came back up properly at our end.
Ever since then, we have had some issues with our pfSenses. The isuses range from full on crashes (can't ping the pfSense, need to do a hard reset) to pfSense rebooting. So far at least 3 of the 30 pfSense instances are doing this.
I've opened a support ticket but the ISP at this time, and they've stated they're "investigating" it, but I thought I'd ask here as for some assistance in troubleshooting this issue. I am not sure if this is an ISP fault or a pfSense fault, the maintenance window and the time when we started to have trouble may be co-incidences.
The pfSense instances that are having trouble are located at different sites, different types of internet (some use PPPoE, some use static IP). They are all version 2.1.5. One of the instances was v2.1.3 but I have upgraded it to v2.1.5 but it is still doing it.
The log on the pfSense shows this before the restart:
Router #1
Dec 3 20:57:15 router kernel: Origin = "GenuineIntel" Id = 0x6f2 Family = 6 Model = f Stepping = 2 Dec 3 20:57:15 router kernel: CPU: Intel(R) Core(TM)2 CPU 6300 @ 1.86GHz (1861.77-MHz K8-class CPU) Dec 3 20:57:15 router kernel: Timecounter "i8254" frequency 1193182 Hz quality 0 Dec 3 20:57:15 router kernel: root@pf2_1_1_amd64.pfsense.org:/usr/obj.amd64/usr/pfSensesrc/src/sys/pfSense_SMP.8 amd64 Dec 3 20:57:15 router kernel: FreeBSD 8.3-RELEASE-p16 #0: Mon Aug 25 08:27:11 EDT 2014 Dec 3 20:57:15 router kernel: FreeBSD is a registered trademark of The FreeBSD Foundation. Dec 3 20:57:15 router kernel: The Regents of the University of California. All rights reserved. Dec 3 20:57:15 router kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 Dec 3 20:57:15 router kernel: Copyright (c) 1992-2012 The FreeBSD Project. Dec 3 20:57:15 router kernel: Textdump complete. Dec 3 20:57:15 router kernel: current process = 12 (swi6: task queue) Dec 3 20:57:15 router kernel: processor eflags = interrupt enabled, resume, IOPL = 0 Dec 3 20:57:15 router kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 Dec 3 20:57:15 router kernel: code segment = base 0x0, limit 0xfffff, type 0x1b Dec 3 20:57:15 router kernel: frame pointer = 0x28:0xffffff800008ba50 Dec 3 20:57:15 router kernel: stack pointer = 0x28:0xffffff800008ba30 Dec 3 20:57:15 router kernel: instruction pointer = 0x20:0xffffffff8076ba4e Dec 3 20:57:15 router kernel: fault code = supervisor read data, page not present Dec 3 20:57:15 router kernel: fault virtual address = 0x308 Dec 3 20:57:15 router kernel: cpuid = 0; apic id = 00 Dec 3 20:57:15 router kernel: Fatal trap 12: page fault while in kernel mode Dec 3 20:57:15 router kernel: Dec 3 20:57:15 router kernel: Dec 3 20:57:15 router kernel: Dec 3 20:57:15 router kernel: acd0: WARNING - PREVENT_ALLOW freeing taskqueue zombie requestWARNING - PREVENT_ALLOW freeing taskqueue zombie request Dec 3 20:57:15 router kernel: acd0: Dec 3 20:57:15 router kernel: acd0: acd0: WARNING - PREVENT_ALLOW taskqueue timeout - completing request directlyWARNING - PREVENT_ALLOW taskqueue timeout - completing request directly Dec 3 20:57:15 router syslogd: kernel boot file is /boot/kernel/kernel Dec 3 16:36:24 router lighttpd[25642]: (connections.c.137) (warning) close: 13 Connection reset by peer
Router #2
Dec 3 13:14:43 router kernel: CPU: Intel(R) Xeon(R) CPU 5120 @ 1.86GHz (1880.20-MHz K8-class CPU) Dec 3 13:14:43 router kernel: Timecounter "i8254" frequency 1193182 Hz quality 0 Dec 3 13:14:43 router kernel: root@pf2_1_1_amd64.pfsense.org:/usr/obj.amd64/usr/pfSensesrc/src/sys/pfSense_SMP.8 amd64 Dec 3 13:14:43 router kernel: FreeBSD 8.3-RELEASE-p16 #0: Mon Aug 25 08:27:11 EDT 2014 Dec 3 13:14:43 router kernel: FreeBSD is a registered trademark of The FreeBSD Foundation. Dec 3 13:14:43 router kernel: The Regents of the University of California. All rights reserved. Dec 3 13:14:43 router kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 Dec 3 13:14:43 router kernel: Copyright (c) 1992-2012 The FreeBSD Project. Dec 3 13:14:43 router kernel: Textdump complete. Dec 3 13:14:43 router kernel: current process = 12 (swi6: task queue) Dec 3 13:14:43 router kernel: processor eflags = interrupt enabled, resume, IOPL = 0 Dec 3 13:14:43 router kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 Dec 3 13:14:43 router kernel: code segment = base 0x0, limit 0xfffff, type 0x1b Dec 3 13:14:43 router kernel: frame pointer = 0x28:0xffffff80000b69f0 Dec 3 13:14:43 router kernel: stack pointer = 0x28:0xffffff80000b69d0 Dec 3 13:14:43 router kernel: instruction pointer = 0x20:0xffffffff8076ba4e Dec 3 13:14:43 router kernel: fault code = supervisor read data, page not present Dec 3 13:14:43 router kernel: fault virtual address = 0x308 Dec 3 13:14:43 router kernel: cpuid = 0; apic id = 00 Dec 3 13:14:43 router kernel: Fatal trap 12: page fault while in kernel mode Dec 3 13:14:43 router kernel: Dec 3 13:14:43 router kernel: Dec 3 13:14:43 router kernel: acd0: WARNING - PREVENT_ALLOW taskqueue timeout - completing request directly Dec 3 13:14:43 router syslogd: kernel boot file is /boot/kernel/kernel
-
The error messages relate to a, presumably virtual, CD-ROM drive. Do the ESXi hosts have some virtual CD loaded? What happened during the planned maintenance? The VM hosts were rebooted?
Steve
-
I run pfSense 2.1.5 on ESXi 5.5U2 in production without a problem. However, I have seen this on my home lab which uses Virtualbox. It only ever happens 1 time out of 20 during bootup, and a reboot always gets past it the second time. A quick Google for Fatal trap 12: page fault while in kernel mode shows lots of people having this issue with FreeBSD over the years, and the cause is various different things, of course.
-
remove the virtual cd drives … it's a known issue freebsd 8.x & esxi
-
Thanks for the insight, I will remove the virtual CD drive and see what happens
I was just caught off guard because these pfSense instances have been running fine for over a year now only to suddently start being unreliable like this.