Persistant Crashing of CARP master server
-
Hi guys,
I've a setup enabling around 50 people to access the internet through 2 gateways. I recently configured CARP and everything seemed fine for around 24hrs, then the CARP master server starting crashing and restarting frequently.
Its not a HUGE deal as my users can continue to access through the backup firewall, but it does interrupt streaming and VOIP causing video to stop playing and calls to drop.
I'm guessing its a hardware issue but before i start randomly replacing the HD or memory I thought i'd post the error log to see if anyone had any better clue as to whats wrong.
Apologies if this is in the wrong place.
Crash report begins. Anonymous machine information:
i386
8.3-RELEASE-p16
FreeBSD 8.3-RELEASE-p16 #0: Mon Aug 25 08:25:41 EDT 2014 root@pf2_1_1_i386.pfsense.org:/usr/obj.i386/usr/pfSensesrc/src/sys/pfSense_SMP.8Crash report details:
Filename: /var/crash/bounds
1Filename: /var/crash/info.0
Dump header from device /dev/ad0s1b
Architecture: i386
Architecture Version: 1
Dump Length: 75776B (0 MB)
Blocksize: 512
Dumptime: Sun Jan 4 15:13:52 2015
Hostname: central.localdomain
Magic: FreeBSD Text Dump
Version String: FreeBSD 8.3-RELEASE-p16 #0: Mon Aug 25 08:25:41 EDT 2014
root@pf2_1_1_i386.pfsense.org:/usr/obj.i386/usr/pfSensesrc/src/sys/pfSense_SMP.8
Panic String:
Dump Parity: 3461449576
Bounds: 0
Dump Status: goodFilename: /var/crash/textdump.tar.0
ddb.txt06000014000012452236220 7070 ustarrootwheeldb:0:kdb.enter.default> run lockinfo
db:1:lockinfo> show locks
No such command
db:1:locks> show alllocks
No such command
db:1:alllocks> show lockedvnods
Locked vnodes
db:0:kdb.enter.default> show pcpu
cpuid = 0
dynamic pcpu = 0x306700
curthread = 0xc49948a0: pid 11 "idle: cpu0"
curpcb = 0xc47d1d80
fpcurthread = none
idlethread = 0xc49948a0: tid 100003 "idle: cpu0"
APIC ID = 0
currentldt = 0x50
db:0:kdb.enter.default> bt
Tracing pid 11 tid 100003 td 0xc49948a0
end(c47d1c68) at 0xc47d1c64
Xtimerint() at Xtimerint+0x20
–- interrupt, eip = 0xc0e7fb02, esp = 0xc47d1ca8, ebp = 0xc47d1ca8 ---
cpu_idle_acpi(1,8ea8e888,88ae4c88,eeca88cc,c49948a0,...) at cpu_idle_acpi+0x22
sched_idletd(0,c47d1d28,5c50484c,0,0,...) at sched_idletd+0x116
fork_exit(c0ac8fb0,0,c47d1d28) at fork_exit+0x87
fork_trampoline() at fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xc47d1d60, ebp = 0 ---
db:0:kdb.enter.default> ps
pid ppid pgrp uid state wmesg wchan cmd
59054 11190 265 0 S nanslp 0xc15571c4 sleep
39548 1 39548 0 Ss (threaded) ntpd
100067 S select 0xc5193a24 ntpd
27276 25894 27276 0 S+ ttyin 0xc4a4fa70 sh
25894 22890 25894 0 S+ wait 0xc55deac0 sh
25567 93362 25567 0 Ss (threaded) sshlockout_pf
100119 S nanslp 0xc15571c4 sshlockout_pf
100112 S piperd 0xc4e3c620 initial thread
22890 1 22890 0 Ss+ wait 0xc57c7ac0 login
11190 1 265 0 S wait 0xc52f3ac0 sh
2917 2490 2490 0 S nanslp 0xc15571c4 minicron
2490 1 2490 0 Ss wait 0xc57c8000 minicron
2460 2399 2399 0 S nanslp 0xc15571c4 minicron
2399 1 2399 0 Ss wait 0xc55dd2b0 minicron
2093 1486 1486 0 S nanslp 0xc15571c4 minicron
1486 1 1486 0 Ss wait 0xc5195ac0 minicron
95191 1 95191 0 Ss nanslp 0xc15571c4 cron
93362 1 93362 0 Ss select 0xc53297e4 syslogd
59693 1 59693 1002 Ss select 0xc51938a4 dhcpd
54760 49630 49630 0 S (threaded) php
100102 S accept 0xc5316376 php
54588 48698 48698 0 S accept 0xc536e512 php
53394 1 53238 65534 S select 0xc53295e4 dnsmasq
49630 48586 49630 0 Ss wait 0xc52f22b0 initial thread
48698 48586 48698 0 Ss wait 0xc52f2ac0 initial thread
48586 1 48243 0 S kqread 0xc5310700 lighttpd
37383 1 37383 0 Ss select 0xc532a4a4 inetd
34643 1 26 0 S+ piperd 0xc4e3bc40 logger
34639 1 26 0 S+ bpf 0xc5324700 tcpdump
27212 1 27212 65 Ss select 0xc5193d64 dhclient
26695 24470 24470 0 S piperd 0xc4e3bab8 rrdtool
24470 1 24470 0 Ss select 0xc5193b24 apinger
17512 1 17512 0 Ss select 0xc51945e4 dhclient
15574 1 15574 65 Ss select 0xc4c3cd64 dhclient
8412 1 8412 0 Ss select 0xc5193ea4 dhclient
276 1 276 0 Ss select 0xc5072a24 devd
267 265 265 0 S kqread 0xc4e11a00 check_reload_status
265 1 265 0 Ss kqread 0xc4e11600 check_reload_status
72 0 0 0 SL mdwait 0xc4df1800 [md0]
38 0 0 0 SL (threaded) zfskern
100071 D l2arc_fe 0xc4fb7b04 [l2arc_feed_thread]
100070 D arc_recl 0xc4fa897c [arc_reclaim_thread]
25 0 0 0 SL sdflush 0xc1585b60 [softdepflush]
24 0 0 0 SL vlruwt 0xc4df9560 [vnlru]
23 0 0 0 SL syncer 0xc156af38 [syncer]
22 0 0 0 SL psleep 0xc156ac68 [bufdaemon]
21 0 0 0 SL pollid 0xc15566fc [idlepoll]
20 0 0 0 SL pgzero 0xc1586814 [pagezero]
19 0 0 0 SL psleep 0xc158643c [vmdaemon]
18 0 0 0 SL psleep 0xc1586404 [pagedaemon]
17 0 0 0 SL ccb_scan 0xc151f5d4 [xpt_thrd]
16 0 0 0 SL pftm 0xc050c700 [pfpurge]
9 0 0 0 SL waiting_ 0xc1572338 [sctp_iterator]
8 0 0 0 SL - 0xc4bd843c [fdc0]
15 0 0 0 SL (threaded) usb
100048 D - 0xc4be8d34 [usbus3]
100047 D - 0xc4be8d04 [usbus3]
100046 D - 0xc4be8cd4 [usbus3]
100045 D - 0xc4be8ca4 [usbus3]
100043 D - 0xc4bdcb5c [usbus2]
100042 D - 0xc4bdcb2c [usbus2]
100041 D - 0xc4bdcafc [usbus2]
100040 D - 0xc4bdcacc [usbus2]
100038 D - 0xc4bd9b5c [usbus1]
100037 D - 0xc4bd9b2c [usbus1]
100036 D - 0xc4bd9afc [usbus1]
100035 D - 0xc4bd9acc [usbus1]
100033 D - 0xc4bcfb5c [usbus0]
100032 D - 0xc4bcfb2c [usbus0]
100031 D - 0xc4bcfafc [usbus0]
100030 D - 0xc4bcfacc [usbus0]
7 0 0 0 SL - 0xc4ad9000 [fw0_probe]
14 0 0 0 SL - 0xc1557024 [yarrow]
6 0 0 0 SL crypto_r 0xc158510c [crypto returns]
5 0 0 0 SL crypto_w 0xc15850e8 [crypto]
4 0 0 0 SL - 0xc15547e4 [g_down]
3 0 0 0 SL - 0xc15547e0 [g_up]
2 0 0 0 SL - 0xc15547d8 [g_event]
13 0 0 0 SL sleep 0xc14f2fa0 [ng_queue0]
12 0 0 0 WL (threaded) intr
100052 I [irq7: ppc0]
100050 I [swi0: uart uart]
100049 I [irq1: atkbd0]
100044 I [irq11: ehci0]
100039 I [irq10: sis0 rl2+]
100034 I [irq5: rl1 ohci1]
100029 I [irq15: ata1]
100028 I [irq14: ata0]
100025 I [irq9: fwohci0 rl0+]
100024 I [swi6: Giant taskq]
100022 I [swi5: +]
100019 I [swi2: cambio]
100015 I [swi6: task queue]
100006 I [swi3: vm]
100005 I [swi4: clock]
100004 I [swi1: netisr 0]
11 0 0 0 RL CPU 0 [idle: cpu0]
1 0 1 0 SLs wait 0xc4992ac0 [init]
10 0 0 0 SL audit_wo 0xc1585500 [audit]
0 0 0 0 SLs (threaded) kernel
100092 D - 0xc532a7c0 [dummynet]
100069 D - 0xc4d5f880 [system_taskq]
100026 D - 0xc4acccc0 [fw0_taskq]
100023 D - 0xc4aa3040 -
Ok, I guess the log file is far too long to be copypasted into the forum… so I hope the end bit is the important part.
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0xbfe7e01f
fault code = supervisor read, page not present
instruction pointer = 0x20:0xbfe7e01f
stack pointer = 0x28:0xc47d1c4c
frame pointer = 0x28:0xc47d1c5c
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = resume, IOPL = 0
current process = 11 (idle: cpu0)
version.txt06000021612452240230 7604 ustarrootwheelFreeBSD 8.3-RELEASE-p16 #0: Mon Aug 25 08:25:41 EDT 2014
root@pf2_1_1_i386.pfsense.org:/usr/obj.i386/usr/pfSensesrc/src/sys/pfSense_SMP.8 -
It's unlikely enabling CARP started that, given it's in nothing related. Also doesn't look like any of "the usual" suspects (mostly mbuf exhaustion) where tuning would fix. High probability that's a hardware problem of some sort.
-
Given the very short backtrace and that is panicked in ACPI, I'd go with BIOS/hardware. Check for a BIOS update. You could try to disable ACPI, but I'd be surprised if that actually helped these days.