System lock-up
-
Hi,
I have a pair of dell 2950's running pfsense 2.1 in a carp pair. After several months of testing I rolled the setup into production within a few hours the primary and eventually the secondary machines locked up without warning or logs. I have repeated this several times and it doesn't appear to be related to load. The set-up is fairly complex as I have two WAN connections and a DMZ. I have been running a different machine with pfsense 1. in this role for three or four years with out a hickup - ;D I have ruled out hardware as I have tried this on 2 pairs of machines all of which have failed in the (apparently) same way. The current machines have 6 Nics (2 on-board) and 2 dual-interface NC7170's. I am a long term user and fan of pfSense, but I am nowhere near an expert, so I want to know how I should go about trying to hunt down the problem. I cannot reproduce the problem in testing, but it dies every time it goes into my production network, unfortunately this makes me very unpopular with the customers so I need to get insight rather than wade and experiment! The average traffic is about 50Mbps down/40Mbps up, but like I say I don't think its load related.Thanks all!
Alan -
Those boxes appear to have two on board Broadcom NICs plus whatever others you have (possibly more Broadcom). You should try the recommended NIC tweak for Broadcom on Dell hardware:
http://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards#Broadcom_bce.284.29_Cards
I don't know how applicable that is to 2.1 but it's easy to try.Steve
Edit: The NC7170's seem to be Intel so possible try the Intel tweaks also. Like it says this is especially a problem if you are running 64bit.
-
Hi, I have tried the changes you suggested and the firewall died after about 3 hours this morning, same symptom ie no trace out output of any kind. CARP kicks in correctly, but the first machine never recovers so when the second eventually dies CARP runs out of options :P
One think I did just notice was that the network cards all seem to be intel. The built in nics are apparently Broadcom - I am not on site any more, so I cant swear to that!
Thanks for your help :)
My /boot/loader.conf looks like :-
autoboot_delay="3"
vm.kmem_size="435544320"
vm.kmem_size_max="535544320"
kern.ipc.nmbclusters="131072"
console="comconsole"
hw.bce.tso_enable="0"
hw.pci.enable_msix="0"
hw.igb.num_queries="1"
if_igb_load="YES"
legal.intel_ipw.license_ack="1"
legal.intel_wpi.license_ack="1"Dmesg:-
Copyright 1992-2010 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.1-RELEASE-p6 #0: Mon Dec 12 18:15:35 EST 2011
root@FreeBSD_8.0_pfSense_2.0-AMD64.snaps.pfsense.org:/usr/obj./usr/pfSensesrc/src/sys/pfSense_SMP.8 amd64
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 3.60GHz (3591.24-MHz K8-class CPU)
Origin = "GenuineIntel" Id = 0xf41 Family = f Model = 4 Stepping = 1
Features=0xbfebfbff <fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,htt,tm,pbe>Features2=0x659d <sse3,dtes64,mon,ds_cpl,est,tm2,cnxt-id,cx16,xtpr>AMD Features=0x20100800 <syscall,nx,lm>TSC: P-state invariant
real memory = 6442450944 (6144 MB)
avail memory = 6186790912 (5900 MB)
ACPI APIC Table: <dell pe="" bkc ="">FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 2 package(s) x 1 core(s) x 2 HTT threads
cpu0 (BSP): APIC ID: 0
cpu1 (AP/HT): APIC ID: 1
cpu2 (AP): APIC ID: 6
cpu3 (AP/HT): APIC ID: 7
ioapic0: Changing APIC ID to 8
ioapic1: Changing APIC ID to 9
ioapic2: Changing APIC ID to 10
ioapic3: Changing APIC ID to 11
ioapic0 <version 2.0="">irqs 0-23 on motherboard
ioapic1 <version 2.0="">irqs 32-55 on motherboard
ioapic2 <version 2.0="">irqs 64-87 on motherboard
ioapic3 <version 2.0="">irqs 96-119 on motherboard
netisr_init: forcing maxthreads to 1 and bindthreads to 0 for device polling
wlan: mac acl policy registered
kbd1 at kbdmux0
cryptosoft0: <software crypto="">on motherboard
padlock0: No ACE support.
acpi0: <dell pe="" bkc="">on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
cpu0: <acpi cpu="">on acpi0
cpu1: <acpi cpu="">on acpi0
cpu2: <acpi cpu="">on acpi0
cpu3: <acpi cpu="">on acpi0
acpi_hpet0: <high precision="" event="" timer="">iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 900
pcib0: <acpi host-pci="" bridge="">port 0xcf8-0xcff on acpi0
pci0: <acpi pci="" bus="">on pcib0
pcib1: <acpi pci-pci="" bridge="">at device 2.0 on pci0
pci1: <acpi pci="" bus="">on pcib1
pcib2: <acpi pci-pci="" bridge="">at device 0.0 on pci1
pci2: <acpi pci="" bus="">on pcib2
amr0: <lsilogic megaraid="" 1.53="">mem 0xf80f0000-0xf80fffff,0xfe9c0000-0xfe9fffff irq 46 at device 14.0 on pci2
amr0: Using 64-bit DMA
amr0: [ITHREAD]
amr0: delete logical drives supported by controller
amr0: <lsilogic perc="" 4e="" di="">Firmware 5B2D, BIOS H435, 256MB RAM
pcib3: <acpi pci-pci="" bridge="">at device 0.2 on pci1
pci3: <acpi pci="" bus="">on pcib3
pcib4: <acpi pci-pci="" bridge="">at device 4.0 on pci0
pci4: <acpi pci="" bus="">on pcib4
pcib5: <acpi pci-pci="" bridge="">at device 5.0 on pci0
pci5: <acpi pci="" bus="">on pcib5
pcib6: <acpi pci-pci="" bridge="">at device 0.0 on pci5
pci6: <acpi pci="" bus="">on pcib6
em0: <intel(r) 1000="" pro="" legacy="" network="" connection="" 1.0.3="">port 0xecc0-0xecff mem 0xfe6e0000-0xfe6fffff irq 64 at device 7.0 on pci6
em0: [FILTER]
pcib7: <acpi pci-pci="" bridge="">at device 0.2 on pci5
pci7: <acpi pci="" bus="">on pcib7
em1: <intel(r) 1000="" pro="" legacy="" network="" connection="" 1.0.3="">port 0xdcc0-0xdcff mem 0xfe4e0000-0xfe4fffff irq 65 at device 8.0 on pci7
em1: [FILTER]
pcib8: <acpi pci-pci="" bridge="">at device 6.0 on pci0
pci8: <acpi pci="" bus="">on pcib8
pcib9: <acpi pci-pci="" bridge="">at device 0.0 on pci8
pci9: <acpi pci="" bus="">on pcib9
pcib10: <acpi pci-pci="" bridge="">at device 0.2 on pci8
pci10: <acpi pci="" bus="">on pcib10
em2: <intel(r) 1000="" pro="" legacy="" network="" connection="" 1.0.3="">port 0xccc0-0xccff mem 0xfe1e0000-0xfe1fffff,0xfe180000-0xfe1bffff irq 96 at device 2.0 on pci10
em2: [FILTER]
em3: <intel(r) 1000="" pro="" legacy="" network="" connection="" 1.0.3="">port 0xcc80-0xccbf mem 0xfe1c0000-0xfe1dffff irq 97 at device 2.1 on pci10
em3: [FILTER]
em4: <intel(r) 1000="" pro="" legacy="" network="" connection="" 1.0.3="">port 0xcc40-0xcc7f mem 0xfe160000-0xfe17ffff,0xfe100000-0xfe13ffff irq 101 at device 3.0 on pci10
em4: [FILTER]
em5: <intel(r) 1000="" pro="" legacy="" network="" connection="" 1.0.3="">port 0xcc00-0xcc3f mem 0xfe140000-0xfe15ffff irq 102 at device 3.1 on pci10
em5: [FILTER]
uhci0: <intel 82801eb="" (ich5)="" usb="" controller="" usb-a="">port 0xace0-0xacff irq 16 at device 29.0 on pci0
uhci0: [ITHREAD]
usbus0: <intel 82801eb="" (ich5)="" usb="" controller="" usb-a="">on uhci0
uhci1: <intel 82801eb="" (ich5)="" usb="" controller="" usb-b="">port 0xacc0-0xacdf irq 19 at device 29.1 on pci0
uhci1: [ITHREAD]
usbus1: <intel 82801eb="" (ich5)="" usb="" controller="" usb-b="">on uhci1
uhci2: <intel 82801eb="" (ich5)="" usb="" controller="" usb-c="">port 0xaca0-0xacbf irq 18 at device 29.2 on pci0
uhci2: [ITHREAD]
usbus2: <intel 82801eb="" (ich5)="" usb="" controller="" usb-c="">on uhci2
ehci0: <intel 82801eb="" r="" (ich5)="" usb="" 2.0="" controller="">mem 0xfeb00000-0xfeb003ff irq 23 at device 29.7 on pci0
ehci0: [ITHREAD]
usbus3: EHCI version 1.0
usbus3: <intel 82801eb="" r="" (ich5)="" usb="" 2.0="" controller="">on ehci0
pcib11: <acpi pci-pci="" bridge="">at device 30.0 on pci0
pci11: <acpi pci="" bus="">on pcib11
vgapci0: <vga-compatible display="">port 0xbc00-0xbcff mem 0xf0000000-0xf7ffffff,0xfdef0000-0xfdefffff irq 18 at device 13.0 on pci11
isab0: <pci-isa bridge="">at device 31.0 on pci0
isa0: <isa bus="">on isab0
atapci0: <intel ich5="" udma100="" controller="">port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 31.1 on pci0
ata0: <ata 0="" channel="">on atapci0
ata0: [ITHREAD]
ata1: <ata 1="" channel="">on atapci0
ata1: [ITHREAD]
atrtc0: <at realtime="" clock="">port 0x70-0x7f irq 8 on acpi0
fdc0: <floppy drive="" controller="">port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: [FILTER]
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
uart0: [FILTER]
uart0: console (9600,n,8,1)
orm0: <isa option="" roms="">at iomem 0xc0000-0xcafff,0xcd800-0xcefff,0xcf000-0xd07ff,0xec000-0xeffff on isa0
sc0: <system console="">at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <generic isa="" vga="">at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
atkbdc0: <keyboard controller="" (i8042)="">at port 0x60,0x64 on isa0
atkbd0: <at keyboard="">irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
ppc0: cannot reserve I/O port range
est0: <enhanced speedstep="" frequency="" control="">on cpu0
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 122d0000122d
device_attach: est0 attach returned 6
p4tcc0: <cpu frequency="" thermal="" control="">on cpu0
est1: <enhanced speedstep="" frequency="" control="">on cpu1
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 122d0000122d
device_attach: est1 attach returned 6
p4tcc1: <cpu frequency="" thermal="" control="">on cpu1
est2: <enhanced speedstep="" frequency="" control="">on cpu2
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 122d0000122d
device_attach: est2 attach returned 6
p4tcc2: <cpu frequency="" thermal="" control="">on cpu2
est3: <enhanced speedstep="" frequency="" control="">on cpu3
est: CPU supports Enhanced Speedstep, but is not recognized.
est: cpu_vendor GenuineIntel, msr 122d0000122d
device_attach: est3 attach returned 6
p4tcc3: <cpu frequency="" thermal="" control="">on cpu3
Timecounters tick every 1.000 msec
IPsec: Initialized Security Association Processing.
usbus0: 12Mbps Full Speed USB v1.0
usbus1: 12Mbps Full Speed USB v1.0
usbus2: 12Mbps Full Speed USB v1.0
usbus3: 480Mbps High Speed USB v2.0
acd0: CDROM <teac cd-rom="" cd-224e="" k.9a="">at ata0-master UDMA33
amr0: delete logical drives supported by controller
amrd0: <lsilogic megaraid="" logical="" drive="">on amr0
amrd0: 69880MB (143114240 sectors) RAID 1 (optimal)
SMP: AP CPU #1 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #2 Launched!
ugen1.1: <intel>at usbus1ugen0.1: <intel>at usbus0ugen2.1: <intel>at usbus2ugen3.1: <intel>at usbus3
uhub0:<intel 1="" 9="" uhci="" root="" hub,="" class="" 0,="" rev="" 1.00="" 1.00,="" addr="">on usbus1
uhub1: <intel 1="" 9="" ehci="" root="" hub,="" class="" 0,="" rev="" 2.00="" 1.00,="" addr="">on usbus3
uhub2: <intel 1="" 9="" uhci="" root="" hub,="" class="" 0,="" rev="" 1.00="" 1.00,="" addr="">on usbus2
uhub3: <intel 1="" 9="" uhci="" root="" hub,="" class="" 0,="" rev="" 1.00="" 1.00,="" addr="">on usbus0
uhub0: 2 ports with 2 removable, self powered
uhub3: 2 ports with 2 removable, self powered
uhub2: 2 ports with 2 removable, self powered
Root mount waiting for: usbus3
Root mount waiting for: usbus3
uhub1: 6 ports with 6 removable, self powered
Root mount waiting for: usbus3
ugen3.2: <vendor 0x413c="">at usbus3
uhub4: <vendor 2="" 9="" 0x413c="" product="" 0xa001,="" class="" 0,="" rev="" 2.00="" 0.00,="" addr="">on usbus3
uhub4: 2 ports with 2 removable, self powered
ugen3.3: <dell>at usbus3
ukbd0: <dell 0="" 3="" dell="" usb="" keyboard,="" class="" 0,="" rev="" 1.10="" 1.05,="" addr="">on usbus3
kbd2 at ukbd0
Trying to mount root from ufs:/dev/amrd0s1a
pflog0: promiscuous mode enabled
vip253: link state changed to UP
vip210: link state changed to UP
vip247: link state changed to UP
vip244: link state changed to UP
vip243: link state changed to UP
vip242: link state changed to UP
vip240: link state changed to UP
vip236: link state changed to UP
vip235: link state changed to UP
vip234: link state changed to UP
vip233: link state changed to UP
vip230: link state changed to UP
vip227: link state changed to UP
vip226: link state changed to UP
vip223: link state changed to UP
vip250: link state changed to UP
vip254: link state changed to UP
vip246: link state changed to UP
vip215: link state changed to UP
vip218: link state changed to UP
vip211: link state changed to UP
vip209: link state changed to UP
vip167: link state changed to UP
vip204: link state changed to UP
vip159: link state changed to UP
vip195: link state changed to UP
vip17: link state changed to UP
vip185: link state changed to UP
vip184: link state changed to UP
vip178: link state changed to UP
vip175: link state changed to UP
vip172: link state changed to UP
vip170: link state changed to UP
vip166: link state changed to UP
vip21: link state changed to UP
vip194: link state changed to UP
vip165: link state changed to UP
vip164: link state changed to UP
vip252: link state changed to UP
vip251: link state changed to UP
vip40: link state changed to UP
vip163: link state changed to UP
vip162: link state changed to UP
vip161: link state changed to UP
vip30: link state changed to UP
vip123: link state changed to UP
vip124: link state changed to UP
vip35: link state changed to UP
vip125: link state changed to UP
vip5: link state changed to UP
vip27: link state changed to UP
vip51: link state changed to UP
vip54: link state changed to UP
vip57: link state changed to UP
vip129: link state changed to UP
vip126: link state changed to UP
vip149: link state changed to UP
vip177: link state changed to UP
vip24: link state changed to UP
vip88: link state changed to UP
vip158: link state changed to UP
vip4: link state changed to UP
vip188: link state changed to UP
vip130: link state changed to UP
vip1: link state changed to UP
vip221: link state changed to UP
vip220: link state changed to UP
vip212: link state changed to UP
vip213: link state changed to UP
vip217: link state changed to UP
vip219: link state changed to UP
ugen3.3: <dell>at usbus3 (disconnected)
ukbd0: at uhub4, port 2, addr 3 (disconnected)</dell></dell></dell></vendor></vendor></intel></intel></intel></intel></intel></intel></intel></intel></lsilogic></teac></cpu></enhanced></cpu></enhanced></cpu></enhanced></cpu></enhanced></at></keyboard></generic></system></isa></floppy></at></ata></ata></intel></isa></pci-isa></vga-compatible></acpi></acpi></intel></intel></intel></intel></intel></intel></intel></intel></intel(r)></intel(r)></intel(r)></intel(r)></acpi></acpi></acpi></acpi></acpi></acpi></intel(r)></acpi></acpi></intel(r)></acpi></acpi></acpi></acpi></acpi></acpi></acpi></acpi></lsilogic></lsilogic></acpi></acpi></acpi></acpi></acpi></acpi></high></acpi></acpi></acpi></acpi></dell></software></version></version></version></version></dell ></syscall,nx,lm></sse3,dtes64,mon,ds_cpl,est,tm2,cnxt-id,cx16,xtpr></fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,htt,tm,pbe> -
Have you tried putting a monitor on one of the servers and looking at any messages after a lock up? Might help debugging.
-
Ok so, as you say, it looks like you have 6 Intel NICs and they're all the legacy type, em(4) driver.
Thus apart from the the nmbclusters tweak the others are not doing anything.
You should use /boot/loader.conf.local for additonal loader options as /boot/loader.conf can be overwritten at a firmware upgrade.
I don't see why you need to load the igb driver.
Put in hw.em.num_queries="1" instead of igb.
Remove the bce stuff.Cross your fingers! ;)
Interestingly I don't have any OIDs at hw.em but most were introduced after FreeBSD 8.1.
Edit: Doesn't appear to be a valid OID under 8.3 either. :-\Steve
Edit: As suggested above, if it is out of nmbclusters that should show up after a crash.
Also putting a dmesg list in a code box makes your post much easier to read. ;)