Cannot keep 2.0.2 pfsense firewall from crashing daily in production environment
-
running on kvm amd64 virtual hosts. have 2 firewalls and using carp for failover.
have 4 intel 1gb nics and have made changes as pfsense refers to about intel nics on amd64 platform.doing openvpn, ipsec, tindydns, dhcp and have rules in place for access to the 3 internal networks.
nic e1 is wan
nic e2 is lan
nic e3 is control network
nic e4 is qa networki am desperate trying to figure out why i am having so many crashes.
i have attached latest crash dump and also uploaded to pfsense.
public ip of firewall is 199.255.156.227Crash report begins. Anonymous machine information:
amd64
8.1-RELEASE-p13
FreeBSD 8.1-RELEASE-p13 #1: Fri Dec 7 23:07:32 EST 2012 root@snapshots-8_1-amd64.builders.pfsense.org:/usr/obj./usr/pfSensesrc/src/sys/pfSense_SMP.8Crash report details:
Filename: /var/crash/bounds
1Filename: /var/crash/info.0
Dump header from device /dev/ad0s1b
Architecture: amd64
Architecture Version: 1
Dump Length: 66560B (0 MB)
Blocksize: 512
Dumptime: Sun Mar 17 10:00:23 2013
Hostname: fw01data.prod.tracsoftware.com
Magic: FreeBSD Text Dump
Version String: FreeBSD 8.1-RELEASE-p13 #1: Fri Dec 7 23:07:32 EST 2012
root@snapshots-8_1-amd64.builders.pfsense.org:/usr/obj./usr/pfSensesrc/src/sys/pfSense_SMP.8
Panic String:
Dump Parity: 1173100622
Bounds: 0
Dump Status: goodFilename: /var/crash/textdump.tar.0
ddb.txt���������������������������������������������������������������������������������������������0600����0�������0�������140000������12121346167� 7075� �����������������������������������������������������������������������������������������������������ustar���root����������������������������wheel������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������db:0:kdb.enter.default> run lockinfo
db:1:lockinfo> show locks
No such command
db:1:locks> show alllocks
No such command
db:1:alllocks> show lockedvnods
Locked vnodes
db:0:kdb.enter.default> show pcpu
cpuid = 0
dynamic pcpu = 0x2fc080
curthread = 0xffffff00024477c0: pid 0 "em3 taskq"
curpcb = 0xffffff800018cd40
fpcurthread = none
idlethread = 0xffffff00022a77c0: pid 11 "idle: cpu0"
curpmap = 0
tssp = 0xffffffff811ddd80
commontssp = 0xffffffff811ddd80
rsp0 = 0xffffff800018cd40
gs32p = 0xffffffff811dcbb8
ldt = 0xffffffff811dcbf8
tss = 0xffffffff811dcbe8
db:0:kdb.enter.default> bt
Tracing pid 0 tid 64036 td 0xffffff00024477c0
rn_match() at rn_match+0x1b
pfr_match_addr() at pfr_match_addr+0xcb
pf_test_udp() at pf_test_udp+0x89b
pf_test() at pf_test+0x207f
pf_check_in() at pf_check_in+0x39
pfil_run_hooks() at pfil_run_hooks+0xa2
ip_input() at ip_input+0x34e
netisr_dispatch_src() at netisr_dispatch_src+0x7b
ether_demux() at ether_demux+0x169
ether_input() at ether_input+0x174
lem_rxeof() at lem_rxeof+0x24d
lem_handle_rxtx() at lem_handle_rxtx+0x51
taskqueue_run() at taskqueue_run+0x93
taskqueue_thread_loop() at taskqueue_thread_loop+0x46
fork_exit() at fork_exit+0x118
fork_trampoline() at fork_trampoline+0xe
–- trap 0, rip = 0, rsp = 0xffffff800018cd30, rbp = 0 ---
db:0:kdb.enter.default> ps
pid ppid pgrp uid state wmesg wchan cmd
61811 51715 24 0 S nanslp 0xffffffff8115cc28 sleep
30064 1 30064 0 Ss (threaded) filterdns
64134 S nanslp 0xffffffff8115cc28 filterdns
64133 S nanslp 0xffffffff8115cc28 filterdns
64132 S nanslp 0xffffffff8115cc28 filterdns
64131 S nanslp 0xffffffff8115cc28 filterdns
64130 S nanslp 0xffffffff8115cc28 filterdns
64129 S nanslp 0xffffffff8115cc28 filterdns
64128 S nanslp 0xffffffff8115cc28 filterdns
64127 S nanslp 0xffffffff8115cc28 filterdns
64126 S nanslp 0xffffffff8115cc28 filterdns
64125 S nanslp 0xffffffff8115cc28 filterdns
64124 S nanslp 0xffffffff8115cc28 filterdns
64123 S nanslp 0xffffffff8115cc28 filterdns
64122 S nanslp 0xffffffff8115cc28 filterdns
64090 S nanslp 0xffffffff8115cc28 filterdns
64089 S nanslp 0xffffffff8115cc28 filterdns
64088 S nanslp 0xffffffff8115cc28 filterdns
64087 S nanslp 0xffffffff8115cc28 filterdns
64086 S nanslp 0xffffffff8115cc28 filterdns
64085 S nanslp 0xffffffff8115cc28 filterdns
64084 S nanslp 0xffffffff8115cc28 filterdns
64083 S nanslp 0xffffffff8115cc28 filterdns
64082 S nanslp 0xffffffff8115cc28 filterdns
64081 S nanslp 0xffffffff8115cc28 filterdns
64080 S nanslp 0xffffffff8115cc28 filterdns
64118 S uwait 0xffffff000f936100 filterdns
28048 21191 20919 0 S accept 0xffffff000f979b06 initial thread
36846 51391 51391 0 S piperd 0xffffff000265a000 rrdtool
50910 50856 42225 2009 S pipewr 0xffffff000265b5b0 tinydns
50856 42351 42225 0 S select 0xffffff000fca34c0 supervise
46070 42077 46070 0 S+ ttyin 0xffffff0002556ca8 sh
45819 42967 45819 0 S+ ttyin 0xffffff00025568a8 sh
42967 41688 42967 0 S+ wait 0xffffff000fa2e460 sh
42077 41599 42077 0 S+ wait 0xffffff000fa89000 sh
41883 43346 41883 0 Ss (threaded) sshlockout_pf
64099 S nanslp 0xffffffff8115cc28 sshlockout_pf
64092 S piperd 0xffffff0002719b60 initial thread
41688 1 41688 0 Ss+ wait 0xffffff000f98b8c0 login
41599 1 41599 0 Ss+ wait 0xffffff000fe31460 login
41370 1 41370 0 Ss select 0xffffff000fbad6c0 ntpd
33855 33840 33840 0 S nanslp 0xffffffff8115cc28 svscan
33840 1 33840 0 Ss wait 0xffffff000fe30460 sh
31857 31512 31512 0 S nanslp 0xffffffff8115cc28 minicron
31512 1 31512 0 Ss wait 0xffffff00025ed000 minicron
31455 30665 30665 0 S nanslp 0xffffffff8115cc28 minicron
30665 1 30665 0 Ss wait 0xffffff000f9898c0 minicron
30551 30311 30311 0 S nanslp 0xffffffff8115cc28 minicron
30311 1 30311 0 Ss wait 0xffffff00025ed8c0 minicron
52885 1 52885 0 Ss nanslp 0xffffffff8115cc28 cron
51715 1 24 0 S+ wait 0xffffff000fa888c0 sh
42834 42351 42225 0 S select 0xffffff000f9262c0 supervise
42518 42225 42225 0 S piperd 0xffffff000265bb60 multilog
42351 42225 42225 0 S nanslp 0xffffffff8115cc28 svscan
42225 1 42225 0 Ss wait 0xffffff000faa2000 sh
39412 1 39412 0 Ss select 0xffffff000fbae2c0 racoon
35106 1 35106 0 Ss (threaded) filterdns
64160 S uwrlck 0xffffff000f926180 filterdns
64097 S uwait 0xffffff000f936780 filterdns
31660 1 31490 65534 S select 0xffffff000fbadc40 dnsmasq
31138 1 31138 1002 Ss select 0xffffff000f936bc0 dhcpd
21656 21191 20919 0 S accept 0xffffff000f71d30e initial thread
21191 1 20919 0 S kqread 0xffffff000fb6aa00 lighttpd
51391 1 51391 0 Ss select 0xffffff000f936240 apinger
46583 1 46583 0 Ss (threaded) filterdns
64091 S uwrlck 0xffffff000f6bdc00 filterdns
64063 S uwait 0xffffff0002615380 filterdns
45137 1 45137 0 Ss select 0xffffff0002614cc0 inetd
44686 1 44686 0 Ss select 0xffffff000f6896c0 openvpn
43835 1 24 0 S+ piperd 0xffffff000265ab60 logger
43706 1 24 0 S+ bpf 0xffffff000f663c00 tcpdump
43346 1 43346 0 Ss select 0xffffff000f6bd340 syslogd
8284 1 8284 0 Ss select 0xffffff000f6bd540 sshd
259 1 259 0 Ss select 0xffffff0002614740 devd
245 243 243 0 S kqread 0xffffff0002609900 check_reload_status
243 1 243 0 Ss kqread 0xffffff0002626000 check_reload_status
39 0 0 0 SL mdwait 0xffffff0002605800 [md0]
23 0 0 0 SL sdflush 0xffffffff811a3738 [softdepflush]
22 0 0 0 SL vlruwt 0xffffff00025d6000 [vnlru]
21 0 0 0 SL syncer 0xffffffff81180e40 [syncer]
20 0 0 0 SL psleep 0xffffffff81180968 [bufdaemon]
19 0 0 0 SL pollid 0xffffffff8115ba48 [idlepoll]
18 0 0 0 SL pgzero 0xffffffff811a51cc [pagezero]
17 0 0 0 SL psleep 0xffffffff811a4568 [vmdaemon]
16 0 0 0 SL psleep 0xffffffff811a452c [pagedaemon]
9 0 0 0 SL ccb_scan 0xffffffff811229e0 [xpt_thrd]
8 0 0 0 SL pftm 0xffffffff80204dd0 [pfpurge]
7 0 0 0 SL waiting_ 0xffffffff8118ce00 [sctp_iterator]
15 0 0 0 SL (threaded) usb
64032 D - 0xffffff8000234ef0 [usbus0]
64031 D - 0xffffff8000234e98 [usbus0]
64030 D - 0xffffff8000234e40 [usbus0]
64029 D - 0xffffff8000234de8 [usbus0]
14 0 0 0 SL - 0xffffffff8115c904 [yarrow]
6 0 0 0 SL crypto_r 0xffffffff811a2660 [crypto returns]
5 0 0 0 SL crypto_w 0xffffffff811a2620 [crypto]
4 0 0 0 SL - 0xffffffff811586e8 [g_down]
3 0 0 0 SL - 0xffffffff811586e0 [g_up]
2 0 0 0 SL - 0xffffffff811586d0 [g_event]
13 0 0 0 SL sleep 0xffffffff810d19d0 [ng_queue0]
12 0 0 0 RL (threaded) intr
64040 I [swi0: uart]
64039 I [irq7: ppc0]
64038 I [irq12: psm0]
64037 I [irq1: atkbd0]
64028 I [irq11: em0 em1+]
64027 I [irq15: ata1]
64026 I [irq14: ata0]
64025 I [irq9: acpi0]
64023 I [swi5: +]
64021 I [swi2: cambio]
64017 I [swi6: task queue]
64016 I [swi6: Giant taskq]
64007 I [swi3: vm]
64006 RunQ [swi4: clock]
64005 I [swi1: netisr 0]
11 0 0 0 RL [idle: cpu0]
1 0 1 0 SLs wait 0xffffff00022a48c0 [init]
10 0 0 0 SL audit_wo 0xffffffff811a2b90 [audit]
0 0 0 0 RLs (threaded) kernel
64036 Run CPU 0 [em3 taskq]
64035 D - 0xffffff0002509580 [em2 taskq]
64034 D - 0xffffff00024de900 [em1 taskq]
64033 D - 0xffffff00024d9100 [em0 taskq]
64024 D - 0xffffff0002444180 -
Is there a reason you started a new thread rather than continue your existing thread (http://forum.pfsense.org/index.php/topic,59979.0/topicseen.html) ?
Have you tried 2.0.3 (http://forum.pfsense.org/index.php/topic,58203.0.html) ?
-
my previous message seemed to not get much response and i am struggling trying to figure out why i am having so many crashes.
my apologies for jumping the gun and posting new message and thank you for your reply.
i have not tried 2.03 yet since this is production fw and leary of applying beta code.
regards
Richard -
2.0.3 is more stable than 2.0.2. Don't consider it a "beta". It's practically RELEASE we are just waiting on FreeBSD to issue an OpenSSL security advisory before we can wrap it up.
And multiple threads are never the correct answer. One issue, one thread.
-
noted and understood about multiple threads. i wont do that again.
could you tell me where i can get the 2.0.3 code? i am unable to locate.
thanks -
Follow the link in the earlier message I posted.
-
rclaus,
I have the same problem. Please give us a feedback after 2.0.3 install.tnx
-
installed 2.0.3 this morning and it crashed about 2 hours ago >:(
-
looking at the dmesg.boot log i have the following error on all 4 of the intel nics
"Memory Access and/or Bus Master bits were not set!"
was wondering if this error be related to my daily crashes on pfsense 2.0.3 on amd64 build? -
this issue is resolved now. i installed 2.1 beta version which has newer release of FreeBSD and updated Intel nic drivers and i am having no more daily crashes.
i assume the older version of FreeBSD that the 2.0.x pfSense uses had bad em(4) drivers :) the 2.1 beta version is performing well for me.
thanks
Richard -
i assume the older version of FreeBSD that the 2.0.x pfSense uses had bad em(4) drivers :)
I confirm it, there is something wrong with the em(4) drivers, even in the PRE-RELEASE 2.0.3 version (amd64). Tested with multiple machines, all wtih Intel NICs.
Is there a chance to correct the driver before launch 2.0.3 Final Release ? -
We've seen a few complaints about the em drivers but nothing specific enough to act on. We can certainly update the drivers, but it would be most helpful to know what is wrong so we can confirm fixes.
I have em(4) NICs on practically everything (physical and virtual) and have had zero issues on 2.0.x or 2.1. It's not a universal/general issue, it must be specific to a certain set of chips.
-
Jimp,
I had open this thread with a lots of info:
http://forum.pfsense.org/index.php/topic,60237.0.htmlI don't know if it's enough.
Unfortunately, I can't reproduce the crash in a lab yet.
-
i also unable to reproduce the error but it happened consistently on a daily basis. my environment was a vm running on the kvm hypervisor
the physical nics are e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI
the hypervisor os is Ubuntu 11.10 (GNU/Linux 3.0.0-16-server x86_64)