Firewall hangs and reboots since upgrade to 2.2.3

wricaurte

Since upgrading to 2.2.3 my home firewall hangs and sometimes it reboots also. After these events I get a message at the Web Configurator saying that a core dump was found, I submitted it some times. This morning I got the dump file which is attached. I realized the firewall has rebooted 5 hours ago.

I run:

7 Ipsec VPNs (1 with an Alix pfsense and 6 with a TPLINK TL-ER604W)
2 openvpn servers
2 openvpn clients
PPTP Server
DNS Resolver
DHCP Server
Many NAT and firewall rules (Between local and VPN nets)
Squid + SquidGuard

I can not read the dump properly, I hope somebody could help me to make sure I´m not having a hardware related problem, last weekend I reinstalled the system with a new SSD, the same configuration worked fine on 2.2.2.

I only see IPSEC related messages, the IPSEC configuration has not changed from 2.2.2 to 2.2.3.

I can not access my home because it seems the firewall did not come back after rebooting again, I´ll check tonight when I´m home.

Regards.
dump_ndxfw-Jul-9-2015.txt

jimp

The actual crash dump/panic appears to be complaining about the filesystem

curthread    = 0xfffff8005eb82490: pid 65120 "squid"
curpcb       = 0xfffffe0036609cc0
fpcurthread  = none
idlethread   = 0xfffff80003210920: tid 100004 "idle: cpu1"
curpmap      = 0xfffff8005e6bf678
tssp         = 0xffffffff8219cff8
commontssp   = 0xffffffff8219cff8
rsp0         = 0xfffffe0036609cc0
gs32p        = 0xffffffff8219ea50
ldt          = 0xffffffff8219ea90
tss          = 0xffffffff8219ea80
db:0:kdb.enter.default>  bt
Tracing pid 65120 tid 100256 td 0xfffff8005eb82490
softdep_disk_io_initiation() at softdep_disk_io_initiation+0xdb0/frame 0xfffffe00366094c0
ffs_geom_strategy() at ffs_geom_strategy+0x15e/frame 0xfffffe00366094f0
bufwrite() at bufwrite+0x142/frame 0xfffffe0036609530
ffs_update() at ffs_update+0x25e/frame 0xfffffe00366095b0
ffs_write() at ffs_write+0x542/frame 0xfffffe0036609650
VOP_WRITE_APV() at VOP_WRITE_APV+0x145/frame 0xfffffe0036609760
vn_write() at vn_write+0x248/frame 0xfffffe00366097e0
vn_io_fault_doio() at vn_io_fault_doio+0x22/frame 0xfffffe0036609820
vn_io_fault1() at vn_io_fault1+0x7c/frame 0xfffffe0036609970
vn_io_fault() at vn_io_fault+0x18b/frame 0xfffffe00366099f0
dofilewrite() at dofilewrite+0x87/frame 0xfffffe0036609a40
kern_writev() at kern_writev+0x68/frame 0xfffffe0036609a90
sys_write() at sys_write+0x63/frame 0xfffffe0036609ae0
amd64_syscall() at amd64_syscall+0x351/frame 0xfffffe0036609bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0036609bf0

As you can see it was squid that was active at the time, and the various function calls in the backtrace are mostly filesystem related (ffs, vn, filewrite, etc)

Though there are some later ipsec errors they don't appear to be related the actual panic/crash

ipsec4_checkpolicy: invalid policy 3
ipsec4_checkpolicy: invalid policy 3
ipsec4_checkpolicy: invalid policy 3

It may be worth giving a 2.2.4 snapshot a try to see if our changes to the filesystem help (we fixed some issues in pw and in config.xml writing that could cause problems, and we turned sync back off).

wricaurte

Thanks a lot.. I´m going to install the snapshot, I´ll let you know if it fixes the problem.

Regards.

wricaurte

Sadly I got a crash dump again, the firewall rebooted again. I attach the dump file for further revision.

The snapshot was "2.2.4-DEVELOPMENT (amd64) built on Fri Jul 10 00:17:53 CDT 2015", it did not let me do any configuration change. By now I'm going back to 2.2.2. I have the old HDD still working if you want me to do any test, no problem at all :).

I'm sure you will fix it as always. Thanks a lot for your help.

Regards.

dump_pfsense_ndxfw-Jul-11-2015.txt

wricaurte

This problem persists in 2.2.4-STABLE, may be I found a pattern:

Open the serial console
Put some traffic on the firewall, something like watching Netflix

The firewall restarted every time after 1 or 2 minutes I started to watch any movie. Sometimes it restarted when I logged in to the web configuration tool. I attach the dump file.

I closed the serial console and it did not restarted while watching Netflix again. In 2.2.4 the reboots are less than 2.2.3, I'm having up to 2 days of uptime with 2.2.4, with 2.2.3 I was having some reboots a day.

I run squid in transparent mode but the client machine generating the traffic has an exception rule so this traffic passes through the firewall not through squid.

no rdr on re1 inet proto tcp from 192.168.30.140 to any port = http
no rdr on ovpns3 inet proto tcp from 192.168.30.140 to any port = http
no rdr on ovpns4 inet proto tcp from 192.168.30.140 to any port = http
no rdr on pptp inet proto tcp from 192.168.30.140 to any port = http

Please let me know if you want me to do any test.

Regards.

ndxfw_dump-17Aug2015.txt

heper

remove squid/squidguard (and any related proxy packages),and try again.

jimp

This crash dump was in IPsec processing / NIC drivers:

db:0:kdb.enter.default>  show pcpu
cpuid        = 1
dynamic pcpu = 0xfffffe00984bc800
curthread    = 0xfffff800034ae000: pid 12 "irq256: re0"
curpcb       = 0xfffffe00344cecc0
fpcurthread  = none
idlethread   = 0xfffff80003210920: tid 100004 "idle: cpu1"
curpmap      = 0xffffffff82181fd8
tssp         = 0xffffffff8219cff8
commontssp   = 0xffffffff8219cff8
rsp0         = 0xfffffe00344cecc0
gs32p        = 0xffffffff8219ea50
ldt          = 0xffffffff8219ea90
tss          = 0xffffffff8219ea80
db:0:kdb.enter.default>  bt
Tracing pid 12 tid 100051 td 0xfffff800034ae000
key_allocsp() at key_allocsp+0x256/frame 0xfffffe00344ce620
ipsec_getpolicybyaddr() at ipsec_getpolicybyaddr+0x8d/frame 0xfffffe00344ce690
ipsec4_checkpolicy() at ipsec4_checkpolicy+0x29/frame 0xfffffe00344ce6b0
ip_ipsec_output() at ip_ipsec_output+0x8a/frame 0xfffffe00344ce6f0
ip_output() at ip_output+0x966/frame 0xfffffe00344ce7f0
ip_forward() at ip_forward+0x347/frame 0xfffffe00344ce8a0
ip_input() at ip_input+0x6ec/frame 0xfffffe00344ce8f0
netisr_dispatch_src() at netisr_dispatch_src+0x62/frame 0xfffffe00344ce960
ether_demux() at ether_demux+0x149/frame 0xfffffe00344ce990
ether_nh_input() at ether_nh_input+0x347/frame 0xfffffe00344ce9f0
netisr_dispatch_src() at netisr_dispatch_src+0x62/frame 0xfffffe00344cea60
re_rxeof() at re_rxeof+0x4ce/frame 0xfffffe00344ceae0
re_intr_msi() at re_intr_msi+0x10b/frame 0xfffffe00344ceb20
intr_event_execute_handlers() at intr_event_execute_handlers+0xab/frame 0xfffffe00344ceb60
ithread_loop() at ithread_loop+0x96/frame 0xfffffe00344cebb0
fork_exit() at fork_exit+0x9a/frame 0xfffffe00344cebf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00344cebf0

ipsec4_checkpolicy: invalid policy 3

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address	= 0xa40c050150
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80cf0d26
stack pointer	        = 0x28:0xfffffe00344ce590
frame pointer	        = 0x28:0xfffffe00344ce620
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 12 (irq256: re0)

It's completely different from the last panic which was in filesystem code. I'd suspect the hardware at this stage more than anything.

wricaurte

@heper:

remove squid/squidguard (and any related proxy packages),and try again.

Hi, Thanks for the suggestion. I've deleted some packages, all related to caching and ntop also. I had an improvement. I got 2 days uptime but the firewall keeps rebooting. I see the IpSec related errors in the dump file.

Thanks again!!!

wricaurte

@jimp:

This crash dump was in IPsec processing / NIC drivers:

db:0:kdb.enter.default>  show pcpu
cpuid        = 1
dynamic pcpu = 0xfffffe00984bc800
curthread    = 0xfffff800034ae000: pid 12 "irq256: re0"
curpcb       = 0xfffffe00344cecc0
fpcurthread  = none
idlethread   = 0xfffff80003210920: tid 100004 "idle: cpu1"
curpmap      = 0xffffffff82181fd8
tssp         = 0xffffffff8219cff8
commontssp   = 0xffffffff8219cff8
rsp0         = 0xfffffe00344cecc0
gs32p        = 0xffffffff8219ea50
ldt          = 0xffffffff8219ea90
tss          = 0xffffffff8219ea80
db:0:kdb.enter.default>  bt
Tracing pid 12 tid 100051 td 0xfffff800034ae000
key_allocsp() at key_allocsp+0x256/frame 0xfffffe00344ce620
ipsec_getpolicybyaddr() at ipsec_getpolicybyaddr+0x8d/frame 0xfffffe00344ce690
ipsec4_checkpolicy() at ipsec4_checkpolicy+0x29/frame 0xfffffe00344ce6b0
ip_ipsec_output() at ip_ipsec_output+0x8a/frame 0xfffffe00344ce6f0
ip_output() at ip_output+0x966/frame 0xfffffe00344ce7f0
ip_forward() at ip_forward+0x347/frame 0xfffffe00344ce8a0
ip_input() at ip_input+0x6ec/frame 0xfffffe00344ce8f0
netisr_dispatch_src() at netisr_dispatch_src+0x62/frame 0xfffffe00344ce960
ether_demux() at ether_demux+0x149/frame 0xfffffe00344ce990
ether_nh_input() at ether_nh_input+0x347/frame 0xfffffe00344ce9f0
netisr_dispatch_src() at netisr_dispatch_src+0x62/frame 0xfffffe00344cea60
re_rxeof() at re_rxeof+0x4ce/frame 0xfffffe00344ceae0
re_intr_msi() at re_intr_msi+0x10b/frame 0xfffffe00344ceb20
intr_event_execute_handlers() at intr_event_execute_handlers+0xab/frame 0xfffffe00344ceb60
ithread_loop() at ithread_loop+0x96/frame 0xfffffe00344cebb0
fork_exit() at fork_exit+0x9a/frame 0xfffffe00344cebf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00344cebf0

ipsec4_checkpolicy: invalid policy 3

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address	= 0xa40c050150
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80cf0d26
stack pointer	        = 0x28:0xfffffe00344ce590
frame pointer	        = 0x28:0xfffffe00344ce620
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 12 (irq256: re0)

It's completely different from the last panic which was in filesystem code. I'd suspect the hardware at this stage more than anything.

Thanks for your help,

Is there any way to troubleshoot Ipsec? What I do not understand is why in 2.2.2 the problems does not exists. I rolled back to 2.2.2 two times and the problem disappear with the same confguration. I thing some change introduced from 2.2.3 forward is messing with my configuration :D.

Can you please give me any advice to troubleshoot IPSec? if there is no way I will roll back again. I have two hard disk so I can test new versions of PFSense with no problem.

Regards.

divsys

I have no idea if this will help your particular issue, but it may be worth a try to roll forward to the current 2.2.4.

There were some IPSec issues resolved in that release.

It's only a guess, but reasonably easy to try…....

wricaurte

@divsys:

I have no idea if this will help your particular issue, but it may be worth a try to roll forward to the current 2.2.4.

There were some IPSec issues resolved in that release.

It's only a guess, but reasonably easy to try…....

Hi, Thanks for the suggestion. The problem happens in 2.2.3 and 2.2.4. I downgraded to 2.2.2 again and the problem disappears. I'll try again with 2.3. There is something wrong with those versions. I've seen some IPSec related problems reported in the forums. I hope the pfsense team solve this.

Thanks..