Firewall hangs and reboots since upgrade to 2.2.3
-
Since upgrading to 2.2.3 my home firewall hangs and sometimes it reboots also. After these events I get a message at the Web Configurator saying that a core dump was found, I submitted it some times. This morning I got the dump file which is attached. I realized the firewall has rebooted 5 hours ago.
I run:
- 7 Ipsec VPNs (1 with an Alix pfsense and 6 with a TPLINK TL-ER604W)
- 2 openvpn servers
- 2 openvpn clients
- PPTP Server
- DNS Resolver
- DHCP Server
- Many NAT and firewall rules (Between local and VPN nets)
- Squid + SquidGuard
I can not read the dump properly, I hope somebody could help me to make sure I´m not having a hardware related problem, last weekend I reinstalled the system with a new SSD, the same configuration worked fine on 2.2.2.
I only see IPSEC related messages, the IPSEC configuration has not changed from 2.2.2 to 2.2.3.
I can not access my home because it seems the firewall did not come back after rebooting again, I´ll check tonight when I´m home.
Regards.
dump_ndxfw-Jul-9-2015.txt -
The actual crash dump/panic appears to be complaining about the filesystem
curthread = 0xfffff8005eb82490: pid 65120 "squid" curpcb = 0xfffffe0036609cc0 fpcurthread = none idlethread = 0xfffff80003210920: tid 100004 "idle: cpu1" curpmap = 0xfffff8005e6bf678 tssp = 0xffffffff8219cff8 commontssp = 0xffffffff8219cff8 rsp0 = 0xfffffe0036609cc0 gs32p = 0xffffffff8219ea50 ldt = 0xffffffff8219ea90 tss = 0xffffffff8219ea80 db:0:kdb.enter.default> bt Tracing pid 65120 tid 100256 td 0xfffff8005eb82490 softdep_disk_io_initiation() at softdep_disk_io_initiation+0xdb0/frame 0xfffffe00366094c0 ffs_geom_strategy() at ffs_geom_strategy+0x15e/frame 0xfffffe00366094f0 bufwrite() at bufwrite+0x142/frame 0xfffffe0036609530 ffs_update() at ffs_update+0x25e/frame 0xfffffe00366095b0 ffs_write() at ffs_write+0x542/frame 0xfffffe0036609650 VOP_WRITE_APV() at VOP_WRITE_APV+0x145/frame 0xfffffe0036609760 vn_write() at vn_write+0x248/frame 0xfffffe00366097e0 vn_io_fault_doio() at vn_io_fault_doio+0x22/frame 0xfffffe0036609820 vn_io_fault1() at vn_io_fault1+0x7c/frame 0xfffffe0036609970 vn_io_fault() at vn_io_fault+0x18b/frame 0xfffffe00366099f0 dofilewrite() at dofilewrite+0x87/frame 0xfffffe0036609a40 kern_writev() at kern_writev+0x68/frame 0xfffffe0036609a90 sys_write() at sys_write+0x63/frame 0xfffffe0036609ae0 amd64_syscall() at amd64_syscall+0x351/frame 0xfffffe0036609bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0036609bf0
As you can see it was squid that was active at the time, and the various function calls in the backtrace are mostly filesystem related (ffs, vn, filewrite, etc)
Though there are some later ipsec errors they don't appear to be related the actual panic/crash
ipsec4_checkpolicy: invalid policy 3 ipsec4_checkpolicy: invalid policy 3 ipsec4_checkpolicy: invalid policy 3
It may be worth giving a 2.2.4 snapshot a try to see if our changes to the filesystem help (we fixed some issues in pw and in config.xml writing that could cause problems, and we turned sync back off).
-
Thanks a lot.. I´m going to install the snapshot, I´ll let you know if it fixes the problem.
Regards.
-
Sadly I got a crash dump again, the firewall rebooted again. I attach the dump file for further revision.
The snapshot was "2.2.4-DEVELOPMENT (amd64) built on Fri Jul 10 00:17:53 CDT 2015", it did not let me do any configuration change. By now I'm going back to 2.2.2. I have the old HDD still working if you want me to do any test, no problem at all :).
I'm sure you will fix it as always. Thanks a lot for your help.
Regards.
-
This problem persists in 2.2.4-STABLE, may be I found a pattern:
- Open the serial console
- Put some traffic on the firewall, something like watching Netflix
The firewall restarted every time after 1 or 2 minutes I started to watch any movie. Sometimes it restarted when I logged in to the web configuration tool. I attach the dump file.
I closed the serial console and it did not restarted while watching Netflix again. In 2.2.4 the reboots are less than 2.2.3, I'm having up to 2 days of uptime with 2.2.4, with 2.2.3 I was having some reboots a day.
I run squid in transparent mode but the client machine generating the traffic has an exception rule so this traffic passes through the firewall not through squid.
no rdr on re1 inet proto tcp from 192.168.30.140 to any port = http
no rdr on ovpns3 inet proto tcp from 192.168.30.140 to any port = http
no rdr on ovpns4 inet proto tcp from 192.168.30.140 to any port = http
no rdr on pptp inet proto tcp from 192.168.30.140 to any port = httpPlease let me know if you want me to do any test.
Regards.
-
remove squid/squidguard (and any related proxy packages),and try again.
-
This crash dump was in IPsec processing / NIC drivers:
db:0:kdb.enter.default> show pcpu cpuid = 1 dynamic pcpu = 0xfffffe00984bc800 curthread = 0xfffff800034ae000: pid 12 "irq256: re0" curpcb = 0xfffffe00344cecc0 fpcurthread = none idlethread = 0xfffff80003210920: tid 100004 "idle: cpu1" curpmap = 0xffffffff82181fd8 tssp = 0xffffffff8219cff8 commontssp = 0xffffffff8219cff8 rsp0 = 0xfffffe00344cecc0 gs32p = 0xffffffff8219ea50 ldt = 0xffffffff8219ea90 tss = 0xffffffff8219ea80 db:0:kdb.enter.default> bt Tracing pid 12 tid 100051 td 0xfffff800034ae000 key_allocsp() at key_allocsp+0x256/frame 0xfffffe00344ce620 ipsec_getpolicybyaddr() at ipsec_getpolicybyaddr+0x8d/frame 0xfffffe00344ce690 ipsec4_checkpolicy() at ipsec4_checkpolicy+0x29/frame 0xfffffe00344ce6b0 ip_ipsec_output() at ip_ipsec_output+0x8a/frame 0xfffffe00344ce6f0 ip_output() at ip_output+0x966/frame 0xfffffe00344ce7f0 ip_forward() at ip_forward+0x347/frame 0xfffffe00344ce8a0 ip_input() at ip_input+0x6ec/frame 0xfffffe00344ce8f0 netisr_dispatch_src() at netisr_dispatch_src+0x62/frame 0xfffffe00344ce960 ether_demux() at ether_demux+0x149/frame 0xfffffe00344ce990 ether_nh_input() at ether_nh_input+0x347/frame 0xfffffe00344ce9f0 netisr_dispatch_src() at netisr_dispatch_src+0x62/frame 0xfffffe00344cea60 re_rxeof() at re_rxeof+0x4ce/frame 0xfffffe00344ceae0 re_intr_msi() at re_intr_msi+0x10b/frame 0xfffffe00344ceb20 intr_event_execute_handlers() at intr_event_execute_handlers+0xab/frame 0xfffffe00344ceb60 ithread_loop() at ithread_loop+0x96/frame 0xfffffe00344cebb0 fork_exit() at fork_exit+0x9a/frame 0xfffffe00344cebf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00344cebf0
ipsec4_checkpolicy: invalid policy 3 Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0xa40c050150 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80cf0d26 stack pointer = 0x28:0xfffffe00344ce590 frame pointer = 0x28:0xfffffe00344ce620 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq256: re0)
It's completely different from the last panic which was in filesystem code. I'd suspect the hardware at this stage more than anything.
-
remove squid/squidguard (and any related proxy packages),and try again.
Hi, Thanks for the suggestion. I've deleted some packages, all related to caching and ntop also. I had an improvement. I got 2 days uptime but the firewall keeps rebooting. I see the IpSec related errors in the dump file.
Thanks again!!!
-
This crash dump was in IPsec processing / NIC drivers:
db:0:kdb.enter.default> show pcpu cpuid = 1 dynamic pcpu = 0xfffffe00984bc800 curthread = 0xfffff800034ae000: pid 12 "irq256: re0" curpcb = 0xfffffe00344cecc0 fpcurthread = none idlethread = 0xfffff80003210920: tid 100004 "idle: cpu1" curpmap = 0xffffffff82181fd8 tssp = 0xffffffff8219cff8 commontssp = 0xffffffff8219cff8 rsp0 = 0xfffffe00344cecc0 gs32p = 0xffffffff8219ea50 ldt = 0xffffffff8219ea90 tss = 0xffffffff8219ea80 db:0:kdb.enter.default> bt Tracing pid 12 tid 100051 td 0xfffff800034ae000 key_allocsp() at key_allocsp+0x256/frame 0xfffffe00344ce620 ipsec_getpolicybyaddr() at ipsec_getpolicybyaddr+0x8d/frame 0xfffffe00344ce690 ipsec4_checkpolicy() at ipsec4_checkpolicy+0x29/frame 0xfffffe00344ce6b0 ip_ipsec_output() at ip_ipsec_output+0x8a/frame 0xfffffe00344ce6f0 ip_output() at ip_output+0x966/frame 0xfffffe00344ce7f0 ip_forward() at ip_forward+0x347/frame 0xfffffe00344ce8a0 ip_input() at ip_input+0x6ec/frame 0xfffffe00344ce8f0 netisr_dispatch_src() at netisr_dispatch_src+0x62/frame 0xfffffe00344ce960 ether_demux() at ether_demux+0x149/frame 0xfffffe00344ce990 ether_nh_input() at ether_nh_input+0x347/frame 0xfffffe00344ce9f0 netisr_dispatch_src() at netisr_dispatch_src+0x62/frame 0xfffffe00344cea60 re_rxeof() at re_rxeof+0x4ce/frame 0xfffffe00344ceae0 re_intr_msi() at re_intr_msi+0x10b/frame 0xfffffe00344ceb20 intr_event_execute_handlers() at intr_event_execute_handlers+0xab/frame 0xfffffe00344ceb60 ithread_loop() at ithread_loop+0x96/frame 0xfffffe00344cebb0 fork_exit() at fork_exit+0x9a/frame 0xfffffe00344cebf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00344cebf0
ipsec4_checkpolicy: invalid policy 3 Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0xa40c050150 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80cf0d26 stack pointer = 0x28:0xfffffe00344ce590 frame pointer = 0x28:0xfffffe00344ce620 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq256: re0)
It's completely different from the last panic which was in filesystem code. I'd suspect the hardware at this stage more than anything.
Thanks for your help,
Is there any way to troubleshoot Ipsec? What I do not understand is why in 2.2.2 the problems does not exists. I rolled back to 2.2.2 two times and the problem disappear with the same confguration. I thing some change introduced from 2.2.3 forward is messing with my configuration :D.
Can you please give me any advice to troubleshoot IPSec? if there is no way I will roll back again. I have two hard disk so I can test new versions of PFSense with no problem.
Regards.
-
I have no idea if this will help your particular issue, but it may be worth a try to roll forward to the current 2.2.4.
There were some IPSec issues resolved in that release.
It's only a guess, but reasonably easy to try…....
-
I have no idea if this will help your particular issue, but it may be worth a try to roll forward to the current 2.2.4.
There were some IPSec issues resolved in that release.
It's only a guess, but reasonably easy to try…....
Hi, Thanks for the suggestion. The problem happens in 2.2.3 and 2.2.4. I downgraded to 2.2.2 again and the problem disappears. I'll try again with 2.3. There is something wrong with those versions. I've seen some IPSec related problems reported in the forums. I hope the pfsense team solve this.
Thanks..