Firewall hangs and reboots since upgrade to 2.2.3



  • Since upgrading to 2.2.3 my home firewall hangs and sometimes it reboots also. After these events I get a message at the Web Configurator saying that a core dump was found, I submitted it some times. This morning I got the dump file which is attached. I realized the firewall has rebooted 5 hours ago.

    I run:

    • 7 Ipsec VPNs (1 with an Alix pfsense and 6 with a TPLINK TL-ER604W)
    • 2 openvpn servers
    • 2 openvpn clients
    • PPTP Server
    • DNS Resolver
    • DHCP Server
    • Many NAT and firewall rules (Between local and VPN nets)
    • Squid + SquidGuard

    I can not read the dump properly, I hope somebody could help me to make sure I´m not having a hardware related problem, last weekend I reinstalled the system with a new SSD, the same configuration worked fine on 2.2.2.

    I only see IPSEC related messages, the IPSEC configuration has not changed from 2.2.2 to 2.2.3.

    I can not access my home because it seems the firewall did not come back after rebooting again, I´ll check tonight when I´m home.

    Regards.
    dump_ndxfw-Jul-9-2015.txt


  • Rebel Alliance Developer Netgate

    The actual crash dump/panic appears to be complaining about the filesystem

    curthread    = 0xfffff8005eb82490: pid 65120 "squid"
    curpcb       = 0xfffffe0036609cc0
    fpcurthread  = none
    idlethread   = 0xfffff80003210920: tid 100004 "idle: cpu1"
    curpmap      = 0xfffff8005e6bf678
    tssp         = 0xffffffff8219cff8
    commontssp   = 0xffffffff8219cff8
    rsp0         = 0xfffffe0036609cc0
    gs32p        = 0xffffffff8219ea50
    ldt          = 0xffffffff8219ea90
    tss          = 0xffffffff8219ea80
    db:0:kdb.enter.default>  bt
    Tracing pid 65120 tid 100256 td 0xfffff8005eb82490
    softdep_disk_io_initiation() at softdep_disk_io_initiation+0xdb0/frame 0xfffffe00366094c0
    ffs_geom_strategy() at ffs_geom_strategy+0x15e/frame 0xfffffe00366094f0
    bufwrite() at bufwrite+0x142/frame 0xfffffe0036609530
    ffs_update() at ffs_update+0x25e/frame 0xfffffe00366095b0
    ffs_write() at ffs_write+0x542/frame 0xfffffe0036609650
    VOP_WRITE_APV() at VOP_WRITE_APV+0x145/frame 0xfffffe0036609760
    vn_write() at vn_write+0x248/frame 0xfffffe00366097e0
    vn_io_fault_doio() at vn_io_fault_doio+0x22/frame 0xfffffe0036609820
    vn_io_fault1() at vn_io_fault1+0x7c/frame 0xfffffe0036609970
    vn_io_fault() at vn_io_fault+0x18b/frame 0xfffffe00366099f0
    dofilewrite() at dofilewrite+0x87/frame 0xfffffe0036609a40
    kern_writev() at kern_writev+0x68/frame 0xfffffe0036609a90
    sys_write() at sys_write+0x63/frame 0xfffffe0036609ae0
    amd64_syscall() at amd64_syscall+0x351/frame 0xfffffe0036609bf0
    Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0036609bf0
    
    

    As you can see it was squid that was active at the time, and the various function calls in the backtrace are mostly filesystem related (ffs, vn, filewrite, etc)

    Though there are some later ipsec errors they don't appear to be related the actual panic/crash

    ipsec4_checkpolicy: invalid policy 3
    ipsec4_checkpolicy: invalid policy 3
    ipsec4_checkpolicy: invalid policy 3
    
    

    It may be worth giving a 2.2.4 snapshot a try to see if our changes to the filesystem help (we fixed some issues in pw and in config.xml writing that could cause problems, and we turned sync back off).



  • Thanks a lot.. I´m going to install the snapshot, I´ll let you know if it fixes the problem.

    Regards.



  • Sadly I got a crash dump again, the firewall rebooted again. I attach the dump file for further revision.

    The snapshot was "2.2.4-DEVELOPMENT (amd64) built on Fri Jul 10 00:17:53 CDT 2015", it did not let me do any configuration change. By now I'm going back to 2.2.2. I have the old HDD still working if you want me to do any test, no problem at all :).

    I'm sure you will fix it as always. Thanks a lot for your help.

    Regards.

    dump_pfsense_ndxfw-Jul-11-2015.txt



  • This problem persists in 2.2.4-STABLE, may be I found a pattern:

    • Open the serial console
    • Put some traffic on the firewall, something like watching Netflix

    The firewall restarted every time after 1 or 2 minutes I started to watch any movie. Sometimes it restarted when I logged in to the web configuration tool. I attach the dump file.

    I closed the serial console and it did not restarted while watching Netflix again. In 2.2.4 the reboots are less than 2.2.3, I'm having up to 2 days of uptime with 2.2.4, with 2.2.3 I was having some reboots a day.

    I run squid in transparent mode but the client machine generating the traffic has an exception rule so this traffic passes through the firewall not through squid.

    no rdr on re1 inet proto tcp from 192.168.30.140 to any port = http
    no rdr on ovpns3 inet proto tcp from 192.168.30.140 to any port = http
    no rdr on ovpns4 inet proto tcp from 192.168.30.140 to any port = http
    no rdr on pptp inet proto tcp from 192.168.30.140 to any port = http

    Please let me know if you want me to do any test.

    Regards.

    ndxfw_dump-17Aug2015.txt



  • remove squid/squidguard (and any related proxy packages),and try again.


  • Rebel Alliance Developer Netgate

    This crash dump was in IPsec processing / NIC drivers:

    db:0:kdb.enter.default>  show pcpu
    cpuid        = 1
    dynamic pcpu = 0xfffffe00984bc800
    curthread    = 0xfffff800034ae000: pid 12 "irq256: re0"
    curpcb       = 0xfffffe00344cecc0
    fpcurthread  = none
    idlethread   = 0xfffff80003210920: tid 100004 "idle: cpu1"
    curpmap      = 0xffffffff82181fd8
    tssp         = 0xffffffff8219cff8
    commontssp   = 0xffffffff8219cff8
    rsp0         = 0xfffffe00344cecc0
    gs32p        = 0xffffffff8219ea50
    ldt          = 0xffffffff8219ea90
    tss          = 0xffffffff8219ea80
    db:0:kdb.enter.default>  bt
    Tracing pid 12 tid 100051 td 0xfffff800034ae000
    key_allocsp() at key_allocsp+0x256/frame 0xfffffe00344ce620
    ipsec_getpolicybyaddr() at ipsec_getpolicybyaddr+0x8d/frame 0xfffffe00344ce690
    ipsec4_checkpolicy() at ipsec4_checkpolicy+0x29/frame 0xfffffe00344ce6b0
    ip_ipsec_output() at ip_ipsec_output+0x8a/frame 0xfffffe00344ce6f0
    ip_output() at ip_output+0x966/frame 0xfffffe00344ce7f0
    ip_forward() at ip_forward+0x347/frame 0xfffffe00344ce8a0
    ip_input() at ip_input+0x6ec/frame 0xfffffe00344ce8f0
    netisr_dispatch_src() at netisr_dispatch_src+0x62/frame 0xfffffe00344ce960
    ether_demux() at ether_demux+0x149/frame 0xfffffe00344ce990
    ether_nh_input() at ether_nh_input+0x347/frame 0xfffffe00344ce9f0
    netisr_dispatch_src() at netisr_dispatch_src+0x62/frame 0xfffffe00344cea60
    re_rxeof() at re_rxeof+0x4ce/frame 0xfffffe00344ceae0
    re_intr_msi() at re_intr_msi+0x10b/frame 0xfffffe00344ceb20
    intr_event_execute_handlers() at intr_event_execute_handlers+0xab/frame 0xfffffe00344ceb60
    ithread_loop() at ithread_loop+0x96/frame 0xfffffe00344cebb0
    fork_exit() at fork_exit+0x9a/frame 0xfffffe00344cebf0
    fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00344cebf0
    
    
    ipsec4_checkpolicy: invalid policy 3
    
    Fatal trap 12: page fault while in kernel mode
    cpuid = 1; apic id = 01
    fault virtual address	= 0xa40c050150
    fault code		= supervisor read data, page not present
    instruction pointer	= 0x20:0xffffffff80cf0d26
    stack pointer	        = 0x28:0xfffffe00344ce590
    frame pointer	        = 0x28:0xfffffe00344ce620
    code segment		= base 0x0, limit 0xfffff, type 0x1b
    			= DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags	= interrupt enabled, resume, IOPL = 0
    current process		= 12 (irq256: re0)
    
    

    It's completely different from the last panic which was in filesystem code. I'd suspect the hardware at this stage more than anything.



  • @heper:

    remove squid/squidguard (and any related proxy packages),and try again.

    Hi, Thanks for the suggestion. I've deleted some packages, all related to caching and ntop also. I had an improvement. I got 2 days uptime but the firewall keeps rebooting. I see the IpSec related errors in the dump file.

    Thanks again!!!



  • @jimp:

    This crash dump was in IPsec processing / NIC drivers:

    db:0:kdb.enter.default>  show pcpu
    cpuid        = 1
    dynamic pcpu = 0xfffffe00984bc800
    curthread    = 0xfffff800034ae000: pid 12 "irq256: re0"
    curpcb       = 0xfffffe00344cecc0
    fpcurthread  = none
    idlethread   = 0xfffff80003210920: tid 100004 "idle: cpu1"
    curpmap      = 0xffffffff82181fd8
    tssp         = 0xffffffff8219cff8
    commontssp   = 0xffffffff8219cff8
    rsp0         = 0xfffffe00344cecc0
    gs32p        = 0xffffffff8219ea50
    ldt          = 0xffffffff8219ea90
    tss          = 0xffffffff8219ea80
    db:0:kdb.enter.default>  bt
    Tracing pid 12 tid 100051 td 0xfffff800034ae000
    key_allocsp() at key_allocsp+0x256/frame 0xfffffe00344ce620
    ipsec_getpolicybyaddr() at ipsec_getpolicybyaddr+0x8d/frame 0xfffffe00344ce690
    ipsec4_checkpolicy() at ipsec4_checkpolicy+0x29/frame 0xfffffe00344ce6b0
    ip_ipsec_output() at ip_ipsec_output+0x8a/frame 0xfffffe00344ce6f0
    ip_output() at ip_output+0x966/frame 0xfffffe00344ce7f0
    ip_forward() at ip_forward+0x347/frame 0xfffffe00344ce8a0
    ip_input() at ip_input+0x6ec/frame 0xfffffe00344ce8f0
    netisr_dispatch_src() at netisr_dispatch_src+0x62/frame 0xfffffe00344ce960
    ether_demux() at ether_demux+0x149/frame 0xfffffe00344ce990
    ether_nh_input() at ether_nh_input+0x347/frame 0xfffffe00344ce9f0
    netisr_dispatch_src() at netisr_dispatch_src+0x62/frame 0xfffffe00344cea60
    re_rxeof() at re_rxeof+0x4ce/frame 0xfffffe00344ceae0
    re_intr_msi() at re_intr_msi+0x10b/frame 0xfffffe00344ceb20
    intr_event_execute_handlers() at intr_event_execute_handlers+0xab/frame 0xfffffe00344ceb60
    ithread_loop() at ithread_loop+0x96/frame 0xfffffe00344cebb0
    fork_exit() at fork_exit+0x9a/frame 0xfffffe00344cebf0
    fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00344cebf0
    
    
    ipsec4_checkpolicy: invalid policy 3
    
    Fatal trap 12: page fault while in kernel mode
    cpuid = 1; apic id = 01
    fault virtual address	= 0xa40c050150
    fault code		= supervisor read data, page not present
    instruction pointer	= 0x20:0xffffffff80cf0d26
    stack pointer	        = 0x28:0xfffffe00344ce590
    frame pointer	        = 0x28:0xfffffe00344ce620
    code segment		= base 0x0, limit 0xfffff, type 0x1b
    			= DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags	= interrupt enabled, resume, IOPL = 0
    current process		= 12 (irq256: re0)
    
    

    It's completely different from the last panic which was in filesystem code. I'd suspect the hardware at this stage more than anything.

    Thanks for your help,

    Is there any way to troubleshoot Ipsec? What I do not understand is why in 2.2.2 the problems does not exists. I rolled back to 2.2.2 two times and the problem disappear with the same confguration. I thing some change introduced from 2.2.3 forward is messing with my configuration  :D.

    Can you please give me any advice to troubleshoot IPSec? if there is no way I will roll back again. I have two hard disk so I can test new versions of PFSense with no problem.

    Regards.



  • I have no idea if this will help your particular issue, but it may be worth a try to roll forward to the current 2.2.4.

    There were some IPSec issues resolved in that release.

    It's only a guess, but reasonably easy to try…....



  • @divsys:

    I have no idea if this will help your particular issue, but it may be worth a try to roll forward to the current 2.2.4.

    There were some IPSec issues resolved in that release.

    It's only a guess, but reasonably easy to try…....

    Hi, Thanks for the suggestion. The problem happens in 2.2.3 and 2.2.4. I downgraded to 2.2.2 again and the problem disappears. I'll try again with 2.3. There is something wrong with those versions. I've seen some IPSec related problems reported in the forums. I hope the pfsense team solve this.

    Thanks..