Secondary Pfsense Crash after CARP Configuration



  • I am stuck in an issues since last two weeks because my pfsense crashes and went into to endless loop of reboot whenever i configure CARP on seconday pfsense . Even i have tested on different hardwares and different versions old and latest of pfsense . Please if someone can guide


  • Rebel Alliance Developer Netgate

    You must provide more detail about the crashes. Can you see any part of the error/backtrace/etc that happens? If you disconnect the sync cable does the reboot loop stop?

    Are you using any limiters? (Limiters + pfsync is a known panic trigger for HA)



  • Hi

    I have the same issue i am using 2.4.2 i followed the following video.  https://www.youtube.com/watch?v=gxIPHR-eX_U it works up until enabling carp then send my pfsense in a massive boot loops.

    I do have limiters so maybe that could be the issue im facing.



  • @jimp:

    You must provide more detail about the crashes. Can you see any part of the error/backtrace/etc that happens? If you disconnect the sync cable does the reboot loop stop?

    Are you using any limiters? (Limiters + pfsync is a known panic trigger for HA)

    Did you fix this?


  • Netgate

    No it is still an open bug.

    Delete the limiters or disable pfsync.



  • @Derelict:

    No it is still an open bug.

    Delete the limiters or disable pfsync.

    Can you not uncheck the toggle for Traffic Shaper Limiters configuration & Traffic Shaper configuration?

    Mat


  • Netgate

    If whatever you try works, yes.

    It is a problem specific to limiters. The shaper itself (altq) works fine.



  • Thanks. I did try without it checked but still crashed. I’ll delete the limiters and try that instead.

    Mat



  • Hi,

    I solved mine with this … https://redmine.pfsense.org/issues/4310#note-44

    Edwin



  • Hi again

    i have deleted all my limiters and it still crashes.  here is the crash report.  HELP Please!

    Crash report begins.  Anonymous machine information:

    amd64
    11.1-RELEASE-p4
    FreeBSD 11.1-RELEASE-p4 #5 r313908+79c92265a31(RELENG_2_4): Mon Nov 20 08:18:22 CST 2017    root@buildbot2.netgate.com:/builder/ce-242/tmp/obj/builder/ce-242/tmp/FreeBSD-src/sys/pfSense

    Crash report details:

    No PHP errors found.

    Filename: /var/crash/bounds
    1

    Filename: /var/crash/info.0
    Dump header from device: /dev/gptid/78cabbc4-dad6-11e7-b9e4-005056b3ced1
      Architecture: amd64
      Architecture Version: 1
      Dump Length: 106496
      Blocksize: 512
      Dumptime: Sat Dec  9 17:23:14 2017
      Hostname: srvtcfw01
      Magic: FreeBSD Text Dump
      Version String: FreeBSD 11.1-RELEASE-p4 #5 r313908+79c92265a31(RELENG_2_4): Mon Nov 20 08:18:22 CST 2017
        root@buildbot2.netgate.com:/builder/ce-242/tmp/obj/builder/ce-242/tmp/FreeBSD-src/sys/pfSense
      Panic String: pfsync_undefer_state: unable to find deferred state
      Dump Parity: 583539232
      Bounds: 0
      Dump Status: good

    Filename: /var/crash/info.last
    Dump header from device: /dev/gptid/78cabbc4-dad6-11e7-b9e4-005056b3ced1
      Architecture: amd64
      Architecture Version: 1
      Dump Length: 106496
      Blocksize: 512
      Dumptime: Sat Dec  9 17:23:14 2017
      Hostname: srvtcfw01
      Magic: FreeBSD Text Dump
      Version String: FreeBSD 11.1-RELEASE-p4 #5 r313908+79c92265a31(RELENG_2_4): Mon Nov 20 08:18:22 CST 2017
        root@buildbot2.netgate.com:/builder/ce-242/tmp/obj/builder/ce-242/tmp/FreeBSD-src/sys/pfSense
      Panic String: pfsync_undefer_state: unable to find deferred state
      Dump Parity: 583539232
      Bounds: 0
      Dump Status: good

    Filename: /var/crash/minfree
    2048

    Filename: /var/crash/textdump.tar.0
    ddb.txt06000014000013213016002  7056 ustarrootwheeldb:0:kdb.enter.default>  run lockinfo
    db:1:lockinfo> show locks
    No such command
    db:1:locks>  show alllocks
    No such command
    db:1:alllocks>  show lockedvnods
    Locked vnodes
    db:0:kdb.enter.default>  show pcpu
    cpuid        = 0
    dynamic pcpu = 0x7ebf00
    curthread    = 0xfffff80007d6f000: pid 15262 "openvpn"
    curpcb      = 0xfffffe0096381cc0
    fpcurthread  = 0xfffff80007d6f000: pid 15262 "openvpn"
    idlethread  = 0xfffff8000351d000: tid 100003 "idle: cpu0"
    curpmap      = 0xfffff80007d47138
    tssp        = 0xffffffff82a73b90
    commontssp  = 0xffffffff82a73b90
    rsp0        = 0xfffffe0096381cc0
    gs32p        = 0xffffffff82a7a3e8
    ldt          = 0xffffffff82a7a428
    tss          = 0xffffffff82a7a418
    db:0:kdb.enter.default>  bt
    Tracing pid 15262 tid 100109 td 0xfffff80007d6f000
    kdb_enter() at kdb_enter+0x3b/frame 0xfffffe0096381250
    vpanic() at vpanic+0x1a3/frame 0xfffffe00963812d0
    panic() at panic+0x43/frame 0xfffffe0096381330
    pfsync_update_state() at pfsync_update_state+0x3c5/frame 0xfffffe0096381380
    pf_test() at pf_test+0x21cf/frame 0xfffffe00963815c0
    pf_check_out() at pf_check_out+0x1d/frame 0xfffffe00963815e0
    pfil_run_hooks() at pfil_run_hooks+0x7b/frame 0xfffffe0096381670
    ip_output() at ip_output+0x22b/frame 0xfffffe00963817c0
    ip_forward() at ip_forward+0x323/frame 0xfffffe0096381860
    ip_input() at ip_input+0x75a/frame 0xfffffe00963818c0
    netisr_dispatch_src() at netisr_dispatch_src+0xa0/frame 0xfffffe0096381910
    tunwrite() at tunwrite+0x226/frame 0xfffffe0096381950
    devfs_write_f() at devfs_write_f+0xe2/frame 0xfffffe00963819b0
    dofilewrite() at dofilewrite+0xc8/frame 0xfffffe0096381a00
    sys_writev() at sys_writev+0x8c/frame 0xfffffe0096381a60
    amd64_syscall() at amd64_syscall+0x6c4/frame 0xfffffe0096381bf0
    Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0096381bf0
    –- syscall (121, FreeBSD ELF64, sys_writev), rip = 0x8015a581a, rsp = 0x7fffffffde18, rbp = 0x7fffffffde50 ---
    db:0:kdb.enter.default>  ps
      pid  ppid  pgrp  uid  state  wmesg        wchan        cmd
    51715  308  308    0  S      nanslp  0xffffffff828b51e0 php-fpm
    49963 70428  308    0  S      nanslp  0xffffffff828b51e0 sleep
    71262 71220 69187    0  S      nanslp  0xffffffff828b51e0 sleep
    71220    1 69187    0  S      wait    0xfffff8001c105000 sh
    9738  9585  9738 65534  Ss      sbwait  0xfffff800079fa4a4 darkstat
    9585    1  9585 65534  Ss      select  0xfffff8001c237b40 darkstat
    7039 71757  7039    0  Ss      (threaded)                  sshlockout_pf
    100091                  S      piperd  0xfffff8001c39f8e8 sshlockout_pf
    100169                  S      nanslp  0xffffffff828b51e0 sshlockout_pf
    6749    1  6749    0  Ss      select  0xfffff800074f9b40 bsnmpd
    70428    1  308    0  S      wait    0xfffff80007ea4000 sh
    69866    1 69866  136  Ss      select  0xfffff80007f20cc0 dhcpd
    63516    1 63516    59  Ss      kqread  0xfffff80007744900 unbound
    52767    1 52767    0  Ss      (threaded)                  dpinger
    100156                  S      uwait    0xfffff8001c278080 dpinger
    100165                  S      sbwait  0xfffff800079f8144 dpinger
    100166                  S      nanslp  0xffffffff828b51e0 dpinger
    100167                  S      nanslp  0xffffffff828b51e0 dpinger
    100168                  S      accept  0xfffff800079f906c dpinger
    52697    1 52697    0  Ss      (threaded)                  dpinger
    100155                  S      uwait    0xfffff8001c2fbf00 dpinger
    100161                  S      sbwait  0xfffff80007be1144 dpinger
    100162                  S      nanslp  0xffffffff828b51e0 dpinger
    100163                  S      nanslp  0xffffffff828b51e0 dpinger
    100164                  S      accept  0xfffff800076bc3cc dpinger
      415    1  415    0  Ss+    ttyin    0xfffff80007504cb0 getty
      407    1  407    0  Ss+    ttyin    0xfffff800075050b0 getty
      123    1  123    0  Ss+    ttyin    0xfffff800075054b0 getty
    99998    1 99998    0  Ss+    ttyin    0xfffff800075058b0 getty
    99937    1 99937    0  Ss+    ttyin    0xfffff800074d78b0 getty
    99644    1 99644    0  Ss+    ttyin    0xfffff800074b00b0 getty
    99639    1 99639    0  Ss+    ttyin    0xfffff800074ae8b0 getty
    99388    1 99388    0  Ss+    ttyin    0xfffff800074ae4b0 getty
    82736    1 82213    0  S      select  0xfffff8001c0ec340 vmtoolsd
    79421 78825 78825    0  S      nanslp  0xffffffff828b51e0 minicron
    78825    1 78825    0  Ss      wait    0xfffff800079c3000 minicron
    77900 77877 77877    0  S      nanslp  0xffffffff828b51e0 minicron
    77877    1 77877    0  Ss      wait    0xfffff800079c4000 minicron
    77783 77255 77255    0  S      nanslp  0xffffffff828b51e0 minicron
    77255    1 77255    0  Ss      wait    0xfffff800079c5000 minicron
    71757    1 71757    0  Ss      select  0xfffff80007617cc0 syslogd
    32516    1 32516    0  Ss      (threaded)                  ntpd
    100117                  S      select  0xfffff80007daae40 ntpd
    31992    1 31992    0  Ss      nanslp  0xffffffff828b51e0 cron
    31697 31441 31441    0  S      kqread  0xfffff80007d5f100 nginx
    31679 31441 31441    0  S      kqread  0xfffff80007d5f200 nginx
    31441    1 31441    0  Ss      pause    0xfffff8000754d630 nginx
    15591    1 15591    0  Ss      bpf      0xfffff80007bdca00 filterlog
    15262    1 15262    0  Rs      CPU 0                      openvpn
    14091    1 14091    0  Ss      select  0xfffff80007cde3c0 openvpn
    7334    1  7334    0  Ss      select  0xfffff800074f71c0 sshd
      336    1  336    0  Ss      select  0xfffff800074f60c0 devd
      324  322  322    0  S      kqread  0xfffff80007745600 check_reload_status
      322    1  322    0  Ss      kqread  0xfffff80007745500 check_reload_status
      308    1  308    0  Ss      kqread  0xfffff80007746100 php-fpm
      58    0    0    0  DL      mdwait  0xfffff80007524800 [md0]
      25    0    0    0  DL      syncer  0xffffffff829aee00 [syncer]
      24    0    0    0  DL      vlruwt  0xfffff8000754a588 [vnlru]
      23    0    0    0  DL      (threaded)                  [bufdaemon]
    100083                  D      psleep  0xffffffff829ad604 [bufdaemon]
    100092                  D      sdflush  0xfffff8000752b8e8 [/ worker]
      22    0    0    0  DL      -        0xffffffff829ae2bc [bufspacedaemon]
      21    0    0    0  DL      pgzero  0xffffffff829c2a64 [pagezero]
      20    0    0    0  DL      psleep  0xffffffff829bef14 [vmdaemon]
      19    0    0    0  DL      (threaded)                  [pagedaemon]
    100079                  D      psleep  0xffffffff82a72f85 [pagedaemon]
    100086                  D      launds  0xffffffff829beec4 [laundry: dom0]
    100087                  D      umarcl  0xffffffff829be838 [uma]
      18    0    0    0  DL      -        0xffffffff829ace14 [soaiod4]
      17    0    0    0  DL      -        0xffffffff829ace14 [soaiod3]
      16    0    0    0  DL      -        0xffffffff829ace14 [soaiod2]
      15    0    0    0  DL      -        0xffffffff829ace14 [soaiod1]
        9    0    0    0  DL      -        0xffffffff82789700 [rand_harvestq]
        8    0    0    0  DL      pftm    0xffffffff80e930b0 [pf purge]
        7    0    0    0  DL      waiting_ 0xffffffff82a61d70 [sctp_iterator]
        6    0    0    0  DL      -        0xfffff800039c4448 [fdc0]
        5    0    0    0  DL      idle    0xfffffe0000ee1000 [mpt_recovery0]
        4    0    0    0  DL      (threaded)                  [cam]
    100020                  D      -        0xffffffff8265c480 [doneq0]
    100074                  D      -        0xffffffff8265c2c8 [scanner]
        3    0    0    0  DL      crypto_r 0xffffffff829bd3f0 [crypto returns]
        2    0    0    0  DL      crypto_w 0xffffffff829bd298 [crypto]
      14    0    0    0  DL      (threaded)                  [geom]
    100014                  D      -        0xffffffff82a39e20 [g_event]
    100015                  D      -        0xffffffff82a39e28 [g_up]
    100016                  D      -        0xffffffff82a39e30 [g_down]
      13    0    0    0  DL      sleep    0xffffffff82615c70 [ng_queue0]
      12    0    0    0  WL      (threaded)                  [intr]
    100004                  I                                  [swi1: netisr 0]
    100005                  I                                  [swi3: vm]
    100006                  I                                  [swi4: clock (0)]
    100008                  I                                  [swi6: task queue]
    100009                  I                                  [swi6: Giant taskq]
    100012                  I                                  [swi5: fast taskq]
    100021                  I                                  [irq14: ata0]
    100022                  I                                  [irq15: ata1]
    100023                  I                                  [irq17: mpt0]
    100025                  I                                  [irq256: ahci0]
    100026                  I                                  [irq257: pcib3]
    100027                  I                                  [irq258: vmx0]
    100028                  I                                  [irq259: pcib4]
    100029                  I                                  [irq260: pcib5]
    100030                  I                                  [irq261: pcib6]
    100031                  I                                  [irq262: pcib7]
    100032                  I                                  [irq263: pcib8]
    100033                  I                                  [irq264: pcib9]
    100034                  I                                  [irq265: pcib10]
    100035                  I                                  [irq266: pcib11]
    100036                  I                                  [irq267: vmx1]
    100037                  I                                  [irq268: pcib12]
    100038                  I                                  [irq269: pcib13]
    100039                  I                                  [irq270: pcib14]
    100040                  I                                  [irq271: pcib15]
    100041                  I                                  [irq272: pcib16]
    100042                  I                                  [irq273: pcib17]
    100043                  I                                  [irq274: pcib18]
    100044                  I                                  [irq275: pcib19]
    100045                  I                                  [irq276: vmx2]
    100046                  I                                  [irq277: pcib20]
    100047                  I                                  [irq278: pcib21]
    100048                  I                                  [irq279: pcib22]
    100049                  I                                  [irq280: pcib23]
    100050                  I                                  [irq281: pcib24]
    100051                  I                                  [irq282: pcib25]
    100052                  I                                  [irq283: pcib26]
    100053                  I                                  [irq284: pcib27]
    100054                  I                                  [irq285: pcib28]
    100055                  I                                  [irq286: pcib29]
    100056                  I                                  [irq287: pcib30]
    100057                  I                                  [irq288: pcib31]
    100058                  I                                  [irq289: pcib32]
    100059                  I                                  [irq290: pcib33]
    100060                  I                                  [irq291: pcib34]
    100061                  I                                  [irq1: atkbd0]
    100062                  I                                  [irq12: psm0]
    100067                  I                                  [swi1: pf send]
    100068                  I                                  [swi1: pfsync]
      11    0    0    0  RL                                  [idle: cpu0]
        1    0    1    0  SLs    wait    0xfffff80003518588 [init]
      10    0    0    0  DL      audit_wo 0xffffffff82a68f40 [audit]
        0    0    0    0  DLs    (threaded)                  [kernel]
    100000                  D      swapin  0xffffffff82a39e68 [swapper]
    100007                  D      -        0xfffff80003507900 [kqueue_ctx taskq]
    100010                  D      -        0xfffff80003507100 [aiod_kick taskq]
    100011                  D      -        0xfffff80003506e00



  • Also been reading that the WAN has to be on the same NIC interface on backup cluster?

    Im using vmware on both boxes so does that mean same vswitch?


  • Netgate

    Also be sure you remove all the calls to the limiters in the rules.

    Disable state syncing on both nodes and try again. Does it still crash? If so you might be looking at a different problem.

    Also been reading that the WAN has to be on the same NIC interface on backup cluster?

    ALL NICs have to be the same on both nodes in the same order. If WAN is igb0 on the primary, WAN has to be igb0 on the secondary, and so on. Generally not the source of a panic however, just "unexpected" behavior.

    You might want to start again - small, and get WAN+LAN working in a very basic HA pair before moving on to more advanced configurations. They're VMs. It don't cost nothin'.

    Both nodes have to be able to pass multicast between each other.

    Inability to do so will not result in a crash, however, but a MASTER/MASTER split brain issue.



  • Thanks

    I think the problem i have is the interfaces arent the same so ill have to try and move stuff around to get same interface names.

    so it has to be the same physical nic its not based on virtual nic?

    Mat


  • Netgate

    An interface has a physical name (em0, re0, igb0, xn0, igb0.1000, lagg2.1001) and an internal name (wan, lan, opt1, opt2, opt3, optX).

    They all have to match exactly on both nodes.

    Use Status > Interfaces to verify.

    This is all covered in detail here: https://portal.pfsense.org/docs/book/highavailability/index.html



  • @Derelict:

    An interface has a physical name (em0, re0, igb0, xn0, igb0.1000, lagg2.1001) and an internal name (wan, lan, opt1, opt2, opt3, optX).

    They all have to match exactly on both nodes.

    Use Status > Interfaces to verify.

    This is all covered in detail here: https://portal.pfsense.org/docs/book/highavailability/index.html

    Painful lol.

    Internally they are all named the same but physical there not so ill have to change some bits around.

    few more days of playing then.



  • Ok set up quick test boxes on same host for now.  all HA works however cant ping the LAN Virtual IP until i set the MAC as static on the hosts.

    Now i can ping but its up and down like a yoyo.

    Any ideas?


  • Netgate



  • @Derelict:

    https://doc.pfsense.org/index.php/CARP_Configuration_Troubleshooting

    I have done the

    Enable promiscuous mode on the vSwitch
    Enable "MAC Address changes"
    Enable "Forged transmits"

    I have VM_Prod for VMS

    I now have another port group of VM_Prod-PF and changed pfsense LAN to this port group.

    Same problem though.

    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Request timed out.
    Request timed out.
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Request timed out.
    Request timed out.


  • Netgate

    Sorry. Runs great under XenServer. Someone else will have to help with VMware. It's certainly something in your virtual environment.

    Moving to Virtualization.



  • Thanks for your help up to now anyway.

    Anyone had this issue?

    Cant ping virtual ip until the following is enabled

    Enable promiscuous mode on the vSwitch
    Enable "MAC Address changes"
    Enable "Forged transmits"

    Once enabled i start to get ping return but it times out.

    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Request timed out.
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=40ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=56ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=72ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=90ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=2ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
    Request timed out.
    Request timed out.
    Reply from 192.168.50.254: bytes=32 time=1ms TTL=64



  • Is there anyone who has got this working?