Secondary Pfsense Crash after CARP Configuration
-
I am stuck in an issues since last two weeks because my pfsense crashes and went into to endless loop of reboot whenever i configure CARP on seconday pfsense . Even i have tested on different hardwares and different versions old and latest of pfsense . Please if someone can guide
-
You must provide more detail about the crashes. Can you see any part of the error/backtrace/etc that happens? If you disconnect the sync cable does the reboot loop stop?
Are you using any limiters? (Limiters + pfsync is a known panic trigger for HA)
-
Hi
I have the same issue i am using 2.4.2 i followed the following video. https://www.youtube.com/watch?v=gxIPHR-eX_U it works up until enabling carp then send my pfsense in a massive boot loops.
I do have limiters so maybe that could be the issue im facing.
-
You must provide more detail about the crashes. Can you see any part of the error/backtrace/etc that happens? If you disconnect the sync cable does the reboot loop stop?
Are you using any limiters? (Limiters + pfsync is a known panic trigger for HA)
Did you fix this?
-
No it is still an open bug.
Delete the limiters or disable pfsync.
-
No it is still an open bug.
Delete the limiters or disable pfsync.
Can you not uncheck the toggle for Traffic Shaper Limiters configuration & Traffic Shaper configuration?
Mat
-
If whatever you try works, yes.
It is a problem specific to limiters. The shaper itself (altq) works fine.
-
Thanks. I did try without it checked but still crashed. I’ll delete the limiters and try that instead.
Mat
-
Hi,
I solved mine with this … https://redmine.pfsense.org/issues/4310#note-44
Edwin
-
Hi again
i have deleted all my limiters and it still crashes. here is the crash report. HELP Please!
Crash report begins. Anonymous machine information:
amd64
11.1-RELEASE-p4
FreeBSD 11.1-RELEASE-p4 #5 r313908+79c92265a31(RELENG_2_4): Mon Nov 20 08:18:22 CST 2017 root@buildbot2.netgate.com:/builder/ce-242/tmp/obj/builder/ce-242/tmp/FreeBSD-src/sys/pfSenseCrash report details:
No PHP errors found.
Filename: /var/crash/bounds
1Filename: /var/crash/info.0
Dump header from device: /dev/gptid/78cabbc4-dad6-11e7-b9e4-005056b3ced1
Architecture: amd64
Architecture Version: 1
Dump Length: 106496
Blocksize: 512
Dumptime: Sat Dec 9 17:23:14 2017
Hostname: srvtcfw01
Magic: FreeBSD Text Dump
Version String: FreeBSD 11.1-RELEASE-p4 #5 r313908+79c92265a31(RELENG_2_4): Mon Nov 20 08:18:22 CST 2017
root@buildbot2.netgate.com:/builder/ce-242/tmp/obj/builder/ce-242/tmp/FreeBSD-src/sys/pfSense
Panic String: pfsync_undefer_state: unable to find deferred state
Dump Parity: 583539232
Bounds: 0
Dump Status: goodFilename: /var/crash/info.last
Dump header from device: /dev/gptid/78cabbc4-dad6-11e7-b9e4-005056b3ced1
Architecture: amd64
Architecture Version: 1
Dump Length: 106496
Blocksize: 512
Dumptime: Sat Dec 9 17:23:14 2017
Hostname: srvtcfw01
Magic: FreeBSD Text Dump
Version String: FreeBSD 11.1-RELEASE-p4 #5 r313908+79c92265a31(RELENG_2_4): Mon Nov 20 08:18:22 CST 2017
root@buildbot2.netgate.com:/builder/ce-242/tmp/obj/builder/ce-242/tmp/FreeBSD-src/sys/pfSense
Panic String: pfsync_undefer_state: unable to find deferred state
Dump Parity: 583539232
Bounds: 0
Dump Status: goodFilename: /var/crash/minfree
2048Filename: /var/crash/textdump.tar.0
ddb.txt06000014000013213016002 7056 ustarrootwheeldb:0:kdb.enter.default> run lockinfo
db:1:lockinfo> show locks
No such command
db:1:locks> show alllocks
No such command
db:1:alllocks> show lockedvnods
Locked vnodes
db:0:kdb.enter.default> show pcpu
cpuid = 0
dynamic pcpu = 0x7ebf00
curthread = 0xfffff80007d6f000: pid 15262 "openvpn"
curpcb = 0xfffffe0096381cc0
fpcurthread = 0xfffff80007d6f000: pid 15262 "openvpn"
idlethread = 0xfffff8000351d000: tid 100003 "idle: cpu0"
curpmap = 0xfffff80007d47138
tssp = 0xffffffff82a73b90
commontssp = 0xffffffff82a73b90
rsp0 = 0xfffffe0096381cc0
gs32p = 0xffffffff82a7a3e8
ldt = 0xffffffff82a7a428
tss = 0xffffffff82a7a418
db:0:kdb.enter.default> bt
Tracing pid 15262 tid 100109 td 0xfffff80007d6f000
kdb_enter() at kdb_enter+0x3b/frame 0xfffffe0096381250
vpanic() at vpanic+0x1a3/frame 0xfffffe00963812d0
panic() at panic+0x43/frame 0xfffffe0096381330
pfsync_update_state() at pfsync_update_state+0x3c5/frame 0xfffffe0096381380
pf_test() at pf_test+0x21cf/frame 0xfffffe00963815c0
pf_check_out() at pf_check_out+0x1d/frame 0xfffffe00963815e0
pfil_run_hooks() at pfil_run_hooks+0x7b/frame 0xfffffe0096381670
ip_output() at ip_output+0x22b/frame 0xfffffe00963817c0
ip_forward() at ip_forward+0x323/frame 0xfffffe0096381860
ip_input() at ip_input+0x75a/frame 0xfffffe00963818c0
netisr_dispatch_src() at netisr_dispatch_src+0xa0/frame 0xfffffe0096381910
tunwrite() at tunwrite+0x226/frame 0xfffffe0096381950
devfs_write_f() at devfs_write_f+0xe2/frame 0xfffffe00963819b0
dofilewrite() at dofilewrite+0xc8/frame 0xfffffe0096381a00
sys_writev() at sys_writev+0x8c/frame 0xfffffe0096381a60
amd64_syscall() at amd64_syscall+0x6c4/frame 0xfffffe0096381bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0096381bf0
–- syscall (121, FreeBSD ELF64, sys_writev), rip = 0x8015a581a, rsp = 0x7fffffffde18, rbp = 0x7fffffffde50 ---
db:0:kdb.enter.default> ps
pid ppid pgrp uid state wmesg wchan cmd
51715 308 308 0 S nanslp 0xffffffff828b51e0 php-fpm
49963 70428 308 0 S nanslp 0xffffffff828b51e0 sleep
71262 71220 69187 0 S nanslp 0xffffffff828b51e0 sleep
71220 1 69187 0 S wait 0xfffff8001c105000 sh
9738 9585 9738 65534 Ss sbwait 0xfffff800079fa4a4 darkstat
9585 1 9585 65534 Ss select 0xfffff8001c237b40 darkstat
7039 71757 7039 0 Ss (threaded) sshlockout_pf
100091 S piperd 0xfffff8001c39f8e8 sshlockout_pf
100169 S nanslp 0xffffffff828b51e0 sshlockout_pf
6749 1 6749 0 Ss select 0xfffff800074f9b40 bsnmpd
70428 1 308 0 S wait 0xfffff80007ea4000 sh
69866 1 69866 136 Ss select 0xfffff80007f20cc0 dhcpd
63516 1 63516 59 Ss kqread 0xfffff80007744900 unbound
52767 1 52767 0 Ss (threaded) dpinger
100156 S uwait 0xfffff8001c278080 dpinger
100165 S sbwait 0xfffff800079f8144 dpinger
100166 S nanslp 0xffffffff828b51e0 dpinger
100167 S nanslp 0xffffffff828b51e0 dpinger
100168 S accept 0xfffff800079f906c dpinger
52697 1 52697 0 Ss (threaded) dpinger
100155 S uwait 0xfffff8001c2fbf00 dpinger
100161 S sbwait 0xfffff80007be1144 dpinger
100162 S nanslp 0xffffffff828b51e0 dpinger
100163 S nanslp 0xffffffff828b51e0 dpinger
100164 S accept 0xfffff800076bc3cc dpinger
415 1 415 0 Ss+ ttyin 0xfffff80007504cb0 getty
407 1 407 0 Ss+ ttyin 0xfffff800075050b0 getty
123 1 123 0 Ss+ ttyin 0xfffff800075054b0 getty
99998 1 99998 0 Ss+ ttyin 0xfffff800075058b0 getty
99937 1 99937 0 Ss+ ttyin 0xfffff800074d78b0 getty
99644 1 99644 0 Ss+ ttyin 0xfffff800074b00b0 getty
99639 1 99639 0 Ss+ ttyin 0xfffff800074ae8b0 getty
99388 1 99388 0 Ss+ ttyin 0xfffff800074ae4b0 getty
82736 1 82213 0 S select 0xfffff8001c0ec340 vmtoolsd
79421 78825 78825 0 S nanslp 0xffffffff828b51e0 minicron
78825 1 78825 0 Ss wait 0xfffff800079c3000 minicron
77900 77877 77877 0 S nanslp 0xffffffff828b51e0 minicron
77877 1 77877 0 Ss wait 0xfffff800079c4000 minicron
77783 77255 77255 0 S nanslp 0xffffffff828b51e0 minicron
77255 1 77255 0 Ss wait 0xfffff800079c5000 minicron
71757 1 71757 0 Ss select 0xfffff80007617cc0 syslogd
32516 1 32516 0 Ss (threaded) ntpd
100117 S select 0xfffff80007daae40 ntpd
31992 1 31992 0 Ss nanslp 0xffffffff828b51e0 cron
31697 31441 31441 0 S kqread 0xfffff80007d5f100 nginx
31679 31441 31441 0 S kqread 0xfffff80007d5f200 nginx
31441 1 31441 0 Ss pause 0xfffff8000754d630 nginx
15591 1 15591 0 Ss bpf 0xfffff80007bdca00 filterlog
15262 1 15262 0 Rs CPU 0 openvpn
14091 1 14091 0 Ss select 0xfffff80007cde3c0 openvpn
7334 1 7334 0 Ss select 0xfffff800074f71c0 sshd
336 1 336 0 Ss select 0xfffff800074f60c0 devd
324 322 322 0 S kqread 0xfffff80007745600 check_reload_status
322 1 322 0 Ss kqread 0xfffff80007745500 check_reload_status
308 1 308 0 Ss kqread 0xfffff80007746100 php-fpm
58 0 0 0 DL mdwait 0xfffff80007524800 [md0]
25 0 0 0 DL syncer 0xffffffff829aee00 [syncer]
24 0 0 0 DL vlruwt 0xfffff8000754a588 [vnlru]
23 0 0 0 DL (threaded) [bufdaemon]
100083 D psleep 0xffffffff829ad604 [bufdaemon]
100092 D sdflush 0xfffff8000752b8e8 [/ worker]
22 0 0 0 DL - 0xffffffff829ae2bc [bufspacedaemon]
21 0 0 0 DL pgzero 0xffffffff829c2a64 [pagezero]
20 0 0 0 DL psleep 0xffffffff829bef14 [vmdaemon]
19 0 0 0 DL (threaded) [pagedaemon]
100079 D psleep 0xffffffff82a72f85 [pagedaemon]
100086 D launds 0xffffffff829beec4 [laundry: dom0]
100087 D umarcl 0xffffffff829be838 [uma]
18 0 0 0 DL - 0xffffffff829ace14 [soaiod4]
17 0 0 0 DL - 0xffffffff829ace14 [soaiod3]
16 0 0 0 DL - 0xffffffff829ace14 [soaiod2]
15 0 0 0 DL - 0xffffffff829ace14 [soaiod1]
9 0 0 0 DL - 0xffffffff82789700 [rand_harvestq]
8 0 0 0 DL pftm 0xffffffff80e930b0 [pf purge]
7 0 0 0 DL waiting_ 0xffffffff82a61d70 [sctp_iterator]
6 0 0 0 DL - 0xfffff800039c4448 [fdc0]
5 0 0 0 DL idle 0xfffffe0000ee1000 [mpt_recovery0]
4 0 0 0 DL (threaded) [cam]
100020 D - 0xffffffff8265c480 [doneq0]
100074 D - 0xffffffff8265c2c8 [scanner]
3 0 0 0 DL crypto_r 0xffffffff829bd3f0 [crypto returns]
2 0 0 0 DL crypto_w 0xffffffff829bd298 [crypto]
14 0 0 0 DL (threaded) [geom]
100014 D - 0xffffffff82a39e20 [g_event]
100015 D - 0xffffffff82a39e28 [g_up]
100016 D - 0xffffffff82a39e30 [g_down]
13 0 0 0 DL sleep 0xffffffff82615c70 [ng_queue0]
12 0 0 0 WL (threaded) [intr]
100004 I [swi1: netisr 0]
100005 I [swi3: vm]
100006 I [swi4: clock (0)]
100008 I [swi6: task queue]
100009 I [swi6: Giant taskq]
100012 I [swi5: fast taskq]
100021 I [irq14: ata0]
100022 I [irq15: ata1]
100023 I [irq17: mpt0]
100025 I [irq256: ahci0]
100026 I [irq257: pcib3]
100027 I [irq258: vmx0]
100028 I [irq259: pcib4]
100029 I [irq260: pcib5]
100030 I [irq261: pcib6]
100031 I [irq262: pcib7]
100032 I [irq263: pcib8]
100033 I [irq264: pcib9]
100034 I [irq265: pcib10]
100035 I [irq266: pcib11]
100036 I [irq267: vmx1]
100037 I [irq268: pcib12]
100038 I [irq269: pcib13]
100039 I [irq270: pcib14]
100040 I [irq271: pcib15]
100041 I [irq272: pcib16]
100042 I [irq273: pcib17]
100043 I [irq274: pcib18]
100044 I [irq275: pcib19]
100045 I [irq276: vmx2]
100046 I [irq277: pcib20]
100047 I [irq278: pcib21]
100048 I [irq279: pcib22]
100049 I [irq280: pcib23]
100050 I [irq281: pcib24]
100051 I [irq282: pcib25]
100052 I [irq283: pcib26]
100053 I [irq284: pcib27]
100054 I [irq285: pcib28]
100055 I [irq286: pcib29]
100056 I [irq287: pcib30]
100057 I [irq288: pcib31]
100058 I [irq289: pcib32]
100059 I [irq290: pcib33]
100060 I [irq291: pcib34]
100061 I [irq1: atkbd0]
100062 I [irq12: psm0]
100067 I [swi1: pf send]
100068 I [swi1: pfsync]
11 0 0 0 RL [idle: cpu0]
1 0 1 0 SLs wait 0xfffff80003518588 [init]
10 0 0 0 DL audit_wo 0xffffffff82a68f40 [audit]
0 0 0 0 DLs (threaded) [kernel]
100000 D swapin 0xffffffff82a39e68 [swapper]
100007 D - 0xfffff80003507900 [kqueue_ctx taskq]
100010 D - 0xfffff80003507100 [aiod_kick taskq]
100011 D - 0xfffff80003506e00 -
Also been reading that the WAN has to be on the same NIC interface on backup cluster?
Im using vmware on both boxes so does that mean same vswitch?
-
Also be sure you remove all the calls to the limiters in the rules.
Disable state syncing on both nodes and try again. Does it still crash? If so you might be looking at a different problem.
Also been reading that the WAN has to be on the same NIC interface on backup cluster?
ALL NICs have to be the same on both nodes in the same order. If WAN is igb0 on the primary, WAN has to be igb0 on the secondary, and so on. Generally not the source of a panic however, just "unexpected" behavior.
You might want to start again - small, and get WAN+LAN working in a very basic HA pair before moving on to more advanced configurations. They're VMs. It don't cost nothin'.
Both nodes have to be able to pass multicast between each other.
Inability to do so will not result in a crash, however, but a MASTER/MASTER split brain issue.
-
Thanks
I think the problem i have is the interfaces arent the same so ill have to try and move stuff around to get same interface names.
so it has to be the same physical nic its not based on virtual nic?
Mat
-
An interface has a physical name (em0, re0, igb0, xn0, igb0.1000, lagg2.1001) and an internal name (wan, lan, opt1, opt2, opt3, optX).
They all have to match exactly on both nodes.
Use Status > Interfaces to verify.
This is all covered in detail here: https://portal.pfsense.org/docs/book/highavailability/index.html
-
An interface has a physical name (em0, re0, igb0, xn0, igb0.1000, lagg2.1001) and an internal name (wan, lan, opt1, opt2, opt3, optX).
They all have to match exactly on both nodes.
Use Status > Interfaces to verify.
This is all covered in detail here: https://portal.pfsense.org/docs/book/highavailability/index.html
Painful lol.
Internally they are all named the same but physical there not so ill have to change some bits around.
few more days of playing then.
-
Ok set up quick test boxes on same host for now. all HA works however cant ping the LAN Virtual IP until i set the MAC as static on the hosts.
Now i can ping but its up and down like a yoyo.
Any ideas?
-
https://doc.pfsense.org/index.php/CARP_Configuration_Troubleshooting
-
https://doc.pfsense.org/index.php/CARP_Configuration_Troubleshooting
I have done the
Enable promiscuous mode on the vSwitch
Enable "MAC Address changes"
Enable "Forged transmits"I have VM_Prod for VMS
I now have another port group of VM_Prod-PF and changed pfsense LAN to this port group.
Same problem though.
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Request timed out.
Request timed out.
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Request timed out.
Request timed out. -
Sorry. Runs great under XenServer. Someone else will have to help with VMware. It's certainly something in your virtual environment.
Moving to Virtualization.
-
Thanks for your help up to now anyway.
Anyone had this issue?
Cant ping virtual ip until the following is enabled
Enable promiscuous mode on the vSwitch
Enable "MAC Address changes"
Enable "Forged transmits"Once enabled i start to get ping return but it times out.
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Request timed out.
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=40ms TTL=64
Reply from 192.168.50.254: bytes=32 time=56ms TTL=64
Reply from 192.168.50.254: bytes=32 time=72ms TTL=64
Reply from 192.168.50.254: bytes=32 time=90ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=2ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64
Request timed out.
Request timed out.
Reply from 192.168.50.254: bytes=32 time=1ms TTL=64