Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    CARP VIP assignment causes kernel panic

    Scheduled Pinned Locked Moved HA/CARP/VIPs
    15 Posts 4 Posters 5.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      mdpugh
      last edited by

      Hello,
      I have read everything I can find and tried every permutation I can think of to try to fix this problem to no avail.  Nevertheless, I apologize if this has been dealt with elsewhere–just please point me in the right direction.  I am trying to set up CARP on the LAN interface only.  My ISP will only assign IP addresses by DHCP and, while I can get more than one, they are, in general, not adjacent (I'm not even sure they're in the same subnet--I'll have to check), so I cannot readily set up CARP on the WAN.  I realize that the failure of the master firewall/gateway will interrupt any open sessions/connections, but at least the client can recover quickly.  I have set this configuration up on FreeBSD and OpenBSD several times with no problems and I am truly stumped as to what I am doing wrong.  I have followed both Chapter 20 of the book and the online tutorial to the letter, but it fails at the same point each time.  Everything proceeds normally until I configure the first CARP VIP on the master (the slave CARP VIP is not yet configured).  I configure it exactly as specified, but when I click Apply Changes, the firewall goes into a kernel panic and, after flashing a multitude of messages, reboots.  When it gets to the CARP configuration section of the boot process, it repeats the cycle ad infinitum.  I have saved the dumps generated each time, so I can provide them if necessary and no one has a Eureka moment from having read only my description of the problem.  I would really appreciate any insight.
      Thanks,
      Mike Pugh

      1 Reply Last reply Reply Quote 0
      • M
        mdpugh
        last edited by

        Here's the panic string:

        Panic String: Lock carp_if not exclusively locked @ /usr/pfSensesrc/src/sys/netinet/ip_carp.c:892

        1 Reply Last reply Reply Quote 0
        • M
          mdpugh
          last edited by

          Complete crash dump:

          Crash report begins.  Anonymous machine information:

          i386
          8.1-RELEASE-p4
          FreeBSD 8.1-RELEASE-p4 #0: Tue Sep 13 16:51:59 EDT 2011    root@FreeBSD_8.0_pfSense_2.0-snaps.pfsense.org:/usr/obj./usr/pfSensesrc/src/sys/pfSense_Dev.8

          Crash report details:

          Filename: /var/crash/bounds
          1

          Filename: /var/crash/info.0
          Dump header from device /dev/ad0s1b
            Architecture: i386
            Architecture Version: 1
            Dump Length: 77312B (0 MB)
            Blocksize: 512
            Dumptime: Thu Jan  5 16:26:31 2012
            Hostname: heads.compughterworx.com
            Magic: FreeBSD Text Dump
            Version String: FreeBSD 8.1-RELEASE-p4 #0: Tue Sep 13 16:51:59 EDT 2011
              root@FreeBSD_8.0_pfSense_2.0-snaps.pfsense.org:/usr/obj./usr/pfSensesrc/src/sys/pfSense_Dev.8
            Panic String: Lock carp_if not exclusively locked @ /usr/pfSensesrc/src/sys/netinet/ip_carp.c:892

          Dump Parity: 737169472
            Bounds: 0
            Dump Status: good

          Filename: /var/crash/minfree
          2048

          Filename: /var/crash/textdump.tar.0
          ddb.txt06000014000011701347467  7104 ustarrootwheeldb:0:kdb.enter.panic>  run lockinfo
          db:1:lockinfo> show locks
          shared rw carp_if (carp_if) r = 0 (0xc513e650) locked @ /usr/pfSensesrc/src/sys/netinet/ip_carp.c:877
          db:1:locks>  show alllocks
          Process 12 (intr) thread 0xc41ab280 (64007)
          shared rw carp_if (carp_if) r = 0 (0xc513e650) locked @ /usr/pfSensesrc/src/sys/netinet/ip_carp.c:877
          db:1:alllocks>  show lockedvnods
          Locked vnodes
          db:0:kdb.enter.panic>  show pcpu
          cpuid        = 0
          dynamic pcpu    = 0x536300
          curthread    = 0xc41ab280: pid 12 "swi4: clock"
          curpcb      = 0xc3df9d90
          fpcurthread  = none
          idlethread  = 0xc41aba00: pid 11 "idle: cpu0"
          APIC ID      = 0
          currentldt  = 0x50
          spin locks held:
          db:0:kdb.enter.panic>  bt
          Tracing pid 12 tid 64007 td 0xc41ab280
          kdb_enter(c0ebbaf1,c0ebbaf1,c0eb84c6,c3df9bac,0,…) at kdb_enter+0x3a
          panic(c0eb84c6,c0ed7976,c0ed7885,37c,c3df9c30,...) at panic+0x136
          _rw_assert(c513e650,4,c0ed7885,37c,c0ed7885,...) at rw_assert+0xb1
          carp_send_ad_locked(c513e650,c0ed7885,36d,c4e8952c,c3df9ca0,...) at carp_send_ad_locked+0x45
          carp_send_ad(c4e89400,0,c0ebcf08,189,c1325bd8,...) at carp_send_ad+0x37
          softclock(c1325ba0,c3df9cc8,c09f56a4,c1329940,c41e8138,...) at softclock+0x24a
          intr_event_execute_handlers(c41a97f8,c41e8100,c0eb6c63,533,c41e8170,...) at intr_event_execute_handlers+0x125
          ithread_loop(c41a8130,c3df9d38,c0eb69c8,344,c41a97f8,...) at ithread_loop+0x9f
          fork_exit(c09de490,c41a8130,c3df9d38) at fork_exit+0xb8
          fork_trampoline() at fork_trampoline+0x8
          --- trap 0, eip = 0, esp = 0xc3df9d70, ebp = 0 ---
          db:0:kdb.enter.panic>  ps
            pid  ppid  pgrp  uid  state  wmesg    wchan    cmd
          37155 23066 39852    0  S      nanslp  0xc1325b64 sleep
          49573 47993 49573    0  S+      ttyin    0xc4260e70 sh
          47993 47663 47993    0  S+      wait    0xc4eb1d48 sh
          47663    1 47663    0  Ss+    wait    0xc4eb02a8 login
          4384 39852 39852    0  S      accept  0xc4d3c1da php
          4672 39852 39852    0  S      accept  0xc4d3c1da php
          18573 21333 21333    0  S      piperd  0xc49267a8 rrdtool
          23066    1 39852    0  S      wait    0xc4d40550 sh
          21333    1 21333    0  Ss      select  0xc48f0924 apinger
          15124    1 15018 65534  S      select  0xc4f55524 dnsmasq
          10563    1 10563    65  Ss      select  0xc4d49d64 dhclient
          6066    1  6066    0  Ss      select  0xc4d49624 dhclient
          57505 16589 57505    0  Ss      (threaded)          sshlockout_pf
          64114                  S      nanslp  0xc1325b64 sshlockout_pf
          64100                  S      piperd  0xc4ee6310 initial thread
          29043    1 29043    0  Ss      nanslp  0xc1325b64 minicron
          28903    1 28903    0  Ss      nanslp  0xc1325b64 minicron
          28519    1 28519    0  Ss      nanslp  0xc1325b64 minicron
          25653    1 25653    0  Ss      nanslp  0xc1325b64 cron
          61941    1 61941    0  Ss      select  0xc48f0ae4 ntpd
          55579    1    26  123  S+      select  0xc4d48864 ntpd
          49912 39356 39356    0  S      accept  0xc49c8d1e php
          49702 39356 39356    0  S      accept  0xc49c8d1e php
          39852 38901 39852    0  Ss      wait    0xc4d3fd48 initial thread
          39356 38901 39356    0  Ss      wait    0xc49f6550 initial thread
          38901    1 38817    0  S      kqread  0xc4b1cb00 lighttpd
          16589    1 16589    0  Ss      select  0xc48ef7a4 syslogd
          11964    1 11964    65  Ss      select  0xc4d486a4 dhclient
          54321    1 54321    0  Ss      select  0xc48efda4 inetd
          41557    1  255    0  S      piperd  0xc4925188 logger
          41538    1  255    0  S      bpf      0xc4d42000 tcpdump
          34414    1 34414    0  Ss      select  0xc48ef424 dhclient
          22390    1 22390    65  Ss      select  0xc48efbe4 dhclient
          17018    1 17018    0  Ss      select  0xc48a9824 dhclient
          14474    1 14474    65  Ss      select  0xc48ef964 dhclient
          8809    1  8809    0  Ss      select  0xc48efba4 dhclient
          8657    1  8657    0  Ss      select  0xc48a8864 sshd
            268    1  268    0  Ss      select  0xc48ef464 devd
            257  255  255    0  S      kqread  0xc48f7d80 check_reload_status
            255    1  255    0  Ss      kqread  0xc48f7480 check_reload_status
            41    0    0    0  SL      mdwait  0xc4934800 [md0]
            25    0    0    0  SL      flowclea 0xc1492568 [flowcleaner]
            24    0    0    0  SL      sdflush  0xc14ad2e0 [softdepflush]
            23    0    0    0  SL      syncer  0xc1492354 [syncer]
            22    0    0    0  SL      vlruwt  0xc41ead48 [vnlru]
            21    0    0    0  SL      psleep  0xc1492088 [bufdaemon]
            20    0    0    0  SL      pollid  0xc13250bc [idlepoll]
            19    0    0    0  SL      pgzero  0xc14adfb4 [pagezero]
            18    0    0    0  SL      psleep  0xc14adbdc [vmdaemon]
            17    0    0    0  SL      psleep  0xc14adba4 [pagedaemon]
            16    0    0    0  SL      ccb_scan 0xc12ef9d4 [xpt_thrd]
              9    0    0    0  SL      pftm    0xc04f85b0 [pfpurge]
              8    0    0    0  SL      waiting
          0xc14997d8 [sctp_iterator]
              7    0    0    0  SL      -        0xc434a23c [fdc0]
            15    0    0    0  SL      (threaded)          usb
          64036                  D      -        0xc4314dac [usbus1]
          64035                  D      -        0xc4314d7c [usbus1]
          64034                  D      -        0xc4314d4c [usbus1]
          64033                  D      -        0xc4314d1c [usbus1]
          64032                  D      -        0xc42ffdac [usbus0]
          64031                  D      -        0xc42ffd7c [usbus0]
          64030                  D      -        0xc42ffd4c [usbus0]
          64029                  D      -        0xc42ffd1c [usbus0]
            14    0    0    0  SL      -        0xc13259c4 [yarrow]
              6    0    0    0  SL      crypto_r 0xc14ac88c [crypto returns]
              5    0    0    0  SL      crypto_w 0xc14ac868 [crypto]
              4    0    0    0  SL      -        0xc1323264 [g_down]
              3    0    0    0  SL      -        0xc1323260 [g_up]
              2    0    0    0  SL      -        0xc1323258 [g_event]
            13    0    0    0  SL      sleep    0xc12c5e40 [ng_queue0]
            12    0    0    0  RL      (threaded)          intr
          64046                  I                          [irq12: psm0]
          64045                  I                          [irq1: atkbd0]
          64044                  I                          [irq7: ppc0]
          64043                  I                          [swi0: uart uart]
          64028                  RunQ                        [irq10: uhci0 uhci1]
          64027                  I                          [irq15: ata1]
          64026                  I                          [irq14: ata0]
          64025                  I                          [irq11: acpi0]
          64024                  I                          [swi6: task queue]
          64023                  I                          [swi6: Giant taskq]
          64021                  I                          [swi5: +]
          64016                  I                          [swi2: cambio]
          64007                  Run    CPU 0              [swi4: clock]
          64006                  I                          [swi3: vm]
          64005                  I                          [swi1: netisr 0]
            11    0    0    0  RL                          [idle: cpu0]
              1    0    1    0  SLs    wait    0xc41a9d48 [init]
            10    0    0    0  SL      audit_wo 0xc14acc00 [audit]
              0    0    0    0  RLs    (threaded)          kernel
          64041                  RunQ                        [em4 taskq]
          64040                  D      -        0xc4348c80 [em3 taskq]
          64039                  RunQ                        [em2 taskq]
          64038                  D      -        0xc42f76c0 [em1 taskq]
          64037                  RunQ                        [em0 taskq]
          64022                  D      -        0xc42ab340

          1 Reply Last reply Reply Quote 0
          • marcellocM
            marcelloc
            last edited by

            Did you tried this way?

            • configure a sync interface

            • enable sync between firewalls

            • check replicaton between boxes

            • create a carp interface with a id that cound not conflict with any vrrp on your network

            Treinamentos de Elite: http://sys-squad.com

            Help a community developer! ;D

            1 Reply Last reply Reply Quote 0
            • C
              cmb
              last edited by

              What type of NICs do you have on the box? Any unusual interface setup? Haven't heard of a CARP panic in a long time much less one so easily triggered.

              1 Reply Last reply Reply Quote 0
              • M
                mdpugh
                last edited by

                {
                •configure a sync interface
                •enable sync between firewalls
                •check replicaton between boxes
                •create a carp interface with a id that cound not conflict with any vrrp on your network
                }

                Yes, I can vouch for all of this and that it was done in this order as the online tutorial suggests in contrast to the book.  I assume that by "check replication between boxes" you are referring to pfsync on the dedicated interfaces.  I really thought you had nailed it with the last point because I had an older OpenBSD box configured for CARP acting as a temporary gateway while I set up pfSense, but alas, I shut it down and still got the same result when I clicked Apply Changes on the Firewall: Virtual IP Address: Edit page.  There are no other redundancy protocols running anywhere on the network.

                The NICs are all genuine Intel Pro/1000 MTs.  The only thing unusual about the (eventual) setup is that I have two disjoint LANs.  I want firewall A to be the master for LAN1 and slave for LAN2 and firewall B to be master for LAN2 and slave for LAN1.  I don't think this is the issue for two reasons: (a) I set this up in OpenBSD/FreeBSD before and it worked, and (b) I haven't gotten that far in the configuration process with pfSense.  I have configured the sync interfaces on both hosts and checked for state synchronization (which is working).  I have not set up configuration synchronization since both firewalls are on equal footing as far as the master/slave relationship is concerned (and if this is the equivalent of the (no-sync) option in pf, then this is how I set it up before anyway).  I have done no CARP configuration on firewall B yet, nor have I configured LAN2 on firewall A.  All I have done past the sync stage is attempt to configure firewall A as the master on LAN1, and that's when things go haywire.

                1 Reply Last reply Reply Quote 0
                • M
                  mdpugh
                  last edited by

                  I did a clean install of 2.0.1 hoping in vain this would fix this issue.  I left the default installation alone except I changed the LAN IP to 10.0.0.101/24.  I then attempted to set a CARP VIP to 10.0.0.100/24 with all other settings to default and off we go to the races.  This is about as basic of an installation as possible, and I'm still getting the same panic.  I've also tried this with the pfsync interface enabled with and without the slave being active.  It does not seem to matter what I do; the kernel panics.  I have a hard time believing it's the NICs; I have the same model in the FreeBSD machines that have worked in this configuration.  I stress again that there are no other machines transmitting redundancy protocols on this network.  This happens so readily that I am flabbergasted that I'm evidently the only person experiencing this problem.

                  1 Reply Last reply Reply Quote 0
                  • marcellocM
                    marcelloc
                    last edited by

                    Do your hardware supports pfsense 64bits

                    Can you try to install it and see if you get same issue?

                    Treinamentos de Elite: http://sys-squad.com

                    Help a community developer! ;D

                    1 Reply Last reply Reply Quote 0
                    • M
                      mdpugh
                      last edited by

                      No, these are 32-bit boxes retasked for this purpose.

                      1 Reply Last reply Reply Quote 0
                      • M
                        mdpugh
                        last edited by

                        I tried something different hoping to gain some new insight.  I booted the firewall with all interfaces unplugged save the private LAN I use for configuration/maintenance.  I then configured CARP which did not immediately throw any errors.  Then I plugged each interface in individually and, if no panic resulted, rebooted the machine checking for panic on boot before proceeding to the next interface.  This worked until I got to the interface on which CARP was configured (no surprise there); as soon as I plugged it in–panic.  The switch connected to this interface is also connected to its sister interface on the other firewall and the rest of the LAN.  I tried (a) unplugging the other host, (b) unplugging the LAN, and (c) unplugging both.  The machine panics even when it is the only device connected to the switch.  I had hoped this would be an eye-opening experiment, but it seems all I have managed to determine is that when the troublesome interface is activated it spells trouble.  Lest anyone suspect it's the NIC, I have tried this with three different cards with the same result.  I can handle troubleshooting, but I have run out of ideas.  Maybe this experiment will light a bulb in someone else.

                        1 Reply Last reply Reply Quote 0
                        • C
                          cmb
                          last edited by

                          can you email me a backup of your config?  cmb at pfsense dot org

                          1 Reply Last reply Reply Quote 0
                          • C
                            cmb
                            last edited by

                            Your config works perfectly for me. So it is somehow hardware-specific, though I have no idea how. The couple boxes I tested with both have Intel Pro/1000 NICs which should behave exactly the same as yours.

                            1 Reply Last reply Reply Quote 0
                            • M
                              mdpugh
                              last edited by

                              This happens on the other host also, which has different onboard hardware (MB, chipset, etc.) and slightly different NICs (still Pro/1000, but GT, XT, and T).  The only similarity, really, is that they're both Pentium IIIs.  What is the likelihood I'd have the same issue on two heterogeneous machines?

                              1 Reply Last reply Reply Quote 0
                              • M
                                mdpugh
                                last edited by

                                So, I'm eating lunch at Subway pondering this latest news.  I'm pretty sure it's not a hardware issue because both machines are affected.  Still, the fact that you got it to work with no modification is quite the puzzle.  So, I ask myself, how could my installation be different from Chris's?  I mean, there are not that many options to choose from during installation.  Then it dawns on me.  I have been choosing the Developer's Kernel from the onset because these Pentium III systems aren't SMP, the uniprocessor option is gone in 2.0, I don't want to go headless, and I wasn't sure whether the DK was necessary for 2.1 (which I plan to install next).  I just reinstalled using the SMP option and now CARP is working perfectly.  I'm not sure this is what is supposed to happen, but the problem (on my end, at least) is solved.  If you hadn't asked for the config file and tested it, I have no idea how long it may have taken to figure this out.  Thank you!

                                1 Reply Last reply Reply Quote 0
                                • jimpJ
                                  jimp Rebel Alliance Developer Netgate
                                  last edited by

                                  All kernels (even Dev) are SMP on 2.0.

                                  There is no longer any benefit to loading a uniprocessor kernel (Mentioned a little here but also in more detail by me around the forum).

                                  I've had some issues with the dev kernel in certain setups as well but it does a lot more strict locking checking and reporting, which is what you appear to have hit here.

                                  We have enough debug info in the stock kernel these days that the full dev kernel isn't quite as necessary on its own, but still useful in rare cases.

                                  Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                  Need help fast? Netgate Global Support!

                                  Do not Chat/PM for help!

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post
                                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.