Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    SG-4860 upgrade failed

    Scheduled Pinned Locked Moved Problems Installing or Upgrading pfSense Software
    7 Posts 2 Posters 1.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • B Offline
      bennyc
      last edited by

      Looks like my Backup member (CARP) 's update to 2.3 failed  :(
      It was on version 2.2.6, I stripped all packages, put it in persistent maintenance, and took a backup. Then hit the upgrade button, and that was about an hour ago. I still can't reach it, not sure what state it is in but this doesn't look promising. Looks like it will need console.
      Going to head over the server-room later today.

      @cmb, I've downloaded the ADI memstick 2.3 installer… but just in case, could you please point me to the 2.2.6 installer as well in case I should want to revert? (can't find it on the portal)

      4x XG-7100 (2xHA), 1x SG-4860, 1x SG-2100
      1x PC Engines APU2C4, 1x PC Engines APU1C4

      1 Reply Last reply Reply Quote 0
      • B Offline
        bennyc
        last edited by

        in update, I had to reboot the thing to get some console output.
        The upgrade has happened, it booted 2.3. After checking all vlans, it got to CARP, then to the point where it was syncing OpenVPN, and ended up with a nice kernel error:

        Configuring CARP settings…done.
        Configuring CARP settings...done.
        Syncing OpenVPN settings...

        Fatal trap 9: general protection fault while in kernel mode
        cpuid = 0; apic id = 00
        instruction pointer    = 0x20:0xffffffff80b82a2b
        stack pointer          = 0x28:0xfffffe001a3f59d0
        frame pointer          = 0x28:0xfffffe001a3f59f0
        code segment            = base 0x0, limit 0xfffff, type 0x1b
                                = DPL 0, pres 1, long 1, def32 0, gran 1
        processor eflags        = interrupt enabled, resume, IOPL = 0
        current process        = 12 (irq256: igb0:que 0)
        [ thread pid 12 tid 100034 ]
        Stopped at      m_tag_delete_chain+0x5b:        movq    (%rdi),%rax
        db:0:kdb.enter.default> textdump set
        textdump set
        db:0:kdb.enter.default>  capture on
        db:0:kdb.enter.default>  run lockinfo
        db:1:lockinfo> show locks
        No such command
        db:1:locks>  show alllocks
        No such command
        db:1:alllocks>  show lockedvnods
        Locked vnodes
        db:0:kdb.enter.default>  show pcpu
        cpuid        = 0
        dynamic pcpu = 0x531e00
        curthread    = 0xfffff80003567960: pid 12 "irq256: igb0:que 0"
        curpcb      = 0xfffffe001a3f5cc0
        fpcurthread  = none
        idlethread  = 0xfffff800033a0000: tid 100003 "idle: cpu0"
        curpmap      = 0xffffffff820f7ca0
        tssp        = 0xffffffff82112b90
        commontssp  = 0xffffffff82112b90
        rsp0        = 0xfffffe001a3f5cc0
        gs32p        = 0xffffffff821145e8
        ldt          = 0xffffffff82114628
        tss          = 0xffffffff82114618
        db:0:kdb.enter.default>  bt
        Tracing pid 12 tid 100034 td 0xfffff80003567960
        m_tag_delete_chain() at m_tag_delete_chain+0x5b/frame 0xfffffe001a3f59f0
        uma_zfree_arg() at uma_zfree_arg+0x3e/frame 0xfffffe001a3f5a60

        Lost more info afterwards, and it resets on its own. The cycle repeats.

        However, in an attempt to get a better log (putty output to log), I cycled again (by pulling the plug) and strangely enough I got past the point of OpenVPN sync? Now it says "Generating RRD graphs"…. It's not up yet, but we are making progress. Pfff.... I must admit this doesn't feel good  ::)

        4x XG-7100 (2xHA), 1x SG-4860, 1x SG-2100
        1x PC Engines APU2C4, 1x PC Engines APU1C4

        1 Reply Last reply Reply Quote 0
        • B Offline
          bennyc
          last edited by

          got past RRD graphs, but that's it. Crashed again… man...

          So, finally I got through. how? Disconnect all interfaces (unplug) but WAN, I left that one connected.
          Not that my success was that good, because after x-time console was non-responsive, and connecting a lan interface did not make it reachable on the network.
          Yet another kernel panic somewhere?

          4x XG-7100 (2xHA), 1x SG-4860, 1x SG-2100
          1x PC Engines APU2C4, 1x PC Engines APU1C4

          1 Reply Last reply Reply Quote 0
          • B Offline
            bennyc
            last edited by

            another update… I decided to take another deep breath, and go have a look at it.

            some findings:

            After a power reset, with only WAN connected, it boots fine. Console is accessible, remains responsive. (no carp on that interface, no vlans, dedicated igb1)
            Connecting the sync interface (igb5), no issue, console remains accessible, responsive.
            Then connected one of my opt interfaces (with vlans, and carp), and also that continued to work.
            I also got a possibility to log into the gui, and submit a crash report (hope it helps or tells something)

            Now what is interesting, as soon as I connect LAN (igb0, which has vlans and carps) I hangs almost instantly, and it needs a hard reset... ???

            Puzzled...

            4x XG-7100 (2xHA), 1x SG-4860, 1x SG-2100
            1x PC Engines APU2C4, 1x PC Engines APU1C4

            1 Reply Last reply Reply Quote 0
            • B Offline
              bennyc
              last edited by

              After I moved all my vlans to another interface (on both carp members ::) ), I was able to plug in igb0 (one subnet was untagged on that interface) -> console became unresponsive, but came back alive after 10seconds or so. Phiew. So now I have +- a working unit.
              OpenVPN still refuses to start for the moment, both instances, with following entries in log:

              Apr 13 16:34:42	openvpn	45820	Exiting due to fatal error 
              Apr 13 16:34:42	openvpn	45820	TCP/UDP: Socket bind failed on local address [AF_INET] x.x.x.x:1195: Address already in use 
              ...
              Apr 13 16:34:42	openvpn	42503	Exiting due to fatal error
              Apr 13 16:34:42	openvpn	42503	TCP/UDP: Socket bind failed on local address [AF_INET] x.x.x.x:1194: Address already in use
              ```so I'll need to look into that 1st before even thinking on upgrading the primary node…
              
              so far for my monologue...

              4x XG-7100 (2xHA), 1x SG-4860, 1x SG-2100
              1x PC Engines APU2C4, 1x PC Engines APU1C4

              1 Reply Last reply Reply Quote 0
              • jimpJ Offline
                jimp Rebel Alliance Developer Netgate
                last edited by

                Looks like you're missing the /boot/loader.conf.local adjustment to increase nmbclusters. On that box just set it to kern.ipc.nmbclusters="1000000"

                https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards#Adding_to_loader.conf.local

                Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                Need help fast? Netgate Global Support!

                Do not Chat/PM for help!

                1 Reply Last reply Reply Quote 0
                • B Offline
                  bennyc
                  last edited by

                  Hi Jimp,

                  thanks for looking at my issue(s). I verified that box, and I have 3 rows in the /boot/loader.conf.local:

                  ahci_load="YES"
                  kern.cam.boot_delay=10000
                  kern.ipc.nmbclusters="1000000"
                  

                  I checked it against the "master" (still on 2.2.6), and see no difference, even the file TS is the same.

                  4x XG-7100 (2xHA), 1x SG-4860, 1x SG-2100
                  1x PC Engines APU2C4, 1x PC Engines APU1C4

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post
                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.