• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

24.03 crashing (again)

Scheduled Pinned Locked Moved General pfSense Questions
20 Posts 4 Posters 962 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • R
    rocketboy001 @stephenw10
    last edited by Jun 24, 2024, 4:28 PM

    @stephenw10 0 Correct, it did not.

    1 Reply Last reply Reply Quote 0
    • R
      rocketboy001 @stephenw10
      last edited by Jun 24, 2024, 4:31 PM

      @stephenw10 -
      Yep, stats page shows: HAProxy version 2.9.7-5742051, released 2024/04/05

      1 Reply Last reply Reply Quote 0
      • S
        stephenw10 Netgate Administrator
        last edited by Jun 24, 2024, 4:46 PM

        Hmm, so that latest crash was with HAProxy 2.9.7 installed?

        R L 2 Replies Last reply Jun 24, 2024, 5:09 PM Reply Quote 0
        • R
          rocketboy001 @stephenw10
          last edited by Jun 24, 2024, 5:09 PM

          @stephenw10

          Yep, it occurred last night.

          1 Reply Last reply Reply Quote 0
          • L
            Luca De Andreis @stephenw10
            last edited by Jul 7, 2024, 1:26 PM

            @stephenw10

            Another crash running HAProxy 2.9.7

            textdump.tar.0 info.0

            1 Reply Last reply Reply Quote 0
            • S
              stephenw10 Netgate Administrator
              last edited by stephenw10 Jul 8, 2024, 2:36 PM Jul 7, 2024, 11:34 PM

              Hmm, same crash again. So it appears that panic is unrelated to that known issue with HAProxy using 100% CPU which should now be fixed.

              L 3 Replies Last reply Jul 8, 2024, 12:56 PM Reply Quote 0
              • L
                Luca De Andreis @stephenw10
                last edited by Jul 8, 2024, 12:56 PM

                @stephenw10
                Yes I can confirm. HAProxy 2.9.7 never use 100% CPU

                1 Reply Last reply Reply Quote 0
                • L
                  Luca De Andreis @stephenw10
                  last edited by Jul 8, 2024, 1:01 PM

                  @stephenw10

                  But.
                  I have another PfSense plus with HAproxy 2.9.7, very little traffic, almost nothing. Well, that PfSense has never presented any problems.
                  The problem is related not only to the presence of HAProxy 2.9.7 but also to the traffic or use of it.

                  5c587e20-dd49-4357-81e5-d253ae2ec768-immagine.png

                  1 Reply Last reply Reply Quote 0
                  • L
                    Luca De Andreis @stephenw10
                    last edited by Jul 8, 2024, 1:08 PM

                    @stephenw10

                    or....

                    is there a correlation between HAProxy 2.9.7 with the VM's virtual CPU ? In my case both VM running in Proxmox 8.2.2 (same version of QEMU, identical).
                    On the version that has NEVER given problems (and is very low traffic):

                    2c1e471c-7db3-4260-b321-aaae4b83d7c8-immagine.png

                    And on the version that has crashes:

                    295cbea9-efe1-4e1b-a7eb-211ab15e2ba4-immagine.png

                    1 Reply Last reply Reply Quote 0
                    • S
                      stephenw10 Netgate Administrator
                      last edited by Jul 8, 2024, 3:05 PM

                      Hmm, possibly some new instruction that HAProxy is using (or trying to use)?

                      If it was that expect to see it in some crypto operation but the backtrace doesn't look like that, it's in the network stack.

                      Is there any difference in the network config of those VMs?

                      L 1 Reply Last reply Jul 9, 2024, 1:45 PM Reply Quote 0
                      • L
                        Luca De Andreis @stephenw10
                        last edited by Jul 9, 2024, 1:45 PM

                        @stephenw10

                        mmmmm no.
                        I just checked and the configuration of the two network interfaces going to Pfsense is completely identical.
                        N.2 virtio type NICs with same configuration running on the same version of QEMU.

                        1 Reply Last reply Reply Quote 0
                        • S
                          stephenw10 Netgate Administrator
                          last edited by Jul 9, 2024, 6:15 PM

                          Hmm, OK the next step here is probably to enable a full kernel core dump and wait for it to happen again.

                          Do you have SWAP enabled on that VM? How much?

                          L 1 Reply Last reply Jul 10, 2024, 12:39 PM Reply Quote 0
                          • L
                            Luca De Andreis @stephenw10
                            last edited by Jul 10, 2024, 12:39 PM

                            @stephenw10

                            OK, if I can help, by installing a new kernel configured for debugging in case of core dump on the production HAProxy server, I'm available.

                            a420593b-c4eb-491c-a753-d86c87c8e830-immagine.png

                            1 Reply Last reply Reply Quote 0
                            • S
                              stephenw10 Netgate Administrator
                              last edited by Jul 10, 2024, 11:50 PM

                              Ok the first step is to enable a full core dump. Edit the file /etc/pfSense-ddb.conf and add a new kdb.enter.default script line like:

                              # $FreeBSD$
                              #
                              #  This file is read when going to multi-user and its contents piped thru
                              #  ``ddb'' to define debugging scripts.
                              #
                              # see ``man 4 ddb'' and ``man 8 ddb'' for details.
                              #
                              
                              script lockinfo=show locks; show alllocks; show lockedvnods
                              script pfs=bt ; show registers ; show pcpu ; run lockinfo ; acttrace ; ps ; alltrace
                              
                              # kdb.enter.panic       panic(9) was called.
                              # script kdb.enter.default=textdump set; capture on; run pfs ; capture off; textdump dump; reset
                              script kdb.enter.default=bt ; show registers ; dump ; reset
                              
                              # kdb.enter.witness	witness(4) detected a locking error.
                              script kdb.enter.witness=run lockinfo
                              

                              So there I commented out the old line and added: script kdb.enter.default=bt ; show registers ; dump ; reset

                              Now reboot as that is only read in at boot.

                              Then check it's present at the CLI with:

                              [24.08-DEVELOPMENT][root@7100.stevew.lan]/root: sysctl debug.ddb.scripting.scripts
                              debug.ddb.scripting.scripts: lockinfo=show locks; show alllocks; show lockedvnods
                              pfs=bt ; show registers ; show pcpu ; run lockinfo ; acttrace ; ps ; alltrace
                              kdb.enter.default=bt ; show registers ; dump ; reset
                              kdb.enter.witness=run lockinfo
                              

                              It will now dump the full vmcore after a panic.
                              You can check it by manually triggering a panic with: sysctl sysctl debug.kdb.panic=1

                              At the console you will see something like:

                              db:0:kdb.enter.default>  dump
                              Dumping 586 out of 8118 MB:..3%..11%..22%..33%..41%..52%..63%..71%..82%..93%
                              Dump complete
                              db:0:kdb.enter.default>  reset
                              Uptime: 17m8s
                              

                              The available SWAP space must be larger than the used RAM though. That 7100 is only using 586MB because it's a test box.

                              1 Reply Last reply Reply Quote 0
                              • M
                                marcosm Netgate
                                last edited by Jul 24, 2024, 4:35 PM

                                For reference:
                                https://redmine.pfsense.org/issues/15618

                                1 Reply Last reply Reply Quote 1
                                • C cboenning referenced this topic on Aug 21, 2024, 3:47 PM
                                • First post
                                  Last post
                                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                                  This community forum collects and processes your personal information.
                                  consent.not_received