Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Pfsense 2.4.5 - Bug? "bge0: firmware handshake timed out, found 0x4b657654" dropping WAN interface needing reboot.

    Scheduled Pinned Locked Moved Hardware
    15 Posts 3 Posters 1.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • AterfaxA
      Aterfax
      last edited by

      @stephenw10 - It's not CRON related afaik, its roughly every 24 hr, but it is not at an exact specific time and sometimes it skips a day - the bug I have referenced is possibly related as it is discussing him having an issue with the same Broadcom chipset and sounds like exactly the same behaviour.

      Here's cron anyway:

      d38cbd0f-0ae2-4bdc-86bf-e9297e32b0f7-image.png

      @noplan Renewing the WAN IP from the ISP is not possible in some terms since I have a static address. If you mean reconnect:

      I can see the PPP daemon trying to do a reconnect - I have manually tried to make it reconnect, I have brought the interface down and up then tried to manually reconnect.

      Nada - only rebooting seems to bring it back online which would seem to agree with the error message that the card has dropped from the kernel for some reason or another.

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        You have the dyndns update running at 01.01. Hard to imagine that could kill the NIC somehow. But easy to test by disabling it.

        Steve

        1 Reply Last reply Reply Quote 0
        • AterfaxA
          Aterfax
          last edited by

          It's not that, since it did it again at now at 00:37, log output below:

          https://pastebin.com/raw/6KddXiNT

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Ok so you are also seeing timeout errors on em1 but that is able to recover:

            May 22 00:37:30 oakenshield kernel: em1: Watchdog timeout -- resetting
            May 22 00:37:30 oakenshield kernel: em1: 2 link states coalesced
            May 22 00:37:30 oakenshield kernel: em1: link state changed to UP
            May 22 00:37:30 oakenshield kernel: bge0: watchdog timeout -- resetting
            

            'Link states coalesced' implies it was flapping too fast to show each state.

            Having em1 also implies you have em0. The first thing I would try there is to swap the em0 and bge0 interface assignments.

            Steve

            AterfaxA 1 Reply Last reply Reply Quote 0
            • AterfaxA
              Aterfax @stephenw10
              last edited by

              Can you explain what you mean?

              I do have an em0 but swapping them is, in a sense, impossible.

              em0 and all emX interfaces are virtual adaptors from KVM which are connected to bridged VLANs on the host.

              bge0 is a physical device - a PCI passthrough from the host of one of the ports on the physical card.

              Swapping the passthrough to the other port would only change port on the card (and would still show up as bge0) while the virtual adaptors would still show up as emX. (Not sure that would really change anything at all?)

              @stephenw10 said in Pfsense 2.4.5 - Bug? "bge0: firmware handshake timed out, found 0x4b657654" dropping WAN interface needing reboot.:

              Ok so you are also seeing timeout errors on em1 but that is able to recover:

              May 22 00:37:30 oakenshield kernel: em1: Watchdog timeout -- resetting
              May 22 00:37:30 oakenshield kernel: em1: 2 link states coalesced
              May 22 00:37:30 oakenshield kernel: em1: link state changed to UP
              May 22 00:37:30 oakenshield kernel: bge0: watchdog timeout -- resetting
              

              'Link states coalesced' implies it was flapping too fast to show each state.

              Having em1 also implies you have em0. The first thing I would try there is to swap the em0 and bge0 interface assignments.

              Steve

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Ah, OK. Yeah no way to do that then.

                Hmm, hard to say if em1 timing out is a symptom or cause there. Can you switch those out to virtio NICs?

                Steve

                AterfaxA 1 Reply Last reply Reply Quote 0
                • AterfaxA
                  Aterfax @stephenw10
                  last edited by Aterfax

                  @stephenw10 Will swap those out to virtio now, however I think when I had them as virtio they did not work correctly in some manner.

                  Edit: Seems to be functioning with virtio adaptors well enough in the short term.

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Good to hear. I'm not aware if any issues with virtio. I use them here in Proxmox for a number of VMs and have not seen any problems.

                    Steve

                    1 Reply Last reply Reply Quote 0
                    • AterfaxA
                      Aterfax
                      last edited by Aterfax

                      It dropped again this morning around 00:15, not sure what to make of the logs however this time was now coincidental with pfctl driving the CPU to 100% at the same time. Log output below, with some more about the connection from PPP not that I am sure of its relevance:

                      https://pastebin.com/hAWS3Gzi

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Ah, if you were seeing pfctl at 100% you're probably hitting this: https://redmine.pfsense.org/issues/10414

                        You can test that by pinging the firewall and running Status > Filter reload. If you see ping times spike to ridiculous levels you are hitting it. Try disabling smp as shown in comment 15 on that report.

                        That is fixed in 2.4.5p1 which should be available soon.

                        Steve

                        AterfaxA 1 Reply Last reply Reply Quote 0
                        • AterfaxA
                          Aterfax @stephenw10
                          last edited by

                          Doing a filter reload gave me:

                          Reply from 10.0.10.1: bytes=32 time<1ms TTL=64
                          Reply from 10.0.10.1: bytes=32 time<1ms TTL=64
                          Reply from 10.0.10.1: bytes=32 time<1ms TTL=64
                          Reply from 10.0.10.1: bytes=32 time=12ms TTL=64
                          Reply from 10.0.10.1: bytes=32 time<1ms TTL=64
                          Reply from 10.0.10.1: bytes=32 time<1ms TTL=64
                          Reply from 10.0.10.1: bytes=32 time=3175ms TTL=64
                          Reply from 10.0.10.1: bytes=32 time<1ms TTL=64
                          Reply from 10.0.10.1: bytes=32 time=1316ms TTL=64
                          Reply from 10.0.10.1: bytes=32 time=2ms TTL=64
                          Reply from 10.0.10.1: bytes=32 time<1ms TTL=64
                          

                          So it might not be that, this said - do I really want to disable SMP? Won't this result in a significant performance hit?

                          @stephenw10 said in Pfsense 2.4.5 - Bug? "bge0: firmware handshake timed out, found 0x4b657654" dropping WAN interface needing reboot.:

                          Ah, if you were seeing pfctl at 100% you're probably hitting this: https://redmine.pfsense.org/issues/10414

                          You can test that by pinging the firewall and running Status > Filter reload. If you see ping times spike to ridiculous levels you are hitting it. Try disabling smp as shown in comment 15 on that report.

                          That is fixed in 2.4.5p1 which should be available soon.

                          Steve

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            Yeah, that's a huge latency. When it reloads normally it's barely noticeable.

                            I would at least test disabling smp to see if it solves the issue. If it does that is fixed in 2.4.5p1 so that will be a permanent solution.

                            Steve

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.