Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Pfsense 2.4.5 - Bug? "bge0: firmware handshake timed out, found 0x4b657654" dropping WAN interface needing reboot.

    Scheduled Pinned Locked Moved Hardware
    15 Posts 3 Posters 1.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      You have the dyndns update running at 01.01. Hard to imagine that could kill the NIC somehow. But easy to test by disabling it.

      Steve

      1 Reply Last reply Reply Quote 0
      • AterfaxA
        Aterfax
        last edited by

        It's not that, since it did it again at now at 00:37, log output below:

        https://pastebin.com/raw/6KddXiNT

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Ok so you are also seeing timeout errors on em1 but that is able to recover:

          May 22 00:37:30 oakenshield kernel: em1: Watchdog timeout -- resetting
          May 22 00:37:30 oakenshield kernel: em1: 2 link states coalesced
          May 22 00:37:30 oakenshield kernel: em1: link state changed to UP
          May 22 00:37:30 oakenshield kernel: bge0: watchdog timeout -- resetting
          

          'Link states coalesced' implies it was flapping too fast to show each state.

          Having em1 also implies you have em0. The first thing I would try there is to swap the em0 and bge0 interface assignments.

          Steve

          AterfaxA 1 Reply Last reply Reply Quote 0
          • AterfaxA
            Aterfax @stephenw10
            last edited by

            Can you explain what you mean?

            I do have an em0 but swapping them is, in a sense, impossible.

            em0 and all emX interfaces are virtual adaptors from KVM which are connected to bridged VLANs on the host.

            bge0 is a physical device - a PCI passthrough from the host of one of the ports on the physical card.

            Swapping the passthrough to the other port would only change port on the card (and would still show up as bge0) while the virtual adaptors would still show up as emX. (Not sure that would really change anything at all?)

            @stephenw10 said in Pfsense 2.4.5 - Bug? "bge0: firmware handshake timed out, found 0x4b657654" dropping WAN interface needing reboot.:

            Ok so you are also seeing timeout errors on em1 but that is able to recover:

            May 22 00:37:30 oakenshield kernel: em1: Watchdog timeout -- resetting
            May 22 00:37:30 oakenshield kernel: em1: 2 link states coalesced
            May 22 00:37:30 oakenshield kernel: em1: link state changed to UP
            May 22 00:37:30 oakenshield kernel: bge0: watchdog timeout -- resetting
            

            'Link states coalesced' implies it was flapping too fast to show each state.

            Having em1 also implies you have em0. The first thing I would try there is to swap the em0 and bge0 interface assignments.

            Steve

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              Ah, OK. Yeah no way to do that then.

              Hmm, hard to say if em1 timing out is a symptom or cause there. Can you switch those out to virtio NICs?

              Steve

              AterfaxA 1 Reply Last reply Reply Quote 0
              • AterfaxA
                Aterfax @stephenw10
                last edited by Aterfax

                @stephenw10 Will swap those out to virtio now, however I think when I had them as virtio they did not work correctly in some manner.

                Edit: Seems to be functioning with virtio adaptors well enough in the short term.

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Good to hear. I'm not aware if any issues with virtio. I use them here in Proxmox for a number of VMs and have not seen any problems.

                  Steve

                  1 Reply Last reply Reply Quote 0
                  • AterfaxA
                    Aterfax
                    last edited by Aterfax

                    It dropped again this morning around 00:15, not sure what to make of the logs however this time was now coincidental with pfctl driving the CPU to 100% at the same time. Log output below, with some more about the connection from PPP not that I am sure of its relevance:

                    https://pastebin.com/hAWS3Gzi

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Ah, if you were seeing pfctl at 100% you're probably hitting this: https://redmine.pfsense.org/issues/10414

                      You can test that by pinging the firewall and running Status > Filter reload. If you see ping times spike to ridiculous levels you are hitting it. Try disabling smp as shown in comment 15 on that report.

                      That is fixed in 2.4.5p1 which should be available soon.

                      Steve

                      AterfaxA 1 Reply Last reply Reply Quote 0
                      • AterfaxA
                        Aterfax @stephenw10
                        last edited by

                        Doing a filter reload gave me:

                        Reply from 10.0.10.1: bytes=32 time<1ms TTL=64
                        Reply from 10.0.10.1: bytes=32 time<1ms TTL=64
                        Reply from 10.0.10.1: bytes=32 time<1ms TTL=64
                        Reply from 10.0.10.1: bytes=32 time=12ms TTL=64
                        Reply from 10.0.10.1: bytes=32 time<1ms TTL=64
                        Reply from 10.0.10.1: bytes=32 time<1ms TTL=64
                        Reply from 10.0.10.1: bytes=32 time=3175ms TTL=64
                        Reply from 10.0.10.1: bytes=32 time<1ms TTL=64
                        Reply from 10.0.10.1: bytes=32 time=1316ms TTL=64
                        Reply from 10.0.10.1: bytes=32 time=2ms TTL=64
                        Reply from 10.0.10.1: bytes=32 time<1ms TTL=64
                        

                        So it might not be that, this said - do I really want to disable SMP? Won't this result in a significant performance hit?

                        @stephenw10 said in Pfsense 2.4.5 - Bug? "bge0: firmware handshake timed out, found 0x4b657654" dropping WAN interface needing reboot.:

                        Ah, if you were seeing pfctl at 100% you're probably hitting this: https://redmine.pfsense.org/issues/10414

                        You can test that by pinging the firewall and running Status > Filter reload. If you see ping times spike to ridiculous levels you are hitting it. Try disabling smp as shown in comment 15 on that report.

                        That is fixed in 2.4.5p1 which should be available soon.

                        Steve

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Yeah, that's a huge latency. When it reloads normally it's barely noticeable.

                          I would at least test disabling smp to see if it solves the issue. If it does that is fixed in 2.4.5p1 so that will be a permanent solution.

                          Steve

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post
                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.