Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Pfsense 2.4.5 - Bug? "bge0: firmware handshake timed out, found 0x4b657654" dropping WAN interface needing reboot.

    Scheduled Pinned Locked Moved Hardware
    15 Posts 3 Posters 1.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • AterfaxA
      Aterfax
      last edited by Aterfax

      Error message in the logs when my WAN interface stops functioning and cannot reconnect until reboot:

      bge0: firmware handshake timed out, found 0x4b657654
      

      Hardware is a HP Gen 8 Microserver, pfsense is running in a VM inside of Unraid.

      Pfsense version 2.4.5 and the network card is a HPE Ethernet 1Gb 2-port NC332i Adapter (BCM5720 Broadcom).

      Interface is running in PPPoE via a modem, the card itself is running in pfsense via hardware pass-through.

      Nothing obvious in the VM logs, didn't have the issue with previous release and seems to happen like clockwork every 24 hrs / in the early morning (1am.)

      Anyone got any ideas? Having to reboot every 24 hours is sub-optimal.

      I am probably going to have to bridge the interface with the host in the meantime.

      Possible regression to reported behaviour in? https://redmine.pfsense.org/issues/6423

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Seem unrelated to that bug, not a PPPoE issue.

        What happens at 1am? Check the Cron entries, installing the Cron package makes that easier.

        Steve

        1 Reply Last reply Reply Quote 0
        • noplanN
          noplan
          last edited by

          Renew of your Wan ip from your ISP?
          Maybe

          1 Reply Last reply Reply Quote 0
          • AterfaxA
            Aterfax
            last edited by

            @stephenw10 - It's not CRON related afaik, its roughly every 24 hr, but it is not at an exact specific time and sometimes it skips a day - the bug I have referenced is possibly related as it is discussing him having an issue with the same Broadcom chipset and sounds like exactly the same behaviour.

            Here's cron anyway:

            d38cbd0f-0ae2-4bdc-86bf-e9297e32b0f7-image.png

            @noplan Renewing the WAN IP from the ISP is not possible in some terms since I have a static address. If you mean reconnect:

            I can see the PPP daemon trying to do a reconnect - I have manually tried to make it reconnect, I have brought the interface down and up then tried to manually reconnect.

            Nada - only rebooting seems to bring it back online which would seem to agree with the error message that the card has dropped from the kernel for some reason or another.

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              You have the dyndns update running at 01.01. Hard to imagine that could kill the NIC somehow. But easy to test by disabling it.

              Steve

              1 Reply Last reply Reply Quote 0
              • AterfaxA
                Aterfax
                last edited by

                It's not that, since it did it again at now at 00:37, log output below:

                https://pastebin.com/raw/6KddXiNT

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Ok so you are also seeing timeout errors on em1 but that is able to recover:

                  May 22 00:37:30 oakenshield kernel: em1: Watchdog timeout -- resetting
                  May 22 00:37:30 oakenshield kernel: em1: 2 link states coalesced
                  May 22 00:37:30 oakenshield kernel: em1: link state changed to UP
                  May 22 00:37:30 oakenshield kernel: bge0: watchdog timeout -- resetting
                  

                  'Link states coalesced' implies it was flapping too fast to show each state.

                  Having em1 also implies you have em0. The first thing I would try there is to swap the em0 and bge0 interface assignments.

                  Steve

                  AterfaxA 1 Reply Last reply Reply Quote 0
                  • AterfaxA
                    Aterfax @stephenw10
                    last edited by

                    Can you explain what you mean?

                    I do have an em0 but swapping them is, in a sense, impossible.

                    em0 and all emX interfaces are virtual adaptors from KVM which are connected to bridged VLANs on the host.

                    bge0 is a physical device - a PCI passthrough from the host of one of the ports on the physical card.

                    Swapping the passthrough to the other port would only change port on the card (and would still show up as bge0) while the virtual adaptors would still show up as emX. (Not sure that would really change anything at all?)

                    @stephenw10 said in Pfsense 2.4.5 - Bug? "bge0: firmware handshake timed out, found 0x4b657654" dropping WAN interface needing reboot.:

                    Ok so you are also seeing timeout errors on em1 but that is able to recover:

                    May 22 00:37:30 oakenshield kernel: em1: Watchdog timeout -- resetting
                    May 22 00:37:30 oakenshield kernel: em1: 2 link states coalesced
                    May 22 00:37:30 oakenshield kernel: em1: link state changed to UP
                    May 22 00:37:30 oakenshield kernel: bge0: watchdog timeout -- resetting
                    

                    'Link states coalesced' implies it was flapping too fast to show each state.

                    Having em1 also implies you have em0. The first thing I would try there is to swap the em0 and bge0 interface assignments.

                    Steve

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Ah, OK. Yeah no way to do that then.

                      Hmm, hard to say if em1 timing out is a symptom or cause there. Can you switch those out to virtio NICs?

                      Steve

                      AterfaxA 1 Reply Last reply Reply Quote 0
                      • AterfaxA
                        Aterfax @stephenw10
                        last edited by Aterfax

                        @stephenw10 Will swap those out to virtio now, however I think when I had them as virtio they did not work correctly in some manner.

                        Edit: Seems to be functioning with virtio adaptors well enough in the short term.

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Good to hear. I'm not aware if any issues with virtio. I use them here in Proxmox for a number of VMs and have not seen any problems.

                          Steve

                          1 Reply Last reply Reply Quote 0
                          • AterfaxA
                            Aterfax
                            last edited by Aterfax

                            It dropped again this morning around 00:15, not sure what to make of the logs however this time was now coincidental with pfctl driving the CPU to 100% at the same time. Log output below, with some more about the connection from PPP not that I am sure of its relevance:

                            https://pastebin.com/hAWS3Gzi

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              Ah, if you were seeing pfctl at 100% you're probably hitting this: https://redmine.pfsense.org/issues/10414

                              You can test that by pinging the firewall and running Status > Filter reload. If you see ping times spike to ridiculous levels you are hitting it. Try disabling smp as shown in comment 15 on that report.

                              That is fixed in 2.4.5p1 which should be available soon.

                              Steve

                              AterfaxA 1 Reply Last reply Reply Quote 0
                              • AterfaxA
                                Aterfax @stephenw10
                                last edited by

                                Doing a filter reload gave me:

                                Reply from 10.0.10.1: bytes=32 time<1ms TTL=64
                                Reply from 10.0.10.1: bytes=32 time<1ms TTL=64
                                Reply from 10.0.10.1: bytes=32 time<1ms TTL=64
                                Reply from 10.0.10.1: bytes=32 time=12ms TTL=64
                                Reply from 10.0.10.1: bytes=32 time<1ms TTL=64
                                Reply from 10.0.10.1: bytes=32 time<1ms TTL=64
                                Reply from 10.0.10.1: bytes=32 time=3175ms TTL=64
                                Reply from 10.0.10.1: bytes=32 time<1ms TTL=64
                                Reply from 10.0.10.1: bytes=32 time=1316ms TTL=64
                                Reply from 10.0.10.1: bytes=32 time=2ms TTL=64
                                Reply from 10.0.10.1: bytes=32 time<1ms TTL=64
                                

                                So it might not be that, this said - do I really want to disable SMP? Won't this result in a significant performance hit?

                                @stephenw10 said in Pfsense 2.4.5 - Bug? "bge0: firmware handshake timed out, found 0x4b657654" dropping WAN interface needing reboot.:

                                Ah, if you were seeing pfctl at 100% you're probably hitting this: https://redmine.pfsense.org/issues/10414

                                You can test that by pinging the firewall and running Status > Filter reload. If you see ping times spike to ridiculous levels you are hitting it. Try disabling smp as shown in comment 15 on that report.

                                That is fixed in 2.4.5p1 which should be available soon.

                                Steve

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  Yeah, that's a huge latency. When it reloads normally it's barely noticeable.

                                  I would at least test disabling smp to see if it solves the issue. If it does that is fixed in 2.4.5p1 so that will be a permanent solution.

                                  Steve

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post
                                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.