Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    BGP Flaps on pfsense

    Scheduled Pinned Locked Moved Routing and Multi WAN
    5 Posts 2 Posters 502 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • R
      rahul.yedavi
      last edited by

      Looking for a root cause of bgp flap issue happening more frequently in our pfsense pair of firewalls.

      From the logs it indicates the flaps are occurring due to Hold Timer Expired however from the other end there are no interface flaps or drops happening.

      Any other pointers apart from "Hold Timer Expired" are appreciated. It keeps on happening quiet freqently on random intervals & we couldn't find issue in other end switch side.

      Jun 13 04:09:14 <firewall_hostname> bgpd[80064]: %NOTIFICATION: sent to neighbor ixl1.1001 4/0 (Hold Timer Expired) 0 bytes
      Jun 13 04:09:14 <firewall_hostname> bgpd[80064]: %ADJCHANGE: neighbor ixl1.1001(<neigh_switch_2>) in vrf default Down BGP Notification send
      Jun 13 04:09:14 <firewall_hostname> zebra[79339]: [EC 100663303] kernel_rtm: 10.0.0.0/8: rtm_write() unexpectedly returned -4 for command RTM_DELETE
      Jun 13 04:09:15 <firewall_hostname> bgpd[80064]: %ADJCHANGE: neighbor ixl1.1001(<neigh_switch_2>) in vrf default Up
      Jun 13 04:09:18 <firewall_hostname> bgpd[80064]: %NOTIFICATION: sent to neighbor ixl0.1000 4/0 (Hold Timer Expired) 0 bytes
      Jun 13 04:09:18 <firewall_hostname> bgpd[80064]: %ADJCHANGE: neighbor ixl0.1000(<neigh_switch_1>) in vrf default Down BGP Notification send
      Jun 13 04:09:18 <firewall_hostname> zebra[79339]: [EC 100663303] kernel_rtm: 0.0.0.0/0: rtm_write() unexpectedly returned -4 for command RTM_DELETE
      Jun 13 04:09:24 <firewall_hostname> bgpd[80064]: %NOTIFICATION: sent to neighbor ixl1.1001 4/0 (Hold Timer Expired) 0 bytes
      Jun 13 04:09:24 <firewall_hostname> bgpd[80064]: %ADJCHANGE: neighbor ixl1.1001(<neigh_switch_2>) in vrf default Down BGP Notification send
      Jun 13 04:09:24 <firewall_hostname> bgpd[80064]: %NOTIFICATION: sent to neighbor ixl0.1001 4/0 (Hold Timer Expired) 0 bytes
      Jun 13 04:09:24 <firewall_hostname> bgpd[80064]: %ADJCHANGE: neighbor ixl0.1001(<neigh_switch_1>) in vrf default Down BGP Notification send
      Jun 13 04:09:26 <firewall_hostname> bgpd[80064]: %NOTIFICATION: sent to neighbor ixl1.1000 4/0 (Hold Timer Expired) 0 bytes
      Jun 13 04:09:26 <firewall_hostname> bgpd[80064]: %ADJCHANGE: neighbor ixl1.1000(<neigh_switch_2>) in vrf default Down BGP Notification send
      Jun 13 04:09:26 <firewall_hostname> bgpd[80064]: %ADJCHANGE: neighbor ixl0.1000(<neigh_switch_1>) in vrf default Up
      Jun 13 04:09:26 <firewall_hostname> bgpd[80064]: %ADJCHANGE: neighbor ixl1.1001(<neigh_switch_2>) in vrf default Up
      Jun 13 04:09:26 <firewall_hostname> bgpd[80064]: %NOTIFICATION: sent to neighbor ixl0.1001 6/7 (Cease/Connection collision resolution) 0 bytes
      Jun 13 04:09:26 <firewall_hostname> bgpd[80064]: %ADJCHANGE: neighbor ixl0.1001(<neigh_switch_1>) in vrf default Up
      Jun 13 04:09:27 <firewall_hostname> bgpd[80064]: %ADJCHANGE: neighbor ixl1.1000(<neigh_switch_2>) in vrf default Up
      Jun 13 04:09:29 <firewall_hostname> bgpd[80064]: %NOTIFICATION: rcvd End-of-RIB for IPv4 Unicast from ixl0.1000 in vrf default
      Jun 13 04:09:29 <firewall_hostname> bgpd[80064]: %NOTIFICATION: rcvd End-of-RIB for IPv4 Unicast from ixl1.1001 in vrf default
      Jun 13 04:09:29 <firewall_hostname> bgpd[80064]: %NOTIFICATION: rcvd End-of-RIB for IPv4 Unicast from ixl0.1001 in vrf default
      Jun 13 04:09:29 <firewall_hostname> bgpd[80064]: %NOTIFICATION: rcvd End-of-RIB for IPv4 Unicast from ixl1.1000 in vrf default

      M 1 Reply Last reply Reply Quote 0
      • M
        michmoor LAYER 8 Rebel Alliance @rahul.yedavi
        last edited by

        @rahul-yedavi well the holddown timers is pretty important so we should focus on that. Why isnt the peer responding with BGP hellos? CPU usage?
        Couple of things i would do

        1. take a pcap on both sides. See if BGP hellos are being sent and correlate that with the lost of the adjacency.
        2. Check CPU on both sides along with the health of the link. Is this over a VPN or over a direct connect? If over a VPN and assuming you have gateway monitoring enabled how are things looking?
        3. Finally on a personal note, i always try to enable BFD. BFD does not rely on routing protocol timers so it can detect fault quicker and notify the routing process so routes can converge.

        Firewall: NetGate,Palo Alto-VM,Juniper SRX
        Routing: Juniper, Arista, Cisco
        Switching: Juniper, Arista, Cisco
        Wireless: Unifi, Aruba IAP
        JNCIP,CCNP Enterprise

        R 1 Reply Last reply Reply Quote 1
        • R
          rahul.yedavi @michmoor
          last edited by

          @michmoor Thank you so much for the suggestions. This is very helpful

          1. We do intend to take a simultaneous pcap from both ends however the occurrence of the event is random and we can't keep a pcap running for longer intervals. Is there any kind of cron job or schedular that we can set with a trigger of BGP flap event on pfsense which will collect a pcap during the actual occurrence of the issue?

          2. Noted on this, we will monitor the cpu trend from both ends to check if any thresholds were crossed which can cause an issue with resource utilization. This is over direct connect.

          3. We will have a discussion about the addition of BFD to the network.

          M 1 Reply Last reply Reply Quote 0
          • M
            michmoor LAYER 8 Rebel Alliance @rahul.yedavi
            last edited by

            @rahul-yedavi Im not aware of any CRON job.
            Are these BGP sessions made over VPN or over a direct connection (physically connected to another router).

            If over a VPN, this may be the quality of the internet links between your two devices. DPinger may be able to reveal if there are link quality issues but ultimately you cant do anything about it.

            Firewall: NetGate,Palo Alto-VM,Juniper SRX
            Routing: Juniper, Arista, Cisco
            Switching: Juniper, Arista, Cisco
            Wireless: Unifi, Aruba IAP
            JNCIP,CCNP Enterprise

            R 1 Reply Last reply Reply Quote 0
            • R
              rahul.yedavi @michmoor
              last edited by

              @michmoor , thanks for the response. We don't have any VPN between the firewall and the downstream device between which the BGP is flapping. The firewall is directly connected to the downstream switch.

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.