upgrade 2.4.3 to 2.4.4 changes bird behavior during filter reload



  • Hi all,

    after upgrading my box from 2.4.3 to 2.4.4 every change on filter or nat rules
    (after applying) is breaking my iBGP Sessions.
    BGP Sessions are using BFD

    Normal State:

    ATVIED6BNG01 BGP master up 16:42:19 Established
    ATVIED6BNG02 BGP master up 17:16:00 Established
    ATVIED6INFRAH1_NET_A BGP master up 16:43:16 Established
    ATVIED6INFRAH1_NET_B BGP master up 16:43:08 Established
    ATVIED6INFRAH2_NET_A BGP master up 16:43:10 Established
    ATVIED6INFRAH2_NET_B BGP master up 16:43:04 Established
    ATVIED6INFRAH3_NET_A BGP master up 16:43:13 Established
    ATVIED6INFRAH3_NET_B BGP master up 16:43:08 Established
    ATVIED6INFRAH4_NET_A BGP master up 16:43:05 Established
    ATVIED6INFRAH4_NET_B BGP master up 16:43:05 Established

    After Changing Rules

    ATVIED6INFRAH1_NET_A BGP master up 17:17:47 Established
    ATVIED6INFRAH1_NET_B BGP master start 17:18:08 Idle Error: BFD session down
    ATVIED6INFRAH2_NET_A BGP master start 17:18:08 Passive Received: Cease
    ATVIED6INFRAH2_NET_B BGP master start 17:18:08 Passive Received: Cease
    ATVIED6INFRAH3_NET_A BGP master start 17:18:08 Passive Received: Cease
    ATVIED6INFRAH3_NET_B BGP master start 17:18:08 Idle Received: Cease
    ATVIED6INFRAH4_NET_A BGP master start 17:18:08 Idle Received: Cease
    ATVIED6INFRAH4_NET_B BGP master start 17:18:08 Passive Received: Cease

    Only BGP Sessions on the "LAN" side are affected.

    Peers on the WAN Side (ATVIED6BNG01, ATVIED6BNG02) are Cisco
    Peers on the LAN Side (ATVIED6INFRAH1-4) are other Bird Instances

    I have reverted the bird package to 1.6.3 to match version under 2.4.3 but nothing changed.
    It seems something has changed in 2.4.4 reagrding the filter reload

    any ideas ?

    BR
    Stefan



  • Bird is not a supported package on pfSense, IIRC the team recommends the "frr" package for BGP.



  • Yes i know , but it's the only option to habe BGP with BFD support.
    It was working all the time - only this update break it.

    I'will have a look at the FRR package but i don't think BFD support is included

    thanks


  • Netgate Administrator

    Ok, so to be clear you installed the FreeBSD bird package? Did it pull in any other dependencies?
    One might have been replaced with our version at upgrade.

    Steve



  • I have installed it via shell

    PACKAGE-INFO

    [2.4.4-RELEASE][root@ATVIED6INFRAFW2.as29081.net]/root: pkg info bird
    bird-1.6.4
    Name : bird
    Version : 1.6.4
    Installed on : Mon Nov 12 11:33:34 2018 CET
    Origin : net/bird
    Architecture : FreeBSD:11:amd64
    Prefix : /usr/local
    Categories : net
    Licenses : GPLv2
    Maintainer : olivier@FreeBSD.org
    WWW : http://bird.network.cz/
    Comment : Dynamic IP routing daemon (IPv4 version)
    Shared Libs required:
    libreadline.so.7
    Annotations :
    FreeBSD_version: 1102000
    flavor : ipv4
    repo_type : binary
    repository : pfSense
    Flat size : 554KiB
    Description :
    The BIRD project aims to develop a fully functional dynamic IP routing daemon.

    • Both IPv4 and IPv6
    • Multiple routing tables
    • BGP
    • RIP
    • OSPF
    • Static routes
    • Inter-table protocol
    • Command-line interface
    • Soft reconfiguration
    • Powerful language for route filtering

    WWW: http://bird.network.cz/

    bird 1.6.3 was only for testing - downloaded the package provided by FreeBSD - extracted it and only used
    the binary.
    Should i install bird 1.6.3 from pfsense repo Package ? (how can i do this?)


  • Netgate Administrator

    Hmm, well as far as I know it's completely untested. I didn't even realise it was in our repo until I checked it.

    Do you see blocked packets from the peers? Do you see any packets from the peers? Or being sent to the peers?

    Steve



  • i will try to do some deeper debug (tcpdump)
    Maybe something has changed (parameters) in the way pfctl is called for filter reload

    at time it seems all TCP Session conncted to the pfsense box get stalled when
    filter is reloaded (for example SSH remote Session timed out)



  • Some more information

    every "save" action in the gui causes a disruption - this could also be the reason for delay response
    from the gui

    Syslog shows this after changig Log-Settings

    Nov 15 10:38:50 sshd 17503 Fssh_packet_write_poll: Connection from user root 62.212.164.58 port 24480: Permission denied
    Nov 15 10:38:47 syslogd kernel boot file is /boot/kernel/kernel
    Nov 15 10:38:47 syslogd exiting on signal 15
    Nov 15 10:38:45 root /etc/rc.d/hostid: WARNING: hostid: unable to figure out a UUID from DMI data, generating a new one
    Nov 15 10:38:45 check_reload_status Syncing firewall

    In the meanwhile i have also modified the BGP BFD Timers because tcpdump and bird logs indicates
    BGP Session down due to failed BFD Session
    After this birdc give following info to me after a filter reload

    birdc
    ATVIED6INFRAH1_NET_A BGP master start 10:31:15 Passive Socket: Permission denied
    ATVIED6INFRAH1_NET_B BGP master start 10:31:15 Passive Socket: Permission denied
    ATVIED6INFRAH2_NET_A BGP master start 10:31:15 Passive Socket: Permission denied
    ATVIED6INFRAH2_NET_B BGP master start 10:31:14 Passive Socket: Permission denied
    ATVIED6INFRAH3_NET_A BGP master start 10:31:15 Passive Socket: Permission denied
    ATVIED6INFRAH3_NET_B BGP master start 10:31:15 Passive Error: BFD session down
    ATVIED6INFRAH4_NET_A BGP master start 10:31:15 Passive Socket: Permission denied
    ATVIED6INFRAH4_NET_B BGP master start 10:31:15 Passive Socket: Permission denied

    it seems sockets get cleared during reload - would also match the ssh error we can see in the syslog


  • Netgate Administrator

    Hmm, that's curious. Just to confirm if you are ssh'd into the firewall and you reload the filter from Status > Filter Reload in the GUI your ssh session is interrupted? Disconnected?

    That should not happen. I've never seen it on any of my test boxes.

    Do you lose the firewall state when that happens? Do you have any of the advanced state killing options selected? Like 'Flush all states when a gateway goes down' for example.

    Steve



  • Yes, i ssh'd into the box and after "Status->Filter Reload" or any other action , for example changing a
    Rule make the terminal unresponsive and after 1 Minute i got the message "Session closed".

    BUT
    i have done some additional tests right now , and it's working at time 🤔 🤔 🤔
    Don't know whtas going on.
    i will watch the situation for a few days ....

    Many thanks for your help Steve

    • and to answer the rest of your question: no advanced state killing options selected