Unable to ping/telnet partner failover interface



  • Greetings all,

    I have a pair of pfSense boxes running 2.0.3 with a dedicated interface configured for failover on each firewall (cross-over cable connected to em1 on each side).  I have assigned a private IP to each side (active: 10.1.1.1/24  standby: 10.1.1.2/24) and have configured the firewall rules to allow all traffic on the appropriate interface for each firewall.  However, I am unable to ping/telnet/http to the failover interface from the partner.  I tried manually shutting down the interface on each node (ifconfig em1 down) but the partner side still shows em1 = up.  If I reboot the standby firewall, I get "status: no carrier" for em1 on the primary side, so, I am pretty sure the cabling is correct.

    Strangely enough, all the CARP VIPs are running on the primary and operate properly.  If I disable CARP on the primary, the VIPs automatically get activated on the standby (and vice versa).  Until I get this problem resolved, I cannot sync the configs between the firewalls.

    Any ideas?  What am I missing?

    Thanks!


  • Rebel Alliance Developer Netgate

    Post the output of the following commands from both nodes:

    ifconfig em1
    netstat -rn | grep 10.1.1
    pfctl -sr | egrep '(10.1.1|em1)'
    


  • Thanks Jim.  Here is the requested output:

    Firewall-1:

    ifconfig em1

    em1: flags=8c43 <up,broadcast,running,oactive,simplex,multicast>metric 0 mtu 1500
    options=42098 <vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_magic,vlan_hwtso>ether 00:e0:81:79:dd:35
    inet6 fe80::2e0:81ff:fe79:dd35%em1 prefixlen 64 scopeid 0x2
    inet 10.1.1.1 netmask 0xfffffffc broadcast 10.1.1.3
    nd6 options=43 <performnud,accept_rtadv>media: Ethernet 100baseTX <full-duplex>status: active

    netstat -rn | grep 10.1.1

    10.1.1.0/30        link#2            U          0 12735499    em1
    10.1.1.1          link#2            UHS        0        0    lo0

    pfctl -sr | egrep '(10.1.1|em1)'

    scrub on em1 all fragment reassemble
    block drop in on ! em1 inet from 10.1.1.0/30 to any
    block drop in inet from 10.1.1.1 to any
    block drop in on em1 inet6 from fe80::2e0:81ff:fe79:dd35 to any
    pass in quick on em1 all flags S/SA keep state label "USER_RULE"

    Firewall-2:

    ifconfig em1

    em1: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
    options=4209b <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_magic,vlan_hwtso>ether 00:e0:81:79:dd:3b
    inet6 fe80::2e0:81ff:fe79:dd3b%em1 prefixlen 64 scopeid 0x2
    inet 10.1.1.2 netmask 0xfffffffc broadcast 10.1.1.3
    nd6 options=43 <performnud,accept_rtadv>media: Ethernet 100baseTX <full-duplex>status: active

    netstat -rn | grep 10.1.1

    10.1.1.0/30        link#2            U          0        0    em1
    10.1.1.2          link#2            UHS        0        0    lo0

    pfctl -sr | egrep '(10.1.1|em1)'

    scrub on em1 all fragment reassemble
    block drop in on ! em1 inet from 10.1.1.0/30 to any
    block drop in inet from 10.1.1.2 to any
    block drop in on em1 inet6 from fe80::2e0:81ff:fe79:dd3b to any
    pass in quick on em1 all flags S/SA keep state label "USER_RULE"

    Looking at the above output, I am surprised to see the "block drop in" output from the pfctl command.  From the Web GUI, I double-checked to verify the "pass any any" rules on the FAILOVER interface on both firewalls.

    Another bit of interesting info is some em1 messages in /var/log/system on the primary firewall:

    
    Jun  5 18:42:24 ral-prod-fw1 kernel: em1: Watchdog timeout -- resetting
    Jun  5 18:42:24 ral-prod-fw1 kernel: em1: Queue(0) tdh = 0, hw tdt = 993
    Jun  5 18:42:24 ral-prod-fw1 kernel: em1: TX(0) desc avail = 31,Next TX to Clean = 0
    Jun  5 18:42:24 ral-prod-fw1 kernel: em1: link state changed to DOWN
    Jun  5 18:42:24 ral-prod-fw1 check_reload_status: Linkup starting em1
    Jun  5 18:42:24 ral-prod-fw1 check_reload_status: Linkup starting em1
    Jun  5 18:42:24 ral-prod-fw1 kernel: em1: link state changed to UP
    Jun  5 18:42:27 ral-prod-fw1 php: : Hotplug event detected for opt1 but ignoring since interface is configured with static IP (10.1.1.1)
    Jun  5 18:42:27 ral-prod-fw1 php: : Hotplug event detected for opt1 but ignoring since interface is configured with static IP (10.1.1.1)
    Jun  5 18:42:27 ral-prod-fw1 check_reload_status: rc.newwanip starting em1
    Jun  5 18:42:29 ral-prod-fw1 php: : rc.newwanip: Informational is starting em1.
    Jun  5 18:42:29 ral-prod-fw1 php: : rc.newwanip: on (IP address: 10.1.1.1) (interface: opt1) (real interface: em1).
    Jun  5 18:42:29 ral-prod-fw1 apinger: Exiting on signal 15.
    Jun  5 18:42:30 ral-prod-fw1 check_reload_status: Reloading filter
    
    

    None of the other interfaces seem to have this issue….</full-duplex></performnud,accept_rtadv></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_magic,vlan_hwtso></up,broadcast,running,simplex,multicast></full-duplex></performnud,accept_rtadv></vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_magic,vlan_hwtso></up,broadcast,running,oactive,simplex,multicast>


  • Rebel Alliance Developer Netgate

    Do you have a little switch you can plug in between them to see if that helps? Or change out the cable?

    The block rule is normal, note that it doesn't say "quick" on it, so it's not going to match first since the other rule will catch it.



  • Thanks Jim.  I will head out to the datacenter tomorrow and try a different cable.

    In the meantime, I chose the LAN interface for the config sync until I can get the failover interface working.

    Appreciate your assistance…