Route stops working but ok after reboot



  • Posted in routing section but may be a 2.1 bug?

    Having a problem with one of the sites loosing connection, not sure if the build is the issue, but all 3 location are on latest 2.1 build now.  Previously Site A was on a build few weeks old.

    Site A 192.168.1.0/24
    Site B 192.168.5.0/24
    Site C 192.168.0.0/24

    3 sites connected through layer 2 MPLS via telco.  Confirmed it's not the telco dropping since PFSense can ping PFSense even when route not working.

    We switched to PFSense last night, and everything was working fine and all of a sudden Site B lose connection to Site A and A can't reach B, but Site C can still reach site B and vice versa.  Rebooting Site B has no affect, rebooting PFSense at A, everything routing correctly again.  Oh and Site A never looses connection to site C, just B.  Simple setup with static route set for the 3 sites, anybody have suggestion on troubleshooting this or know if it's a bug with 2.1?



  • @zhaolander:

    Confirmed it's not the telco dropping since PFSense can ping PFSense even when route not working.

    I don't understand what you mean by that. Do you mean something like "pfSense at site A gets a response when it pings pfSense site B but systems on site A can't access systems on site B"?

    What is in the system logs of pfSense A and pfSense B at around the time the site connectivity was lost? (See Status -> System Logs)

    Please give more details of what you mean by "lose connection", for example, "On pfSense at site A when I attempt to ping a non-pfSense system at site B ping reports …"



  • Thanks for the reply, yes PFSense at site A can ping PFSense at site B (interface/gw/should of tried pinging the subnet from PFSense but forgot during the outage), but the LAN subnet connected to site A unable to ping LAN subnet at site B and vince versa.

    Here are the logs

    Site B - Disconnected right around the 10:45 mark, updated the firmware once logged in but still route broken.

    Jul 31 10:57:55 shutdown: reboot by root: 
    Jul 31 10:56:13 php: /system_firmware_auto.php: The command '/usr/local/sbin/gzsig verify /etc/pubkey.pem < '/root/latest.tgz'' returned exit code '2', the output was 'No gzip signature found Couldn't verify input' 
    Jul 31 10:55:24 kernel: arp: 192.168.5.9 moved from a0:21:b7:c1:d9:c8 to a0:21:b7:c1:d9:c9 on em1 
    Jul 31 10:55:22 kernel: arp: 192.168.5.9 moved from a0:21:b7:c1:d9:c9 to a0:21:b7:c1:d9:c8 on em1 
    Jul 31 10:54:31 check_reload_status: Syncing firewall 
    Jul 31 10:53:47 php: /system_firmware_auto.php: The command '/usr/local/sbin/gzsig verify /etc/pubkey.pem < '/root/latest.tgz'' returned exit code '2', the output was 'No gzip signature found Couldn't verify input' 
    Jul 31 10:51:59 php: /index.php: Successful login for user 'admin' from: 192.168.5.35 
    Jul 31 10:51:59 php: /index.php: Successful login for user 'admin' from: 192.168.5.35 
    Jul 31 10:47:31 php: /system_firmware_auto.php: The command '/usr/local/sbin/gzsig verify /etc/pubkey.pem < '/root/latest.tgz'' returned exit code '2', the output was 'No gzip signature found Couldn't verify input' 
    Jul 31 10:45:04 php: /index.php: Successful login for user 'admin' from: 192.168.0.35 
    Jul 31 10:45:04 php: /index.php: Successful login for user 'admin' from: 192.168.0.35 
    Jul 31 10:38:47 kernel: arp: 192.168.5.9 moved from a0:21:b7:c1:d9:c8 to a0:21:b7:c1:d9:c9 on em1 
    Jul 31 10:38:45 kernel: arp: 192.168.5.9 moved from a0:21:b7:c1:d9:c9 to a0:21:b7:c1:d9:c8 on em1 
    Jul 31 10:36:58 kernel: arp: 192.168.5.9 moved from a0:21:b7:c1:d9:c8 to a0:21:b7:c1:d9:c9 on em1 
    Jul 31 10:36:56 kernel: arp: 192.168.5.9 moved from a0:21:b7:c1:d9:c9 to a0:21:b7:c1:d9:c8 on em1 
    Jul 31 10:29:52 kernel: arp: 192.168.5.9 moved from a0:21:b7:c1:d9:c8 to a0:21:b7:c1:d9:c9 on em1 
    Jul 31 10:29:50 kernel: arp: 192.168.5.9 moved from a0:21:b7:c1:d9:c9 to a0:21:b7:c1:d9:c8 on em1 
    Jul 31 10:28:16 kernel: arp: 192.168.5.9 moved from a0:21:b7:c1:d9:c8 to a0:21:b7:c1:d9:c9 on em1 
    
    

    Site A log - looks like a chunk is missing from 9 - 10:40 not sure why but updated and rebooted and everything works again.

    Jul 31 11:10:32 php: /system_firmware_auto.php: The command '/usr/local/sbin/gzsig verify /etc/pubkey.pem < '/root/latest.tgz'' returned exit code '2', the output was 'No gzip signature found Couldn't verify input' 
    Jul 31 11:08:08 php: /index.php: Successful login for user 'admin' from: 192.168.1.148 
    Jul 31 11:08:08 php: /index.php: Successful login for user 'admin' from: 192.168.1.148 
    Jul 31 10:59:27 php: rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use KelVPLS. 
    Jul 31 10:59:24 check_reload_status: Reloading filter 
    Jul 31 10:59:24 check_reload_status: Restarting OpenVPN tunnels/interfaces 
    Jul 31 10:59:24 check_reload_status: Restarting ipsec tunnels 
    Jul 31 10:59:24 check_reload_status: updating dyndns KelVPLS 
    Jul 31 10:58:35 php: rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use KelVPLS. 
    Jul 31 10:58:32 check_reload_status: Reloading filter 
    Jul 31 10:58:32 check_reload_status: Restarting OpenVPN tunnels/interfaces 
    Jul 31 10:58:32 check_reload_status: Restarting ipsec tunnels 
    Jul 31 10:58:32 check_reload_status: updating dyndns KelVPLS 
    Jul 31 10:45:23 php: rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use VPNFailover,WirelessGW. 
    Jul 31 10:45:21 check_reload_status: Reloading filter 
    Jul 31 10:45:21 check_reload_status: Restarting OpenVPN tunnels/interfaces 
    Jul 31 10:45:21 check_reload_status: Restarting ipsec tunnels 
    Jul 31 10:45:21 check_reload_status: updating dyndns VPNFailover,WirelessGW 
    Jul 31 10:45:09 check_reload_status: Reloading filter 
    Jul 31 10:45:09 php: rc.newwanip: pfSense package system has detected an ip change 10.0.8.1 -> 10.0.8.1 ... Restarting packages. 
    Jul 31 10:45:07 php: rc.newwanip: Creating rrd update script 
    Jul 31 10:45:02 php: rc.start_packages: Restarting/Starting all packages. 
    Jul 31 10:45:00 check_reload_status: Starting packages 
    Jul 31 10:45:00 php: rc.newwanip: pfSense package system has detected an ip change -> 10.0.9.1 ... Restarting packages. 
    Jul 31 10:45:00 php: rc.newwanip: rc.newwanip: on (IP address: 10.0.9.1) (interface: ) (real interface: ovpns2). 
    Jul 31 10:45:00 php: rc.newwanip: rc.newwanip: Informational is starting ovpns2\. 
    Jul 31 10:44:59 php: rc.newwanip: rc.newwanip: on (IP address: 10.0.8.1) (interface: opt4) (real interface: ovpns1). 
    Jul 31 10:44:59 php: rc.newwanip: rc.newwanip: Informational is starting ovpns1\. 
    Jul 31 10:44:58 check_reload_status: rc.newwanip starting ovpns2 
    Jul 31 10:44:58 kernel: ovpns2: link state changed to UP 
    Jul 31 10:44:57 kernel: ovpns2: link state changed to DOWN 
    Jul 31 10:44:57 check_reload_status: rc.newwanip starting ovpns1 
    Jul 31 10:44:57 kernel: ovpns1: link state changed to UP 
    Jul 31 10:44:57 php: rc.openvpn: OpenVPN: Resync server2 Roadwarrior 
    Jul 31 10:44:57 kernel: ovpns1: link state changed to DOWN 
    Jul 31 10:44:57 php: rc.openvpn: OpenVPN: Resync server1 VPN Failover 
    Jul 31 10:44:57 php: rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use ShawGW. 
    Jul 31 10:44:56 php: rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use VPNFailover,WirelessGW. 
    Jul 31 10:44:54 check_reload_status: Restarting OpenVPN tunnels/interfaces 
    Jul 31 10:44:54 check_reload_status: updating dyndns ShawGW 
    Jul 31 10:44:54 check_reload_status: Reloading filter 
    Jul 31 10:44:54 check_reload_status: Restarting OpenVPN tunnels/interfaces 
    Jul 31 10:44:54 check_reload_status: Restarting ipsec tunnels 
    Jul 31 10:44:54 check_reload_status: updating dyndns VPNFailover,WirelessGW 
    Jul 31 10:44:42 check_reload_status: Reloading filter 
    Jul 31 10:44:42 php: rc.newwanip: pfSense package system has detected an ip change 10.0.8.1 -> 10.0.8.1 ... Restarting packages. 
    Jul 31 10:44:40 php: rc.newwanip: Creating rrd update script 
    Jul 31 10:44:35 php: rc.start_packages: Restarting/Starting all packages. 
    Jul 31 10:44:33 check_reload_status: Starting packages 
    Jul 31 10:44:33 php: rc.newwanip: pfSense package system has detected an ip change -> 10.0.9.1 ... Restarting packages. 
    Jul 31 10:44:33 php: rc.newwanip: rc.newwanip: on (IP address: 10.0.9.1) (interface: ) (real interface: ovpns2). 
    Jul 31 10:44:33 php: rc.newwanip: rc.newwanip: Informational is starting ovpns2\. 
    Jul 31 10:44:33 php: rc.newwanip: rc.newwanip: on (IP address: 10.0.8.1) (interface: opt4) (real interface: ovpns1). 
    Jul 31 10:44:33 php: rc.newwanip: rc.newwanip: Informational is starting ovpns1\. 
    Jul 31 10:44:31 check_reload_status: rc.newwanip starting ovpns2 
    Jul 31 10:44:31 kernel: ovpns2: link state changed to UP 
    Jul 31 10:44:30 kernel: ovpns2: link state changed to DOWN 
    Jul 31 10:44:30 check_reload_status: rc.newwanip starting ovpns1 
    Jul 31 10:44:30 kernel: ovpns1: link state changed to UP 
    Jul 31 10:44:30 php: rc.openvpn: OpenVPN: Resync server2 Roadwarrior 
    Jul 31 10:44:30 kernel: ovpns1: link state changed to DOWN 
    Jul 31 10:44:30 php: rc.openvpn: OpenVPN: Resync server1 VPN Failover 
    Jul 31 10:44:30 php: rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use ShawGW. 
    Jul 31 10:44:27 check_reload_status: Reloading filter 
    Jul 31 10:44:27 check_reload_status: Restarting OpenVPN tunnels/interfaces 
    Jul 31 10:44:27 check_reload_status: Restarting ipsec tunnels 
    Jul 31 10:44:27 check_reload_status: updating dyndns ShawGW 
    Jul 31 10:42:57 php: rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use VPNFailover,WirelessGW. 
    Jul 31 10:42:55 check_reload_status: Reloading filter 
    Jul 31 10:42:55 check_reload_status: Restarting OpenVPN tunnels/interfaces 
    Jul 31 10:42:55 check_reload_status: Restarting ipsec tunnels 
    Jul 31 10:42:55 check_reload_status: updating dyndns VPNFailover,WirelessGW 
    Jul 31 10:42:43 check_reload_status: Reloading filter 
    Jul 31 10:42:43 php: rc.newwanip: pfSense package system has detected an ip change 10.0.8.1 -> 10.0.8.1 ... Restarting packages. 
    Jul 31 10:42:41 php: rc.newwanip: Creating rrd update script 
    Jul 31 10:42:36 php: rc.start_packages: Restarting/Starting all packages. 
    Jul 31 10:42:34 check_reload_status: Starting packages 
    Jul 31 10:42:34 php: rc.newwanip: pfSense package system has detected an ip change -> 10.0.9.1 ... Restarting packages. 
    Jul 31 10:42:34 php: rc.newwanip: rc.newwanip: on (IP address: 10.0.9.1) (interface: ) (real interface: ovpns2). 
    Jul 31 10:42:34 php: rc.newwanip: rc.newwanip: Informational is starting ovpns2\. 
    Jul 31 10:42:34 php: rc.newwanip: rc.newwanip: on (IP address: 10.0.8.1) (interface: opt4) (real interface: ovpns1). 
    Jul 31 10:42:34 php: rc.newwanip: rc.newwanip: Informational is starting ovpns1\. 
    Jul 31 10:42:32 check_reload_status: rc.newwanip starting ovpns2 
    Jul 31 10:42:32 kernel: ovpns2: link state changed to UP 
    Jul 31 10:42:32 kernel: ovpns2: link state changed to DOWN 
    Jul 31 10:42:32 check_reload_status: rc.newwanip starting ovpns1 
    Jul 31 10:42:32 kernel: ovpns1: link state changed to UP 
    Jul 31 10:42:32 php: rc.openvpn: OpenVPN: Resync server2 Roadwarrior 
    Jul 31 10:42:31 kernel: ovpns1: link state changed to DOWN 
    Jul 31 10:42:31 php: rc.openvpn: OpenVPN: Resync server1 VPN Failover 
    Jul 31 10:42:31 php: rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use ShawGW. 
    Jul 31 10:42:29 check_reload_status: Reloading filter 
    Jul 31 10:42:29 check_reload_status: Restarting OpenVPN tunnels/interfaces 
    Jul 31 10:42:29 check_reload_status: Restarting ipsec tunnels 
    Jul 31 10:42:29 check_reload_status: updating dyndns ShawGW 
    Jul 31 08:57:13 kernel: arp: 192.168.1.17 moved from 78:a3:e4:79:68:af to 20:c9:d0:66:14:49 on em1 
    
    

    No issues with C to B or to A even when route from B and A stopped working.  Settings on C and B are almost exact.



  • incase it's the failover openvpn causing this issue, I've disabled that interface and disabled monitoring on site ABC's gateway.

    Well that made no change.  Routes broke again and rebooting A to restored the connection.