Default route goes missing needs to be added manually to resolve



  • This is the second time it has happened since Dec 13 2017, running version 2.4.2-RELEASE-p1 and I think on Dec 13th the version was 2.4.1 or something, I forget.

    Here's our basic config:
    WAN is on a static where we have 3 IPs from our ISP. This gateway is marked as default.
    OPT1 is on DHCP where I force user workstations through on: Floating Rule, LAN Rule, NAT Outbound Rule.

    When our ISP does funky stuff at night and OPT1 decides it's getting a new DHCP (same IP though, just a reset it seems) then our default gateway perhaps goes down, I don't know, but what I do know is I get back into the office and "The sky is falling!" so I check out our interfaces and they show as 0% packet loss Online. I SSH into the firewall and I ping 8.8.8.8 and everything is happy. I ping google.ca and "No route found". I run a netstat -rn and find that there's no default gateway. I then have to run: route add default 184.71.xxx.xxx our default static gateway and everything works.

    I was on the system until roughly 12:30AM through VPN, everything was working fine then, including outbound HTTP/S requests and remote backups. I received a report from staff at 6:30AM that the office internet was down.

    I don't feel like it should matter what my ISP is doing, though. When WAN is Online and using a Static Gateway that is marked as Default, that route should not go missing. I feel that I should never have to manually add a default route under any circumstance. That said, apparently I do need to, which I find to be extremely frustrating. So it seems to me that my options are to: wait for a bug fix, create a job that checks for default route every x minutes and adds it if it's missing, or maybe sell the SG-4860-1U and change platforms since this unit has other issues too (if the power goes out I give this unit a 50%~75% chance of starting without incident requiring direct console access to resolve).

    In the /var/log/system.log I see the following: (somewhere in all of that something deletes the default route and does not re-add it)
    Jan  4 00:05:34 pf check_reload_status: Syncing firewall
    Jan  4 01:00:00 pf php: [pfBlockerNG] Starting cron process.
    Jan  4 01:00:01 pf php: [pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
    Jan  4 02:00:00 pf php: [pfBlockerNG] Starting cron process.
    Jan  4 02:00:01 pf php: [pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
    Jan  4 03:00:00 pf php: [pfBlockerNG] Starting cron process.
    Jan  4 03:00:01 pf php: [pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
    Jan  4 04:00:00 pf php: [pfBlockerNG] Starting cron process.
    Jan  4 04:00:01 pf php: [pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
    Jan  4 04:54:39 pf rc.gateway_alarm[46502]: >>> Gateway alarm: WANGW (Addr:184.xxx.xxx.xxx Alarm:1 RTT:12160ms RTTsd:2512ms Loss:21%)
    Jan  4 04:54:39 pf check_reload_status: updating dyndns WANGW
    Jan  4 04:54:39 pf check_reload_status: Restarting ipsec tunnels
    Jan  4 04:54:39 pf check_reload_status: Restarting OpenVPN tunnels/interfaces
    Jan  4 04:54:39 pf check_reload_status: Reloading filter
    Jan  4 04:54:39 pf rc.gateway_alarm[47393]: >>> Gateway alarm: OPT1_DHCP (Addr:yyy.yyy.yyy.yyy Alarm:1 RTT:11775ms RTTsd:3302ms Loss:21%)
    Jan  4 04:54:39 pf check_reload_status: updating dyndns OPT1_DHCP
    Jan  4 04:54:39 pf check_reload_status: Restarting ipsec tunnels
    Jan  4 04:54:39 pf check_reload_status: Restarting OpenVPN tunnels/interfaces
    Jan  4 04:54:39 pf check_reload_status: Reloading filter
    Jan  4 04:54:41 pf php-fpm[47853]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WANGW.
    Jan  4 04:54:42 pf php-fpm[49049]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use OPT1_DHCP.
    Jan  4 04:54:55 pf php-fpm: /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing.

    /var/log/gateways.log:
    Jan  4 04:54:39 pf dpinger: WANGW [wangw ip]: Alarm latency 12160us stddev 2512us loss 21%
    Jan  4 04:54:39 pf dpinger: OPT1_DHCP [dhcpgw ip]: Alarm latency 11775us stddev 3302us loss 21%
    Jan  4 05:13:45 pf dpinger: WANGW [wangw ip]: Clear latency 12714us stddev 5506us loss 5%
    Jan  4 05:13:46 pf dpinger: OPT1_DHCP [dhcpgw ip]: Clear latency 12041us stddev 3179us loss 5%
    Jan  4 05:15:00 pf dpinger: WANGW [wangw ip]: Alarm latency 12805us stddev 4257us loss 21%
    Jan  4 05:16:14 pf dpinger: WANGW [wangw ip]: Clear latency 12946us stddev 12136us loss 15%
    Jan  4 05:18:42 pf dpinger: WANGW [wangw ip]: Alarm latency 12893us stddev 4771us loss 21%
    Jan  4 05:19:26 pf dpinger: WANGW [wangw ip]: Clear latency 12310us stddev 4321us loss 17%
    Jan  4 05:21:51 pf dpinger: WANGW [wangw ip]: Alarm latency 11694us stddev 3297us loss 22%
    Jan  4 05:22:40 pf dpinger: WANGW [wangw ip]: Clear latency 12053us stddev 3486us loss 16%
    Jan  4 05:31:30 pf dpinger: WANGW [wangw ip]: Alarm latency 12005us stddev 3687us loss 21%
    Jan  4 05:32:00 pf dpinger: OPT1_DHCP [dhcpgw ip]: Alarm latency 12725us stddev 7081us loss 22%
    Jan  4 05:32:36 pf dpinger: WANGW [wangw ip]: Clear latency 15318us stddev 13884us loss 17%
    Jan  4 05:33:20 pf dpinger: OPT1_DHCP [dhcpgw ip]: Clear latency 12447us stddev 5666us loss 10%
    Jan  4 05:41:02 pf dpinger: WANGW [wangw ip]: Alarm latency 11368us stddev 2965us loss 21%
    Jan  4 05:41:47 pf dpinger: WANGW [wangw ip]: Clear latency 13330us stddev 18709us loss 17%
    Jan  4 05:50:18 pf dpinger: OPT1_DHCP [dhcpgw ip]: Alarm latency 11529us stddev 3735us loss 22%

    dhcp log has nothing in it.

    Any advice on how I can resolve prevent this is certainly appreciated. I haven't found anyone else recently complaining about default gateway being deleted and it appears as though it was bug-fixed in the past. So perhaps my config is bungled up, maybe I'm partially retarded, or perhaps my expectations are too high. I don't feel like these are the cases but there's got to be potential there somewhere.


Log in to reply