No default route after reboot using Gateway Groups.



  • After a reboot I seem to have no default route. I am using gateway groups for both my IPv4 and IPv6 default gateway. For services like Squid which depend on the default route this is troublesome. I wind up having to manually change my default gateway to intermediary value and then change back to the gateway groups. I've seen several others with the same problem on Redmine: Bug #9004. Anybody have any advice?



  • So I was not alone who is experiencing the issue of loosing default gateway after performing a reboot when the default gateway is set using the recently-added feature for "default gateway".


  • LAYER 8 Rebel Alliance

    I just checked with my test VM (2.4.4-p2) and can report the same problem here.

    -Rico



  • I honestly thought I had broken something. Good to know others are experiencing it as well. Any word on if there's a fix planned?



  • @0daymaster said in No default route after reboot using Gateway Groups.:

    #9004

    On latest 2.4.4-p2

    I have this issue only on Netgate XG-7100 appliance
    No issues on Netgate SG-4860 or Hyper-V VMs

    If Default gateway IPv4 set as Gateway Group you can see near Tier1 gateway "(default)" after reboot "(default)" get removed from Tier1 gateway
    No default route in routing table

    This issue only on Netgate XG-7100
    Double checked, no issues on Netgate SG-4860 or Hyper-V VMs



  • Hello there! I have my own hardware and virtualization solutions and the issue still happens.


  • LAYER 8 Rebel Alliance

    The fix is to replace

    if ($gwgroupitem['gwip'] == $currentdefaultgwip) {
    

    with

    if (!empty($currentdefaultgwip) AND $gwgroupitem['gwip'] == $currentdefaultgwip) {
    

    in /etc/inc/gwlb.inc line 1112
    Don't ask me if this is dirty or not...but it works. 😳

    -Rico



  • @rico said in No default route after reboot using Gateway Groups.:

    if ($gwgroupitem['gwip'] == $currentdefaultgwip) {

    with
    if (!empty($currentdefaultgwip) AND $gwgroupitem['gwip'] == $currentdefaultgwip) {

    in /etc/inc/gwlb.inc line 1112

    This workaround didn't work for me. Same errors unfortunately. I don't use GW groups though.



  • @amorphous
    Hey
    but after a reboot that the issue here is this shell command ?
    /sbin/route -n get -inet default | /usr/bin/awk '/gateway:/ {print $2}'

    And what errors are written in the log ?

    and more
    after reboot, what does /status/interfaces show ?
    all interfaces in the "up" state, all interfaces have ip addresses?



  • @konstanti

    After reboot:
    [2.4.4-RELEASE][admin@pfsense.localdomain]/root: /sbin/route -n get -inet default | /usr/bin/awk '/gateway:/ {print $2}'
    route: route has not been found

    After enabling disabling WAN
    [2.4.4-RELEASE][admin@pfsense.localdomain]/root: /sbin/route -n get -inet default | /usr/bin/awk '/gateway:/ {print $2}'
    70.111.222.0

    In Status -> interfaces both WAN and LAN have the correct Ip address and they're both up. No change after reboot.

    Log sample and netstat output here:
    https://forum.netgate.com/topic/139570/no-internet-after-reboot-wrong-gateway
    https://redmine.pfsense.org/issues/9269



  • @amorphous
    Do I understand correctly that IP addresses are assigned dynamically (DHCP) ?
    Is there no pause during boot time when configuring the WAN interface ?

    Open file / diagnostics / edit file
    /var / log / system.log
    during the boot, it will display approximately the following text:

    14:05:17 ru kernel: coretemp1: <CPU On-Die Thermal Sensors> on cpu1
    Dec 31 14:05:18 ru sshd[8921]: Server listening on :: port 22.
    Dec 31 14:05:18 ru sshd[8921]: Server listening on 0.0.0.0 port 22.
    Dec 31 14:05:18 ru syslogd: Logging subprocess 9187 (exec /usr/local/sbin/sshguard) exited due to signal 15.
    Dec 31 14:05:19 ru check_reload_status: Linkup starting igb0
    Dec 31 14:05:19 ru kernel:
    Dec 31 14:05:19 ru kernel: igb0: link state changed to UP
    Dec 31 14:05:21 ru check_reload_status: rc.newwanip starting igb0
    Dec 31 14:05:22 ru kernel: done.
    Dec 31 14:05:22 ru php-cgi: rc.bootup: Resyncing OpenVPN instances.
    Dec 31 14:05:22 ru kernel: done.
    Dec 31 14:05:22 ru kernel: gre0: link state changed to UP
    Dec 31 14:05:22 ru kernel: gre1: link state changed to UP
    Dec 31 14:05:22 ru kernel:
    Dec 31 14:05:22 ru kernel: tun1: changing name to 'ovpns1'
    Dec 31 14:05:22 ru php-fpm[242]: /rc.newwanip: rc.newwanip: Info: starting on igb0.
    Dec 31 14:05:22 ru php-fpm[242]: /rc.newwanip: rc.newwanip: on (IP address: XX.XXX) (interface: WAN[wan]) (real interface: igb0).

    Dec 31 14:05:22 ru kernel: gre0: link state changed to DOWN
    Dec 31 14:05:22 ru kernel: gre0: link state changed to UP
    Dec 31 14:05:22 ru kernel: gre1: link state changed to DOWN
    Dec 31 14:05:22 ru kernel: gre1: link state changed to UP
    Dec 31 14:05:22 ru kernel: ovpns1: link state changed to UP
    Dec 31 14:05:22 ru kernel: pflog0: promiscuous mode enabled
    Dec 31 14:05:22 ru php-fpm[242]: /rc.newwanip: Default gateway setting Interface WAN_DHCP Gateway as default.

    What does that look like to you ?
    I have a suspicion that em0 does not receive an ip address at the boot
    in your log line /rc.newwanip: rc.newwanip: on (IP address: ) (interface: WAN[wan]) (real interface: em0) - does not contain IP address ( it is not)
    And more
    written anything in files
    /tmp/em0_error_output
    /tmp/em0_output

    You can try to do so ( this is theory , I never did )
    https://www.freebsd.org/doc/handbook/network-dhcp.html
    open the /etc/defaults/rc.conf
    and add this line to the beginning of the file
    ifconfig_em0="SYNCDHCP"
    save the file and reboot

    if this does not help , you can try to put the boot script to pause (5 seconds) after starting dhclient. ( this is theory , I never did )



  • @konstanti

    Do I understand correctly that IP addresses are assigned dynamically (DHCP) ?
    *Yes

    #=== After reboot when Internet is not working ===#
    [2.4.4-RELEASE][admin@pfsense.localdomain]/root: cat /tmp/em0_defaultgw

    [2.4.4-RELEASE][admin@pfsense.localdomain]/root: cat /tmp/em0_error_output

    [2.4.4-RELEASE][admin@pfsense.localdomain]/root: cat /tmp/em0_output
    em0: no link ...... got link
    dhclient: PREINIT
    DHCPDISCOVER on em0 to 255.255.255.255 port 67 interval 1
    DHCPOFFER from 10.100.111.2
    DHCPREQUEST on em0 to 255.255.255.255 port 67
    DHCPACK from 10.100.111.2
    bound to 70.111.222.333 -- renewal in 25107 seconds.
    [2.4.4-RELEASE][admin@pfsense.localdomain]/root: cat /tmp/em0_router
    70.111.222.1

    #=== REBOOT LOGS ===#

    Jan 18 10:06:47 pfsense kernel: coretemp0: <CPU On-Die Thermal Sensors> on cpu0
    Jan 18 10:06:47 pfsense kernel: coretemp1: <CPU On-Die Thermal Sensors> on cpu1
    Jan 18 10:06:48 pfsense sshd[6471]: Server listening on :: port 22.
    Jan 18 10:06:48 pfsense sshd[6471]: Server listening on 0.0.0.0 port 22.
    Jan 18 10:06:48 pfsense syslogd: Logging subprocess 6540 (exec /usr/local/sbin/sshguard) exited due to signal 15.
    Jan 18 10:06:51 pfsense kernel:
    Jan 18 10:06:51 pfsense kernel: em0: link state changed to UP
    Jan 18 10:06:51 pfsense check_reload_status: Linkup starting em0
    Jan 18 10:06:53 pfsense check_reload_status: rc.newwanip starting em0
    Jan 18 10:06:54 pfsense php-cgi: rc.bootup: Resyncing OpenVPN instances.
    Jan 18 10:06:54 pfsense kernel: done.
    Jan 18 10:06:54 pfsense kernel: pflog0: promiscuous mode enabled
    Jan 18 10:06:54 pfsense php-cgi: rc.bootup: [squid] Installed but disabled. Not installing 'nat' rules.
    Jan 18 10:06:54 pfsense php-cgi: rc.bootup: [squid] Installed but disabled. Not installing 'pfearly' rules.
    Jan 18 10:06:54 pfsense kernel: .
    Jan 18 10:06:54 pfsense php-cgi: rc.bootup: [squid] Installed but disabled. Not installing 'filter' rules.
    Jan 18 10:06:54 pfsense kernel: ..
    Jan 18 10:06:55 pfsense php-fpm[366]: /rc.newwanip: rc.newwanip: Info: starting on em0.
    Jan 18 10:06:55 pfsense kernel: .done.
    Jan 18 10:06:55 pfsense php-fpm[366]: /rc.newwanip: rc.newwanip: on (IP address: 70.111.222.333) (interface: WAN[wan]) (real interface: em0).
    Jan 18 10:06:55 pfsense php-cgi: rc.bootup: Default gateway setting as default.
    Jan 18 10:06:55 pfsense php-cgi: rc.bootup: Gateway, none 'available' for inet6, use the first one configured. ''
    Jan 18 10:06:55 pfsense kernel: route: writing to routing socket: Network is unreachable
    Jan 18 10:06:58 pfsense check_reload_status: Linkup starting em1
    Jan 18 10:06:58 pfsense kernel:
    Jan 18 10:06:58 pfsense kernel: em1: link state changed to UP
    Jan 18 10:07:04 pfsense php-cgi: rc.bootup: sync unbound done.
    Jan 18 10:07:14 pfsense kernel: done.
    Jan 18 10:07:15 pfsense kernel: done.
    Jan 18 10:07:15 pfsense php-cgi: rc.bootup: NTPD is starting up.
    Jan 18 10:07:15 pfsense kernel: done.
    Jan 18 10:07:18 pfsense check_reload_status: Updating all dyndns
    Jan 18 10:07:18 pfsense kernel: done.



  • @amorphous said in
    route: writing to routing socket: Network is unreachable

    This error
    indicates that the system is trying to add a route to a non-existent gateway
    Need to think



  • @amorphous said in No default route after reboot using Gateway Groups.:

    Default gateway setting as default

    I have a suspicion that here a problem that at the time of loading not all interfaces received ip addresses yet.
    Can you show the group settings ?



  • @konstanti From where exactly?



  • @amorphous

    I'm not sure
    It is very difficult to understand someone else's code
    Can you show the group settings ?



  • @konstanti Which Group Settings? Where in the WebUI?



  • This post is deleted!


  • @amorphous
    The problem is solved
    The problem is that when a script processes a group of interfaces, the system does not yet have default gateway information. To do this, check the status of the variable in the script
    "currentdefaultgwip" (file /etc/inc/gwlb.inc)
    I added one line for debugging
    log_error("Point1 ip is: $currentdefaultgwip");
    and replaced one line to check the variable ( this line is the same as above - post Rico )
    if (($gwgroupitem['gwip'] == $currentdefaultgwip) && (!empty($currentdefaultgwip))) {

    this is what this part of the code looks like

    0_1547850168428_f43c6e8e-98b9-491c-9840-93219c86fbc0-image.png

    Jan 19 01:01:55 check_reload_status Linkup starting em0
    Jan 19 01:01:55 kernel em0: link state changed to UP
    Jan 19 01:01:55 check_reload_status rc.newwanip starting em0
    Jan 19 01:01:56 php-fpm 342 /rc.newwanip: rc.newwanip: Info: starting on em0.
    Jan 19 01:01:56 php-fpm 342 /rc.newwanip: rc.newwanip: on (IP address: 192.168.1.139) (interface: WAN[wan]) (real interface: em0).
    Jan 19 01:01:57 check_reload_status Linkup starting em1
    Jan 19 01:01:57 kernel em1: link state changed to UP
    Jan 19 01:01:57 check_reload_status rc.newwanip starting em1
    Jan 19 01:01:57 php-cgi rc.bootup: Resyncing OpenVPN instances.
    Jan 19 01:01:57 php-cgi rc.bootup: Point1 ip is:
    Jan 19 01:01:57 kernel done.
    Jan 19 01:01:57 kernel pflog0: promiscuous mode enabled
    Jan 19 01:01:57 php-cgi rc.bootup: Gateway, switch to: WAN_DHCP
    Jan 19 01:01:57 php-cgi rc.bootup: Default gateway setting Interface WAN_DHCP Gateway as default.
    Jan 19 01:01:57 php-cgi rc.bootup: Point1 ip is: 192.168.1.1
    Jan 19 01:01:57 php-cgi rc.bootup: Point1 ip is: 192.168.1.1

    and everything works fine
    0_1547850827801_8158fc00-4008-4ee5-9ec8-af5388f1b98b-image.png

    There is another way to solve this problem ,but it is not quite correct
    When you restart, all temporary files are deleted, if you disable the delete option, then you also do not have problems when you reboot .



  • @konstanti I'll give it a try and report back.



  • @amorphous said in No default route after reboot using Gateway Groups.:

    I'll give it a try and report back.

    if there are problems , write in PM
    and send me a log in the mail
    be sure to put the control points
    they make it easier to localize the problem



  • @amorphous
    and most importantly , before you make changes to the file, make a copy of it . Otherwise , in case of an error, the system will not boot and will have to fix the error for a long time



  • @amorphous
    if you make wan_dhcp2-tier1 and wan_dhcp-tier2
    the log will already look like this

    Jan 19 02:13:49 check_reload_status Linkup starting em1
    Jan 19 02:13:49 kernel em1: link state changed to UP
    Jan 19 02:13:49 check_reload_status rc.newwanip starting em1
    Jan 19 02:13:49 php-cgi rc.bootup: Resyncing OpenVPN instances.
    Jan 19 02:13:49 kernel done.
    Jan 19 02:13:49 kernel pflog0: promiscuous mode enabled
    Jan 19 02:13:49 php-cgi rc.bootup: Point1 ip is:
    Jan 19 02:13:49 php-cgi rc.bootup: Control point 2 ip is: 192.168.1.1
    Jan 19 02:13:49 php-cgi rc.bootup: Gateway, switch to: WAN_DHCP
    Jan 19 02:13:49 php-cgi rc.bootup: Default gateway setting Interface WAN_DHCP Gateway as default.

    Jan 19 02:13:49 php-cgi rc.bootup: Point1 ip is: 192.168.1.1
    Jan 19 02:13:49 php-cgi rc.bootup: Point1 ip is: 192.168.1.1

    Jan 19 02:13:50 kernel ..
    Jan 19 02:13:50 kernel .done.
    Jan 19 02:13:50 kernel done.
    Jan 19 02:13:50 php-cgi rc.bootup: Point1 ip is: 192.168.1.1
    Jan 19 02:13:50 php-cgi rc.bootup: Control point 2 ip is: 10.0.3.2
    Jan 19 02:13:50 php-cgi rc.bootup: Gateway, switch to: WAN2_DHCP
    Jan 19 02:13:50 php-cgi rc.bootup: Default gateway setting Interface WAN2_DHCP Gateway as default.

    Jan 19 02:13:50 kernel done.
    Jan 19 02:13:50 php-fpm 341 /rc.newwanip: rc.newwanip: Info: starting on em1.
    Jan 19 02:13:50 php-fpm 341 /rc.newwanip: rc.newwanip: on (IP address: 10.0.3.15) (interface: WAN2[lan]) (real interface: em1).



  • @konstanti No luck unfortunately. Still doesn't work. I need to Disable -> Enable the WAN.

    Jan 18 19:07:47 pfsense kernel:
    Jan 18 19:07:47 pfsense kernel: em0: link state changed to UP
    Jan 18 19:07:47 pfsense check_reload_status: Linkup starting em0
    Jan 18 19:07:51 pfsense check_reload_status: rc.newwanip starting em0
    Jan 18 19:07:51 pfsense php-cgi: rc.bootup: Resyncing OpenVPN instances.
    Jan 18 19:07:51 pfsense kernel: done.
    Jan 18 19:07:51 pfsense kernel: done.
    Jan 18 19:07:51 pfsense kernel: pflog0: promiscuous mode enabled
    Jan 18 19:07:51 pfsense php-cgi: rc.bootup: [squid] Installed but disabled. Not installing 'nat' rules.
    Jan 18 19:07:51 pfsense php-cgi: rc.bootup: [squid] Installed but disabled. Not installing 'pfearly' rules.
    Jan 18 19:07:51 pfsense kernel: .
    Jan 18 19:07:51 pfsense php-cgi: rc.bootup: [squid] Installed but disabled. Not installing 'filter' rules.
    Jan 18 19:07:51 pfsense kernel: ..
    Jan 18 19:07:52 pfsense php-fpm[365]: /rc.newwanip: rc.newwanip: Info: starting on em0.
    Jan 18 19:07:52 pfsense php-fpm[365]: /rc.newwanip: rc.newwanip: on (IP address: 70.111.222.333) (interface: WAN[wan]) (real interface: em0).
    Jan 18 19:07:52 pfsense php-cgi: rc.bootup: Default gateway setting as default.
    Jan 18 19:07:52 pfsense kernel: .done.
    Jan 18 19:07:52 pfsense php-cgi: rc.bootup: Gateway, none 'available' for inet6, use the first one configured. ''
    Jan 18 19:07:52 pfsense kernel: route: writing to routing socket: Network is unreachable
    Jan 18 19:07:55 pfsense check_reload_status: Linkup starting em1
    Jan 18 19:07:55 pfsense kernel:
    Jan 18 19:07:55 pfsense kernel: em1: link state changed to UP
    Jan 18 19:08:03 pfsense php-cgi: rc.bootup: sync unbound done.
    Jan 18 19:08:10 pfsense kernel: done.


    The Modified file before and after reboot: https://pastebin.com/VEiSDz5R



  • This is my configuration:

    2_1547858500219_Screen Shot 2019-01-18 at 7.36.37 PM.png 1_1547858500219_Screen Shot 2019-01-18 at 7.36.16 PM.png 0_1547858500219_Screen Shot 2019-01-18 at 7.36.02 PM.png