Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Wan interface not coming back up after failover

    Routing and Multi WAN
    6
    22
    10.4k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • V
      Valhalla1
      last edited by

      I have a dual wan loadbalance setup, WAN is cable, gets its ip from DHCP, cable modem bridged.

      opt1 is dsl, modem/gateway is in router mode and pfsense uses static ip

      LoadBalance works fine.  Failover works too, as the cable on WAN seems to be flaky and dies once a day, and shows Offline in webgui, but traffic continues to flow through dsl

      unfortunately it doesn't bring the WAN interface back up when the cable comes back online.  Hours later, I'll notice wan is offline in webgui and I examine the cable modem, which shows all lights lit 'online' and the link is up.  I have to power cycle the modem, then pfsense detects the link

      the monitor IP the load balancer is using is the primary DNS server for the cable ISP.  Should I change this or will this probably not affect my problem?  why doesn't it seem to detect correct when WAN is back online? I'm not even sure if it ever dies in the first place because I haven't caught it at the moment it goes offline yet to examine the modem lights, I notice it later and by then the modem has recovered the link

      1 Reply Last reply Reply Quote 0
      • P
        Perry
        last edited by

        Before i started using pfSense i had some connection problems too and i also had to power cycle the modem. So i don't thing it's related to pfSense.

        If it happens so often you could confirm it with a test using another firewall / router

        /Perry
        doc.pfsense.org

        1 Reply Last reply Reply Quote 0
        • V
          Valhalla1
          last edited by

          true it could be unrelated to pfsense, but I don't remember having to reboot the modem daily like this before..
          also what kind of sucks, is that when I do reboot the cable modem, as the interface starts to link up with pfsense, all internet traffic dies for about 20 or 30 seconds.. even though it was flowing smoothly before thru the dsl/opt1  load balancer.    once the cable modem comes up fully, then the loadbalancer marks it as up and internet traffic begins to flow again, now going over both wan links
          I wish it wouldn't disrupt the traffic already failed over to opt1 while the modem is rebooting

          1 Reply Last reply Reply Quote 0
          • D
            drees
            last edited by

            @Valhalla1:

            I have a dual wan loadbalance setup, WAN is cable, gets its ip from DHCP, cable modem bridged.

            When the connection comes back up, do you have a new IP or an old IP?

            I have seen cases where sometimes it seems like a DHCP request is necessary to get the connection working again even if you don't get a new IP.

            @Valhalla1:

            I wish it wouldn't disrupt the traffic already failed over to opt1 while the modem is rebooting

            You just need to set the Load Balancing -> Use sticky connections option under System -> Advanced

            1 Reply Last reply Reply Quote 0
            • V
              Valhalla1
              last edited by

              I considered using sticky connections, but heard there were issues (the current thread/poll regarding it)

              when the cable comes back up I get the same IP address, generally.. It did change once, to a completely different ip

              does pfsense not try to continuously issue  dhcp request  when it detects the wan link is down?   I used to use m0n0wall and even if it went down while I was sleeping, everything worked in the morning once it came back up, no rebooting of hardware needed..  should I maybe put a m0n0wall inbetween the cable modem and pfsense, and change the WAN on pfsense to a static IP?

              1 Reply Last reply Reply Quote 0
              • D
                drees
                last edited by

                That's just the thing - it seems that perhaps pfSense thought that the link was up so it didn't issue new DHCP requests…

                1 Reply Last reply Reply Quote 0
                • P
                  Perry
                  last edited by

                  does pfsense not try to continuously issue  dhcp request  when it detects the wan link is down?  I used to use m0n0wall and even if it went down while I was sleeping, everything worked in the morning once it came back up, no rebooting of hardware needed..  should I maybe put a m0n0wall in between the cable modem and pfsense, and change the WAN on pfsense to a static IP?

                  In a other topic the problem was that it got a local ip from dhcp when cable modem was down. So i can't solely see it as being a dhcp request problem. But it has been recommend before to have something between a cable modem and pfSense.

                  The way my cable connection works.
                  I've got a Motorola SB5101 Cabel Modem and to get an ip the ISP want to know the mac address of the first nic after the modem, before they provide a dynamic ip address.
                  First i will get an ip 10.52.x.x then when i browse to a web page i get redirected to a web page where user id and password is entered. Then after a 120sec wait i will get an IP. When I trace route after the first address I get is 10.52.x.x but If I want to connect to the cable modem's web gui the address is 192.168.100.1
                  In that web gui a DHCP server is enable and set to hand out up to 32 ip adresse. I will only receive a 192.168.100.x if connection between modem and ISP is down and i force a release / renew from pfsense.

                  As my ISP do not force a change of IP I've disable the DHCP server on the modem so I'm sure i don't get a local ip that will break the failover on my pfsense.

                  /Perry
                  doc.pfsense.org

                  1 Reply Last reply Reply Quote 0
                  • V
                    Valhalla1
                    last edited by

                    thanks.. I will see if I can access the modem and disable dhcp, that might be whats happening (getting some 192.168.100.x ip from the modem instead of normal ISP dhcp)
                    if thats not the case I'll stick something between them like a soekris/m0n0wall with no packet filtering running just a simple nat.. then pfsense will handle firewall/load balance/failover/dns

                    1 Reply Last reply Reply Quote 0
                    • V
                      Valhalla1
                      last edited by

                      I was getting sick of walking down to the basement office to reset the cable modem so this time I tried something else.. I just went to the pfsense webgui, clicked the WAN interface, clicked "save"  and 10 seconds later the WAN came back online for the loadbalancer

                      so no modem reboot is required, just a refresh on the pfsense interface

                      how can I automate this so that when pfsense detects WAN as down, it will "ifup re0"  (I'm guessing?)

                      1 Reply Last reply Reply Quote 0
                      • P
                        Perry
                        last edited by

                        Does it bring back your cable connection if you enter " dhclient re0 " in console?

                        /Perry
                        doc.pfsense.org

                        1 Reply Last reply Reply Quote 0
                        • V
                          Valhalla1
                          last edited by

                          @Perry:

                          Does it bring back your cable connection if you enter " dhclient re0 " in console?

                          yep, this instantly brought the WAN online to the load balancer.

                          this time, while WAN was marked down on the loadbalancer, before I did anything I looked at the states, and active sessions on the WAN with 'iftop'  and and even though the WAN was marked DOWN on the Load Balancer, there werestill active  connections being used on the wan link.   However no "new" traffic was being sent over it.  Also I had the correct IP and interface was "Active" when I did "ifconfig".     
                          After I issued 'dclient re0'  from ssh,  new traffic instantly started flowing over WAN as it was marked up by load balancer.  the ip didn't change or anything

                          so it looks like its marking it down when its really not down, or perhaps I need to automate issuing "dhclient re0"  when it detects WAN DOWN

                          any advice how to proceed?  thanks for the help so far

                          1 Reply Last reply Reply Quote 0
                          • P
                            Perry
                            last edited by

                            Good. I've made a bug report http://cvstrac.pfsense.com/tktview?tn=1729
                            For now you could try using afterfilterchangeshellcmd for it.
                            http://forum.pfsense.org/index.php/topic,7808.msg46725.html#msg46725
                            change code to something like this.
                            #!/bin/sh
                            sleep 60
                            dhclient re0

                            /Perry
                            doc.pfsense.org

                            1 Reply Last reply Reply Quote 0
                            • C
                              cmb
                              last edited by

                              @Perry:

                              But it has been recommend before to have something between a cable modem and pfSense.

                              The only time that's recommended is if you have two Internet connections using the same gateway IP, then it's the only way you can use multi-WAN. If that's not the case, as it isn't here, I wouldn't recommend doing that.

                              Valhalla1: when the cable modem goes down, what do you see on that interface in Status-> Interfaces?  And what does your system log show?

                              Power cycling your cable modem just creates a link up event on pfSense, which runs dhclient on the interface, which appears to be the resolution. It shouldn't ever need to be manually run though, which is why we need to know what the interface and logs are showing. Without that info there's no telling what is actually happening, and no way we can fix it.

                              1 Reply Last reply Reply Quote 0
                              • V
                                Valhalla1
                                last edited by

                                I implemented the <afterfilterchangeshellcmd>  to run a script
                                #!/bin/sh
                                sleep 60
                                dhclient re0

                                this now seems to be keeping the WAN connection online to the load balancer, however it seems to be running literally every 2 minutes… the system logs are now filling up with dhcp requests over the cable connection.

                                I'm out of town unfortunately at the moment so I can't troubleshoot this as easily.  my users are not reporting internet problems however, even with the constant dhcp requests.  However my VPN connection to pfsense does die briefly every couple mins.
                                maybe this will work till get I get back onsite in a couple weeks, then I'll un-implement that afterfilterchangesshellcmd script and allow the connection to show "DOWN"  so I can let you know what the system logs and interface page look like

                                kinda dont want to let the WAN connection stay down right now as then I wont be able to vpn in unless I call up a  user to go reboot the modem for me</afterfilterchangeshellcmd>

                                1 Reply Last reply Reply Quote 0
                                • V
                                  Valhalla1
                                  last edited by

                                  May 22 15:11:45 	slbd[5169]: ICMP poll succeeded for 65.41.120.51, marking service UP
                                  May 22 15:11:45 	slbd[5169]: ICMP poll succeeded for 68.105.28.11, marking service UP
                                  May 22 15:11:45 	slbd[5169]: ICMP poll succeeded for 68.105.28.11, marking service UP
                                  May 22 15:11:45 	slbd[5169]: ICMP poll succeeded for 65.41.120.51, marking service UP
                                  May 22 15:11:45 	check_reload_status: reloading filter
                                  May 22 15:11:45 	slbd[5169]: ICMP poll succeeded for 65.41.120.51, marking service UP
                                  May 22 15:11:45 	slbd[5169]: ICMP poll succeeded for 68.105.28.11, marking service UP
                                  May 22 15:11:45 	php: : Configuring slbd
                                  May 22 15:11:42 	dnsmasq[1582]: using nameserver 68.105.28.11#53
                                  May 22 15:11:42 	dnsmasq[1582]: using nameserver 68.105.29.11#53
                                  May 22 15:11:42 	dnsmasq[1582]: using nameserver 68.105.28.12#53
                                  May 22 15:11:42 	dnsmasq[1582]: reading /etc/resolv.conf
                                  May 22 15:11:42 	php: : Creating rrd update script
                                  May 22 15:11:41 	php: : Informational: DHClient spawned /etc/rc.newwanip and the new ip is wan - 68.224.153.16.
                                  May 22 15:11:36 	php: : rc.newwanip working with (IP address: 68.224.153.16) (interface: wan) (interface real: re0).
                                  May 22 15:11:36 	php: : Informational: rc.newwanip is starting re0.
                                  May 22 15:11:35 	check_reload_status: rc.newwanip starting
                                  May 22 15:11:30 	php: : phpDynDNS: No Change In My IP Address and/or 25 Days Has Not Past. Not Updating Dynamic DNS Entry.
                                  May 22 15:11:30 	php: : DynDns: Cached IP: 68.224.153.16
                                  May 22 15:11:30 	php: : DynDns: Current WAN IP: 68.224.153.16
                                  May 22 15:11:30 	php: : DynDns: _detectChange() starting.
                                  May 22 15:11:30 	php: : DynDns: updatedns() starting
                                  May 22 15:11:30 	php: : DynDns: Running updatedns()
                                  May 22 15:11:28 	check_reload_status: updating dyndns
                                  May 22 15:11:27 	last message repeated 2 times
                                  May 22 15:11:28 	dhclient[4779]: bound to 68.224.153.16 -- renewal in 43200 seconds.
                                  May 22 15:11:27 	dhclient[4779]: DHCPACK from 68.224.153.1
                                  May 22 15:11:27 	kernel: arpresolve: can't allocate route for 68.224.153.1
                                  May 22 15:11:27 	slbd[3737]: Service LoadBalance changed status, reloading filter policy
                                  May 22 15:11:27 	slbd[3737]: Service WAN2FailsToWAN1 changed status, reloading filter policy
                                  May 22 15:11:27 	slbd[3737]: ICMP poll failed for 68.105.28.11, marking service DOWN
                                  May 22 15:11:27 	slbd[3737]: ICMP poll failed for 68.105.28.11, marking service DOWN
                                  May 22 15:11:27 	slbd[3737]: Service WAN1FailsToWAN2 changed status, reloading filter policy
                                  May 22 15:11:27 	slbd[3737]: ICMP poll failed for 68.105.28.11, marking service DOWN
                                  May 22 15:11:27 	kernel: arpresolve: can't allocate route for 68.224.153.1
                                  May 22 15:11:26 	dhclient[3377]: exiting.
                                  May 22 15:11:26 	dhclient[3377]: exiting.
                                  May 22 15:11:26 	dhclient[3377]: connection closed
                                  May 22 15:11:26 	dhclient[3377]: connection closed
                                  May 22 15:11:26 	dhclient[4779]: DHCPREQUEST on re0 to 255.255.255.255 port 67
                                  May 22 15:11:26 	kernel: arpresolve: can't allocate route for 68.224.153.1
                                  May 22 15:10:21 	slbd[3737]: ICMP poll succeeded for 65.41.120.51, marking service UP
                                  May 22 15:10:21 	slbd[3737]: ICMP poll succeeded for 68.105.28.11, marking service UP
                                  May 22 15:10:21 	slbd[3737]: ICMP poll succeeded for 68.105.28.11, marking service UP
                                  May 22 15:10:21 	slbd[3737]: ICMP poll succeeded for 65.41.120.51, marking service UP
                                  May 22 15:10:21 	slbd[3737]: ICMP poll succeeded for 65.41.120.51, marking service UP
                                  May 22 15:10:21 	check_reload_status: reloading filter
                                  May 22 15:10:21 	slbd[3737]: ICMP poll succeeded for 68.105.28.11, marking service UP
                                  May 22 15:10:20 	php: : Configuring slbd
                                  May 22 15:10:18 	dnsmasq[1582]: using nameserver 68.105.28.11#53
                                  

                                  there's 1 mins worth of the logs, but this basically repeats itself every minute or two..  at least the connections staying online to the load balancer and my users aren't reporting any problems

                                  1 Reply Last reply Reply Quote 0
                                  • V
                                    Valhalla1
                                    last edited by

                                    so I guess it seems that the script I put in afterfilterchangesshellcmd  is running like constantly.. its set to sleep 60 then issue dhclient re0, and seemingly every 1 minute its indeed running.  it seems to just abitrarily run it 24/7 as it waits 60 seconds (sleep 60)  then the dhcp refreshes on WAN which among other things nukes my vpn connection going over wan

                                    I was hoping it would only run this when it detects WAN is down

                                    I should take this out and let things happen as they happen so we can troubleshoot the actual problem instead of this workaround hack but first I need to get openvpn working on the opt1 wan connection so I can access the box still when wan goes down

                                    1 Reply Last reply Reply Quote 0
                                    • P
                                      Perry
                                      last edited by

                                      Yeah dhclient must trick afterfilter….. My initiate idea was a cron job but found it wrong to make that loop  :-[

                                      BTW Did you ever try replacing the nic?

                                      /Perry
                                      doc.pfsense.org

                                      1 Reply Last reply Reply Quote 0
                                      • I
                                        ilko
                                        last edited by

                                        If that helps- in our machine when we had 3 the same NICs it was going crazy in a similar way when 2 WANs were up. Every minute or so WANs were going up and down. If only one WAN is up things were fine. Plug back the second WAN cable- goes crazy again.
                                        Changing either of the NICs with another brand resolved the problem. Put back the removed NIC, replacing a similar one- no issues, so it wasn't a faulty NIC.

                                        1 Reply Last reply Reply Quote 0
                                        • V
                                          Valhalla1
                                          last edited by

                                          @Perry:

                                          Yeah dhclient must trick afterfilter….. My initiate idea was a cron job but found it wrong to make that loop  :-[

                                          BTW Did you ever try replacing the nic?
                                          [/quote]

                                          can't replace the nic on this particular box, as its custom hardware, watchguard firebox x500 with 6 onboard realtek nics

                                          it could very well be hardware related, but I haven't seen similar reports from people that have the same hardware I do.. although I don't know if they are running dual wan

                                          I guess when I get back in town I will try a few diff things, worst case scenario I'll load up a generic pc with different nics and see if it does the same thing as the watchguard hardware

                                          1 Reply Last reply Reply Quote 0
                                          • O
                                            olaf
                                            last edited by

                                            Hi:

                                            I'm having the same issues Valhalla1 has. But I have complete different hardware:

                                            vr0: flags=8843 <up,broadcast,running,simplex,multicast>mtu 1500
                                                    inet 192.168.4.2 netmask 0xffffff00 broadcast 192.168.4.255
                                                    inet6 fe80::206:25ff:fe07:a43f%vr0 prefixlen 64 scopeid 0x1
                                                    ether 00:06:25:07:a4:3f
                                                    media: Ethernet autoselect (10baseT/UTP)
                                                    status: active
                                            vr1: flags=8943 <up,broadcast,running,promisc,simplex,multicast>mtu 1500
                                                    inet 192.168.5.1 netmask 0xffffff00 broadcast 192.168.5.255
                                                    inet6 fe80::20c:41ff:fee7:903a%vr1 prefixlen 64 scopeid 0x2
                                                    ether 00:0c:41:e7:90:3a
                                                    media: Ethernet autoselect (100baseTX <full-duplex>)
                                                    status: active
                                            dc0: flags=8843 <up,broadcast,running,simplex,multicast>mtu 1500
                                                    options=8 <vlan_mtu>inet 10.34.89.2 netmask 0xffffff00 broadcast 10.34.89.255
                                                    inet6 fe80::20c:41ff:fe22:d97%dc0 prefixlen 64 scopeid 0x3
                                                    ether 00:0c:41:22:0d:97
                                                    media: Ethernet autoselect (100baseTX <full-duplex>)
                                                    status: active
                                            vr2: flags=8843 <up,broadcast,running,simplex,multicast>mtu 1500
                                                    inet6 fe80::20f:eaff:fe14:8d61%vr2 prefixlen 64 scopeid 0x4
                                                    inet 192.168.6.2 netmask 0xffffff00 broadcast 192.168.6.255
                                                    ether 00:0f:ea:14:8d:61
                                                    media: Ethernet autoselect (100baseTX <full-duplex>)
                                                    status: active</full-duplex></up,broadcast,running,simplex,multicast></full-duplex></vlan_mtu></up,broadcast,running,simplex,multicast></full-duplex></up,broadcast,running,promisc,simplex,multicast></up,broadcast,running,simplex,multicast> 
                                            

                                            That is the hardware.

                                            vr0 is the Opt1 interface.
                                            vr1 is the Lan interface.
                                            dc0 is connected to a VPN.
                                            vr2 is the WAN interface.

                                            WAN and Opt1 are in a Load Balance (actually in FailOver due to problems with sticky connections).

                                            And this is the log of last WAN down in loadbalance:

                                            May 29 17:22:29 	slbd[81452]: Service Balancer changed status, reloading filter policy
                                            May 29 17:22:29 	slbd[81452]: ICMP poll succeeded for 87.217.47.1, marking service UP
                                            May 29 17:12:32 	slbd[81452]: ICMP poll succeeded for 87.235.0.10, marking service UP
                                            May 29 17:12:32 	slbd[81452]: ICMP poll failed for 87.217.47.1, marking service DOWN
                                            May 29 17:12:32 	slbd[81452]: VIP 127.0.0.1:666 added real service 87.235.0.10:666
                                            May 29 17:12:32 	slbd[81452]: VIP 127.0.0.1:666 added real service 87.217.47.1:666
                                            May 29 17:12:32 	slbd[81452]: VIP 127.0.0.1:666 sitedown at 127.0.0.1:666
                                            May 29 17:12:32 	slbd[81452]: VIP 127.0.0.1:666 configured as "127.0.0.1"
                                            May 29 17:12:32 	slbd[81452]: Using configuration file /var/etc/slbd.conf
                                            May 29 17:12:32 	slbd[81452]: Using r_refresh of 5000 milliseconds
                                            May 29 14:31:10 	slbd[358]: Service Balancer changed status, reloading filter policy
                                            May 29 14:31:10 	slbd[358]: ICMP poll failed for 87.217.47.1, marking service DOWN
                                            

                                            As you can see, at 14:31 the WAN interface is marked as down.

                                            At 17 I saw the interface down in load balancer status, then I reboot the Cisco 857 that is connected to WAN interface. I see ADSL Link of router is up after reboot, but the WAN interface of load balancer not up.

                                            Other times, I reconfig the load balancer pool with "text change" to force reload of pool status. But this time indeed that not up the WAN interface. See at 17:12:32.

                                            Then I decided to down interface vr2 (ifconfig vr2 down), and up again.

                                            May 29 17:22:29 	check_reload_status: reloading filter
                                            May 29 17:22:29 	slbd[81452]: Service Balancer changed status, reloading filter policy
                                            May 29 17:22:29 	slbd[81452]: ICMP poll succeeded for 87.217.47.1, marking service UP
                                            May 29 17:22:25 	kernel: vr2: Using force reset command.
                                            May 29 17:13:31 	sshd[81903]: Accepted keyboard-interactive/pam for root from 192.168.0.8 port 44027 ssh2
                                            May 29 17:12:32 	slbd[81452]: ICMP poll succeeded for 87.235.0.10, marking service UP
                                            May 29 17:12:32 	slbd[81452]: ICMP poll failed for 87.217.47.1, marking service DOWN
                                            May 29 17:12:31 	check_reload_status: reloading filter
                                            

                                            Actually, do you think that these problems may be caused by the fact of using same drivers in interface hardware?

                                            I could try to change the hardware if you are certainly sure that it is the root cause.

                                            Please, all advices are wellcome…

                                            Best regards,

                                            Olaf

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.