Multi-WAN, both lines down after power test, lines do not reconnect/connection



  • Hi,

    I am using 2.0-RC3 (amd64) built on Sun Jul 31 04:45:33 EDT 2011.
    I have to DSL lines with a router at each line doing NAT and the pfsense as Multi-WAN

    DSL1 – router1(NAT) ----
                                        --- pfsense
    DSL2 -- router2(NAT) ---- /

    In the morning we had a power test and all switches, router1 and router2 rebootet. Only the servers are connected to a UPS and didn't reboot. That's ok.

    But after this, my DSL2 on my pfsense didn't come up again, no IP address. In the systemlog there was only

    Aug 2 08:29:27 	kernel: arprequest: cannot find matching address
    Aug 2 08:29:25 	kernel: arprequest: cannot find matching address
    Aug 2 08:29:24 	kernel: arprequest: cannot find matching address
    Aug 2 08:29:23 	kernel: arprequest: cannot find matching address
    

    I cannot tell you what was before because I didn't use any syslog server and pfsense only saves 2000 lines.
    I do not know why it doesn't failover to DSL1 because this line was up (as pfsense told me) but no traffic passed.

    After I disabled DSL2 interface and reanabled it again, both interfaces are working ok right now.

    I know, this aren't really exact and clear information but please tell me what you need to know to fix this problem.



  • Post how you have configured the system, screenshots, system logs etc.
    Where you expect you have configured your failover capability, etc



  • Hi,
    I am using LoadBalancing since BETA4 and it was working in general without problems.
    So it did the last weeks and days, too, but I encoutered problems, when one Gateway on the WAN1 or WAN2 site went down or cable unplugged.

    Here are my screenshots. Systemlog doesnt show anything necessary when WAN went down, because it is spamed all over with the output I posted above. I will setup a external syslog server to post more information if this is happening again in the future.














  • Hi,

    I found some more information. I got two e-mails from pfsense to my private e-mail box:

    Both are containing the same but were sent with a time difference from 2 hours:

    There were error(s) loading the rules: /tmp/rules.debug:124: syntax error
    pfctl: Syntax error in config file: pf rules not loaded The line in question reads [124]: pass  in  quick  on $LAN  $GWWAN2  from any to /8 keep state  label "USER_RULE: Zugriff auf WAN2-Subnet"
    
    

    When I did an reinstallation of pfsense for some weeks I did a missconfiguration of my gateways and pfsense then created "GWWAN2 and GWWAN1". Sometimes, when one WAN went down and I checked the gateways then "GWWAN2" appeared in the gateways list but it was the same as "WAN2" in the pictures above (same GW, same IP) just only other name.

    Here are some lines of the rules.debug:

     120 # make sure the user cannot lock himself out of the webConfigurator or SSH
        121 pass in quick on igb3 proto tcp from any to (igb3) port { 80 22 } keep state label "anti-lockout rule"
        122
        123 # User-defined rules follow
        124
        125 anchor "userrules/*"
        126 pass  in  quick  on $WAN2 reply-to ( igb0 192.168.2.1 )  proto udp  from any to   172.16.0.1 port 1194  keep st
    ate  label "USER_RULE: NAT OVPN-Server-01-RBS ueber WAN2"
        127 pass   in  quick  on $WAN2 reply-to ( igb0 192.168.2.1 )  proto udp  from any to   172.16.0.1 port 1195   label
     "USER_RULE: NAT OVPN-Server-02-KOST ueber WAN2"
        128 block  in log  quick  on $WAN2 reply-to ( igb0 192.168.2.1 )  from  ! 192.168.2.0/24 to any  label "USER_RULE:
    Nur zum Loggen"
        129 pass  in  quick  on $LAN  $GWWAN1  from any to 192.168.1.105/24 keep state  label "USER_RULE: Zugriff auf WAN1-
    Subnet"
        130 pass  in  quick  on $LAN  $GWWAN2  from any to 192.168.2.0/24 keep state  label "USER_RULE: Zugriff auf WAN2-Su
    bnet"
    

    Perhaps this was causing the problem !?

    This is from rules.debug.old:

     120 # make sure the user cannot lock himself out of the webConfigurator or SSH
        121 pass in quick on igb3 proto tcp from any to (igb3) port { 80 22 } keep state label "anti-lockout rule"
        122
        123 # User-defined rules follow
        124
        125 anchor "userrules/*"
        126 pass  in  quick  on $WAN2 reply-to ( igb0 192.168.2.1 )  proto udp  from any to   172.16.0.1 port 1194  keep st
    ate  label "USER_RULE: NAT OVPN-Server-01-RBS ueber WAN2"
        127 pass   in  quick  on $WAN2 reply-to ( igb0 192.168.2.1 )  proto udp  from any to   172.16.0.1 port 1195   label
     "USER_RULE: NAT OVPN-Server-02-KOST ueber WAN2"
        128 block  in log  quick  on $WAN2 reply-to ( igb0 192.168.2.1 )  from  ! 192.168.2.0/24 to any  label "USER_RULE:
    Nur zum Loggen"
        129 pass  in  quick  on $LAN  $GWWAN1  from any to 192.168.1.105/24 keep state  label "USER_RULE: Zugriff auf WAN1-
    Subnet"
        130 pass  in  quick  on $LAN  $GWWAN2  from any to 192.168.2.0/24 keep state  label "USER_RULE: Zugriff auf WAN2-Su
    bnet"
        131 pass  in  quick  on $LAN  proto { tcp udp }  from any  to <vpns> keep state  label "NEGATE_ROUTE: Negate policy
     route for vpn(s)"
        132 pass  in  quick  on $LAN  $GWNoLoadBalance  proto { tcp udp }  from any to any port $SingleWANPorts  keep state
      label "USER_RULE: Alle Ports die KEIN LoadBalancing k\xf6nnen"
        133 pass  in  quick  on $LAN  from any  to <vpns> keep state  label "NEGATE_ROUTE: Negate policy route for vpn(s)"
        </vpns></vpns>
    

    And this is the "diff" of my configuration:

    Configuration diff from 7/31/11 22:59:18 to 8/2/11 17:30:44
    --- /conf/backup/config-1312145958.xml 2011-08-02 08:15:52.000000000 +0200
    +++ /conf/config.xml 2011-08-02 17:30:44.000000000 +0200
    @@ -271,12 +271,12 @@
     <dhcphostname><wan>- <enable><if>igb0</if>
    
     <alias-address><alias-subnet>32</alias-subnet>
     <spoofmac>+ <enable><ipaddr>dhcp</ipaddr>
     <dhcphostname></dhcphostname></enable></spoofmac></alias-address></enable></wan> 
    @@ -347,6 +347,13 @@
     <reverse><nentries>2000</nentries>
     <nologdefaultblock>+ <remoteserver>172.17.1.1</remoteserver>
    + <remoteserver2>+ <remoteserver3>+ <portalauth>+ <vpn>+ <system>+ <enable><nat><ipsecpassthru>@@ -797,9 +804,9 @@
    <servicestatusfilter>dhcpd,ntpd,dnsmasq</servicestatusfilter>
    
     <revision>- <time>1312145958</time>
    - 
    - <username>(system)</username>
    + <time>1312299044</time>
    + 
    + <username>admin@172.17.1.1</username></revision> 
     <openvpn><openvpn-server>@@ -896,6 +903,7 @@
    <gateway>dynamic</gateway>
    <name>WAN1</name>
    <weight>1</weight>
    + <interval><monitor>8.8.8.8</monitor>
     <defaultgw>@@ -907,6 +915,7 @@
    <gateway>192.168.2.1</gateway>
    <name>WAN2</name>
    <weight>1</weight>
    + <interval><monitor>8.8.4.4</monitor></interval></defaultgw></interval></openvpn-server></openvpn></ipsecpassthru></nat></enable></system></vpn></portalauth></remoteserver3></remoteserver2></nologdefaultblock></reverse></dhcphostname> 
    

  • Netgate Administrator

    You have your DNS servers set to 8.8.8.8 and 8.8.4.4 and one on each gateway?

    Steve



  • @stephenw10:

    You have your DNS servers set to 8.8.8.8 and 8.8.4.4 and one on each gateway?

    Steve

    Yes, but I have got another one for each WAN. Take a look at my screenshot.

    BUT I have got the monitor IPs on 8.8.8.8 and 8.8.4.4
    Both are google DNS servers….could it possible that both went down !?
    But they were working later but pfsense wasn't able to work as it did before.

    ---- edit ----
    Another thing which is curious is in the RRD graphs. Why is there "GW_WAN" displayed ?
    I do not have such a gateway as you can see in my first post.
    Not sure if this all has something to do with my problem posted in the first post.




  • I tested a little bit with my pfsense.

    In general, if both WAN1 and WAN2 are UP, then the default GW is WAN1 ( 192.168.1.1 ). Then pfsense is able to check for updates. For testing purposes I restartet my router for WAN1 and the routing table in pfsense changed. The default GW is now my LAN address ( 172.16.0.254 ). Of course this is not correct and because of this pfsense cannot check for updates.

    ![WAN1 down.jpg](/public/imported_attachments/1/WAN1 down.jpg)
    ![WAN1 down.jpg_thumb](/public/imported_attachments/1/WAN1 down.jpg_thumb)
    ![Default_GW_WAN down.jpg](/public/imported_attachments/1/Default_GW_WAN down.jpg)
    ![Default_GW_WAN down.jpg_thumb](/public/imported_attachments/1/Default_GW_WAN down.jpg_thumb)


  • Netgate Administrator

    Hmm,
    Why do you have LAN set as a gateway? That must cause problems.

    I have that same issue with my RRD graphs. It still maintains graphs for any gateways that have ever existed. I renamed one at one time so now it has an empty graph.

    Steve

    Edit: See this post.



  • @stephenw10:

    Hmm,
    Why do you have LAN set as a gateway? That must cause problems.

    Because I am using another pfsense in routing mode behind my first one and so I have to create a static route with gateway.

    I have that same issue with my RRD graphs. It still maintains graphs for any gateways that have ever existed. I renamed one at one time so now it has an empty graph.

    Steve

    Thanks for info. So that shouldn't be cause of my problem. :-(



  • Uncheck the advanced option of switching gateways.



  • @ermal:

    Uncheck the advanced option of switching gateways.

    I did this yesterday and it kicked me off my OpenVPN and this morning there wasnt any connection to the internet possible.


  • Netgate Administrator



  • @stephenw10:

    Looks like a fix?
    https://github.com/bsdperimeter/pfsense/commit/e56a730636d36714b29fdec9947f4b8d0f2ff443

    Steve

    I read this. I will test a new snap tomottow when I am on work and can get close to my server ;)

    PS: Why cant pfsense get any updates when in MultiWAN with WAN1 (default GW) and WAN2 and WAN1 is down ? Then the GUI is slower and it ends in "unable to check for updates"
    For me it feels like Multi-WAN is a little bit "buggy" when it does failover. But perhaps this is only my feeling.

    Nevertheless thank you very much for taking time and giving advice.



  • Is there possibility to that pfsense itself can't use failover dns


Locked