Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Outgoing Loadbalancer not failing over gracefully? DNS related?

    Scheduled Pinned Locked Moved 2.0-RC Snapshot Feedback and Problems - RETIRED
    18 Posts 4 Posters 7.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • G
      GoldServe
      last edited by

      My filter list is just "for all traffic from lan going to anywhere, use gateway LOADBALANCE"

      I'm pretty sure there is nothing causing it cause I have and still am using an alix board with 2.0-alpha-alpha from august of last year.

      The problems i'm seeing is that when an interface link goes down, the cleanups on the system are not being preformed like flushing the state table with that interface or something like that? I'll try to spend some more time debugging the issue but from what I see, the load balancing code is still not refine which I am providing feedback to help improve.

      1 Reply Last reply Reply Quote 0
      • G
        GoldServe
        last edited by

        Now when I upgraded to the latest build and I have WAN connected by not WAN2, the monitor IP for WAN2 is not registered in the static routes so apringer thinks that WAN2 is UP as well even though it is not even connected. What gives?

        From the state table, you can see that 4.2.2.2 (WAN monitor) and 4.2.2.3 (WAN2 monitor) is making connection through WAN:

        icmp  	192.168.2.197:32855 -> 4.2.2.2  	0:0  	
        icmp 	192.168.2.197:32855 -> 4.2.2.3 	0:0
        
        1 Reply Last reply Reply Quote 0
        • E
          eri--
          last edited by

          Get later snapshot than this post and test.

          1 Reply Last reply Reply Quote 0
          • G
            GoldServe
            last edited by

            Thanks ermal for trying to look at it. I think the problem is that the monitor IPs are not explicitly added to the routing table. I guess this happens because the link started off as down initially so it wasn't removed by accident.

            1 Reply Last reply Reply Quote 0
            • E
              eri--
              last edited by

              Test the later snapshot.
              Your routing table will preserve monitor ip routes with that change which otherwise will get lost during reload.

              1 Reply Last reply Reply Quote 0
              • G
                GoldServe
                last edited by

                Ermal,

                I did install the latest update and rebooted the machine. Again, WAN2 was not connected but when it rebooted, I can see that the Gateway status still shows WAN2 as alive and online with no routing entry for its monitor IP in the route table.

                Cheers!

                1 Reply Last reply Reply Quote 0
                • E
                  eri--
                  last edited by

                  You mean your wan2 disconnected as in physical cable disconnected from wan2 interface or interface wan2 was down?

                  1 Reply Last reply Reply Quote 0
                  • G
                    GoldServe
                    last edited by

                    I mean physically disconnected. I have been testing the case the link going down by disconnecting the cable for now. I haven't tried to simulate an internet outage yet.

                    Cheers!

                    1 Reply Last reply Reply Quote 0
                    • G
                      GoldServe
                      last edited by

                      I give up for now. I am trying with both wan and wan2 connected, all is working fine. I now block all traffic going to wan from my other router and so pfsense thinks the link is down which is correct. Now my browser will only load pages half way, very slow, etc.

                      I disabled DNS forwarder at the moment because I find it does not work very well when the links go down. It is not very multi-wan aware from what I see. Now I just have 4 dns server addresses in windows and I let the pfsense box do the load balancing and fail over routing.

                      I'll try to edit this post with config files, etc.

                      *Edit: I see that the WAN Gateway is marked down as expected but I still see new HTTP states being requested in the Diagnostics: States"

                      1 Reply Last reply Reply Quote 0
                      • C
                        cmb
                        last edited by

                        @GoldServe:

                        I disabled DNS forwarder at the moment because I find it does not work very well when the links go down. It is not very multi-wan aware from what I see.

                        As long as those routes are still there and correct, so at least one is reachable, it will continue to work fine. It queries every configured DNS server simultaneously and takes the fastest response, so as long as one is reachable there isn't even a delay. That all works on all the 2.0 multi-WAN setups I have in production, regardless of WAN/WAN2/WAN3 status. Get some pcaps of your DNS traffic when that happens and see what's happening on the wire.

                        Also note the output of the command 'grep route-to /tmp/rules.debug' both when things are working and not working. It should update the first lines accordingly with your pool configuration and the gateways' statuses.

                        1 Reply Last reply Reply Quote 0
                        • G
                          GoldServe
                          last edited by

                          Let me try this again. This is the following setup I have which is doing load balancing and it works, until I simulate a link down by blocking ALL traffic to 192.168.2.197 on the other side.

                          Instead of cluttering this post with all these images, I have added a link here: http://img51.imageshack.us/gal.php?g=generalnl.png

                          Output of route-to in rules.debug:

                          
                          # grep route-to /tmp/rules.debug
                          GWwan = " route-to ( em3 192.168.2.1 ) "
                          GWopt1 = " route-to ( em4 98.210.16.1 ) "
                          GWLOADBALANCE = "  route-to { ( em3 192.168.2.1 ) ( em4 98.210.16.1 )  }  "
                          GWWAN_WAN2 = "  route-to { ( em3 192.168.2.1 )  }  "
                          GWWAN2_WAN = "  route-to { ( em4 98.210.16.1 )  }  "
                          pass out route-to ( em3 192.168.2.1 ) from 192.168.2.197 to !192.168.2.0/24 keep state allow-opts label "let out anything from firewall host itself"
                          pass out route-to ( em4 98.210.16.1 ) from 98.210.19.93 to !98.210.16.0/21 keep state allow-opts label "let out anything from firewall host itself"
                          
                          

                          I've attached my config file as well.

                          Looks like when I simulate link down, the rules.debug route-to section does not change even though the status says WAN is link down.

                          config-lanner_pfsense.home-20100613215907.txt

                          1 Reply Last reply Reply Quote 0
                          • E
                            eri--
                            last edited by

                            Try latest snapshot it should behave correctly now.

                            1 Reply Last reply Reply Quote 0
                            • C
                              cmb
                              last edited by

                              If it's really marked as down, and stays that way, but the route-to doesn't update, then what Ermal committed today won't change that. Check the system log when that happens and post what it shows

                              1 Reply Last reply Reply Quote 0
                              • G
                                GoldServe
                                last edited by

                                Well, I am waiting for Ermal's changes to make it into the snapshot.

                                How does the timestamp correspond with the changes.

                                Ex: Built On: Mon Jun 14 14:14:25 EDT 2010

                                And commit:

                                Date: Mon Jun 14 15:26:32 EDT 2010

                                Committer: Ermal (eriAT@NOSPAM@pfsenseDOTorg)

                                Anyways, when a link goes down, this is what I see:

                                
                                Jun 14 19:08:20 	php: : All gateways are unavailable, proceeding with configured XML settings!
                                Jun 14 19:08:20 	check_reload_status: reloading filter
                                Jun 14 19:08:07 	apinger: ALARM: WAN(4.2.2.2) *** down ***
                                
                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post
                                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.