Outgoing Loadbalancer not failing over gracefully? DNS related?
-
Get later snapshot than this post and test.
-
Thanks ermal for trying to look at it. I think the problem is that the monitor IPs are not explicitly added to the routing table. I guess this happens because the link started off as down initially so it wasn't removed by accident.
-
Test the later snapshot.
Your routing table will preserve monitor ip routes with that change which otherwise will get lost during reload. -
Ermal,
I did install the latest update and rebooted the machine. Again, WAN2 was not connected but when it rebooted, I can see that the Gateway status still shows WAN2 as alive and online with no routing entry for its monitor IP in the route table.
Cheers!
-
You mean your wan2 disconnected as in physical cable disconnected from wan2 interface or interface wan2 was down?
-
I mean physically disconnected. I have been testing the case the link going down by disconnecting the cable for now. I haven't tried to simulate an internet outage yet.
Cheers!
-
I give up for now. I am trying with both wan and wan2 connected, all is working fine. I now block all traffic going to wan from my other router and so pfsense thinks the link is down which is correct. Now my browser will only load pages half way, very slow, etc.
I disabled DNS forwarder at the moment because I find it does not work very well when the links go down. It is not very multi-wan aware from what I see. Now I just have 4 dns server addresses in windows and I let the pfsense box do the load balancing and fail over routing.
I'll try to edit this post with config files, etc.
*Edit: I see that the WAN Gateway is marked down as expected but I still see new HTTP states being requested in the Diagnostics: States"
-
I disabled DNS forwarder at the moment because I find it does not work very well when the links go down. It is not very multi-wan aware from what I see.
As long as those routes are still there and correct, so at least one is reachable, it will continue to work fine. It queries every configured DNS server simultaneously and takes the fastest response, so as long as one is reachable there isn't even a delay. That all works on all the 2.0 multi-WAN setups I have in production, regardless of WAN/WAN2/WAN3 status. Get some pcaps of your DNS traffic when that happens and see what's happening on the wire.
Also note the output of the command 'grep route-to /tmp/rules.debug' both when things are working and not working. It should update the first lines accordingly with your pool configuration and the gateways' statuses.
-
Let me try this again. This is the following setup I have which is doing load balancing and it works, until I simulate a link down by blocking ALL traffic to 192.168.2.197 on the other side.
Instead of cluttering this post with all these images, I have added a link here: http://img51.imageshack.us/gal.php?g=generalnl.png
Output of route-to in rules.debug:
# grep route-to /tmp/rules.debug GWwan = " route-to ( em3 192.168.2.1 ) " GWopt1 = " route-to ( em4 98.210.16.1 ) " GWLOADBALANCE = " route-to { ( em3 192.168.2.1 ) ( em4 98.210.16.1 ) } " GWWAN_WAN2 = " route-to { ( em3 192.168.2.1 ) } " GWWAN2_WAN = " route-to { ( em4 98.210.16.1 ) } " pass out route-to ( em3 192.168.2.1 ) from 192.168.2.197 to !192.168.2.0/24 keep state allow-opts label "let out anything from firewall host itself" pass out route-to ( em4 98.210.16.1 ) from 98.210.19.93 to !98.210.16.0/21 keep state allow-opts label "let out anything from firewall host itself"
I've attached my config file as well.
Looks like when I simulate link down, the rules.debug route-to section does not change even though the status says WAN is link down.
-
Try latest snapshot it should behave correctly now.
-
If it's really marked as down, and stays that way, but the route-to doesn't update, then what Ermal committed today won't change that. Check the system log when that happens and post what it shows
-
Well, I am waiting for Ermal's changes to make it into the snapshot.
How does the timestamp correspond with the changes.
Ex: Built On: Mon Jun 14 14:14:25 EDT 2010
And commit:
Date: Mon Jun 14 15:26:32 EDT 2010
Committer: Ermal (eriAT@NOSPAM@pfsenseDOTorg)
Anyways, when a link goes down, this is what I see:
Jun 14 19:08:20 php: : All gateways are unavailable, proceeding with configured XML settings! Jun 14 19:08:20 check_reload_status: reloading filter Jun 14 19:08:07 apinger: ALARM: WAN(4.2.2.2) *** down ***