Failover stucked on backup link

dondos

Hello.
I am running pfSense 1.2.3. with two internet connections. The first one is an ADSL connection from a local provider, the second one is a 3G mobile connection.
My failover setup looks like this:

The 3G router (Orange) is 192.168.11.6 and 89.122.156.*** is the ADSL gateway (Romtelecom).

The failover configuration works fine, routing traffic through the 3G backup link (when main ADSL connection is down), but it won´t switch back when the main ADSL connection comes back online. The only solution is to turn off the 3G router, to simulate a failure on the backup link.

Here is the log:

Jul 3 09:59:20 	apinger: alarm canceled: 192.168.11.6(192.168.11.6) *** down ***
Jul 3 09:58:04 	apinger: ALARM: 192.168.11.6(192.168.11.6) *** down ***
Jul 3 00:13:59 	apinger: Starting Alarm Pinger, apinger(60095)
Jul 3 00:13:59 	apinger: Exiting on signal 15.
Jul 3 00:13:59 	apinger: command (/usr/bin/touch /tmp/filter_dirty) exited with status: 1
Jul 3 00:13:59 	apinger: Error while starting command.
Jul 3 00:13:52 	apinger: alarm canceled: 89.122.156.***(89.122.156.***) *** down ***
Jul 3 00:04:00 	apinger: ALARM: 89.122.156.***(89.122.156.***) *** down ***

Any suggestions would be appreciated.

Perry

Could it be the monitor ip, try using Google Public DNS 8.8.8.8 and 8.8.4.4

dondos

The monitor IP is 89.122.156.*** and is replying to ping queries. Changing the monitor ip to 8.8.8.8 didn´t helped. And yes, a google public dns is among those used:

Jul 4 02:01:13 	dnsmasq[1013]: server 62.217.193.1#53: queries sent 748425, retried or failed 18588
Jul 4 02:01:13 	dnsmasq[1013]: server 193.231.100.134#53: queries sent 737650, retried or failed 0
Jul 4 02:01:13 	dnsmasq[1013]: server 193.231.100.130#53: queries sent 737652, retried or failed 1
Jul 4 02:01:13 	dnsmasq[1013]: server 8.8.4.4#53: queries sent 737652, retried or failed 0
Jul 4 02:01:13 	dnsmasq[1013]: queries forwarded 748428, queries answered locally 147664
Jul 4 02:01:13 	dnsmasq[1013]: cache size 10000, 0/1348664 cache insertions re-used unexpired cache entries.
Jul 4 02:01:13 	dnsmasq[1013]: time 1278198073

jasonlitka

Existing sessions won't bounce back to the DSL. If you're moving a lot of traffic to/from the same IPs, you may be stuck on the 3G for a while. Clearing out existing states should fix it.

mav2929

Try configuring another new Failover pool and rules (make sure default Gw is tied to this new pool) it should work.

dondos

@jasonlitka: a while = a few hours ??
@mav2929: I don´t understand you suggestion. Are you saying that I should duplicate the existing pool?

GruensFroeschli

How do you test, that it doesn't fall back?
And yes, if you have a continuous stream (connection) on your backup WAN it wont fall back in a few hours. (it will never fall back)

jimp

@dondos:

@jasonlitka: a while = a few hours ??

However long a connection/session is active.

It will not cut off an existing session just because the WAN came back online. (I think there might be an option to do just this in 2.0 in the works)

jasonlitka

@dondos:

@jasonlitka: a while = a few hours ??

Indefinitely if there is a stream of data regular enough to keep the session alive. 1 ping per second, for example, is more than enough to do it.

dondos

@GruensFroeschli:

How do you test, that it doesn't fall back?

tracert www.google.ro or any other site.

@jimp:

It will not cut off an existing session just because the WAN came back online. (I think there might be an option to do just this in 2.0 in the works)

This ˝feature˝ is nasty, since I have to pay for the 3G traffic…

jimp

Most people would prefer to keep their sessions alive rather than abruptly cut off users, but with metered links that is a valid issue. A preference for it would be idea.

http://redmine.pfsense.org/issues/8

jasonlitka

@dondos:

@GruensFroeschli:

How do you test, that it doesn't fall back?

tracert www.google.ro or any other site.

@jimp:

It will not cut off an existing session just because the WAN came back online. (I think there might be an option to do just this in 2.0 in the works)

This ˝feature˝ is nasty, since I have to pay for the 3G traffic…

If you can pick a random site where you know your users haven't been visiting and you're still seeing traffic going out the 3G card when the main link is back up then something is wrong. It is only sticky for existing connections.

dondos

Nope, same issue. But I found a workaround: using opt1 and opt2 interfaces in the failover pool.