WAN Incoming Failover Fails

jsmwalker

Hi,

Hopefully someone can shed light on this, we have 2 wan connections, outgoing LB policy works perfectly, and we've tested incoming failover which also seemed to work fine, however we have just had an failure of our main connection and the failover failed, it seems that the incoming failover (2 NAT polices per published service using DNS failover records) only works if the main connection is up, which kind of means it doesn't work the way expected ??? To confirm all the devices can still gain internet access, just replies to incoming requests don't seem to be correctly processed.

Anyone any ideas on this, we are using 1.2.2, and have a couple of issues for not upgrading to 1.2.3 Final (NIC Driver issues)

Cheers in advance.

J

tasis

I can confirm that the same behaviour is exhibited by version 1.2.3.

We have a setup consisting of three Internet ADSL links (WAN, WAN2|OPT1, WAN3|OPT2) used for load-balancing / failover.

Although this setup works fine as far as outgoing connections are concerned, once the WAN interface goes down, we completely lose incoming connectivity to WAN2 and WAN3. You cannot ping WAN2 or WAN3 from the outside, nor connect to any services that may be offered via NAT Port Mapping behind them.

This - as you write - renders failover for incoming connections rather useless…

What I assume happens, is the following: when the WAN interface goes down, the pfSense box loses its default gateway. In this respect it cannot respond to any incoming requests on WAN2 or WAN3. If, however, you set a static route to whatever destination to go via e.g. WAN2 as the gateway, then you have connectivity again...

You can test this:

WAN/WAN2/WAN3 are up, you can ping all IP addresses from 1.2.3.4 (substitute your IP address)
WAN goes down -> now neither WAN2 or WAN3 can be PINGed from 1.2.3.4
create a static route to 1.2.3.4 via WAN2
now you can ping WAN2 from 1.2.3.4

If the problem is the default gateway of pfSense being lost, I can only assume that perhaps the use of a routing protocol (RIPv2 ?) on pfSense might be used to dynamically change the default gateway to WAN2 or WAN3...

However I have not found any references to this, only some people mentioning to look for discussions regarding the "failover of services on the pfSense itself".

I will post more if I find something, please do the same... :)

Needless to say that this is a big problem for us as well... we absolutely need failover for incoming traffic as well (not in the sense: one WAN -> many inside servers, but multiple WANs -> one inside server)

jwbrown77

I'm going to be deploying a similar setup in about a week when I get my new hardware.

I bought the book, and it says a couple of things on this. Hope the authors don't mind me quoting:

"In the case of traffic initiated on the Internet destined for an OPT WAN interface, pfSense automatically uses pf's reply-to directive in all WAN and OPT WAN rules, which ensures the reply traffic is routed back out the correct WAN interface."

"Each port forward applies to a single WAN interface. A given port can be opened on multiple WAN interfaces by using multiple port forward entries, one per WAN interface. The easiest way to accomplish this is to add the port forward on the first WAN connection, then click the + to the right of that entry to add another port forward based on that one. Change the interface to the desired WAN, then click save."

Seems like this configuration should work?

tasis

I fully agree, this is what we also expected!

However the scenario that I mentioned seems to be completely reproducible: if WAN goes down, OPT1 stops replying to incoming connections. However, if you then add a static route via OPT1 to your IP, then OPT1 is reachable again! (WAN is still down)

Throughout this test, outgoing connectivity - from the pfSesnse via OPT1 to the Internet - is not at all affected.

For completeness, I should mention that our setup is the following:

WAN: PPPoE connected to a fully bridged ADSL modem
OPT1: DHCP connected to a half-bridged ADSL modem

In both cases, WAN and OPT1 receive each the public IP address from their corresponding ISP (in our case these are static IP addresses)

I have almost given up searching, I cannot find any similar behaviour reported in the forums, only what <jsmwalker>posted when he started this thread…</jsmwalker>

jwbrown77

My ISPs are fully static IP. I will try this setup and post my results, though it might be a week or two.

If it's critical to your business, you might consider paying for the commercial support? Maybe they can figure it out.

Only idea I have… Did you do anything special with your NAT rules, or are they just default? Was thinking that with special rules maybe that "reply-to" directive is dropped?

tasis

Thanks for helping out, we will consider paid support if nothing else comes up. For the time being, we are using a third ADSL link on a secondary pfSense to provide for incoming traffic fail-over (we primarily use it for incoming email, so setting two MX entries pointing to the two pfSense boxes does the trick). Using two pfSense boxes was not what we had in mind though…

Nothing special with the NAT rules, just mapping TCP/25 to an internal server (and the associated firewall rules are also created).

Please note that what I mentioned before happens even with just ICMP packets to the OPT1 interface (i.e. a simple PING to the WAN2 public IP address):

Firewall rules allow incoming ICMP on the WAN2 interface to the WAN2 address
incoming PING to the WAN2 public IP address works fine
if WAN goes down, WAN2 can also no longer be pinged from the outside (eg from IP 1.2.3.4)
but if a static route is added for your outside IP address (1.2.3.4) via WA2, WAN2 can be pinged again!

Again I assume that it has to do with the default gateway not being available after WAN goes down.

But I cannot explain it. Incoming PINGs to WAN2 should go out via WAN2 in any case.... unless I do not understand the way pfSense works.

tasis

Interesting follow up:

We performed a similar test on another dual-WAN pfSense firewall at a different location (pfSense 1.2.3 embedded running on an Alix 2d1 board).

In this case, if WAN got disconnected, WAN2 continued to work (i.e. to be pinged from the outside). As it should in the first place…

It really makes me wonder now if there is a misconfiguration of our first pfSense (rel 1.2.3 on a PC platform) that caused the original problems with this strange WAN/WAN2 coupling that we were experiencing, or maybe if the different platform (PC vs embedded) plays a strange role...

__Fox__

Hi, i've got the same problem… http://forum.pfsense.org/index.php/topic,23391.0.html

can you confirm that with the embedded version the nat from opt interface still works also without WAN1 UP?

Thanks

__Fox__

And so, no one use nat in from multiple lan? :-[

tasis

@__Fox__:

Hi, i've got the same problem… http://forum.pfsense.org/index.php/topic,23391.0.html

can you confirm that with the embedded version the nat from opt interface still works also without WAN1 UP?

Thanks

No, as it seems I cannot confirm it. It appears that 1.2.3 embedded behaves in the same fashion (which makes sense I suppose).

We are in the process of replacing our pfSense PC boxes with Alix boards. We performed yesterday one such migration (using the backup/restore feature) and we found that the Alix pfSense behaved in the same way.

The only difference we saw was that this time it took some seconds (maybe minutes) before WAN2 stopped being responsive: we unplugged WAN, and we could still ping WAN2 for some time until it stopped.

This time lag may be the reason that I initially reported that the embedded version was behaving differently. Perhaps I didn't wait long enough at the time, we will try to repeat the test next week and report back.

PS. I asked the same question to Tom Schaefer's blog (http://www.tomschaefer.org/web/wordpress/?p=538#comment-576) and the reply Tom gave was "Make sure your resetting the states or rebooting. The reason you have to reset the states or reboot is to enforce the settings you have made. Pfsense will hold on to connections until they timeout and thus your rules will not apply. That is why the pfsense team recommends you reboot or reset the state table. This applies to firewall settings."

And provided the link: http://forum.pfsense.org/index.php/board,21.0.html for more information.

I have not tried yet to just reset the states or to wait long enough to see if things would be fixed after a timeout… maybe next week as I said.

__Fox__

I tryed to reboot the pfsense box with the WAN offline and the result is the same… no nat in.
:-\

tasis

@__Fox__:

I tryed to reboot the pfsense box with the WAN offline and the result is the same… no nat in.
:-\

How about resetting the states? (Diagnostics -> States -> Reset States tab)

Do you have perhaps the possibility of also trying this out?

__Fox__

Yes… I tried now..
not only the nat in doesn't work, but also the outs connections don't come up after the reset :(
[I'm connected to a inside lan pc via a teamviewer connection than works from outside also when WAN fail]

tasis

@__Fox__:

Yes… I tried now..
not only the nat in doesn't work, but also the outs connections don't come up after the reset :(
[I'm connected to a inside lan pc via a teamviewer connection than works from outside also when WAN fail]

I am at a loss myself… Not only can I not explain it, I am surprised that not more people need incoming NAT fail-over.

__Fox__

I thought I was the only one..

cmb

You have to keep link on your WAN, if you lose link it'll do this. Doesn't matter if it's actually up, or what it's plugged into, as long as you have a link light.

__Fox__

I'm not sure that this is the problem…
I'm using a PPPoE connection on WAN and the problem come when I disconnect the PPPoE connection from WAN status (or when internet break).. the WAN LINK is ever UP..

Thanks

cmb

Oh, that's likely the same thing with a different symptom, with PPPoE your WAN is actually ng0 not the physical interface, and when you disconnect you lose "link" on that.