Multi WAN route problem with failed WAN link
I'm new to PfSense and so far I really enjoy the srong capabilities of the product.
I have one issue that appeared the other day when my WAN link went down (They are the ISP, still trying to fix it). Luckilly we recently installed a second ISP on OPT1 and the failover pool worked great. It took a few minutes and we were back in business (at least on our outbound internet).
Now, our problem:
If someone tries to reach a server that is behind PfSense, it appears as not responsive. I triple and quadruple check our configuration and to no avail.
Here is our setup:
WAN -> PPPOE Static IP (Down Link)
OPT1(WAN2) -> CAble Static IP (Up Link)
We have one pool as a fail over gateway for everything except voice (OPT1 –> WAN). The other is primarily for voice (WAN --> OPT1)
I followed the setup guides on the site etc, and I've been reading for a long time. I believe I have isolated the problem, but I'm not sure how to implement a permanent fix.
All servers and PCs behind PfSense can access the Internet, with no issues whatsoever.
The problem comes from Internet request on the OPT1 link. Here is an example with DNS, but the same happens with HTTP etc.:
Outside DNS request on port 53 to our DNS server behind PFsense using Port Forwading.
Outside request comes in, I see it begin caught by the NAT rule and the packet trace. It also gets to the server, the server gets the request and serves an answer. The answer goes back to the LAN interface (see it in the packet diagnostic), but it fails to reach the IP that made the initial request (ICMP Host unreachable).
I tried to ping the IP in question from Pfsense and it can't see the host in question that made the request.
I then added a static route to that host just to see if this would work, and it did.
My guess at the problem:
It looks like the route has not being defined (or redefined) when the link failed.
I'm not a network expert, and to get here took me a few days.
Can someone point me to a possible solution? Setup issue?
I believe I found my problem, but it looks like a bug to me (or maybe it should be explicitly stated somewhere).
Basically, you cannot use a PPPOE type connection for WAN. It has to be a static IP address or fail over just won't work if the WAN (the one with PPPOE) goes down.
This appears to be caused by the fact that the default route does not get created if no IP is assigned to the interface of WAN, which is exactly what happens when a PPPOE connection drops dead.
I still can't test it fully beacuse my ADSL line is still down. But the good news is that the ISP on OPT1 is now fully functional, both ways.
How did I fix it:
I took an old router I had, I put it in front of Pfsense and gave the Pfsense WAN interface a static IP going to that router and I simply let that router manage the PPPOE connection.
Recommendation for the Pfsense team.
Either update the documentation on the multiple wan setup with a big "WARNING" if you use Pfsense to mange the PPPOE connection in Multiwan fail over pools.
Prevent the selection of PPPOE type interfaces in the menu when creating pools (with a small disclaimer like you do for the other features, something like "PPPOE connections can't be used in pools, only static addresses allowed…"
Or, if it's possible, change the default route behavior so that it does not disappear when the PPPOE WAN connection dies.
A week later and we're now fully operational, the joys of working with Open Source :)
This is semi-related to this issue, added a link here so we'll look into this as well when we're looking into that one.