Outbound Gateway selection in Rule ignored in some Multi-WAN cases

EEConsult

When using two WAN's, and with one WAN down, in at least the following cases the wan that is up will be used even though the Filter Rule/Gateway indicates the other WAN (that is down) should be used exclusively. (i.e. There is data leakage, when the data should be dropped completely.)

(This is verified with logs, traceroute, etc)

The configuration:
WAN1 - up
(If it matters: DSL, with gateway mode on modem allowing a DHCP'd internet IP on the modem and also on the pfSense WAN ethernet interface card, bogon/rfc1918 filtered out in filter rules, modem address 192.168.0.1)

WAN2 - down (various types of down (forced down for testing) listed below)
(If is matters: Clear WiMax modem serving a 192.168.15.x address to pfSense ethernet adapter via DHCP, modem DMZing all data to the pfSense ethernet interface, bogon/rfc1918 not filtered out in filter rules)

LAN - 192.168.1.x addresses served by pfSense to network via DHCP

pfSense: Current version: 2.0-RC1 Built On: Sat Feb 26 15:30:26 EST 201 (Arch: i386, debug kernel)

LAN Filter Rules (in order): - Note that this is a testing configuration, not the production rules yet, just trying to force various corner cases.

Anti-lockout (default setup)
Modem (DSL) access rule:
Source: LAN net
Destination: 192.168.0.1 (The address of the DSL modem)
Gateway: DSLWAN
Queue: none
Force most traffic to WIMAX rule: (This Modem is various types of down for testing purposes, as shown below)
Source: LAN net
Destination: *
Gateway: WIMAXWAN
Queue: none

Below are various situations where the traffic passes through the operating WAN (DSL) even though the filter rule sent it to WIMAX that is down.

Note: The purpose for this configuration is to test black holeing the traffic if it cannot go out the proper interface.
Note: The log shows that the "Force most traffic to WIMAX rule" was activated for all the situations below, so the problem is not the filter rule being activated, but apparently, the multi-wan routing?

Situation 1:
Server freshly rebooted. (This is important, once the routing works once, it does not fail again until you reboot.)
DSL modem ethernet cable attached, modem running, working with internet accessable.
WIMAX ethernet cable disconnected from pfSense ethernet adapter.
Both WAN interfaces with no ping response as shown in the pfSense web configurator.
No Gateway groups defined.
Internet accessable from LAN (but shouldn't be).

Situation 2:
Same as above, but:
Gateway groups defined, and used in Filter Rules. Each gateway "group" has only a single interface selected for use (other one set to "Never".)
Internet still accessable (but shouldn't be).

Situation 3: (Works now.)
Same as situation 2, but with WIMAX ethernet connected, and modem operating normally.
Pinging returning success according to pfSense on both WANs
Packets reach internet (which is correct operation, correct path was verified with traceroute )

Situation 4: (Also working.)
Carried out immediately after situation 3, but with pinging NOT working on WIMAX but internet accessable, or with WIMAX down completely.
Packets are kept from reaching internet. (which is correct operation)

So, key here: after the WAN's are both up once, then it works correctly even if one goes down.

Sure, this is a corner case. But for a new installations it is huge: Multi-Wan is more common than ever, and typically when installing something, I install slowly (one wan at a time), and test along the way. This seems a small issue now that it is clear the failure does not reoccur after both wans come up for the first time after a reboot. However before that was realized…. I nearly wiped the machine and installed Astaro!

Since incremental setup (one wan at a time) is common for new users, they will find this much more often than experienced pfSense veterans.

Those new users may just skip to a different firewall after finding this.

I'm glad I didn't! (but it was awful close for a while there)

Thanks for everyone's effort. pfSense is looking better all the time.

eri--

You have to tag traffic entering the LAN and enforce the policy routing with outgoing rule matching the tag.
Otherwise pfSense tries to do its best to send this traffic outside a usable outgoing port.

If you want policy root to be valid no matter what, it has to be enforced through floating rules.
As i said just tag the traffic in the LAN rule and create an outgoing firewall rule to direct this traffic with policy routing.

EEConsult

While I don't dispute what you are saying at all with regard to stopping the leak, (what you are suggesting should certainly stop the leak)…

It all comes down to this:

A firewall must strictly adhere to every firewall rule.... Period. No exceptions.

If the router is allowed to send packets anywhere it pleases in complete contradiction to firewall rules, and do so under circumstances that are difficult to predict (and then stop doing so under equally inexplicable circumstances), then the firewall is just a huge hole waiting to happen.

What I am saying is that this is a huge bug. Anything that inserts hidden exceptions to firewall rules is a bug.

It is important to fix this security vulnerability.

(As a side note, at first it seems like a great idea to provide "best effort" routing/etc, but it is risky to let the router think it knows what is and is not an "outgoing port" on a firewall. Everything is different. A firewall knows only rules. (And it must not be allowed to think otherwise.) It cannot be making assumptions like thinking that multiple WANs are somehow equivalent. They are not the equivalent for many cases, like failing over to an expensive to use link (like 3G), using encrypted vs non-encrypted links, connections to clients that check origination IP (like HTTPS browsers do), etc. )

Thanks in advance!

eri--

It depends on the product.
I did not say it IS a bug but that it is a feature.

You have to learn a product before judging it, and i do not think the way a product you are used to does things should apply to all other products!
pfSense applies rules twice once a packet is incoming and once when the packet is leaving an interface hence the behavior you are seeing.
Furthermore, you are configuring pfSense so its trying to do what you told it to.