Multiple pfSense Routers, Multi-homed and Asymmetric Routing Challenge

weehooey

We are trying to resolve an issue with asymmetric routing with pfSense in a multi-homed environment.

The setup with a drawing is below.

Two ISP connections with BGP peering established.

ISP #1 with ASN 65001 with 1 Gbps connection
ISP #2 with ASN 65002 with 500 Mbps connection

The network in question is ASN 65500

Each ISP connection has a dedicated pfSense edge device (Negate 7100)
The edge devices are running FRR for BGP and OSPF
The network has a single public IPv4 prefix - 203.0.113.0/24
There is a single transit network connecting tenant networks - 172.16.16.0/24
The tenants are all using pfSense devices and are participating in OSPF
Tenants are assigned a prefix from the public IPv4 prefix (203.0.113.0/24). Most are single IPv4 addresses (/32)

Edge Router #1

Advertising the public prefix via BGP (203.0.113.0/24)
Advertising a default route via OSPF with a metric of 10. This makes it the default for the tenants.

Edge Router #2

Advertising the public prefix via BGP (203.0.113.0/24) with an AS prepend. This is the less preferred route externally (mostly).
Advertising a default route via OSPF with a metric of 110. This makes it the backup default route for the tenants.
Also, it advertises ISP #2’s prefixes via OSPF. This makes it the preferred route for just ISP #2’s prefixes.

Tenants' pfSense Devices

The WAN interface is statically configured with an IP address from the transit subnet (172.16.16.0/24)
No Upstream Gateway defined on the WAN interfaces
Their assigned public IPv4 address (/32) are made as IP Aliases on the localhost interface
The WAN and localhost interfaces participate in OSPF.
NAT is manually configured.
NAT rules assign the public IPv4 addresses as traffic leaves the WAN that is NOT destined for the transit subnet (i.e. public).

Everything works fine except in some instances where traffic is asymmetrically routed. Presumably, the AS prepend was insufficient for a remote network to route back via ISP #1.

For example:

Tenant A attempts to access an HTTP resource on ASN 65999.
Traffic leaves via ISP #1 (default route) as expected.
The response traffic returns via ISP #2
Edge #2 forwards the return traffic to Tenant A as expected.
Tenant A does not accept the return traffic, and the connection does not happen.

Questions

Why will the tenants not accept the return traffic if they are not returning via the same edge router? It works perfectly fine when the traffic leaves and returns from the same edge router.
The traffic has the correct IP addresses and ports, so it should match the state, right?
Does pfSense keep MAC addresses as part of the state?

                      ┌──────────────────────────────────────────┐
                      │              Remote resource             │
                      │              ASN 65999                   │
                      └────────────────────┬─────────────────────┘
                                           │
                                           │
  ┌────────────────────────────────────────┴──────────────────────────────────────┐
  │                                                                               │
  │                                    INTERNET                                   │
  │                                                                               │
  └──────────────────────────┬────────────────────────────┬───────────────────────┘
                             │                            │
                     ┌───────┴───────┐          ┌─────────┴───────┐
                     │ ISP #1        │          │ ISP #2          │
                     │               │          │                 │
                     │ ASN 65001     │          │ ASN 65002       │
                     │ BGP           │          │ BGP             │
                     └──────┬────────┘          └────────┬────────┘
                            │                            │
                            │                            │
┌───────────────────────────┼────────────────────────────┼────────────────────────┐
│                           │           ASN 65500        │                        │
│                     ┌─────┴───────────┐        ┌───────┴─────────┐              │
│                     │ pfSense Edge #1 │        │ pfSense Edge #2 │              │
│                     │                 │        │                 │              │
│                     │ BGP & OSPF      │        │ BGP & OSPF      │              │
│                     └───────┬─────────┘        └────────┬────────┘              │
│                             │                           │                       │
│                             │                           │                       │
│                       ┌─────┴───────────────────────────┴──────┐                │
│                       │                                        │                │
│                       │  L2 Switch - 172.16.16.0/24            │                │
│                       │  OSPF Area 0                           │                │
│                       │                                        │                │
│                       └──┬──────────────┬──────────────────┬───┘                │
│                          │              │                  │                    │
│                          │              │                  │                    │
│                          │              │                  │                    │
│        ┌─────────────────┴─┐    ┌───────┴───────────┐     ┌┴───────────────┐    │
│        │ pfSense OSPF      │    │ pfSense OSPF      │     │ pfSense OSPF   │    │
│        │ Tenant A          │    │ Tenant B          │     │ Tenant C       │    │
│        └─────────┬─────────┘    └─────────┬─────────┘     └────────┬───────┘    │
│                  │                        │                        │            │
│        ┌─────────┴─────────┐     ┌────────┴─────────┐      ┌───────┴───────┐    │
│        │                   │     │                  │      │               │    │
│        │ Tennt A LAN       │     │  Tenant B LAN    │      │ Tenant C LAN  │    │
│        │                   │     │                  │      │               │    │
│        └───────────────────┘     └──────────────────┘      └───────────────┘    │
│                                                                                 │
└─────────────────────────────────────────────────────────────────────────────────┘

weehooey

We have resolved this issue.

From the example provided above, when the traffic arrives at Edge #2, it does not have

We found the issue.

The edge routers were dropping inbound TCP packets even though there was a firewall rule allowing them.

Digging into the pfSense documentation, we discovered that a firewall rule allowing TCP only allows TCP with a SYN flag set. Any other TCP packet is dropped. Additionally, we found that there is a default deny on traffic leaving the firewall (out direction) for TCP packets without a SYN flag set.

Normally, TCP traffic is handled by the state after the initial SYN packet.

Because the traffic was returning via a different edge router, there was no state established AND the firewall rule was allow all TCP traffic that had a SYN flag; the return traffic did not pass.

Adding two floating rules fixed the issue.

Allow TCP traffic was set to allow all TCP flags inbound (of course, only for traffic you actually want to allow).
A rule that allowed all TCP flags OUT of the interval interface (again, matching the traffic you wanted to pass).