Packets not hitting rule they should

SteveFW

Hello,

I'm new to pfSense. I used to run a Virtual Checkpoint Firewall solution on VMware for many years. I did not renew the maintenance contract as my needs no longer justify the costs of a CheckPoint solution. I replaced my CheckPoint firewall "in-place" with pfSense 2.4.3. I built a pfSense Firewall pair consisting of two VMware VM's using CARP which works fine.

While having the pfSense pair isolated (to not cause IP conflicts etc.), I copied over all the firewall and NAT rules etc. etc.. then turned off the CheckPoint appliances, took the pfSense pair into production and everything works.
I'm very happy with the results. I had a little trouble to adjusting to the way pfSense works in some cases but I overcame all of them so far and everything works as expected now.

Disclaimer: as i'm used to CheckPoint, I work with floating-rules exclusively (for me, it's the best way of doing rulebases). All rules have the "quick" parameter set. There are some auto-generated rules on some of the interfaces but i've marked them in their description fields to see if any of them hit but they never do. Everything is "caught" by floating-rules as per my design. It all works as I expect. The last floating rule is the "Block all and log" rule and works as designed.

Having said that, I do have one issue and it's a nasty one (for me, hopefully not for you PF experts):

Disclaimer 2: i've went through a lot of documentation, blogs etc. incl. https://www.openbsd.org/faq/pf/filter.html but I can't find a solution.

My workstation is an iMac (192.168.10.51) and runs MacOS High Sierra with MS Office Outlook. In the DMZ (vlan 30) there is a load-balancer which reverse-proxy fronts an Exchange Server (192.168.10.19). The VIP in vlan30 has IP 192.168.30.119
When I start Outlook, it connects via HTTPS to the Load Balanced VIP in the DMZ which is 192.168.30.119. It starts out fine and the correct firewall rules are hit. Outlook sync email and all is normal.
After a couple of seconds, the Outlook connection dies. It stops syncing. I quit Outlook, start it up again, it syncs just fine. Then the connection dies again.

I am doing the exact same thing as folks in another VLAN (192.168.200.x) and their Outlook on MacOS (same version everywhere) do not have problems at all. The only difference is that my iMac is in the same server VLAN (VLAN1, 192.168.10.x) as the Exchange server. But as said my Outlook connects to the same VIP in VLAN30 as everybody else. (I have my reasons, as the IT admin, to have my machine in the server VLAN).

Looking at the firewall log, I see my iMac (192.168.10.51) hitting the correct PASS rule (which explains why for the first few seconds, everything syncs and works) and after a couple of seconds, only the "Block all and log" rule get's hit.
The traffic that hits the "pass" rule are session SYN's. The ones that drop are always FIN, RST and ACK's.
I briefly thought about asymmetric routing (because my iMac sits in the same L2 domain as the Exchange server and everybody else is in a different VLAN from it) but as my iMac really only connects HTTPS (443) to the VLAN30 VIP, I ruled that out.

If I power down the pfSense pair, flush all arp-caches everywhere, power on the CheckPoint VM, my problem is gone. My Outlook syncs happily and continues to do so.
If I power down the CheckPoint VM, flush all arp-caches everywhere, power on the pfSense VM's, the problem is back. It leads me to saying that "pfSense does things differently than CheckPoint". Not saying better or worse, just different and I must adjust to the way pfSense works.

How do I go about troubleshooting this?

I attached a screenshot of the firewall log and the rule it should be hitting (and does in the beginning). I log "newest entries on top". At the bottom, you see the "pass" hits (as they should), then it starts to drop because the same traffic does not hit the correct rule anymore (which is what I don't understand).
![MacOS Outlook Problem.png](/public/imported_attachments/1/MacOS Outlook Problem.png)
![MacOS Outlook Problem.png_thumb](/public/imported_attachments/1/MacOS Outlook Problem.png_thumb)

Rule.png_thumb

johnpoz

You do understand all your entries in your firewall that are blocked are Out of state traffic.. So yeah they would be blocked. All I see in those blocks is A and FA and R..

https://doc.pfsense.org/index.php/Why_do_my_logs_show_%22blocked%22_for_traffic_from_a_legitimate_connection

To be honest this quite often points to asymmetrical routing as a problem..

If you want help on your rules you would have to post them.

"I work with floating-rules exclusively (for me, it's the best way of doing rulebases)"

Lets agree to disagree on that :) Your saying your doing all your rules in floating vs the interfaces directly? How would that be better.. What it makes for is a LARGE rule list that is hard to parse looking at it..

SteveFW

"To be honest this quite often points to asymmetrical routing as a problem.."

Yeah that keeps creeping in my mind as well. But knowing the layout of the network as well as I do, I don't see it. CheckPoint hates asymmetric routing too and it will tell you loud and clear in the logs that it's happening (and drop the packets). But it's not saying anything and everything works fine (as described in my OP).

I'll do some more packet-sniffing, tcp-sharking and wire-dumping. Maybe i'm missing something…

"Your saying your doing all your rules in floating vs the interfaces directly? How would that be better.. What it makes for is a LARGE rule list that is hard to parse looking at it.."
To each his own ;)
pfSense and other Firewall like Netscreen (now Juniper SSG) give/gave the admin the option to do either. pfSense calls it floating rules. Dunno what Netscreen used to call it back in the day. CheckPoint has always done it "floaty style" and I like it.
I have a LOT of vlan's, and therefore Interfaces on the network here. Clicking on each and every one of them to find a rule... too much of a hassle. The reason why I like the "floating" concept is you have one massive list. It's on one page and when you are consistent in naming and descriptions, a simple "find" in the browser or log-viewer brings you where you want to be instantly. I have zero problems with working this way. I've been doing it for the past 16 years ;D

What ever "floats" your boat eh 8)

johnpoz

Yeah been working with firewall since the first packet filters ;)

Juniper, Checkpoint, Cisco ASA and Pix, etc. Name it and prob have had hands it as some time or the other.

How many vlans you have… 1000? Then you might have a point. 10 or 20 or so then no its easier to work with the interfaces..

Like said going to agree to disagree here ;) But your log is showing out of state hits.. That traffic would never be passed be it you created a special log rule or not.. default log would log those as well if you left that on, etc.

From multiple years experience here on the boards. It's either something like a cell phone moving from cell to wifi trying to use the same session vs creating a new one. Or OLD sessions, etc.. But normally is an asymmetrical problem. You mention lots of vlans - any of those "downstream" at a different router? And not via a transit.

Without a diagram not going to be able to help you find your problem.

SteveFW

Hello John,

I solved my problem by adding a VLAN interface on my Mac in VLAN200 (where the other folks are) and route VLAN30 through the VLAN200 interface on the pfSense pair.
My Mac now has a normal Interface in the server VLAN1 and a VLAN interface in VLAN200. Default gateway is the pfSense Interface in the server VLAN1 and I told my Mac to route 192.168.30.0/24 to the VLAN200 interface of the pfSense CARP.
I added the needed rules and viola, no more funky chicken. Outlook sessions are kept open and it works like a charm.

Still, my curiosity of "WFT is going on" (or "was going on") remains. So as a diagram, picture this: the Firewall in the middle and three legs, one in "server" VLAN1 (where my Mac is and the Exchange server is). The other in the client VLAN200. The third leg is VLAN30 (DMZ) where the Load-Balancer has it's public facing interfaces and the VIP 192.168.30.119.
Very simple. The other 13 VLAN's + WAN are not relevant.

I ran Wireshark on my Mac and on the Exchange server. They are(were) not exchanging packets in any way. My Mac (before adding it to VLAN200 as a second interface) only ever sends 443 traffic to the VIP in VLAN30.

What is interesting is, in the old situation before my Mac got a leg in VLAN200, is that when I closed Outlook, and verified that it really was gone and netstat shows everything gone too, up to a minute later, I still get "FIN" and "ACK" dropped packets (like in the screenshot above). So packets are swirling around for a while, belonging to nothing and thus not to any valid state. And get dropped.

What is really weird is that my old CheckPoint gives me no trouble and shows no out of state stuff. If I power off the pfSense pair, clear ARP tables, power on the CheckPoint box, everything works fine (i temporarily removed the VLAN200 Interface before putting the CheckPoint back into service). When I start using the pfSense pair again, the problem is back and Outlook connections start dying when VLAN1 is used to communicate.

Again I don't have an issue at the moment due to the "add vlan200 to Mac and route VLAN30 over it trick" but i'm just curious why pfSense sees packets out of state.

johnpoz

" i'm just curious why pfSense sees packets out of state."

Because it sees a packet it has no state for ;) That is not a SYN..

Placing a box on more than 1 vlan normally leads to the kind of shit your seeing. Why would your MAC need an interface in more than 1 vlan??

That is not a picture that is words! – Actual picture worth 1000+ of those...

SteveFW

All right all right here is ya pictja.

Why does everything work hunky dory when I route over VLAN200 (dotted green line which is my workaround) and not over VLAN1 (blue) which has my Default Gateway.
The Default Gateway for all VLAN's are the .254 IP's in the respective networks where something.254 is always the central firewall.

I'll need to run TCPdump on the PFsense to find out the root cause. Wiresharks on any clients and servers show nothing strange.

Diagram.png_thumb

johnpoz

Well that sort of setup is RIPE for asymmetrical traffic..

If your mac sends to its gateway to get to .200.x and .200.x answers back direct on their shared network.

Again what is the POINT of mac having a leg in the .200.x network? Is it just for storage?

And defeats the whole point of a firewall..

Where is the switch that connects all of these different networks to pfsense? Quite possible you have vlans that are not actually isolated can cause problems as well when you multi home a box like your mac.

SteveFW

No no no having an interface in VLAN200 is a new thing, as it solves the problem of wierd packet drops. I decribed it above. I solely route VLAN30 over the .200.254 interface now to make my Outlook work. It's a stupid work around but if it's stupid but it works, it's not stupid…

Forget about the second interface for now. Ignoring the green dotted line completely brings us to how it was when I started my thread. As you can see, it's a clean setup and wiresharking proves no packets being (attempted) via funny ways. My Mac REALLY only talks to the .119 VIP in VLAN30. Both ends show clean traffic-flows and nothing flying in some wierd direction. And again, the CheckPoint, which is picky as hell when it comes to assym.routing, causes no trouble.

So forget about that second interface on my Mac. Go back to the beginning. My question would be: How, besides having packet-dumping proof of clean traffic flow, can I start understanding why pfSense thinks there are packets out of state where as the CheckPoint, configured in fundamentally exactly the same way, sees no issues and doesn't drop packets.
What is pfSense thinking? I think it has a different opinion as to "what a state is" compared to checkpoint. I just want to understand why it has this opinion about these packets.
Outlook sessions always start out ok after the program is started. Then, within 10 to 15 seconds, the connection to the VIP start to die and the logs start showing these dropped packets. When I press the "Fetch mail now" button in Outlook, it sets up a new connection, fetches mail, then 10 to 15 seconds later, this new connection dies again.

This does not happen when I create a second Interface in VLAN200 and forcefully route Outlook's connection to the VIP 192.168.30.119 over the 192.168.200.254 interface instead of the default gateway on 192.168.10.254
The Load-balancer Pair that the VIP runs on, has a default gateway of 192.168.30.254

Netstat -rn output:

imac:~ steve$ netstat -rn -f inet
Routing tables

Internet:
Destination Gateway Flags Refs Use Netif Expire
default 192.168.10.254 UGSc 53 0 en0
127 127.0.0.1 UCS 0 0 lo0
127.0.0.1 127.0.0.1 UH 32 299344 lo0
169.254 link#5 UCS 0 0 en0
169.254 link#10 UCSI 0 0 vlan0
192.168.10 link#5 UCS 10 0 en0
192.168.10.19 0:50:56:a8:40:1e UHLWI 0 13238 en0 1160
192.168.10.24 0:11:32:33:c6:c3 UHLWIi 5 2971 en0 1044
192.168.10.51/32 link#5 UCS 0 0 en0
192.168.10.52 0:25:90:d8:2:8e UHLWIi 9 812 en0 1081
192.168.10.101 0:50:56:a8:54:9e UHLWIi 1 9361 en0 563
192.168.10.148 0:50:56:a8:54:a2 UHLWI 0 0 en0 898
192.168.10.204 0:50:56:ae:46:a6 UHLWIi 1 10699 en0 1192
192.168.10.237 0:50:56:ff:ea:3c UHLWIi 2 0 en0 758
192.168.10.245 link#5 UHLWIi 4 0 en0
192.168.10.249 f4:ea:67:89:32:7a UHLWI 0 0 en0 1106
192.168.10.254/32 link#5 UCS 1 0 en0
192.168.10.254 0:0:5e:0:1:78 UHLWIir 7 0 en0 1166
192.168.200 link#10 UCS 3 0 vlan0
192.168.200.234 2.11.32.20.c3.9b UHLWIi 3 14065744 vlan0 157
192.168.200.235/32 link#10 UCS 0 0 vlan0
192.168.200.254 0.0.5e.0.1.7c UHLWIi 2 0 vlan0 1163
224.0.0/4 link#5 UmCS 2 0 en0
224.0.0/4 link#10 UmCSI 1 0 vlan0
224.0.0.251 1:0:5e:0:0:fb UHmLWI 0 0 en0
239.255.255.250 1:0:5e:7f:ff:fa UHmLWI 0 357 en0
239.255.255.250 1:0:5e:7f:ff:fa UHmLWI 0 357 vlan0
255.255.255.255/32 link#5 UCS 0 0 en0
255.255.255.255/32 link#10 UCSI 0 0 vlan0
imac:~ steve$

johnpoz

And how do you have these networks isolated at layer 2? That is the most important part which you have left out of your drawing.

Why does it show vlan0? Vs vlan200?

SteveFW

"And how do you have these networks isolated at layer 2? That is the most important part which you have left out of your drawing."
Swtiches and VMware doing VLAN's.

"Why does it show vlan0? Vs vlan200?"
I assume that is what MacOS calls the first VLAN Interface that one creates. The second would be called "vlan1". The naming is independent of the VLAN ID used or the Name given to it in the GUI (in the GUI it's called "Vlan200").

johnpoz

"Swtiches and VMware doing VLAN'"

This is where you could have a problem..

Keep in mind your traffic might not be asymmetrical.. It could just be some bad acting application… Or if your running multiple layer 3 over the same layer 2 (ie not proper layer 2 isolation) you could get weird stuff happening.

Your idea of sniffing at pfsense makes sense to try and determine why pfsense is seeing out of state traffic.

SteveFW

"Or if your running multiple layer 3 over the same layer 2"

Both pfSense VM's have three vmxnet3 interfaces:
1 x WAN (VLAN1000) connected to a classical dvSwitch Portgroup that only has the relevant VLAN.
1 x SYNC (VLAN61) connected to a classical dvSwitch Portgroup that only has the relevant VLAN.
1 x TRUNK which is a VLAN-Trunk on the VMware level with only the 17 VLAN's configured that I need to present to this particular firewall-pair. VLAN200, VLAN30 and VLAN1 are amongst those that go "through" it.

As discussed all VLAN's go through this Trunk Portgroup, and VLAN30 and 200 don't bite, I cannot think of a reason why the combination VLAN1 and VLAN200 should.

Attached a screenshot of one of the two pfSense nodes. Both are identical.

![pfSense vmNIC Config.png](/public/imported_attachments/1/pfSense vmNIC Config.png)
![pfSense vmNIC Config.png_thumb](/public/imported_attachments/1/pfSense vmNIC Config.png_thumb)