Possible firewall bug - Confirmed with testbed

jonnytabpni

Hi Everyone,

I'm using "2.0-BETA2 built on Tue Jun 8 15:18:33 EDT 2010" and I may have found a possible bug. I started off the reporting in this post:

http://forum.pfsense.org/index.php/topic,26421.0.html

However I've decided to continue here as this is probably a 2.0 specific bug (Haven't testing 1.2.3 for this issue though).

I'm trying to use pfsense for 2 purposes. To bridge 2 interfaces together (WAN and PUBLIC), as well as keep a 3rd interface (LAN) for private (normal NATted) use.

The bug is related to traffic "leaking" from the PUBLIC interface over to the LAN interface. Here are the steps to reproduce:

1 ) Setup pfsense with 3 interfaces. WAN, LAN and PUBLIC
2 ) Bridge WAN and PUBLIC together (They will share the same subnet).
3 ) Make sure that the WAN interface is assigned an IP, but the PUBLIC interface is not
4 ) Make sure the LAN interface is setup as normal with a private IP range. The default "allow all" rule is ok here
5 ) Connect a host to the LAN interface (We will call this X)
6 ) In the WAN tab on pfsense, allow ICMP (ping) access from anywhere to the host setup in step 5. Also create a rule which allows ICMP (Ping) from anywhere to the pfsense WAN Interface address (you wouldn't do this in production, it's just to prove a point).
7 ) Connect a host to the PUBLIC interface (We will call this Z). Make sure that this host is using the pfsense WAN interface as its default gateway. I appreciate that this is not a standard setup (since we are using bridging and normally you would use an upstream gateway), however this is to prove a potential security risk, as we are assuming that hosts on PUBLIC are not trusted.
8 ) In the PUBLIC tab, make sure the only rule listed is a "block all" (This is just an extreme case to prove results) This is important, as the assumption is that hosts on the PUBliC interface will not be able to access anything.
9 ) Reset state table (just to be sure)
10 ) Try and ping X from Z. It will probably fail, as you would expect
11 ) Now, try and ping Z from X. It will probably work (as you would expect). Actually, what I did was log into ssh on Z from X. But ping will probably work.
12 ) Now, try and ping X from Z again, and this time, it works. You will also find that Z can ping the pfsense WAN IP address.

You may actually find that 12) is not true, and may start to think that I'm being silly. However, if you try hard enough, you will eventually get traffic to leak. I know that it a terrible way of describing how to reproduce a bug, however please understand that this is a temperamental issue, and I'm not 100% sure how to make it rear its ugly head.

Has anyone else experienced this issue?

What I think it is, is that once you access a host on the PUBLIC subnet from the LAN, when subsequent requests from PUBLIC to LAN are made, pfsense is having trouble determining that the traffic is from PUBLIC, and instead thinks it's coming from WAN. This is possibly a MAC-Interface mapping mismatch.

Update: I found that by issuing "arp -d -a" from the pfsense shell as well as resetting state table did indeed cause this to appear. You might want to do this instead of step 11

Thanks

jonnytabpni

It would be appreciated if a mod could let me know whether or not I should put this in a bug report in redmine.

Thanks

jimp

Before you do anything else, upgrade to a recent snapshot. Testing a bug on a month-old snapshot doesn't tell us anything conclusive.

cmb

pf keeps ICMP state by ICMP ID and source and destination IP. If you try enough, and your OS doesn't generate its ICMP IDs very randomly, you will get a collision that will allow traffic through in the opposite direction you might expect as long as the state is up matching the ID and IPs.

jonnytabpni

No problem, I'll upgrade to the latest snapshot.

As for ICMP state, I don't think it's related to ping at all. I did originally have the same theory as cmb, however I feel it's an issue with the arp table, as once I flush the Arp Cache, all services (I've tested with SSH) work for a while (Maybe 5 - 10 minutes or so). I think for those few minutes, pfsense thinks that traffic coming from Server Z originates on the WAN interface, and uses the WAN rules, instead of correctly using the PUBLIC rules. Remember that WAN and PUBLIC are bridged together which makes this theory possible.

Also, the O/S being used is CentOS, and I don't think I've ever experienced this problem before with Ping IDs?

What I will do tonight (after upgrade to latest snapshot), is to take ICMP out of the equation and test using just SSH. I will try and SSH into server X from Server Z. Just to confirm, Server Z will be behind the PUBLIC interface who only rule will be "block all" (However the WAN tab will allow SSH access to X).

Surely pfsense can't be classed as safe if it suddenly opened all ports, just because the arp cache was flushed?

scoop

Maybe I'm stating the obvious, but can you rule out the possibility of a network loop between PUBLIC and WAN? I.e. did you look with tcpdump to double check if traffic indeed is coming in from the interface PUBLIC?

jonnytabpni

@scoop:

Maybe I'm stating the obvious, but can you rule out the possibility of a network loop between PUBLIC and WAN? I.e. did you look with tcpdump to double check if traffic indeed is coming in from the interface PUBLIC?

I can indeed rule this out. This setup was done on a Xen box, however the WAN interface was a physical PCI NIC passed through to pfsense, while the PUBLIC NIC was a virtual NIC, so it would be impossible to have a loop.

Also, I did indeed check tcpdump on the pfsense box to confirm that the packets were indeed entering/leaving the "PUBLIC" interface (This was when I realised that there was a real problem and decided to report on the forums).

I also made sure that there wern't any MAC address conflicts

cmb

@jonnytabpni:

Surely pfsense can't be classed as safe if it suddenly opened all ports, just because the arp cache was flushed?

It won't, ever, under any circumstances, do that. ARP has no impact at all on filtering, that indicates you have a loop or some other path where systems can communicate without the firewall.

@jonnytabpni:

I can indeed rule this out. This setup was done on a Xen box, however the WAN interface was a physical PCI NIC passed through to pfsense, while the PUBLIC NIC was a virtual NIC, so it would be impossible to have a loop.

Not impossible at all, lots of ways to get such a scenario where that traffic isn't actually going through the firewall, bridging the NICs among others.

jonnytabpni

@cmb:

@jonnytabpni:

Surely pfsense can't be classed as safe if it suddenly opened all ports, just because the arp cache was flushed?

It won't, ever, under any circumstances, do that. ARP has no impact at all on filtering, that indicates you have a loop or some other path where systems can communicate without the firewall.

@jonnytabpni:

I can indeed rule this out. This setup was done on a Xen box, however the WAN interface was a physical PCI NIC passed through to pfsense, while the PUBLIC NIC was a virtual NIC, so it would be impossible to have a loop.

Not impossible at all, lots of ways to get such a scenario where that traffic isn't actually going through the firewall, bridging the NICs among others.

I'm not suggesting that it's the filtering that going wrong. I'm suggesting that pfSense thinks that the traffic is coming from WAN instead of PUBLIC, so it uses WAN's rules instead of PUBLIC's.

I can assure you that there is no loop. Tcpdump has confirmed this for me.

Anyway, I'll will install the latest snapshot on a bare metal box and use a couple of laptops connected directly to the pfsense machine to test and get back to you.

jonnytabpni

It's also very reproducable. All I have to do is reset the pfsense ARP cache and reset states, then for about 5 minutes, hosts connected to PUBLIC will use WAN's rules.

jonnytabpni

I can confirm that this is indeed a bug, as it happens with my clean test bed.

Here is the test bed:

Latest pfsense 2.0 snapshot. A brand new server with 2 NICs. One is configure as WAN, the other as LAN. I gave the WAN interface an IP, and gave LAN no ip. I then bridged the 2 interfaces together.

The only rule on the WAN tab was "allow all"
The only rule in the LAN was block all.

I connected a PC to the WAN interface, to be used to access the WebGUI. I connected a host to the LAN interface. Once I reset my arp cache and reset my state table, the host connected to the LAN interface can ping the WAN ip of pfsense for about 5 minutes. This is concurs with my results on my Xen system.

Should I file a bug report now? Is there anything else you would like me to do?

Thanks

jonnytabpni

Should I post this to Redmine?

jimp

The output of the following commands from the shell would also help:

# ifconfig -a

# cat /tmp/rules.debug