Issues after upgrade

PARC

Hello all,

Thanks for taking the time to look over this thread. I'm not sure which area of the forum this thread belongs, so admins, please move this thread if you think it necessary.

We have been using pfSense happily for years, but our incarnation was starting to age and a refresh was needed. We purchased some new hardware, installed a newer version of pfSense and implemented the new machine this past weekend. Previously, we were running 1.2.3-RELEASE and are now running 2.0.3-RELEASE. We did our best to migrate all of the settings from the previous version to the new one.

While the pfSense machine acts as our primary gateway/firewall, we also have another security appliance attached to the network that provides a site-to-site tunnel between two datacenters. This operates independently of the pfSense machine, the idea being that if the pfSense machine went down, there would still be connectivity between the two datacenters.

Traffic to the other side of the tunnel is provided by adding a static route in pfSense. Traffic bound for the other network is directed to the local IP of the security appliance via a gateway associated with the particular switch the pfSense interface is connected to. This worked flawlessly in 1.2.3: all network services on both sides of the tunnel were able to function properly using this setup.

However, this doesn't seem to be the case with 2.0.3. While IPs on either side of the tunnel are pingable, servers on the other side of the tunnel are having issues: AD replication is failing between the domain controllers on either side of the tunnel, and connectivity issues persist: for example, opening a telnet session on a server on the other side of the tunnel to port 25 of a mail server on the pfSense side of the tunnel usually fails, but then connects intermittently after a ping (???). However, after connecting, the connection is soon dropped. Because of this, email notifications are down, which of course is a fairly significant issue.

RDP sessions to servers on the other side of the tunnel were being dropped at random, but this seems to be resolved after enabling the "Bypass firewall rules for traffic on the same interface " setting, which was missed when configuring 2.0.3 originally (this was enabled in 1.2.3). I was hoping this would resolve the issue, but it has not.

Since the connectivity issues on the other side of the tunnel are intermittent- i.e. telnet sessions sometimes work, but sometimes don't, and after a connection time out at random- this is a really tough issue to track down.

We'd welcome anyone with some stripes in the pfSense world to make some suggestions as to where to start troubleshooting this issue. If more details are required, we'd be happy to provide them.

Many thanks,
Greg

SeventhSon

Maybe a weird question: did you try to load a (virtual)box with 1.2.3 with your config and upgrade to 2.0.3? You're manually reconfiguring it seems, makes it easy to miss something…

PARC

Hi SeventhSon, thanks for the reply.

We actually installed 2.0.3 on a physical box that we plunked a few network cards into, just to familiarize ourselves with the new layout and to try some of the features. But we elected to migrate the settings manually since it gave us a chance to review each setting.

We still have the previous hardware running 1.2.3, and could back up this configuration, then try upgrading to 2.0.3 and revert back to that hardware set to see if the problem persists.

But before we go that route, I'll give this thread some more time in case that this might be cause by a new feature or setting that someone can point us to…

Thanks,
Greg

SeventhSon

OK, fair enough, t'was just a question :)

You probably have to give use your firewall rules,subnets and routes in question, because otherwise it would be just guessing. A good way for debugging rules would be to turn on logging for certain rules, and see if they get hit.

Going from your description (here we go, this is the guessing part), you're not running a default allow rule? Could you test with just a default allow, and see if the routing/IP part of it works perfectly?

PARC

SeventhSon (and others who have had a look at this post), thanks so far for the effort to help resolve this.

As requested, some additional details. Let's call the pfSense side of the tunnel the HOME side, and the other side the PROD side. pfSense is the primary gateway/firewall for the HOME side of the equation. There are 4 externally facing interfaces, each of which consist of 2 connections to 2 separate ISPs. Internally, the primary network on the HOME side of the equation consists of a class b subnet, 10.10.0.0/16. There are 2 internal interfaces on the pfSense machine, each connecting to a switch. Let's call these LAN1 and LAN2.

The PROD side also has a class b subnet, 11.11.0.0/16 (Long after setting this up I realized it was actually a public block of IPs- my bad, rookie mistake- maybe this could be part of the problem?…) and a Cisco security appliance which plays the role of pfSense for that environment. It has an internal IP of 11.11.1.50, and there is a site-to-site VPN tunnel between HOME and PROD provided by a second Cisco security appliance on the HOME network, which is connected to the LAN2 switch and has an internal IP of 10.10.1.50.

The pfSense configuration has a static route set up so that outbound traffic for the 11.11.0.0/16 network are routed through the LAN2 interface via a gateway at 10.10.1.50. This provides connectivity over the site-to-site tunnel to servers on the other side of the tunnel. On the PROD side, traffic from the 11.11.0.0/16 network are routed back to the 10.10.0.0/16 network via 11.11.1.50. The objective of this setup was that if pfSense failed, the site-to-site tunnel would be preserved. This configuration worked perfectly with pfSense 1.2.3.

There are two Windows domain controllers on the HOME side, and one on the PROD side. Since the pfSense 2.0.3 gateway was put in place, replication to the PROD dc has failed, and servers on the PROD side are unable to (reliably) send mail messages, as the mail server resides on the HOME side of the network.

My understanding of this is that traffic from the PROD side destined for the HOME side will be routed via the 11.11.1.50 gateway, traverse the secure tunnel, and arrive at the LAN2 switch to which the Cisco security appliance is connected.

The firewall rules for the LAN2 interface are as follows:

Proto Source Port Destination Port Gateway Queue Schedule Description
TCP * * * * FailoverGateway none Allow LAN to Failover gateway

- - - - none Allow LAN to Any
        GRE * * * * FailoverGateway none PPTP GRE Outgoing

Since this switch is connected to the LAN2 interface on pfSense, any blocked traffic should show up in the logs, but there isn't a single log entry for anything being blocked on the LAN2 interface based on this configuration.

My understanding of this setup is that pfSense shouldn't even be part of the equation: incoming traffic from PROD should be considered as part of the internal network on the LAN2 switch and should be able to reach any IP in the HOME network without pfSense's handling or scrutiny... but this assumption is obviously wrong.

As mentioned in my original post, the worst part of this issue is that it is intermittent. RDP connections from HOME to PROD are occasionally dropped. Creating a telnet session on a server on the PROD side to the mail server on the HOME side on port 25 is unlikely to connect until the mail server is pinged, and then once connected after that, the connection will be intermittently dropped.

Again, thanks for reading, and any suggestions you can offer. If there are any more details I can provide, by all means let me know.

Regards,
Greg

PARC

Just as an experiment, I reverted to my previous hardware running pfSense 1.2.3. As soon as I plugged in the cables, the DC's resynchronized, I was able to RDP without issue, and was able to telnet to the mail server at HOME from a server at PROD without issue.

So, this is most certainly a pfSense issue. Tomorrow we will install pfSense 2.0.3 on the old hardware and see if we get the same behavior after an upgrade like 7th suggested.

PARC

After looking over our network diagram again, and doing some additional head scratching, I chatted with a colleague who confirmed my assumption that pfSense should have nothing to do with the incoming traffic from PROD that arrives on the LAN2 switch.

Since we've proved that pfSense is the issue, the only configuration setting in this regard is the static route. The route is configured on the interface that is connected to the same switch as the Cisco appliance. Here are the details of the route configuration:

11.11.0.0/16 Cisco - 10.10.1.50 LAN2 Tunnel to PROD

This was the same as it was on pfSense 1.2.3.

PARC

Hello all…

Just wanted to let you know that I've given up on pfSense for managing a static route from HOME to PROD. I removed the static route and gateway for it from pfSense, and set up routes manually on the servers at HOME that needed to route traffic to PROD. This worked immediately and resolved all my issues.

As far as I'm concerned the static route feature is broken in pfSense 2.0.3, at least per the configuration I have described above.

Thanks for reading.

Greg

SeventhSon

I'm using static routing on 2.0.3 like this, and it works just fine:

internet - pfsense 192.168.3.1 (static route to 4.1/24 via 3.201) - switch - computer (192.168.3.7 DG: 192.168.3.1, no static routes) 
                                                                           - pfsense (WAN 192.168.3.201) - LAN 192.168.4.1 - other clients

To me that sounds pretty similar (same?), and this works perfectly.

cmb

Static routes work fine. You probably have asymmetric routing in that case, so you have to have the bypass filtering option for static routes enabled under System>Advanced.