Our Sites become unavailable randomly

cmb

What could this be? And how can this happen on two different boxes?

I think there is something wrong with 2.2.2

Sounds a lot like the symptoms of an IP or MAC conflict, though could be any number of other problems. It's most definitely not a general problem with 2.2.2.

Where rebooting fixes something with symptoms along these lines it's most often because of what rebooting does to the switch(es) and/or router(s) the system is connected to (updating CAM and ARP tables), and nothing to do with actually rebooting.

If it happens again, packet capture on WAN filtering on one of the affected public IPs and try to reach one of the sites in question. Stop the capture, see if anything is actually getting there. Are the WAN IPs dropping, or only the CARP IPs?

If you're using the common VHIDs 1, 2, 3 etc. on your CARP IPs, I would change those to something significantly higher in the range. VHID determines the virtual MAC and VRRP uses same virtual MAC space. It's possible your provider brought up VRRP using conflicting VHIDs, or you have something else on your network running CARP or VRRP with the same VHID/VRID causing a MAC conflict. Rebooting would temporarily make that system "win back" the MAC in question with the WAN-side switch, but would lose it again at some point.

firewalluser

@wheemer:

Well I am not going to replace them since I have no budget for that.

There's no way it was a hardware failure on both boxes at the same time.

I always update PFsense… I mean why would there be a built in updater if it's not good to use it?

I think the issue may be related to DNS, we have our DNS on windows boxes and PFSense just passes it through.

I was hoping there might be something in the UI I could look for but after checking the logs I can't see to find anything.

Actually you would be surprised at how common it is especially when considering how batches of electronics are made and so having two identical machines ie a small batch exposes you to the same batch of ram chip's batch of cpu's, batch of psu's, and batch of HDD's.

I have reinstalled from scratch 2.2.2 on my primary box. So far so good, it's been an hour and it's working ok. We will see if it was some problem with an update, time will tell.

Thanks

One of my first thoughts would be your machine may have been compromised. Lets face it who virus checks their firewalls/routers?
http://krebsonsecurity.com/2015/01/lizard-stresser-runs-on-hacked-home-routers/

I'd also suggest rebooting the pfsense boxes after making any config changes just to be sure everything sticks properlys and conflicts dont arise just to be doubly sure as theres a bug which is fixed in 2.2.3 which might have implications for your setup.

wheemer

I setup a different server with a clean install of 2.2.2, and imported my config. Everything was working fine over the weekend, however Monday it went down again. The strange thing is that I am always still able to remote desktop in through the box.

So some parts of PFSense must not be affected. Also sometimes our webserver sites are offline, yet our email servers webmail works.

Again, please keep in mind this configuration was working for a couple years without issue.

wheemer

Our network team from our Fiber is saying that we are not under a denial of service attack… He says everything looks fine and that there is not that much traffic at all.

wheemer

Our website just went down again.

Our network teams says there are 55 connections to port 53, our dns server from Russia.

I have PFBlockerNG enabled where I am blocking all of russia and all of china.

Could this be related to our issue?

firewalluser

What do you logs show? Have you packet captured yet? If so, have you tried some of the packets against a test webserver or firewall?

wheemer

I could not see anything in the logs, which makes sense since they should be denied.

Our provider has blocked the IP address and everything is back to normal.

I do not understand why our PFsense was able to be broken like that though. Seems a little bit unreliable that something as simple as DNS traffic can take down our whole website.

tim.mcmanus

@wheemer:

I could not see anything in the logs, which makes sense since they should be denied.

Our provider has blocked the IP address and everything is back to normal.

I do not understand why our PFsense was able to be broken like that though. Seems a little bit unreliable that something as simple as DNS traffic can take down our whole website.

Poorly configured DNS servers are a main source for DDOS attacks. I won't go into specifics, Google will give you some good reading, but someone can send a DNS query to your DNS server which generates a large response to the "target".

I would advise against exposing a DNS server to the internet unless you absolutely need to and deeply understand how to configure it. IMHO, block port 53 from the WAN and everything should be good.

mer

Tim, you mean "block inbound to port 53 on WAN if it was not generated by LAN", yes?

tim.mcmanus

@mer:

Tim, you mean "block inbound to port 53 on WAN if it was not generated by LAN", yes?

Yeah, that makes sense.

I run DNS internally but that bind server also does root queries externally. No external port 53 access to it (block inbound to port 53 on WAN if it was not generated by LAN). Script kiddies are always on the lookout for a misconfigured service.

wheemer

It's pretty vague to say poorly configured without saying what you mean exactly.

We need port 53 open because we host our external dns.

We have recursion disabled and places like intodns.com say our dns is fine.

wheemer

Also our DNS is running from windows 2012 r2 with all updates.

So all PFSense has to do is pass the packets through, yet it still tanks.

johnpoz

"We need port 53 open because we host our external dns."

I have to say this is normally a BAD idea - where is your secondary nameserver, on the same network? Its almost always better to host your dns with a dns service, your registrar, your webhost. NS should be geographically separated on different netblocks for redundancy, etc..

If you really want to host your own dns - couple of cheap vps can do this nicely. I have a couple of 6$ a year low end vpses that I run name services off of for domain I was playing with dnssec with since its a shame how many dns services or registrars or even webhosts that don't support dnssec. Even though pretty sure a few years back it became a requirement for registrars to support it to be accredited.

So there were 55 connections from site you were trying to block.. Was the IP listed in the tables to be blocked? Did you do a sniff to see what they were doing? You say your websites when down - where they down because you could not resolve them from the outside or were they down because of a bandwidth issue with the 55 connections to your dns?

Next time it happens, need some actual details, post up your domain and can do a dns query to see if stuff resolves from outside. Do a sniff are you seeing dns traffic to your ns and are they answering. Are these sites hosted locally as well?

tim.mcmanus

@wheemer:

It's pretty vague to say poorly configured without saying what you mean exactly.

We need port 53 open because we host our external dns.

We have recursion disabled and places like intodns.com say our dns is fine.

This is why: http://www.circleid.com/posts/20150415_dns_based_ddos_diverse_options_for_attackers/

It doesn't take a lot to use your DNS servers as a contributor to a larger attack.

Your systems could be patched to the teeth, but if you've misconfigured your DNS servers, it doesn't matter. Traffic will look legit because it seems to be, but the responses from your DNS servers will be inappropriate.

wheemer

I am pretty sure my DNS servers are configured just fine, like I said. The attack had zero affect on my actual Windows 2012 R2 VM.

Although I appreciate the DNS info and I will ultimately probably move DNS offsite, this is taking away from the obvious problems I experienced with PFSense.

First of all with PFBlockerNG configured to block all of Russia, I believe the traffic should have been entirely blocked to begin with.

Also since the DNS attack was not ending within PFSense but being forwarded to our VM, I am unsure how this was able to get PFSense to block DNS queries entirely.

tim.mcmanus

Do you have a subscription for iblocklist or another provider to update pfBlocker? It generally works, but there isn't a bullet-proof way to say for sure that an IP is originating from a specific geographic location or not. Buying a subscription to a blocklist will help, but nothing it for sure.

Have you installed a packet sniffer to collect data from a mirrored port in front of and behind pfSense?

I'm not questioning your technical abilities, but check out the link I posted. Major ISPs with what they thought were properly configured DNS servers were being used as DDOS traffic sources. DNS is an ancient protocol in the scheme of things, and it can very easily be used as a source for DDOS. In a similar fashion, the same applies to SMTP servers. Remember it was only until a few years ago that you'd get a bounce message from a mail server that didn't have a correct address. Since then any mail server providing that kind of response is blacklisted because those responses were a great form of DDOS. The same applies to DNS. I can query your servers in such a way that they provide large responses to a destination IP address. There's no way to validate that the incoming request is authentic, so your server will readily respond to the request.

Scan your site to see if you are configured properly: http://openresolverproject.org

wheemer

Like I said I have recursion disabled so I am not an open resolver.

However I will be moving DNS offsite, so that will be someone else's problem soon.

So at this point I am considering the DNS issue dead.

However I would like to have a clue about how passing 55 DNS connections though PFSense brings it down.

When configuring PFBlockerNG it did not say a subscription to a blocklist is required at all. I simply chose Russia from the top 20 list. If this doesn't work then why even have the package available. PFBlockerNG was one of the main reasons I have been recommending PFSense to people a lot over the last couple years.

Also Carp seems to not even see the problem and it never auto switched.

Having two PFSense boxes in Carp had previously given me a lot of confidence in the uptime and security of our network. However seeing that it can be brought down by a small attack from a single IP is disheartening to say the least.

johnpoz

Who said that pfbng table for russia include the IP(s) that were connecting to you? Did you validate that the IP was in the table/list? If they were then they would of been blocked. If the rule was active.

Even non recursive ns can have issues if specific queries are sent to them.. Maybe those connections were designed to take out your dns. Without a sniff its hard to say what they were doing or attempting to do or why you had issues.

Again why was your site down - was your ns not responding to public queries? Or you just had reports that your site was down from someone that was maybe also blocked by pfbng?

wheemer

The site was down because PFSense stops allowing the queries through.

A reboot of the PFsense box immediately switches to the backup box and the website is online again. Once the main box actually comes back up the website remains available, until the next time DNS queries are blocked.

Our ISP told us a couple details about the attach. They said it was from Russia…

Unfortunately I am a networking Noob, I will admit it. I have never done a wire shark capture.

For the mean time the problem is solved. As soon as the IP was blocked on my ISPs side everything began working as expected.

I just want to learn so that our internally hosted sites stay live as much as possible.

tim.mcmanus

Nothing in network security is "set and forget". For production sites, subscriptions to security services are strongly recommended. Additionally, you may want to put a security appliance in front of pfSense. Hosting your own sites comes with its challenges, and that's why hosting providers and colo's still exist. They may be able to make security investments that you cannot, and that's just a sign of how complex and aggressive internet-based attacks have become.