Our Sites become unavailable randomly



  • We have two older IBM servers running PFSense 2.2.2 in carp mode.

    These things have been rock solid for two years now.

    Recently we had a crash on out main box, so I submitted the crash dump via the webui.

    Since then this box will stop our websites from loading after a while, very frequently. It works after a reboot for maybe half an hour then all sites we host go down and are unreachable.

    I have since been forced to use "Enter Persistent Carp Maintenance Mode" so that our backup could take over.

    This worked fine for a few days. On Tuesday I began setting up another PFSense box which has been taking me a while…

    Starting yesterday morning at 11:30 am our secondary box began blocking our websites in the same manner as before. A reboot fixed it until this morning, where yet again our websites were all offline.

    I'm a major noob with firewalls, however I was able to have PFSense working absolutely perfectly for two years. And now all of a sudden both my boxes die all the time.

    What could this be? And how can this happen on two different boxes?

    I think there is something wrong with 2.2.2



  • What could it be?  Hard to say without anyone seeing the configuration, rules, packages running, logs, etc.
    You say "older IBM servers", so a possible "what could it be" is "hardware failure".

    But again, it's really hard to diagnose anything without any information.  Think of what happens when you say "My car won't start.  Why?"



  • Speaking from my own experience, I seldom tend to do in-place upgrades on old hardware. Rather, I'll choose newer hardware to install an updated version and transfer my configs over. I ran an update once on some old Dell blades and they wouldn't even boot up after that! You say you've had these firewalls in place for two years, yet 2.2.2 has only been out a short while. I assume from this you've performed an in-place update then? Might be worth sourcing some newer tin and see if that solves your problem.

    Otherwise, as mer says, it really could be almost anything given the information provided.



  • Well I am not going to replace them since I have no budget for that.

    There's no way it was a hardware failure on both boxes at the same time.

    I always update PFsense… I mean why would there be a built in updater if it's not good to use it?

    I think the issue may be related to DNS, we have our DNS on windows boxes and PFSense just passes it through.

    I was hoping there might be something in the UI I could look for but after checking the logs I can't see to find anything.



  • I have reinstalled from scratch 2.2.2 on my primary box. So far so good, it's been an hour and it's working ok. We will see if it was some problem with an update, time will tell.

    Thanks



  • @muswellhillbilly:

    Speaking from my own experience, I seldom tend to do in-place upgrades on old hardware. Rather, I'll choose newer hardware to install an updated version and transfer my configs over. I ran an update once on some old Dell blades and they wouldn't even boot up after that! You say you've had these firewalls in place for two years, yet 2.2.2 has only been out a short while. I assume from this you've performed an in-place update then? Might be worth sourcing some newer tin and see if that solves your problem.

    Otherwise, as mer says, it really could be almost anything given the information provided.

    Usually an issue when your hardware does not support modern standards, AHCI, MSI-X, UEFI, ACPI, etc. These standards have been around for a long time, but many places still sell hardware that does not. Make sure your hardware supports stuff like these and you'll be good for a long time.

    I always research my hardware before purchases and I haven't had any issues in 15+ years. Everything just works. Not to say these standards were around back then, but I always make sure I know what I'm buying to make sure it's as good as it can possibly be.



  • My backup box has been reinstalled as well…

    I have had no issue on the primary box yet.

    So far so good.



  • If it happens again you might want to look into possibility the system being compromised or target of a DoS attack.



  • @wheemer:

    What could this be? And how can this happen on two different boxes?

    I think there is something wrong with 2.2.2

    Sounds a lot like the symptoms of an IP or MAC conflict, though could be any number of other problems. It's most definitely not a general problem with 2.2.2.

    Where rebooting fixes something with symptoms along these lines it's most often because of what rebooting does to the switch(es) and/or router(s) the system is connected to (updating CAM and ARP tables), and nothing to do with actually rebooting.

    If it happens again, packet capture on WAN filtering on one of the affected public IPs and try to reach one of the sites in question. Stop the capture, see if anything is actually getting there. Are the WAN IPs dropping, or only the CARP IPs?

    If you're using the common VHIDs 1, 2, 3 etc. on your CARP IPs, I would change those to something significantly higher in the range. VHID determines the virtual MAC and VRRP uses same virtual MAC space. It's possible your provider brought up VRRP using conflicting VHIDs, or you have something else on your network running CARP or VRRP with the same VHID/VRID causing a MAC conflict. Rebooting would temporarily make that system "win back" the MAC in question with the WAN-side switch, but would lose it again at some point.



  • @wheemer:

    Well I am not going to replace them since I have no budget for that.

    There's no way it was a hardware failure on both boxes at the same time.

    I always update PFsense… I mean why would there be a built in updater if it's not good to use it?

    I think the issue may be related to DNS, we have our DNS on windows boxes and PFSense just passes it through.

    I was hoping there might be something in the UI I could look for but after checking the logs I can't see to find anything.

    Actually you would be surprised at how common it is especially when considering how batches of electronics are made and so having two identical machines ie a small batch exposes you to the same batch of ram chip's batch of cpu's, batch of psu's, and batch of HDD's.

    I have reinstalled from scratch 2.2.2 on my primary box. So far so good, it's been an hour and it's working ok. We will see if it was some problem with an update, time will tell.

    Thanks

    One of my first thoughts would be your machine may have been compromised. Lets face it who virus checks their firewalls/routers?
    http://krebsonsecurity.com/2015/01/lizard-stresser-runs-on-hacked-home-routers/

    I'd also suggest rebooting the pfsense boxes after making any config changes just to be sure everything sticks properlys and conflicts dont arise just to be doubly sure as theres a bug which is fixed in 2.2.3 which might have implications for your setup.



  • I setup a different server with a clean install of 2.2.2, and imported my config. Everything was working fine over the weekend, however Monday it went down again. The strange thing is that I am always still able to remote desktop in through the box.

    So some parts of PFSense must not be affected. Also sometimes our webserver sites are offline, yet our email servers webmail works.

    Again, please keep in mind this configuration was working for a couple years without issue.



  • Our network team from our Fiber is saying that we are not under a denial of service attack… He says everything looks fine and that there is not that much traffic at all.



  • Our website just went down again.

    Our network teams says there are 55 connections to port 53, our dns server from Russia.

    I have PFBlockerNG enabled where I am blocking all of russia and all of china.

    Could this be related to our issue?



  • What do you logs show? Have you packet captured yet? If so, have you tried some of the packets against a test webserver or firewall?



  • I could not see anything in the logs, which makes sense since they should be denied.

    Our provider has blocked the IP address and everything is back to normal.

    I do not understand why our PFsense was able to be broken like that though. Seems a little bit unreliable that something as simple as DNS traffic can take down our whole website.



  • @wheemer:

    I could not see anything in the logs, which makes sense since they should be denied.

    Our provider has blocked the IP address and everything is back to normal.

    I do not understand why our PFsense was able to be broken like that though. Seems a little bit unreliable that something as simple as DNS traffic can take down our whole website.

    Poorly configured DNS servers are a main source for DDOS attacks.  I won't go into specifics, Google will give you some good reading, but someone can send a DNS query to your DNS server which generates a large response to the "target".

    I would advise against exposing a DNS server to the internet unless you absolutely need to and deeply understand how to configure it.  IMHO, block port 53 from the WAN and everything should be good.



  • Tim, you mean "block inbound to port 53 on WAN if it was not generated by LAN", yes?



  • @mer:

    Tim, you mean "block inbound to port 53 on WAN if it was not generated by LAN", yes?

    Yeah, that makes sense.

    I run DNS internally but that bind server also does root queries externally.  No external port 53 access to it (block inbound to port 53 on WAN if it was not generated by LAN).  Script kiddies are always on the lookout for a misconfigured service.



  • It's pretty vague to say poorly configured without saying what you mean exactly.

    We need port 53 open because we host our external dns.

    We have recursion disabled and places like intodns.com say our dns is fine.



  • Also our DNS is running from windows 2012 r2 with all updates.

    So all PFSense has to do is pass the packets through, yet it still tanks.


  • LAYER 8 Global Moderator

    "We need port 53 open because we host our external dns."

    I have to say this is normally a BAD idea - where is your secondary nameserver, on the same network?  Its almost always better to host your dns with a dns service, your registrar, your webhost.  NS should be geographically separated on different netblocks for redundancy, etc..

    If you really want to host your own dns - couple of cheap vps can do this nicely.  I have a couple of 6$ a year low end vpses that I run name services off of for domain I was playing with dnssec with since its a shame how many dns services or registrars or even webhosts that don't support dnssec.  Even though pretty sure a few years back it became a requirement for registrars to support it to be accredited.

    So there were 55 connections from site you were trying to block.. Was the IP listed in the tables to be blocked?  Did you do a sniff to see what they were doing?  You say your websites when down - where they down because you could not resolve them from the outside or were they down because of a bandwidth issue with the 55 connections to your dns?

    Next time it happens, need some actual details, post up your domain and can do a dns query to see if stuff resolves from outside.  Do a sniff are you seeing dns traffic to your ns and are they answering.  Are these sites hosted locally as well?



  • @wheemer:

    It's pretty vague to say poorly configured without saying what you mean exactly.

    We need port 53 open because we host our external dns.

    We have recursion disabled and places like intodns.com say our dns is fine.

    This is why:  http://www.circleid.com/posts/20150415_dns_based_ddos_diverse_options_for_attackers/

    It doesn't take a lot to use your DNS servers as a contributor to a larger attack.

    Your systems could be patched to the teeth, but if you've misconfigured your DNS servers, it doesn't matter.  Traffic will look legit because it seems to be, but the responses from your DNS servers will be inappropriate.



  • I am pretty sure my DNS servers are configured just fine, like I said. The attack had zero affect on my actual Windows 2012 R2 VM.

    Although I appreciate the DNS info and I will ultimately probably move DNS offsite, this is taking away from the obvious problems I experienced with PFSense.

    First of all with PFBlockerNG configured to block all of Russia, I believe the traffic should have been entirely blocked to begin with.

    Also since the DNS attack was not ending within PFSense but being forwarded to our VM, I am unsure how this was able to get PFSense to block DNS queries entirely.



  • Do you have a subscription for iblocklist or another provider to update pfBlocker?  It generally works, but there isn't a bullet-proof way to say for sure that an IP is originating from a specific geographic location or not.  Buying a subscription to a blocklist will help, but nothing it for sure.

    Have you installed a packet sniffer to collect data from a mirrored port in front of and behind pfSense?

    I'm not questioning your technical abilities, but check out the link I posted.  Major ISPs with what they thought were properly configured DNS servers were being used as DDOS traffic sources.  DNS is an ancient protocol in the scheme of things, and it can very easily be used as a source for DDOS.  In a similar fashion, the same applies to SMTP servers.  Remember it was only until a few years ago that you'd get a bounce message from a mail server that didn't have a correct address.  Since then any mail server providing that kind of response is blacklisted because those responses were a great form of DDOS.  The same applies to DNS.  I can query your servers in such a way that they provide large responses to a destination IP address.  There's no way to validate that the incoming request is authentic, so your server will readily respond to the request.

    Scan your site to see if you are configured properly:  http://openresolverproject.org



  • Like I said I have recursion disabled so I am not an open resolver.

    However I will be moving DNS offsite, so that will be someone else's problem soon.

    So at this point I am considering the DNS issue dead.

    However I would like to have a clue about how passing 55 DNS connections though PFSense brings it down.

    When configuring PFBlockerNG it did not say a subscription to a blocklist is required at all. I simply chose Russia from the top 20 list. If this doesn't work then why even have the package available. PFBlockerNG was one of the main reasons I have been recommending PFSense to people a lot over the last couple years.

    Also Carp seems to not even see the problem and it never auto switched.

    Having two PFSense boxes in Carp had previously given me a lot of confidence in the uptime and security of our network. However seeing that it can be brought down by a small attack from a single IP is disheartening to say the least.


  • LAYER 8 Global Moderator

    Who said that pfbng table for russia include the IP(s) that were connecting to you?  Did you validate that the IP was in the table/list?  If they were then they would of been blocked. If the rule was active.

    Even non recursive ns can have issues if specific queries are sent to them.. Maybe those connections were designed to take out your dns.  Without a sniff its hard to say what they were doing or attempting to do or why you had issues.

    Again why was your site down - was your ns not responding to public queries?  Or you just had reports that your site was down from someone that was maybe also blocked by pfbng?



  • The site was down because PFSense stops allowing the queries through.

    A reboot of the PFsense box immediately switches to the backup box and the website is online again. Once the main box actually comes back up the website remains available, until the next time DNS queries are blocked.

    Our ISP told us a couple details about the attach. They said it was from Russia…

    Unfortunately I am a networking Noob, I will admit it. I have never done a wire shark capture.

    For the mean time the problem is solved. As soon as the IP was blocked on my ISPs side everything began working as expected.

    I just want to learn so that our internally hosted sites stay live as much as possible.



  • Nothing in network security is "set and forget".  For production sites, subscriptions to security services are strongly recommended.  Additionally, you may want to put a security appliance in front of pfSense.  Hosting your own sites comes with its challenges, and that's why hosting providers and colo's still exist.  They may be able to make security investments that you cannot, and that's just a sign of how complex and aggressive internet-based attacks have become.



  • Lol, a security appliance in front of what we already consider to be our security appliance? Isn't that what PFSense is already, a firewall?

    All this discussion is good, don't get me wrong, but it is all distracting from the real issue I was having.



  • @wheemer:

    Lol, a security appliance in front of what we already consider to be our security appliance? Isn't that what PFSense is already, a firewall?

    All this discussion is good, don't get me wrong, but it is all distracting from the real issue I was having.

    Unless you packet sniff, you won't be able to determine the root cause of the issue.

    Additionally, yes, security is done in layers, not with one device.  pfSense is a firewall and router, not a security appliance.  That is a completely different beast.

    For comparison, I have a Palo Alto appliance and an F5 in front of pfSense in our production data center.  Whatever gets past the Palo Alto hits the F5, and if it gets past that, then it gets to pfSense.  There are a few other things doing real-time packet capture and analysis in parallel too so I can acutely analyze attacks as they are modified by the attackers.

    Even with all of that in place, it's still a challenge to fend off a determined attacker, and at times I need to escalate to the ISP to stop the attacks further up the pipe.

    As I stated before, pfSense is not "set and forget" security.  Nothing about network security is that way, and it's always a PITA to try to stay ahead of the meanies.



  • @wheemer:

    Lol, a security appliance in front of what we already consider to be our security appliance? Isn't that what PFSense is already, a firewall?

    All this discussion is good, don't get me wrong, but it is all distracting from the real issue I was having.

    Seems to be a common tact.  People offer other solutions rather then explain why pfSense doesn't measure up to the task.

    There is another thread here where it's been demonstrated that pfSense can be knocked-out with by a SYN "flood" of only a few mbps.  One of the typical responses was; DDoS should be mitigated upstream.  LOL.

    Apparently some people just can't face the fact that pfSense seems to have some serious issues, just because their usage model doesn't seem to be affected (at the moment).



  • @NOYB:

    @wheemer:

    Lol, a security appliance in front of what we already consider to be our security appliance? Isn't that what PFSense is already, a firewall?

    All this discussion is good, don't get me wrong, but it is all distracting from the real issue I was having.

    Seems to be a common tact.  People offer other solutions rather then explain why pfSense doesn't measure up to the task.

    There is another thread here where it's been demonstrated that pfSense can be knocked-out with by a SYN "flood" of only a few mbps.  One of the typical responses was; DDoS should be mitigated upstream.  LOL.

    Apparently some people just can't face the fact that pfSense seems to have some serious issues, just because their usage model doesn't seem to be affected (at the moment).

    If you think pfSense is a security appliance, you're very misguided.

    Using a stateful packet inspection device to mitigate a DDOS SYN flood–or any DDOS for that matter--is, well, a gross misunderstanding of what a security appliance is as opposed to what pfSense is.  DDOS, by design, is a capacity overflow attack.  That capacity could be bandwidth, states, or anything else.



  • @tim.mcmanus:

    If you think pfSense is a security appliance, you're very misguided.

    Don't tell me.  I never claimed pfSense is a security appliance. Tell these people.  https://www.pfsense.org/

    "Open Source Security"

    "We make network security easy."

    "Providing comprehensive network security solutions for the enterprise, large business and SOHO, pfSense solutions bring together the most advanced technology available to make protecting your network easier than ever before. Our products are built on the most reliable platforms and are engineered to provide the highest levels of performance, stability and confidence."

    @tim.mcmanus:

    Using a stateful packet inspection device to mitigate a DDOS SYN flood–or any DDOS for that matter--is, well, a gross misunderstanding of what a security appliance is as opposed to what pfSense is.  DDOS, by design, is a capacity overflow attack.  That capacity could be bandwidth, states, or anything else.

    Regardless of ones philosophy on where and how DDOS attacks should be mitigated, as little as a few mbps of any traffic should not take down a modern day firewall.  If it does then it is flawed.  And to simply say it's not the proper place, device, etc. to mitigate is just making excuses.



  • @NOYB:

    Regardless of ones philosophy on where and how DDOS attacks should be mitigated, as little as a few mbps of any traffic should not take down a modern day firewall.  If it does then it is flawed.  And to simply say it's not the proper place, device, etc. to mitigate is just making excuses.

    You cannot mitigate DDOS with a stateful packet inspection device, period.  That's what PF is.  It's the wrong device to do the job.

    A stateless packet device acting as a firewall can.  That's why there's a market for $300K Palo Alto devices.  You get what you pay for.



  • Actually, I'll admit I'm wrong.  pfSense is a security appliance.  That was my mistake.

    The other mistake I made was classifying a DDOS as a security issue, which it is not.



  • @tim.mcmanus:

    You cannot mitigate DDOS with a stateful packet inspection device, period.  That's what PF is.  It's the wrong device to do the job.

    Which misses the point.  A few mbps of any traffic, regardless of whether or not is a DDOS or otherwise, should not take down a modern firewall.  If it does then it is flawed.



  • You can't bring down a system with 55 DNS requests. Maybe if you're running Snort with auto-blocking and haven't whitelisted your own IPs, and the traffic triggers a signature in such a way that it ends up adding your own IPs to the block list. That's another explanation where a reboot would fix things, since it clears Snort's blocking table. Do you have Snort installed?

    Only 55 connections almost certainly isn't an open resolver. Those usually rack up hundreds of thousands or more on fast connections, completely filling your upload to the extent your connection will be nearly unusable (the fun of UDP floods). Your datacenter would have said something if you were using far more bandwidth than usual (at least if it's a worthwhile DC). I also have doubts those 55 connections even have a relation to the problem, though it's possible. Did you get packet captures of what they were doing?

    You also can't bring down a system with a few Mbps DDoS IF it's sized and configured accordingly to handle that kind of resource exhaustion attack.



  • We are a small company with not that much traffic at all.

    I assure you that it was indeed the 55 udp connection that where routed to our Webserver which also runs our DNS. As soon as my ISP blocked the IP on their end the problem has not returned.

    I am not running many packages, just: mailreport, OpenVPN Client Export Utility, pfBlockerNG, RRD Summary and Service Watchdog.

    I did not even get details about the actual attacker. My ISP said they were from Russia and I just assumed then that it if he saw immediately that it was Russian then the IP must show up on a russian country blocklist. I assume you that pfBlockerNG is enabled and Russia and China from the top 20 list are the only two selected.


  • LAYER 8 Global Moderator

    And what was the IP or IPs?  Go to diag, tables - and pick your table that you created under country.. I assume it was ipv4 and not ipv6, etc.

    So attached is very small sample, and where was the firewall rule using this?

    This data is gotten from "Geolite Data by Maxmind Inc. - ISO 3166" Does not mean it is 100% inclusive of every single netblock that is russia, or maybe it wasn't russia and that is just what your isp said, maybe it was the Ukraine or something like that.  Without the IP in question and validation that it was in your table to block, and that the table was used correctly in a rule, etc..

    As to 55 connections taking him down..  Would depend on many factors - what that query was exactly, what version dns is he actually running?  A misconfiguration in say rate limiting could shoot you in the foot..  You go no details to work with here at all to why his site was down..

    Lets see this firewall rule that is blocking russia and china, and lets see the IP of this so called 55 connection attacker, etc.




  • I have no special settings at all (not using rate limiting), just a basic setup.

    I have not created a table… Pretty sure pfBlockerNG does this on it's own. Also the rules are automatically created by pfBlockerNG.

    I already stated DNS is running from Windows 2012 R2.


Log in to reply