Our Sites become unavailable randomly
-
Strangely this morning we had both websites go down again, with "the connection has been reset" error.
During this time I did check DNS and it was responding no problem. It seems as if port 80 was not making it through PFSense.
I tried to enter Persistent Carp Maintenance Mode, however our sites still would not load.
After turning off PCMM and rebooting the first firewall it came back up immediately.
-
So lets see your firewall rules, and lets see sniffing on wan and lan that traffic is forwarded.
Your seeing nothing in pfsense logs any errors? anything? Nothing in firewall logs that traffic was blocked?
So your mail was working while your sites where down?
So from the PM you sent I see 3 IPs public.156, .157 and .158 and .158 and .157 are your mx boxes.. Do you have 2 of them or is this all being served up off your 1 2k12 box? And your just forwarding multiple IPs to the same private IP behind pfsense?
-
I am not sure I am looking in the right place to find relevant info. Maybe I can give you access to PFSense directly.
I did not check our mail ports specifically. I just checked if dns was resolving and it was.
Yeah we have three VMs we are running DNS on. They are not pointed to the same windows server.
-
so that changed? Where you not before running dns off your 2k12 box?
if you pm me login info I will take a look see..
-
No I didn't change anything recently. DNS is running off three different win server 2k12 R2 VMs. Those servers also do other stuff (light load)
-
Our main website just went down again. The only thing in the log is:
lighttpd[25396]: (connections.c.305) SSL: 1 error:1408A0C1:SSL routines:SSL3_GET_CLIENT_HELLO:no shared cipher
The strange thing is that our WebMail site is loading fine on one IP, yet our main site is not loading on another.
It's like PFSense is selectively blocking only one port 80 rule.
-
I show the site up.. The domain you sent me loads fine.
-
Yeah that's because I rebooted both our PFSense boxes already.
We have customers sending us files all day long, so when it's down it has to be back up right away.
Nothing I try in the GUI has any affect, entering Persistent Carp Maintenance mode does not bring it up.
I even tried resetting states.
-
so did you do a sniff on wan and lan when it was down?
Was the firewall showing any blocks?
-
I am unfamiliar with how to sniff network traffic.
At the time of the problem there was nothing showing as blocked, however I unchecked " Log packets matched from the default block rules put in the ruleset"
Now when I look there is a few lines:
block/1000106024
Jun 4 10:50:29 WAN Block private networks from WAN block 192.168/16 (1000106024) Icon Reverse Resolve with DNS Icon Easy Rule: Add to Block List 192.168.1.112:5351 Icon Reverse Resolve with DNS Icon Easy Rule: Pass this traffic 224.0.0.1:5350 UDP192.168.1.112 is the internal IP of our main PFSense box.
-
I believe so. You volunteer to test?
@cmb:
You also can't bring down a system with a few Mbps DDoS IF it's sized and configured accordingly to handle that kind of resource exhaustion attack.
-
Our main site is down again… How do I sniff the connection?
-
When entering persistent carp maintenance mode I would expect the site to come back up right away. However that doesn't work.
If I physically power off box one then the site comes back immediately when box two takes over.
-
When entering persistent carp maintenance mode I would expect the site to come back up right away. However that doesn't work.
If I physically power off box one then the site comes back immediately when box two takes over.
Is there a chance the physical NIC is starting to fail. Is it always the same server, same interface that fails? Is it possible to clone the questionable server onto different hardware?
I can give you some suggestions for setting up a packet sniffer. It's not too tough if the hardware you are using now supports port mirroring. If it does not, I can recommend a piece of hardware that will do it for you that is relatively inexpensive ($50). It would be a temporary thing, but you could always reuse the hardware in a similar fashion for other things.
-
There no chance it's hardware related in my opinion.
The fact is it's only port 80 that is being blocked, sometimes for one IP and at times two public IPs have port 80 blocked.
Everything else works just fine, I can RDP through, FTP connections unaffected, etc. Our connection out is also not affected.
-
There no chance it's hardware related in my opinion.
Given the symptoms, I agree, there's no way a hardware failure would discriminate between diff types of traffic. I'm thinking based on the description thus far that it's actually the server returning the RST, but a packet capture will prove that one way or another.
How do I sniff the connection?
Diagnostics>Packet Capture. Choose interface WAN, count 0, click Start. Let it run for a minute or so while you attempt to load something from the Internet that isn't working. Click Stop, then download the capture. Repeat the same process but for interface LAN. If those are sub-20 MB, you can email them to me (cmb at pfsense dot org) as attachments with a link to this thread and I'll take a look. If they're bigger than that, upload them somewhere and send a URL via PM here is fine.
-
or you can just ssh to pfsense and do a tcpdump on both your wan and lan at the same time.. write those to a file for later viewing
This way you get dump going at same time both on the wan and the lan. If you need the actual command tell me what your interfaces are for your wan and lan, are they em0, re0 and em1, etc. etc.
something like
tcpdump -i em0 -w wancap.pcap
hit cntrl c to stop the capture.
-
em1 is our WAN and em0 is our LAN
-
so ssh into pfsense twice
in one of the shells cd to /tmptcpdump -i em1 -w wancap.pcap
On second shell cd /tmp
tcpdump -i em0 -w lancap.pcap
then after a few minutes after you have tested trying to get to your site cntrl c both of those - download the files to your fav sniffer wireshark for example and take a look see.
Or I happy to take a look at them as well.
-
If you do it via SSH (which is handy because you can run both simultaneously) I'd make that:
tcpdump -i em0 -s 0 -w lancap.pcap
adding the '-s 0' so it grabs the entire frame and not just the first 96 bytes. Probably won't matter either way in this case, but not capturing so long that it's necessary to trim the frames and it could prove helpful to have it all.