Our Sites become unavailable randomly
-
I believe so. You volunteer to test?
@cmb:
You also can't bring down a system with a few Mbps DDoS IF it's sized and configured accordingly to handle that kind of resource exhaustion attack.
-
Our main site is down again… How do I sniff the connection?
-
When entering persistent carp maintenance mode I would expect the site to come back up right away. However that doesn't work.
If I physically power off box one then the site comes back immediately when box two takes over.
-
When entering persistent carp maintenance mode I would expect the site to come back up right away. However that doesn't work.
If I physically power off box one then the site comes back immediately when box two takes over.
Is there a chance the physical NIC is starting to fail. Is it always the same server, same interface that fails? Is it possible to clone the questionable server onto different hardware?
I can give you some suggestions for setting up a packet sniffer. It's not too tough if the hardware you are using now supports port mirroring. If it does not, I can recommend a piece of hardware that will do it for you that is relatively inexpensive ($50). It would be a temporary thing, but you could always reuse the hardware in a similar fashion for other things.
-
There no chance it's hardware related in my opinion.
The fact is it's only port 80 that is being blocked, sometimes for one IP and at times two public IPs have port 80 blocked.
Everything else works just fine, I can RDP through, FTP connections unaffected, etc. Our connection out is also not affected.
-
There no chance it's hardware related in my opinion.
Given the symptoms, I agree, there's no way a hardware failure would discriminate between diff types of traffic. I'm thinking based on the description thus far that it's actually the server returning the RST, but a packet capture will prove that one way or another.
How do I sniff the connection?
Diagnostics>Packet Capture. Choose interface WAN, count 0, click Start. Let it run for a minute or so while you attempt to load something from the Internet that isn't working. Click Stop, then download the capture. Repeat the same process but for interface LAN. If those are sub-20 MB, you can email them to me (cmb at pfsense dot org) as attachments with a link to this thread and I'll take a look. If they're bigger than that, upload them somewhere and send a URL via PM here is fine.
-
or you can just ssh to pfsense and do a tcpdump on both your wan and lan at the same time.. write those to a file for later viewing
This way you get dump going at same time both on the wan and the lan. If you need the actual command tell me what your interfaces are for your wan and lan, are they em0, re0 and em1, etc. etc.
something like
tcpdump -i em0 -w wancap.pcap
hit cntrl c to stop the capture.
-
em1 is our WAN and em0 is our LAN
-
so ssh into pfsense twice
in one of the shells cd to /tmptcpdump -i em1 -w wancap.pcap
On second shell cd /tmp
tcpdump -i em0 -w lancap.pcap
then after a few minutes after you have tested trying to get to your site cntrl c both of those - download the files to your fav sniffer wireshark for example and take a look see.
Or I happy to take a look at them as well.
-
If you do it via SSH (which is handy because you can run both simultaneously) I'd make that:
tcpdump -i em0 -s 0 -w lancap.pcap
adding the '-s 0' so it grabs the entire frame and not just the first 96 bytes. Probably won't matter either way in this case, but not capturing so long that it's necessary to trim the frames and it could prove helpful to have it all.
-
What sort of disk is being used (HDD, SSD, USB Flash, Flash Card, etc.) and age? Maybe something got corrupted.
I use USB Flash drive (on the second one now). When the first one started going bad weird things would start to happen with pfSense until is was rebooted. Bunch of disk errors would be "fixed" and things would be fine for a few weeks. Wash, rinse, repeat.
Have you tried reinstall?
Make config backup to restore after the reinstallation. Unless the config is simple enough to redo by hand.
If you have spare disk you could swap that into the machine for the reinstall so as to retain the current install in case things go badly.
-
valid point cmb, always hate it when missing details in the capture.. But just looking to see if traffic is being forwarded doesn't really matter the snap length.. But yes I agree always better to grab it all.
-
The install is running off 6 18gig 10000 rpm scsi drives that are raid 5.
I did re install with another server and different hard drives, same issue.
After the last drop yesterday I disabled PFBlockerNG, we have had no issues since then. This morning I actually uninstalled PFBlockerNG.
Will update if things change.
-
I keep seeing this in the log:
Jun 4 19:09:59 WAN Block private networks from WAN block 192.168/16 (1000001584) Icon Reverse Resolve with DNS Icon Easy Rule: Add to Block List 192.168.1.112:5351 Icon Reverse Resolve with DNS Icon Easy Rule: Pass this traffic 224.0.0.1:5350 UDP
192.168.1.112 is the IP of our main PFSense box. Any idea how to get rid of it?
-
Well that is multicast traffic I believe bonjour on port 5350 - or NAT-PMP status announcements.
You have UPnP enabled on pfsense?
6 18gig drives in a raid 5?? WTF?? For a firewall?
How old are those drives? Why would you not just run 2 in a mirror? Not like you need space, etc..
-
Yes we have UPNP enabled, not sure why it would be blocking it's own traffic?
We have a bunch of these older servers with 6 18 gig hd in them. Since they are old and we have a lot of them I figure why not since it's the most reliable way to run them.
I had thought progress was made since it seemed to stay up all weekend long. However yesterday as I was away from the office the problem returned. There seems to be nothing in the logs at all during the time it happened.
-
Yes we have UPNP enabled, not sure why it would be blocking it's own traffic?
Because you have your WAN and LAN interconnected somewhere, which is bad. Block private networks on WAN is blocking it because it's a private source IP on WAN. Fix your network so WAN and LAN aren't on the same broadcast domain.
That could be contributing to the problem you're seeing, or potentially the cause of it depending on what other network brokenness you have. But given no useful data gathered yet again at the last instance of the problem, there's no telling.
-
I'm not sure how I could have my lan and wan interconnected. I have separate network ports for each.
Where could I check and what would I be looking for if it's a configuration issue?
-
Interconnected at the switch level, unrelated to the firewall. Maybe your drop from your provider is plugged into the same switch as your LAN hosts, with no VLAN or other isolation.
-
@cmb:
… there's no way a hardware failure would discriminate between diff types of traffic. ...
Not entirely true
My Intel NIC supports scheduling interrupts differently based on TCP ports.