Our Sites become unavailable randomly

wheemer

I keep seeing this in the log:

Jun 4 19:09:59 WAN Block private networks from WAN block 192.168/16 (1000001584) Icon Reverse Resolve with DNS Icon Easy Rule: Add to Block List 192.168.1.112:5351 Icon Reverse Resolve with DNS Icon Easy Rule: Pass this traffic 224.0.0.1:5350 UDP

192.168.1.112 is the IP of our main PFSense box. Any idea how to get rid of it?

johnpoz

Well that is multicast traffic I believe bonjour on port 5350 - or NAT-PMP status announcements.

You have UPnP enabled on pfsense?

6 18gig drives in a raid 5?? WTF?? For a firewall?

How old are those drives? Why would you not just run 2 in a mirror? Not like you need space, etc..

wheemer

Yes we have UPNP enabled, not sure why it would be blocking it's own traffic?

We have a bunch of these older servers with 6 18 gig hd in them. Since they are old and we have a lot of them I figure why not since it's the most reliable way to run them.

I had thought progress was made since it seemed to stay up all weekend long. However yesterday as I was away from the office the problem returned. There seems to be nothing in the logs at all during the time it happened.

cmb

@wheemer:

Yes we have UPNP enabled, not sure why it would be blocking it's own traffic?

Because you have your WAN and LAN interconnected somewhere, which is bad. Block private networks on WAN is blocking it because it's a private source IP on WAN. Fix your network so WAN and LAN aren't on the same broadcast domain.

That could be contributing to the problem you're seeing, or potentially the cause of it depending on what other network brokenness you have. But given no useful data gathered yet again at the last instance of the problem, there's no telling.

wheemer

I'm not sure how I could have my lan and wan interconnected. I have separate network ports for each.

Where could I check and what would I be looking for if it's a configuration issue?

cmb

Interconnected at the switch level, unrelated to the firewall. Maybe your drop from your provider is plugged into the same switch as your LAN hosts, with no VLAN or other isolation.

Harvy66

@cmb:

… there's no way a hardware failure would discriminate between diff types of traffic. ...

Not entirely true

My Intel NIC supports scheduling interrupts differently based on TCP ports.

cmb

@Harvy66:

@cmb:

… there's no way a hardware failure would discriminate between diff types of traffic. ...

Not entirely true

My Intel NIC supports scheduling interrupts differently based on TCP ports.

Sure, if you're configuring something along those lines. But we don't touch anything like that in NICs at this point. For anything we configure today, a hardware failure would not discriminate between diff types of traffic in the way OP describes.

wheemer

I just checked the cabling coming from our ISP switch, there is definitely only two cables going from that, one going into each of our PFSense boxes.

Harvy66

Correct, I just wanted to point out that there are some strange corner cases that really don't apply almost ever, but could exist.

wheemer

I am using 1:1 mapping for three of my external IPs. I also have a rule allowing port 80 on one of those IPs.

This is the port that is going down.

I delete my rule. Then added a port forward and let it create the rule.

When I did this the website opened up again. I hope this has some lasting affect.

cmb

@wheemer:

I am using 1:1 mapping for three of my external IPs. I also have a rule allowing port 80 on one of those IPs.

This is the port that is going down.

I delete my rule. Then added a port forward and let it create the rule.

When I did this the website opened up again. I hope this has some lasting affect.

That's nothing more than a coincidence.

This happened again, and yet still not a single packet capture while it's happening? There's a reason that's what I and others in this thread asked for multiple times, weeks ago. Again, you need to capture the traffic while it's occurring so you can see what's actually happening and share same with some of us, so we can troubleshoot it for you. Anything other than doing that isn't troubleshooting the problem, it's poking at stuff hoping some unknown issue will go away by pushing random buttons which likely have no relation to the issue at all, which is absolutely not going to be successful.

wheemer

So my sites have been up all weekend since making that change.

Will update again after some time.

Also to all the people complaining that I am not providing information. I'm not going to do a bunch of extra work using methods I am unfamiliar with unless it is absolutely necessary. So far it's been much easier to just reboot the firewall and get the site back online.

The fact that problems like this are not present in the log kinda says something about the lack of logging. I shouldn't have to wireshark to try to figure out whats up with our firewall, that's just seems ridiculous.

Supermule

But would you learn from doing so??

wheemer

Of course I would learn if I delved into it, However I do not want to be a network engineer. In my small company I have many hats to wear already.

I am just trying to get my management to stop complaining about the website going down. Keeping it down while I figure out how to do things is not really an option I'd like to take.

But of course I realize that much more info would be available. But I have basic needs from PFSense, so I figure basic needs should require basic operation.

cmb

@wheemer:

Also to all the people complaining that I am not providing information. I'm not going to do a bunch of extra work using methods I am unfamiliar with unless it is absolutely necessary. So far it's been much easier to just reboot the firewall and get the site back online.

We're not asking you to learn packet analysis before getting the system back up and going. You could figure out how to get the necessary packet capture in less time than you've spent pushing buttons that have no relation to the problem, spend no more than 2-3 minutes gathering the necessary files before rebooting the system to temporarily work around whatever the root cause is, then get us those files so we can tell you what the actual problem is.

Patching problems without attempting to find a root cause is never a good idea with anything in IT. They'll usually come back again and again and again.

@wheemer:

The fact that problems like this are not present in the log kinda says something about the lack of logging. I shouldn't have to wireshark to try to figure out whats up with our firewall, that's just seems ridiculous.

Because it's highly likely you're not having a firewall problem at all. Firewalls aren't as straight forward as, say, a Windows server. Their role and placement in the network is such that a reboot will fix a wide range of problems that have nothing to do with the firewall itself.

@wheemer:

Of course I would learn if I delved into it, However I do not want to be a network engineer. In my small company I have many hats to wear already.

I am just trying to get my management to stop complaining about the website going down. Keeping it down while I figure out how to do things is not really an option I'd like to take.

You can figure out the data gathering now, which is quick and easy. Then have it down for a couple minutes longer next time to gather the appropriate data to troubleshoot, so you're on a path to the root cause and a resolution. Blindly pushing buttons with hopes and dreams of fixing a completely unknown problem is just going to ensure the problem recurs. Leaving things down ~2-5 minutes longer next time to put yourself on a path to finding and fixing the root cause is better than having it go down every few days over and over with 0 progress towards finding out what's actually happening much less fixing the root cause.

I understand wearing too many hats to become an expert at these things, we have many support customers in your shoes who rely on us for that expertise. If you worked and learned 24/7/365, it wouldn't be enough to become an expert on everything you're responsible for in a small company where you're probably the guy for anything that runs on electricity. You've done the right thing to seek the experts. But when said experts tell you how to troubleshoot something, you should listen. It's likely it'll only be a matter of time until it happens again.

wheemer

While I do appreciate the help from everyone It seems I was able to fix the problem with the last thing I tried.

I do not want to jinx it, but there is no way we had this long an up time since we first noticed the issue.

My PFSense was working fine for years without having to have a port forward setup since the external and internal were defined in 1:1

As long as I had a rule set the port worked perfectly.

Now that I have specifically defined the port forward and am using the auto rule it created everything is functioning as expected.

Previously we have never had a Monday complete without downtime since this began.

wheemer

It is funny how nobody even responds now that I have confirmed there is some problem with pfsense with regards to 1:1 natting.

All you guys just complicated the issue asking me to do unrelated things like it was somehow my responsibility to troubleshoot and diagnose the problem. I wonder if I would have been questioned like that if I was paying for support?

I have now had even Snort running for a few days and everything is still rock solid.

I love PFSense and greatly appreciate the efforts all the devs have put into making it an excellent firewall.

But I don't think it should be that hard to report a bug.

cmb

Because there isn't anything wrong with 1:1 NAT, and nothing here suggests otherwise. It's completely impossible for it to work for some protocols and not others. You were also getting RSTs back from somewhere, which was more than likely the web server itself. Something else changed/is no longer breaking. I have no doubt if you remove that port forward, it'll continue working.

wheemer

LOL!!!

Still in denial I see…

Keep pointing fingers anywhere but PFSense.

Absolutely nothing has changed on our servers or our network.

The only change was what I said and that completely resolved the issue.