Dpinger multiple targets - aka gwmond $2,500

webdawg

https://redmine.pfsense.org/issues/7671

edit: https://redmine.pfsense.org/issues/4354

Also got a message back from pfSense:

Ultimately it's not seeing any traction because the suggested solution isn't right. Essentially dpinger is only a daemon that pings and reports responses. It doesn't make decisions about what is good or bad for a pfSense gateway as a whole only its specific single target. It isn't up to dpinger to handle multiple targets or different protocols.
What is needed is more like some middleware-ish daemon to sit between pfSense and other gateway monitoring daemons like dpinger (See #7671 for some other suggestions) that would be capable of coordinating multiple monitoring techniques for each gateway and making more informed decisions about their status.

pfSense
|
+--- [gateway monitoring daemon]
|
+ --- [dpinger <1...n>, <something that checks http>, <something that checks tcp>, etc]

There isn't currently a feature request for that, however, but feel free to open one and start a bounty on the forum to see if you get any takers. Given the responses on the dpinger github it appears its author agrees that it's out of scope for dpinger itself.

Can someone create a new bounty?

luckman212

Yes, in my original post/redmine ticket I did suggest that what's really needed is a new gw monitoring daemon to make intelligent decisions and take actions based on the current state of dpinger (or other monitoring processes).

I should have stayed in school and learned enough C so I could create this myself, but, too late for that. We need someone more capable to step up to the plate here.

Maybe $2500 wasn't enough- not sure if it's even possible to raise that much (or more) without some marketing/campaigning from Netgate themselves. Maybe a monthly "Top Bounties" section of the Newsletter could help fund this type of 3rd party development?

In the meantime, I really feel like the saying "Perfect is the enemy of good" (aka Pareto principle) applies here. With the existing tools (dpinger/base pfsense) we already have what's needed with some small code changes to define additional gateway monitoring targets (albeit, ICMP only) and not trigger a gw failover event unless ALL targets are down.

Until & unless we can figure out how to fund or create enough demand for a better solution, maybe we should just implement 3 targets for now which will probably help 80% of the users who need multitarget monitoring.

webdawg

@luckman212 Can you pile in on the new issue? Lets see if we can make it happen?

luckman212

Well I don't know about piling in, but I would go in for $500 or so

My financial position has changed since the original bounty- I can't afford to put $2500 in anymore. Hopefully we can get some others to contribute.

webdawg

I think I might try, and do this. I sent Jim an email, and I will post anything I do here soon.

rtw915

What is the status of this?

I proposed https://redmine.pfsense.org/issues/14177 to try and solve this issue, but having multiple ICMP monitors for a single GW would be a great improvement.

More broadly, how do features like this gain the necessary attention to get implemented sooner than 6+ years?

jrey

@luckman212

I didn't really care about latency, but I had a need to know if a handful of internal systems are up or down by glancing at the dashboard.

I've tried the same with "monitoring other external systems" (by DNSName or IP) and it works for that as well, but it is a moot point if the WAN is down.

Screen Shot 2023-11-22 at 2.48.46 PM.png

Take one system down to show
Screen Shot 2023-11-22 at 2.53.44 PM.png

dennypage

@jrey said in Dpinger multiple targets - aka gwmond $2,500:

I didn't really care about latency, but I had a need to know if a handful of internal systems are up or down by glancing at the dashboard.

Librenms?

jrey

@dennypage said in Dpinger multiple targets - aka gwmond $2,500:

Librenms?

No - why?
Everyone always wants to suggest more packages, or external systems. All of which are great if that is the requirement.

The requirement for me was simply -
show the up/down status of a handful of internal system on the pfSense Dashboard.

In my case there was no need for any third party system.

No more systems to run/manage/maintain
No need to record when they are down, just need the NOC/Ops to see they are.
No need to monitor external systems (although it can), because currently outside the wall is not my problem ;-) someone with a remote location might certainly want this, and perhaps latency shown at that point. But honestly the tunnel would likely go away.
No need to display or record latency history (although the data is right there and easy enough to display. These are internal systems so response is typically <1ms - dashboard real estate is more valuable than seeing latency)

Why do I do this? because in my case, for my requirement these internal systems are all virtual and have automatic strategies for shutdown / backup / restart and other scheduled maintenance functions. On rare occasions they go down for said automated maintenance and then don't restart cleanly. So instead of having to investigate each of them manually, or waiting for the masses to start complaining, there is an immediate visual clue in plain sight that something is wrong.
The less work that has to be done to maintain/discover and solve the better. Now instead of potentially hours before a "down" gets noticed, by some user, or having to check everything. The issue is identified, systems back up in minutes, to the point those users, likely never even notice. And there in lies the illusion 99.99% uptime.
Auto maintenance by night, smooth operation by day. Sit back and watch the blinky lights.

Tools need to save time, not create more work.

PS: one of those systems I monitor, actually has a network monitoring tool running on it, in a docker, with a bunch of other dockers as well. However, it has been my experience that Network monitoring is not very functional if the system goes off the network. Could just be a weird observation, that needs more coffee. ;-)

dennypage

@jrey said in Dpinger multiple targets - aka gwmond $2,500:

Everyone always wants to suggest more packages, or external systems.

The reason that people keep suggesting external systems is because what you are trying to do is general purpose monitoring. That isn't what a firewall does.

The reason that the firewall uses dpinger to monitor targets is to determine the health of gateways (routes), to determine if it needs to failover to a different gateway, etc. Functions necessary to the operation of a firewall. The topic of the thread, which is monitoring multiple targets to determine the health of a single gateway.

If Librenms is too much, then uptime-numa is another that you might consider for your needs.

rtw915

I posted on this thread that has been dead for 4 years because luckman212 original post from 6 years ago would solve my dual WAN failover issue. My need is firewall specific so that it can correctly identify when the "internet" is down. Not just if the GW 20 feet away in the same datacenter is up but the router in the basement is down, or when google DNS decides to throttle ICMP traffic (which is their right to do so).

I think the original intention of this thread has been lost in the last few comments. I'm hoping this comment brings this topic back on track.

jrey

@dennypage said in Dpinger multiple targets - aka gwmond $2,500:

That isn't what a firewall does.

But then by that - a firewall doesn't do DHCP, DNS, proxy or hundreds of other things pfsense can do. And there are "firewall" on the market that can actually display other things in customizeable "widgets".

a "Firewall" blocks traffic.

The change I've made, locally to do this, has no impact on anyone but me, it is a fast convenient dashboard view that does require NOC to go hunting on other systems, rely on notifications or what ever.
Nothing is being captured/recorded, so it is really just a convenience.

You'd likely freak at the other dashboard change I've made. Firewall Log with reverse DNS built right in and then the next most asked question was always so then whois this? so right behind the reverse on the dashboard or ip is a link to whois. Why because instead of multiple links - it's one easy click
Screen Shot 2023-11-23 at 12.24.14 PM.png

this at the end of the S: (reverse lookup) line is a whois link
Screen Shot 2023-11-23 at 12.26.07 PM.png

Doing this in real time, also has absolutely zero impact on the system. But it sure saves a lot of time jumping around on different screens.
Yup I could certainly click on the IP - go to a reverse lookup screen, get the IP, open a whois paste the IP, find out who it is.. OR...as I prefer click and done.

Next argument : what if they change base widget? Well they haven't change in several upgrades and yes they certainly could, but I have built them as local patch files.. so if they don't apply I could just tweak them.

Is it the fact that they are internal systems being watched?
There better, now monitoring multiple targets to determine the health of the gateway
Screen Shot 2023-11-23 at 12.44.12 PM.png

of course the gateway(s) is/are also monitored on both the gateway and interface widgets already. but certainly in the case of a multi wan failover the status of a couple of other remotes might be handy. The remote branch perhaps, you might want to know that is still reachable in a failover.

I'm lazy or "work smarter not harder" for my own particular situation, however you want to look at.

dennypage

@jrey said in Dpinger multiple targets - aka gwmond $2,500:

But then by that - a firewall doesn't do DHCP, DNS, proxy or hundreds of other things pfsense can do.

Yes, these are common services provided by firewalls.

jrey

@rtw915

but having multiple ICMP monitors for a single GW would be a great improvement.

and

I think the original intention of this thread has been lost in the last few comments

sorry just ignore my comments then. or I'll remove them if you like.
Though the "having multiple ICMP monitors for a single GW would be a great improvement" was part of the ask, I must have read that wrong.

Carry on.

dennypage

@rtw915 said in Dpinger multiple targets - aka gwmond $2,500:

My need is firewall specific so that it can correctly identify when the "internet" is down. Not just if the GW 20 feet away in the same datacenter is up but the router in the basement is down, or when google DNS decides to throttle ICMP traffic (which is their right to do so).

Yea, I've noticed that Google has had issues as of late, particularly with 8.8.8.8. Out of curiosity, did Google throttle your ICMP traffic, or just block it completely?

FWIW, I personally use a regional router inside my ISP. I suggest using

traceroute -I 1.1.1.1

to discover a suitable target. If you have multiple WAN interfaces, add the -s option to traceroute to ensure you are looking at the correct interface

For people who absolutely need a target beyond their ISP, I generally recommend Cloudflare (1.1.1.1) instead of Google.

jrey

@dennypage

Yup, no real disagreement there, (they are in fact blenders, doing DHCP, DNS and all that) however monitoring is part of the package (in some vendors / cases more extensive than others). One would assume that's what the Gateway, Interfaces, or Traffic widgets included on the Netgate are monitors. In fact everything that can be dropped on the dashboard is a monitor of something. Dashboard widgets are monitors.

what you are trying to do is general purpose monitoring,

The built in monitors just don't go far enough in some cases, or more specifically in my case.

That isn't what a firewall does.

Why not? it already has a foundation and widgets that monitor all sorts of things built in..

hard pressed to name one widget on the dashboard that isn't monitoring something, okay wait "Picture"

as to

I've noticed that Google has had issues as of late, particularly with 8.8.8.8.

Weird, my gateway widget uses 8.8.8.8 as the monitor and has not had any issues with RTT ~4ms response time - zero packet loss and hasn't dropped the wan in months. Except of course when I reboot.
haven't logged a dpinger alarm since Nov 7, and that was the last time system was restarted. So I can honestly say I haven't notice any throttle from Google.

rtw915

@dennypage said in Dpinger multiple targets - aka gwmond $2,500:

Out of curiosity, did Google throttle your ICMP traffic, or just block it completely?

Honestly, I don't know. It was enough for the firewall to think the primary WAN was down. At first I assumed that the connection was actually down, but our external monitoring solutions said that it was still up. I changed the monitoring IP in the firewall back to the first hop and everything went back to normal.

My concern is, it seems like relying on a single entity who's specific purpose is not an uptime monitor is asking for trouble. If only there was a way to build a monitoring group and then associate that to a WAN interface, so that no single public service could trigger a false failover.

dennypage

@rtw915 said in Dpinger multiple targets - aka gwmond $2,500:

My concern is, it seems like relying on a single entity who's specific purpose is not an uptime monitor is asking for trouble.

It's not so bad if the target is an integral part of your route to the through the ISP. In my ISP case, if the regional target I use is not responding then the ISP routing is pretty non functional as a whole.

This is one reason why the multiple targets for a single gateway hasn't been seen as a critical need.

rtw915

@dennypage said in Dpinger multiple targets - aka gwmond $2,500:

It's not so bad if the target is an integral part of your route to the through the ISP.

Please forgive me, but I don't really understand what you mean.

At the data centers I'm using their DIA service. Honestly, It is sold/offered as one thing, but seems to work differently in practice. They sell it as a highly available service where they monitor and handle BGP routes with their peers and practically nothing bad can ever happen. In practice, they'll peer with Cogent and then Cogent will peer with twelve99. At some point twelve99 will have a problem in the middle of the country and the tunnels will start dropping packets. So I call them up and they will move me to GTT or some other Carrier. At this point the monitoring IP is still on Cogent (cause I forgot to change it) even though I'm not on it anymore. Then sometime in the future, GTT has a problem and for some reason the DIA service doesn't recognize this issue and the gateway doesn't flip because it's still monitoring the old Carrier. The hand off to the Carrier is two or three hops away so I can't monitor the first hop and tell if I'm actually up.

After all this I moved to Google DNS IP and got bitten about a month later like I said in my previous message.

dennypage

@rtw915 said in Dpinger multiple targets - aka gwmond $2,500:

@dennypage said in Dpinger multiple targets - aka gwmond $2,500:

It's not so bad if the target is an integral part of your route to the through the ISP.
Please forgive me, but I don't really understand what you mean.

I was referring to a traditional multi-wan situation in which you have two completely independent ISP connections, such as AT&T and Comcast. In this case you would want a monitoring target for WAN1 in the local region for ISP1, and a a monitoring target for WAN2 in the local region for ISP2

That said, it sounds like what you are being presented is a single WAN connection from a pseudo ISP that moves peering around as needed. Different situation than above. I honestly do not have a good suggestion for you in that situation. Even the multiple targets approach would probably not really address the issue with peering changes.

Cloudflare is currently the best suggestion I have unless you want to stand something up with a cloud provider.