dpinger broken or Dashboard broken or my brain is broken....

Pfosten

Much like in a bunch of other posts, my gateway monitoring is rendering 1 certain gateway of my multi-WAN setup as offline, but it stays there forever with 100% packet loss.

Only tweaking (any) setting of that gateway is triggering the solving the issue...for a while.

Remarkable is that the WAN is all the time working perfectly, speed-tests giving best results, no diffeerence compared to when its marked as online.

I can also ping the Monitoring IP with pfsense internal pinger by selecting the interface of that WAN interface.

Hence I assume that there is something wrong with dpinger or kind of supervision process or whatever.

Means a bug.

kiokoman

did you try this setting ?
https://docs.netgate.com/pfsense/en/latest/book/routing/gateway-settings.html#data-payload

Pfosten

I tried 0, 1, 56, noch difference.
Since the day I made the screenshot, the WAN interface is carrying traffic without any problem while the dashboard status showing offline.

I selected the faulty marked WAN interface/gateway as the default gateway to make sure traffic is routed there.

Pfosten

@Pfosten

This is the Gateway-related log.
As you can see, there is not a single new entry by dpinger, the interface nevertheless is carrying traffic like a charm all the time. The widget in dashboard is still showing the gateway as offline.

kiokoman

I'm following you but I have no idea, maybe try to ping a different IP instead of 8.8.4.4
maybe the wan1 isp is limiting the pings

Pfosten

@kiokoman :

I used several IPs which are working fine for the other WAN interface.
I consider dpinger or the widget itself as broken.
The "problematic" WAN interface is carrying traffic without problems, hence dpinger cannot show 100% paket loss except my modem or anything else in between is filtering out my ping pakets.
And then it would be a permanent error, but each time I change the gateway settings, it resets and works for a while.

kiokoman

try to ping the modem or the next hop to see where it stop working

Pfosten

@kiokoman
This is not the point, whatever causes the packet loss, it is not permanent, but dpinger never recovers

kiokoman

if you restart the service does it start to work again?
is wan1 dhcp or static ?

Pfosten

Pfosten

Pfosten

The Ranges of the "FritzBox" Modems are split so that 100-200 are in DHCP Pool, rest is static. So "DHCP=ON" is a bit misleading. Adresses 1-99 are in fact static.

kiokoman

yeah I see, anyway on pfsense it's set as static IP, I don't understand why dpinger does not recover in your case

Pfosten

@kiokoman : good to review, found a copy&paste mistake in the drawing, config is ok.

Pfosten

Ok, I guess it is a bug, not a misconfiguration, how to submit a bug?

kiokoman

you can do it here https://redmine.pfsense.org/
but maybe there is already a ticket for that, take a look at the list of open bugs before opening a new one

Derelict

Have you packet captured the ICMP pings on the WAN you think should be up when it is showing as down to see what is really going on?

If pfSense is sending the echo requests and there is no response, dpinger is doing everything it is supposed to be doing.

Pfosten

@Derelict

Like I wrote above:

The destination address is always responding, the interface is up and carrying massive traffic.

I was testing today again, during massive speedtest of my interface, the ping was delayed and for 1-2 seconds the dashboard widget was showing "offline", but recovered soon after.
My problem here seems to be that the status is getting unpredictable "stuck" showing 100% packet loss forever UNTIL I do any change to any gateway or the gatewaygroup.
So I have doubts that not sent or filtered ICMP responses are the real cause of this issue.

Pfosten

Here another log example:

2020/09/25 09:38:37 I fiddled around with gateway settings to trigger the problematic gateway group to recover from OFFLINE that was set 2020/09/25 05:53:50

2020/09/26 13:46:28 gateway group OFFLINE again

2020/09/26 15:38:53 manual changing of gateway settings (usually setting default IPv4 gateway from automatic to the problematic gateway and back)

2020/09/26 20:37:51 gateway group OFFLINE again

2020/09/27 10:44:08 manual changing of gateway settings

2020/09/27 13:13:30 gateway group permanently OFFLINE again

Pfosten

A question:

netgate is utilizing the same core code for professional use, right?
They must experience the same issues, how can it come that related bug descriptions are not fixed for 1 year and longer?

https://redmine.pfsense.org/issues/9450