dpinger broken or Dashboard broken or my brain is broken....

Derelict

Have you packet captured the ICMP pings on the WAN you think should be up when it is showing as down to see what is really going on?

If pfSense is sending the echo requests and there is no response, dpinger is doing everything it is supposed to be doing.

Pfosten

@Derelict

Like I wrote above:

The destination address is always responding, the interface is up and carrying massive traffic.

I was testing today again, during massive speedtest of my interface, the ping was delayed and for 1-2 seconds the dashboard widget was showing "offline", but recovered soon after.
My problem here seems to be that the status is getting unpredictable "stuck" showing 100% packet loss forever UNTIL I do any change to any gateway or the gatewaygroup.
So I have doubts that not sent or filtered ICMP responses are the real cause of this issue.

Pfosten

Here another log example:

2020/09/25 09:38:37 I fiddled around with gateway settings to trigger the problematic gateway group to recover from OFFLINE that was set 2020/09/25 05:53:50

2020/09/26 13:46:28 gateway group OFFLINE again

2020/09/26 15:38:53 manual changing of gateway settings (usually setting default IPv4 gateway from automatic to the problematic gateway and back)

2020/09/26 20:37:51 gateway group OFFLINE again

2020/09/27 10:44:08 manual changing of gateway settings

2020/09/27 13:13:30 gateway group permanently OFFLINE again

Pfosten

A question:

netgate is utilizing the same core code for professional use, right?
They must experience the same issues, how can it come that related bug descriptions are not fixed for 1 year and longer?

https://redmine.pfsense.org/issues/9450

bobbenheim

@Pfosten said in dpinger broken or Dashboard broken or my brain is broken....:

@Derelict

Like I wrote above:

The destination address is always responding, the interface is up and carrying massive traffic.

I was testing today again, during massive speedtest of my interface, the ping was delayed and for 1-2 seconds the dashboard widget was showing "offline", but recovered soon after.
My problem here seems to be that the status is getting unpredictable "stuck" showing 100% packet loss forever UNTIL I do any change to any gateway or the gatewaygroup.
So I have doubts that not sent or filtered ICMP responses are the real cause of this issue.

That is exactly why he asks you to do a packet capture, so the problem can be narrowed in to either something within pfsense or something external blocking your ICMP traffic.

Pfosten

@bobbenheim

On 2020/09/28 09:19:52 I was setting the default IPv4 gateway to WAN_PHY1_IGB0 which is resetting the status shown in the dashboard widget

Interface was able to carry traffic all the time!

Several times I pinged 8.8.4.4 as defined for gateway monitoring - always fine.

2020/09/28 21:59:22 the status shown in the dashboard widget changed to OFFLINE, even after that, interfaces is able to carry traffic by speedtest up to subscribed max + pingtest is fine.

Pfosten

Now tested with 2.5.0.a.20201101.1850

I still get for unknown reasons sometimes partial or full loss for alive-ping at one of the 2 WAN interfaces, but this is not the issue.

Nov 2 10:37:56 dpinger 16236 WAN_PHY1_IGB0GW 8.8.4.4: Alarm latency 0us stddev 0us loss 100%

Problem is that this status remains until any change to the gateway group is made - then it works immediately.

dpinger is not reattempting to reach the defined IP or the process maintaining the operational status is not taking over the changes.

bobbenheim

Repeating the symptoms and posting screenshots still doesn't get anyone closer an explanation for what is going on in your setup. Make a packet capture on the WAN_PHY1_IGB0GW interface so you can actually determine if the problem is internal or external of pfsense to start with.

Pfosten

@bobbenheim

1st: I just wanted to state that with 2.5.0 I have the same issues
2nd: If the system is considering the IBG0 OFFLINE while I can ping successfully at the same time the configured 8.8.4.4 (and any other I treid before) by the help of the pfsense ping tool - I am pretty sure that the issue is inside the pfsense.
3rd: I will setup a external capturing if this helps

bobbenheim

@Pfosten

You can make the capture within pfsense:
Diagnostics > Packet Capture
You can also limit it to capture icmp traffic to and from 8.8.4.4 so you don't get an unnecessarily large packet capture.

JeGr

@Pfosten said in dpinger broken or Dashboard broken or my brain is broken....:

2nd: If the system is considering the IBG0 OFFLINE while I can ping successfully at the same time the configured 8.8.4.4 (and any other I treid before) by the help of the pfsense ping tool - I am pretty sure that the issue is inside the pfsense.

The system or interface is not offline, it is just reported offline as dpinger sees no responses to its ping. Simple as that. Ping tool does that - ping. Dpinger perhaps uses different settings like a different source IP etc. Also I don't see your routing table as if you have added 8.8.4.4 as a host route for another interface, that gets in the way of dpinger functioning properly.

E.g. "System / General" -> adding 8.8.4.4 as system DNS on WAN_PHY2 (accidentally) instead of PHY1 would host-route that IP to PHY2 so dpinger trying to check the IP via PHY1 will fail. So a config-mistake is still possible. That's why a packet capture on the physical interface was required to see if dpinger actually sends out pings on THAT interface or if something goes amiss before that.

Also to mention: we had a few (very few) select cases of this "IP/Gateway not pingable with a FritzBox / AVM Box in front" in the german subsection. Funny story: when removing the FritzBox most users had NO problem anymore - at all. Others (like me) had problems with FritzBox in front and could change to "bridged" mode on LAN2 -> also no problems anymore. Newer AVM boxes/firmwares aren't that bulletproof anymore.

PS:

netgate is utilizing the same core code for professional use, right?

There is NO other version of pfsense or "core code" for different versions etc etc.
pfSense is the same software on any platform. The version on Netgates own devices is only tweaked/build for those platforms if they aren't x64 (like SG1100/2100/3100 as they are ARM) and otherwise add HW dependent things (like VLAN/Switch configuration if device includes a switching chipset). Otherwise they just add 1-2 small wizards on top but 98-99% are just the same for everyone. No "enterprise", "core" or anything version. Just wanting to clarify for those wondering after that comment.

Cheers
\jens

Pfosten

@bobbenheim

![0_1604392570270_packetcapture 3rd id20994 is dpinger id3317 is pingtool pfsense.cap](Uploading 100%)

![0_1604392811671_packetcapture 2nd.cap](Uploading 100%)

The internal capture does not give the full evidence since I utilize a tool that could be broken.

In the 2nd trace I have the moment when the responses for a ICMP ping with a certain ID do not arrive any more.

The 3rd trace is showing the ICMP requests with id20994 with no response for a while and in parallel using the pfsense pinger tool (id 3317) pinging the SAME IP address used by dpinger with responses.

Dunno how pfsense is internally structured, if it is possible that packets could be discarded or lost by processes before grabbed by tracing tool.

If not, an external device (fritz box modem or CMTS) would be the next devices that could stop answering on a ICMP job with the same ID after a while.

To bring that evidence I must trace at fritz box or in-between fritz box and pfsense.

Could be as well something in cable operator network that is killing such keep-alive sessions after a while.

Behavior is the same regardless

which IP
which packed size
timings set at pfsense

At the interface towards the DSL operator I run a (of course different type of) fritz box as modem - without any problem.

Pfosten

@JeGr

Hello

I recognized in (pfsense) traces that on every change in settings at pfsense, the ICMP request having a new ID, once this ID does not receive answers any more, requests with a new ID sent in parallel are answered.

There could be something external that is killing based on ICMP job ID and time elapsed such keep-alive traffic. Maybe on purpose. Since it is the same behaviour regardless which IP I use, it can't be the requested host itself. So it is something in VODAFONE/UNITYMEDIA core network or the provided cable modem FritzBox 6591. On all boxes the "stealth mode" is turned of, but that would not hinder ICMP originated from direction Fritzbox towards internet, only the opposite way.

Bob.Dig

I think I have seen similar. For example, my VPN Clients to a VPN Service Provider will all be shown as offline in the dashboard, but I actually can use them. Only restarting the Clients in pfSense will solve that dashboard problem.
Another problem, not related maybe, if my WAN is going down, which is does often, after some clicks in the webgui of pfSense, the whole GUI can die and I get timeouts or a specific browser error. Routing etc still works fine though.
These are examples where I have a problem thinking, that pfSense could be run in an enterprise environment. But then, I also love it so much, that I go on.

bobbenheim

@Pfosten I don't know what you mean by utilizing a broken tool.
Dpinger is not located on the interface in pfsense.
Intercepting packets physically in between pfsense and your friztbox is exactly the same as capturing the packets with packet capture in pfsense on the given interface.
From what you are saying it sounds like pfsense does send ICMP requests but doesn't get an repsonse, but It would be quite a lot easier to see what is going on if you provided a packet capture.

dneuhaeuser

I have the exact same problem with 2 installations behind Fritzbox 6591 Cable from Vodafone Germany (Unitymedia).

Pings to various external IPs stop after a few hours.
Re-save of gateway (which restarts dpinger) starts it again, until the next time.

Only monitoring the external IP of the Fritzbox itself seems to stay stable.

Someone on another forum observed the same thing with a completely different firewall and Fritzbox 6591:

https://community.ui.com/questions/UDM-Pro-1-7-1-WAN-Failover-false-positives/6d01f376-c1f2-4d73-8291-8772b2e3483f#answer/34ba6dab-d394-4b86-ad43-c7f511b69a18

Pfosten

@dneuhaeuser

Finally VF/UM is rolling out 7.22 for the 6591 modem.
If it is the internal firewall or just a bug that kills the ICMP reply, it might work fine then if they fixed it.
I opened a TT to VF and those tards suggested "did you upgraded to the latest 7.22?" knowing exactly that I can't do it on my own. They refused to trace on their side if the ICMP reply is visible at their CMTS.

Pfosten

After 4 days of flawless operation of dpinger I must conclude that both my brain and the AVM firmware 7.13 of the FritzBox 6591 were broken :-)

Derelict

@pfosten What's funny is "On the Fritz" is a colloquial term used here in America for "malfunctioning" "broken".

hmh

@kiokoman Thanks, data-payload = 2 resolved issue with one of my WANs