dpinger broken or Dashboard broken or my brain is broken....


  • 918d455c-3939-4b4a-9cd0-2972c2757497-grafik.png

    Much like in a bunch of other posts, my gateway monitoring is rendering 1 certain gateway of my multi-WAN setup as offline, but it stays there forever with 100% packet loss.

    Only tweaking (any) setting of that gateway is triggering the solving the issue...for a while.

    Remarkable is that the WAN is all the time working perfectly, speed-tests giving best results, no diffeerence compared to when its marked as online.

    I can also ping the Monitoring IP with pfsense internal pinger by selecting the interface of that WAN interface.

    Hence I assume that there is something wrong with dpinger or kind of supervision process or whatever.

    Means a bug.

  • LAYER 8


  • I tried 0, 1, 56, noch difference.
    Since the day I made the screenshot, the WAN interface is carrying traffic without any problem while the dashboard status showing offline.

    69920bc0-fe30-406d-8f40-6efd9460d08b-grafik.png

    I selected the faulty marked WAN interface/gateway as the default gateway to make sure traffic is routed there.

    d6f71429-012c-442b-86ed-dd3f2d40a937-grafik.png


  • @Pfosten

    543f22c1-ec9c-49b2-82e0-edb2f9cb1fdd-grafik.png

    This is the Gateway-related log.
    As you can see, there is not a single new entry by dpinger, the interface nevertheless is carrying traffic like a charm all the time. The widget in dashboard is still showing the gateway as offline.

  • LAYER 8

    I'm following you but I have no idea, maybe try to ping a different IP instead of 8.8.4.4
    maybe the wan1 isp is limiting the pings


  • @kiokoman :

    I used several IPs which are working fine for the other WAN interface.
    I consider dpinger or the widget itself as broken.
    The "problematic" WAN interface is carrying traffic without problems, hence dpinger cannot show 100% paket loss except my modem or anything else in between is filtering out my ping pakets.
    And then it would be a permanent error, but each time I change the gateway settings, it resets and works for a while.

  • LAYER 8

    try to ping the modem or the next hop to see where it stop working


  • @kiokoman
    This is not the point, whatever causes the packet loss, it is not permanent, but dpinger never recovers

  • LAYER 8

    if you restart the service does it start to work again?
    is wan1 dhcp or static ?


  • 2ca08988-c170-4493-89cf-4550db083c55-grafik.png

    15512051-7bc7-4e62-8f10-71452c53872e-grafik.png


  • 251cffa7-3528-43ee-89f4-88045232817c-grafik.png


  • The Ranges of the "FritzBox" Modems are split so that 100-200 are in DHCP Pool, rest is static. So "DHCP=ON" is a bit misleading. Adresses 1-99 are in fact static.

  • LAYER 8

    yeah I see, anyway on pfsense it's set as static IP, I don't understand why dpinger does not recover in your case


  • @kiokoman : good to review, found a copy&paste mistake in the drawing, config is ok.


  • Ok, I guess it is a bug, not a misconfiguration, how to submit a bug?

  • LAYER 8

    you can do it here https://redmine.pfsense.org/
    but maybe there is already a ticket for that, take a look at the list of open bugs before opening a new one

  • LAYER 8 Netgate

    Have you packet captured the ICMP pings on the WAN you think should be up when it is showing as down to see what is really going on?

    If pfSense is sending the echo requests and there is no response, dpinger is doing everything it is supposed to be doing.


  • @Derelict

    Like I wrote above:

    The destination address is always responding, the interface is up and carrying massive traffic.

    I was testing today again, during massive speedtest of my interface, the ping was delayed and for 1-2 seconds the dashboard widget was showing "offline", but recovered soon after.
    My problem here seems to be that the status is getting unpredictable "stuck" showing 100% packet loss forever UNTIL I do any change to any gateway or the gatewaygroup.
    So I have doubts that not sent or filtered ICMP responses are the real cause of this issue.


  • bf7a5059-9808-4991-8198-f6ff4307f9cd-grafik.png

    Here another log example:

    2020/09/25 09:38:37 I fiddled around with gateway settings to trigger the problematic gateway group to recover from OFFLINE that was set 2020/09/25 05:53:50

    2020/09/26 13:46:28 gateway group OFFLINE again

    2020/09/26 15:38:53 manual changing of gateway settings (usually setting default IPv4 gateway from automatic to the problematic gateway and back)

    2020/09/26 20:37:51 gateway group OFFLINE again

    2020/09/27 10:44:08 manual changing of gateway settings

    2020/09/27 13:13:30 gateway group permanently OFFLINE again


  • A question:

    netgate is utilizing the same core code for professional use, right?
    They must experience the same issues, how can it come that related bug descriptions are not fixed for 1 year and longer?

    https://redmine.pfsense.org/issues/9450


  • @Pfosten said in dpinger broken or Dashboard broken or my brain is broken....:

    @Derelict

    Like I wrote above:

    The destination address is always responding, the interface is up and carrying massive traffic.

    I was testing today again, during massive speedtest of my interface, the ping was delayed and for 1-2 seconds the dashboard widget was showing "offline", but recovered soon after.
    My problem here seems to be that the status is getting unpredictable "stuck" showing 100% packet loss forever UNTIL I do any change to any gateway or the gatewaygroup.
    So I have doubts that not sent or filtered ICMP responses are the real cause of this issue.

    That is exactly why he asks you to do a packet capture, so the problem can be narrowed in to either something within pfsense or something external blocking your ICMP traffic.


  • @bobbenheim

    07648a99-1d4a-4ce0-9c67-f89cbdba4d80-grafik.png

    e9ffd22b-274f-4a65-a5fe-51cc20194178-grafik.png

    a5614426-6646-4813-855b-3a246c3efd87-grafik.png

    2fab0c21-e17e-4627-a8dc-ad2a235fb6d0-grafik.png

    On 2020/09/28 09:19:52 I was setting the default IPv4 gateway to WAN_PHY1_IGB0 which is resetting the status shown in the dashboard widget

    Interface was able to carry traffic all the time!

    Several times I pinged 8.8.4.4 as defined for gateway monitoring - always fine.

    2020/09/28 21:59:22 the status shown in the dashboard widget changed to OFFLINE, even after that, interfaces is able to carry traffic by speedtest up to subscribed max + pingtest is fine.


  • Now tested with 2.5.0.a.20201101.1850

    I still get for unknown reasons sometimes partial or full loss for alive-ping at one of the 2 WAN interfaces, but this is not the issue.

    Nov 2 10:37:56 dpinger 16236 WAN_PHY1_IGB0GW 8.8.4.4: Alarm latency 0us stddev 0us loss 100%

    Problem is that this status remains until any change to the gateway group is made - then it works immediately.

    0ac53cc5-6643-4f88-aaaa-2fb01f628c15-grafik.png

    dpinger is not reattempting to reach the defined IP or the process maintaining the operational status is not taking over the changes.


  • Repeating the symptoms and posting screenshots still doesn't get anyone closer an explanation for what is going on in your setup. Make a packet capture on the WAN_PHY1_IGB0GW interface so you can actually determine if the problem is internal or external of pfsense to start with.


  • @bobbenheim

    1st: I just wanted to state that with 2.5.0 I have the same issues
    2nd: If the system is considering the IBG0 OFFLINE while I can ping successfully at the same time the configured 8.8.4.4 (and any other I treid before) by the help of the pfsense ping tool - I am pretty sure that the issue is inside the pfsense.
    3rd: I will setup a external capturing if this helps


  • @Pfosten

    You can make the capture within pfsense:
    Diagnostics > Packet Capture
    You can also limit it to capture icmp traffic to and from 8.8.4.4 so you don't get an unnecessarily large packet capture.

  • LAYER 8 Moderator

    @Pfosten said in dpinger broken or Dashboard broken or my brain is broken....:

    2nd: If the system is considering the IBG0 OFFLINE while I can ping successfully at the same time the configured 8.8.4.4 (and any other I treid before) by the help of the pfsense ping tool - I am pretty sure that the issue is inside the pfsense.

    The system or interface is not offline, it is just reported offline as dpinger sees no responses to its ping. Simple as that. Ping tool does that - ping. Dpinger perhaps uses different settings like a different source IP etc. Also I don't see your routing table as if you have added 8.8.4.4 as a host route for another interface, that gets in the way of dpinger functioning properly.

    E.g. "System / General" -> adding 8.8.4.4 as system DNS on WAN_PHY2 (accidentally) instead of PHY1 would host-route that IP to PHY2 so dpinger trying to check the IP via PHY1 will fail. So a config-mistake is still possible. That's why a packet capture on the physical interface was required to see if dpinger actually sends out pings on THAT interface or if something goes amiss before that.

    Also to mention: we had a few (very few) select cases of this "IP/Gateway not pingable with a FritzBox / AVM Box in front" in the german subsection. Funny story: when removing the FritzBox most users had NO problem anymore - at all. Others (like me) had problems with FritzBox in front and could change to "bridged" mode on LAN2 -> also no problems anymore. Newer AVM boxes/firmwares aren't that bulletproof anymore.

    PS:

    netgate is utilizing the same core code for professional use, right?

    There is NO other version of pfsense or "core code" for different versions etc etc.
    pfSense is the same software on any platform. The version on Netgates own devices is only tweaked/build for those platforms if they aren't x64 (like SG1100/2100/3100 as they are ARM) and otherwise add HW dependent things (like VLAN/Switch configuration if device includes a switching chipset). Otherwise they just add 1-2 small wizards on top but 98-99% are just the same for everyone. No "enterprise", "core" or anything version. Just wanting to clarify for those wondering after that comment.

    Cheers
    \jens


  • @bobbenheim

    ![0_1604392570270_packetcapture 3rd id20994 is dpinger id3317 is pingtool pfsense.cap](Uploading 100%)

    ![0_1604392811671_packetcapture 2nd.cap](Uploading 100%)

    The internal capture does not give the full evidence since I utilize a tool that could be broken.

    In the 2nd trace I have the moment when the responses for a ICMP ping with a certain ID do not arrive any more.

    The 3rd trace is showing the ICMP requests with id20994 with no response for a while and in parallel using the pfsense pinger tool (id 3317) pinging the SAME IP address used by dpinger with responses.

    Dunno how pfsense is internally structured, if it is possible that packets could be discarded or lost by processes before grabbed by tracing tool.

    If not, an external device (fritz box modem or CMTS) would be the next devices that could stop answering on a ICMP job with the same ID after a while.

    To bring that evidence I must trace at fritz box or in-between fritz box and pfsense.

    Could be as well something in cable operator network that is killing such keep-alive sessions after a while.

    Behavior is the same regardless

    • which IP
    • which packed size
    • timings set at pfsense

    At the interface towards the DSL operator I run a (of course different type of) fritz box as modem - without any problem.


  • @JeGr

    Hello

    I recognized in (pfsense) traces that on every change in settings at pfsense, the ICMP request having a new ID, once this ID does not receive answers any more, requests with a new ID sent in parallel are answered.

    There could be something external that is killing based on ICMP job ID and time elapsed such keep-alive traffic. Maybe on purpose. Since it is the same behaviour regardless which IP I use, it can't be the requested host itself. So it is something in VODAFONE/UNITYMEDIA core network or the provided cable modem FritzBox 6591. On all boxes the "stealth mode" is turned of, but that would not hinder ICMP originated from direction Fritzbox towards internet, only the opposite way.


  • I think I have seen similar. For example, my VPN Clients to a VPN Service Provider will all be shown as offline in the dashboard, but I actually can use them. Only restarting the Clients in pfSense will solve that dashboard problem.
    Another problem, not related maybe, if my WAN is going down, which is does often, after some clicks in the webgui of pfSense, the whole GUI can die and I get timeouts or a specific browser error. Routing etc still works fine though.
    These are examples where I have a problem thinking, that pfSense could be run in an enterprise environment. But then, I also love it so much, that I go on. 😉


  • @Pfosten I don't know what you mean by utilizing a broken tool.
    Dpinger is not located on the interface in pfsense.
    Intercepting packets physically in between pfsense and your friztbox is exactly the same as capturing the packets with packet capture in pfsense on the given interface.
    From what you are saying it sounds like pfsense does send ICMP requests but doesn't get an repsonse, but It would be quite a lot easier to see what is going on if you provided a packet capture.


  • I have the exact same problem with 2 installations behind Fritzbox 6591 Cable from Vodafone Germany (Unitymedia).

    Pings to various external IPs stop after a few hours.
    Re-save of gateway (which restarts dpinger) starts it again, until the next time.

    Only monitoring the external IP of the Fritzbox itself seems to stay stable.

    Someone on another forum observed the same thing with a completely different firewall and Fritzbox 6591:

    https://community.ui.com/questions/UDM-Pro-1-7-1-WAN-Failover-false-positives/6d01f376-c1f2-4d73-8291-8772b2e3483f#answer/34ba6dab-d394-4b86-ad43-c7f511b69a18