Gateway Monitoring Errors



  • Just in case anyone else has similar issue . . ..

    I was originally using the two main Google DNS IP's for my GW monitoring (8.8.8.8 and 8.8.4.4), but was frequently seeing timeouts and Up-Down errors on either WAN, even though both were working fine.  Changed the GW Monitor IP for OpenDNS, and all issues went away.

    Pete



  • it seems curious to me… I've been just facing a similar error just a few days ago (https://forum.pfsense.org/index.php?topic=117053.0). I'll try openDNS servers as monitoring IPs only.

    Are 208.67.220.222, 208.67.220.220, 208.67.222.222 and 208.67.222.220 some of the IPs you're using to monitor your gateways?



  • Ironically, I am having similar problems.  For some reason, WAN2 won't let me do ICMP monitoring for ANYTHING other than the WAN2 default gateway IP.  It is incredibly strange.  I know WAN2 works, and that ICMP isn't blocked.  If I drop WAN1, and leave monitoring as default GW on WAN2, the service works, I fail over to WAN2, and I can get out.

    More crazy is that I know ICMP isn't blocked as I am able to ping 8.8.8.8, 8.8.4.4, and 208.67.220.222.

    But for some crazy reason, if I set up a Monitor IP for GW2, no matter what I monitor (other than the default gateway IP) the system always fails ICMP check, and marks the GW offline.

    Seems to have started with last pfsense update to 2.3.2, but I would be lying if I said I KNOW that is when it started.  it is when I noticed it.

    Has anyone else seen this issue?



  • And some more notes …

    When I look in the routing table, I can see my default, plus I can see the two routes for the monitor IPs on my wans.

    default 98.228.39.1 UGS 19866508 1500 igb0
    69.139.173.101 98.228.39.1 UGHS 304986 1500 igb0
    8.8.4.4 192.168.1.1 UGHS 504 1500 igb2

    From the command line, if I try to run dpinger, I simply can't ping that IP out the WAN2 gateway.

    /usr/local/bin/dpinger -f -B 192.168.1.20 8.8.4.4
    send_interval 500ms  loss_interval 2000ms  time_period 60000ms  report_interval 1000ms  data_len 0  alert_interval 1000ms  latency_alarm 0ms  loss_alarm 0%  dest_addr 8.8.4.4  bind_addr 192.168.1.20  identifier ""
    0 0 0
    0 0 0
    0 0 100
    0 0 100
    0 0 100
    0 0 100

    (For reference, the WAN2 setup is connected to a MIFI router, hence the 192.169.1.20 IP on the WAN.  192.168.1.1 is the gateway on the MIFI which I can ping, but I can't ping 8.8.4.4 thru it.)

    Bear in mind, if I failover to WAN2, I am able to ping 8.8.4.4 from my desktop thru pfsense, out WAN2.  So it isn't a case where ICMP doesn't work.  it is something wonkey specific to the WAN2 interface, and pfSense.



  • Perhaps you are one of the few that have issues with zero payload packages.

    You can try adjusting it at the gateway settings page



  • heper's suggestion is a good one. The WAN2 gateway device may have a problem with zero length data payloads for ICMP.

    Mifi as in cellular access point?



  • @heper:

    Perhaps you are one of the few that have issues with zero payload packages.

    You can try adjusting it at the gateway settings page

    OH MY GOD!!!  YOU ARE A GOD SEND!!!!!

    So I first tried "1" for payload size, and it still didnt work.  Bumped it up to 56, and it worked.  Then i checked my windows workstation, and default ICMP size is 32, so I tried 32 and that worked.  Then lowered to 16, and it failed.  So yes, there must be something going on here with that.

    For further clarity, my MIFI solutions is a AT&T LTE MIFI, docked in a ethernet dock, connected to pfSense.

    I have been using this solution for some time now, and it has been working for the better part of a year.  Somewhere along the line, it stopped.  Either AT&T changed something, or maybe the updated dpinger changed things?

    Either way, setting this to 32 fixed my problem!  Thank you so much!  I was pulling my friggin hair out!!!!!



  • Dpinger hasn't changed in this regard since its introduction in 2.3.0. If the upgrade you performed was from 2.2.X or below, then the upgrade involved a change from apinger to dpinger which would result in a change of the ICMP data payload. Failing that, it's something that AT&T changed.



  • The core issue appears to be a defect inside AT&T's cellular network. I have an MiFi which I pulled out to test, and I see the issue as well. I tested an iPhone hotspot on AT&T and it shows the same problem with both LTE and 4G. The smallest data payload acceptable is 20 bytes. I would report the defect to AT&T, but I don't know anyone inside.

    I'd also like to know if the issue exists in Verizon's network, but I don't have a Verizon phone to test with. If someone does, and would like to test, I'd appreciate it. No need to hook the device up to pfSense, you can test from your laptop. Just connect to the hotspot and try to ping.

    Example commands (for a Mac):

    ping -s 0 8.8.8.8
      ping -s 16 8.8.8.8
      ping -s 20 8.8.8.8
      ping -s 56 8.8.8.8

    Thanks.


Log in to reply