Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Upstream unreachable but no ISP connection loss?

    General pfSense Questions
    2
    5
    430
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A
      Andargor
      last edited by

      Hello,

      For several weeks, we have had intermittent connection loss several times per day from remote users connecting to an internal server through our pfSense 2.4.2 firewall. There would be several users connected at once, and they would all drop at the same time. LAN users would not be disconnected from the server. We have on the internal network a Nagios monitor which reports that the ISP gateway is unreachable at the same moment, for example (in reverse order, with recovery on the next poll):

      Host Up[01-24-2018 05:15:21] HOST ALERT: Amazon-West;UP;SOFT;2;PING OK - Packet loss = 0%, RTA = 90.51 ms
      Host Up[01-24-2018 05:15:11] HOST ALERT: Amazon-East;UP;SOFT;3;PING OK - Packet loss = 0%, RTA = 24.22 ms
      Host Up[01-24-2018 05:15:11] HOST ALERT: videotron-gw;UP;SOFT;2;PING OK - Packet loss = 0%, RTA = 0.83 ms
      Service Ok[01-24-2018 05:15:01] SERVICE ALERT: videotron-gw;PING-ISP;OK;SOFT;2;PING OK - Packet loss = 0%, RTA = 0.83 ms
      Host Unreachable[01-24-2018 05:14:21] HOST ALERT: Amazon-West;UNREACHABLE;SOFT;1;CRITICAL - Host Unreachable
      Host Unreachable[01-24-2018 05:14:21] HOST ALERT: Amazon-East;UNREACHABLE;SOFT;2;CRITICAL - Host Unreachable
      Host Down[01-24-2018 05:14:11] HOST ALERT: videotron-gw;DOWN;SOFT;1;CRITICAL - Host Unreachable
      Service Critical[01-24-2018 05:14:01] SERVICE ALERT: videotron-gw;PING-ISP;CRITICAL;SOFT;1;CRITICAL - Host Unreachable
      Host Down[01-24-2018 05:14:01] HOST ALERT: Amazon-East;DOWN;SOFT;1;CRITICAL - Host Unreachable

      Amazon-East/West are our datacenters, and videotron-gw is the ISP gateway. The firewall address is fixed IPv4, with a static default gateway.

      Initially, we thought it might be the ISP, which we investigated. However, looking at the pfSense monitoring (Status > Monitoring), there are no Quality issues reported. For example, for the same timeline as above (WANGW is videotron-gw):

      Nagios does not report internal loss of connectivity to the firewall, which eliminates LAN issues. There is no other activity on the firewall at the times of the connection losses, and they occur at different times during the day. There are no interface errors, and traffic is light. The system CPU, memory and states are normal. Our only conclusion is that the pfSense itself is intermittently blocking traffic for unknown reasons.

      Does anyone have an idea how to resolve this or is this a bug?

      Here are more graphs from Monitoring, for the same timeline as above:

      1 Reply Last reply Reply Quote 0
      • H
        Harvy66
        last edited by

        Just making sure I'm reading this correctly. You said

        However, looking at the pfSense monitoring (Status > Monitoring), there are no Quality issues reported.

        then immediately after have a quality graph showing what looks like 100% packetloss around the time of the error log.

        How is 100% loss not a quality issue?

        1 Reply Last reply Reply Quote 0
        • A
          Andargor
          last edited by

          @Harvy66:

          Just making sure I'm reading this correctly. You said

          However, looking at the pfSense monitoring (Status > Monitoring), there are no Quality issues reported.

          then immediately after have a quality graph showing what looks like 100% packetloss around the time of the error log.

          How is 100% loss not a quality issue?

          The Nagios alert is at 5:14, the Quality graph shows a drop at 5:50, both system clocks are synchronized. Unless the monitoring app is bugged and showing the wrong time?

          1 Reply Last reply Reply Quote 0
          • A
            Andargor
            last edited by

            Looking at the system logs more closely, I am seeing a link down event at that time, strange!

            Jan 24 10:14:08 kernel re1: link state changed to DOWN

            (Note: the time in monitoring is local time, EST, in the system log it's UTC, 5:14 EST = 10:14 UTC)

            The firewall is connected to an ISP switch, to which the ISP's cable modem is also connected.

            I've swapped cables and ports, and will monitor what happens.

            1 Reply Last reply Reply Quote 0
            • A
              Andargor
              last edited by

              @Harvy66:

              Just making sure I'm reading this correctly. You said

              However, looking at the pfSense monitoring (Status > Monitoring), there are no Quality issues reported.

              then immediately after have a quality graph showing what looks like 100% packetloss around the time of the error log.

              How is 100% loss not a quality issue?

              Argh, the monitor shows local time at the bottom, but the times on the graph are UTC! I was confused on the times there. Here's the correct graph, and yes it seems the local link to the ISP went down. Narrowing the possibilities…

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.