Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    dpinger and ISP package loss

    Scheduled Pinned Locked Moved Routing and Multi WAN
    6 Posts 3 Posters 1.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F
      fireix
      last edited by

      I'm trying to find out why Internet connection went down about 3 minutes today, so I can take potential steps to fix it if it is something on my end. I still hoping it was my ISP and waiting for their report, but Pingdom that measures uptime from around the world, didn't record downtime on ISP_IP even though it was for minutes.

      Pingdoms checks to the IP of my pfSense/WAN behind the ISP_GW however detected 100% packet loss at same time.

      My ISP in the data center has provided a fiber-box ISP_IP, so that's my GW to Internet. As you see, it had losses according to dpinger on pfSense against the ISP and it lasted for some minutes (and it happens very rarely, previous time 7 months ago). Could there have been anything on my side (config in pfSense) that could cause these packet loss/downtime? I was sleeping, so didn't actively doing anything at this time, but if there is a sign of something going on with my pfSense, it would be nice to know.

      For example, could it be some internal routing in pfSense that would take down pfSense temporarily or is dpinger pretty reliable in this sense so I can assume the error is on ISP's equipment?

      Oct 31 01:00:36 fw1 dpinger[2268]: GW_WAN_2 ISP_IP: Alarm latency 668us stddev 2892us loss 22%
      Apr 13 10:29:39 fw1 dpinger[34918]: GW_WAN_2 ISP_IP: Alarm latency 2300us stddev 8630us loss 22%

      V 1 Reply Last reply Reply Quote 0
      • V
        viragomann @fireix
        last edited by

        @fireix
        Check also the system log for hints at this time. Maybe the network connection went down temporarily for some reason.

        F 1 Reply Last reply Reply Quote 0
        • F
          fireix @viragomann
          last edited by fireix

          @viragomann Happened again just now at exact same time as yesterday and this time it was actually a crash report waiting for me, pfSense has rebooted. Haven't had error like this for two-three years (then it happened about one time per month) and suddenly started again. Previous time, it was solved by breaking a LACP-lag. Not that the log help me understand anything..

          Have checked the system.logs, ipsec logs, gateway logs and every single log file in the /logs directory and nothing going on the seconds before the crash. It just appeared out of the blue. Has very low traffic... Weird it has happened at exact same time two days ago.

          Fatal trap 12: page fault while in kernel mode
          cpuid = 2; apic id = 04
          fault virtual address = 0x18
          fault code = supervisor read data, page not present
          instruction pointer = 0x20:0xffffffff80e0fcc4
          stack pointer = 0x0:0xfffffe00004d6800
          frame pointer = 0x0:0xfffffe00004d6830
          code segment = base 0x0, limit 0xfffff, type 0x1b
          = DPL 0, pres 1, long 1, def32 0, gran 1
          processor eflags = interrupt enabled, resume, IOPL = 0
          current process = 0 (if_io_tqg_2)
          trap number = 12
          panic: page fault
          cpuid = 2
          time = 1681804483
          KDB: enter: panic

          System logs only have this before it rebooted/crashed:

          Apr 18 08:48:00 fw1 sshguard[24159]: Now monitoring attacks.
          Apr 18 09:04:00 fw1 sshguard[24159]: Exiting on signal.
          Apr 18 09:04:00 fw1 sshguard[39953]: Now monitoring attacks.
          Apr 18 09:19:00 fw1 sshguard[39953]: Exiting on signal.
          Apr 18 09:19:00 fw1 sshguard[27967]: Now monitoring attacks.
          Apr 18 09:34:00 fw1 sshguard[27967]: Exiting on signal.
          Apr 18 09:34:00 fw1 sshguard[13151]: Now monitoring attacks.
          Apr 18 09:36:00 fw1 sshguard[13151]: Exiting on signal.
          Apr 18 09:36:00 fw1 sshguard[38117]: Now monitoring attacks.
          Apr 18 09:50:00 fw1 sshguard[38117]: Exiting on signal.
          Apr 18 09:50:00 fw1 sshguard[50187]: Now monitoring attacks.
          Apr 18 10:09:29 fw1 syslogd: kernel boot file is /boot/kernel/kernel
          Apr 18 10:09:29 fw1 kernel: ---<<BOOT>>---
          Apr 18 10:09:29 fw1 kernel: Copyright (c) 1992-2021 The FreeBSD Project.
          Apr 18 10:09:29 fw1 kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994

          F 1 Reply Last reply Reply Quote 0
          • J
            jaspery
            last edited by

            Not sure problem I had previously is similar to yours, but I'll just share a solution.

            I noticed frequent dpginer restarts in logs at times of heavy network load especially upload.

            I made sure my setup is perfectly and there were no misconfiguration which can cause this. I was also pretty much sure my ISP gateway is not actually going down.

            So my hypothesis was that under heavy load dpinger couldn't just ping a gateway, because ping traffic was stuck in a queues.

            I didn't want to disable pinging gateway ultimately. On the other hand I learned about latency threshold configuration on gateway screen in WebGui (System / Routing / <My Gateway>.

            Basically dpinger issues a warning in log when it sees that ping's delay reaches "Low Threshold" value, and when delay reaches "High Threshold" it decides to restart itself, firewall, and who knows what other services. Apparently Internet connection failure was caused in my case by these restarts.

            I experimented with different threshold setups for my environment,
            and finally came up with Low=600 and High=900. I haven't seen Internet failures since then (more that a week, previously as I said it could be few times a day during heavy load).

            F 1 Reply Last reply Reply Quote 0
            • F
              fireix @fireix
              last edited by

              This IPv6 issue shown just a month ago solved in bug tracker actually have same fault trap 12. Anyone know that this could be related to what I see or it can be something totally different? I reported it as possible bug with all logs, but rejected because I wasn't on latest dev-build.

              https://redmine.pfsense.org/issues/14077

              1 Reply Last reply Reply Quote 0
              • F
                fireix @jaspery
                last edited by

                @jaspery Based on my 2nd episode with crash, I suspect it was crash that caused my dpinger to fail (in this case).

                1 Reply Last reply Reply Quote 1
                • First post
                  Last post
                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.