• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Hourly WAN packet loss after updating to pfSense+ 23

General pfSense Questions
23.05 upgrade pfsense+ packet loss hourly ntopng
3
14
1.2k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D
    drueter
    last edited by Sep 25, 2023, 9:15 PM

    I had been running pfSense+ 22.05 on my Netgate SG-5100 and everything was fine. Now on 23.05.1 I experience WAN packet loss for about 30-60 seconds nearly every hour at 4 to 7 minutes past the hour.

    Monitor Graph

    You see the packet loss in blue begin ("out of the blue" as they say...sorry...). You see utilization in orange increase due to ntopng running...which began with the update occurred. You see utilization drop back down when ntopng was removed. But packet loss continues.

    Curiously packet loss began some hours (maybe a day or so) after the update and ntopng was installed.

    I was tight on disk space, and wondered if log rotation was a factor. I installed an M.2 SATA drive and re-installed pfSense+ (from an image provided by support), and then restored my config backup. This all worked fine...but the packet loss still continues.

    It is possible (but unlikely) that my stable internet connection developed a problem concurrent with the pfSense update. I tend to suspect something going on in my router to cause the problem.

    I adjusted my gateway monitoring thresholds to minimize failing over to the backup WAN since it seems that dpinger and rc.newwanip resets were causing more problems than the brief packet loss itself.

    Here is longer more detailed background:

    I had previously run ntopng, but had not been using that for quite some time.

    A couple weeks ago I wanted to try ntopng. The package was out of date. I encountered errors when I tried to update it. So I decided to update to pfSense+ 23.05.1 I performed this upgrade without a problem, then upgraded the ntopng package, and everything worked fine.

    Except...I noticed periodic WAN disconnections that had not occurred prior to the update. As I gathered further history, I noticed that the outages were brief--approx. 60 seconds, and would occur at about 4 minutes past the hour,..most hours (but not every).

    I then began wondering if ntopng was causing the problem (due to disk activity, cpu load, log rotation, or similar), so I remove the package. The frequent packet loss (hourly, for less than a minute) continued.

    Is there any change that ntopng would have made that could explain the problem? Or any known problem in ver. 23 that could shed some light?

    I suppose as a diagnostic I could obtain a ver. 22 image from support and reinstall that. And/or I could reinstall and apply an old config backup (or even hand-configure if needed)...in case something in the config backup brings about the problem on a fresh install.

    Thanks for your help and suggestions

    1 Reply Last reply Reply Quote 0
    • S
      stephenw10 Netgate Administrator
      last edited by Sep 25, 2023, 9:41 PM

      What do you see in the main system log at that time when the loss starts?

      D 1 Reply Last reply Sep 27, 2023, 2:13 PM Reply Quote 0
      • D
        drueter @stephenw10
        last edited by Sep 27, 2023, 2:13 PM

        @stephenw10 I see the warning alarm, followed by the alarm...and then gateway reconfiguration-related messages.

        Here is a sample containing everything from system.log from one of these events. Note that most log entries seem to be in UTC, but some are in local time (-7) (SHGuard, xxinetd, and rc.gateway_alarm). This is a bit confusing...but what I've pasted below is actually contiguous.

        In this excerpt the first three entries are normal / all is well. The 4th entry at "9/26/2023 11:12" (aka "9/26/2023 18:12") is where the outage begins.

        9/26/2023 17:13	pfMyRouter	php-fpm[62804]: /rc.dyndns.update: phpDynDNS (my.ddnsdomain.com): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
        9/26/2023 10:36	pfMyRouter	sshguard[34957]: Exiting on signal.
        9/26/2023 10:36	pfMyRouter	sshguard[22849]: Now monitoring attacks.
        9/26/2023 11:12	pfMyRouter	rc.gateway_alarm[42912]: >>> Gateway alarm: WAN_Cox (Addr:1.2.3.4 Alarm:1 RTT:9.331ms RTTsd:3.164ms Loss:22%)
        9/26/2023 18:12	pfMyRouter	check_reload_status[348]: updating dyndns WAN_Cox
        9/26/2023 18:12	pfMyRouter	check_reload_status[348]: Restarting IPsec tunnels
        9/26/2023 18:12	pfMyRouter	check_reload_status[348]: Restarting OpenVPN tunnels/interfaces
        9/26/2023 18:12	pfMyRouter	check_reload_status[348]: Reloading filter
        9/26/2023 18:12	pfMyRouter	php-fpm[51310]: /rc.openvpn: MONITOR: WAN_Cox has packet loss, omitting from routing group InternetWANGroup
        9/26/2023 18:12	pfMyRouter	php-fpm[51310]: 1.2.3.4|1.2.3.478|WAN_Cox|9.52ms|3.521ms|23%|down|highloss
        9/26/2023 18:12	pfMyRouter	php-fpm[51310]: /rc.openvpn: Gateway, switch to: WANBackupGW
        9/26/2023 18:12	pfMyRouter	php-fpm[51310]: /rc.openvpn: Default gateway setting  as default.
        9/26/2023 18:12	pfMyRouter	php-fpm[71632]: /rc.filter_configure_sync: Gateway, switch to: WANBackupGW
        9/26/2023 18:12	pfMyRouter	php-fpm[10059]: /rc.dyndns.update: Gateway, switch to: WANBackupGW
        9/26/2023 18:12	pfMyRouter	php-fpm[51310]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use WAN_Cox.
        9/26/2023 11:12	pfMyRouter	xinetd[39386]: Starting reconfiguration
        9/26/2023 11:12	pfMyRouter	xinetd[39386]: Swapping defaults
        9/26/2023 11:12	pfMyRouter	xinetd[39386]: readjusting service 21000-tcp
        9/26/2023 11:12	pfMyRouter	xinetd[39386]: readjusting service 21001-tcp
        9/26/2023 11:12	pfMyRouter	xinetd[39386]: Reconfigured: new=0 old=2 dropped=0 (services)
        9/26/2023 18:12	pfMyRouter	php-fpm[10059]: /rc.dyndns.update: phpDynDNS (my2.ddnsdomain.com: No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
        9/26/2023 18:12	pfMyRouter	php-fpm[10059]: /rc.dyndns.update: phpDynDNS (my.ddnsdomain.com): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
        9/26/2023 11:13	pfMyRouter	rc.gateway_alarm[29520]: >>> Gateway alarm: WAN_Cox (Addr:1.2.3.4 Alarm:0 RTT:9.968ms RTTsd:3.869ms Loss:6%)
        9/26/2023 18:13	pfMyRouter	check_reload_status[348]: updating dyndns WAN_Cox
        9/26/2023 18:13	pfMyRouter	check_reload_status[348]: Restarting IPsec tunnels
        9/26/2023 18:13	pfMyRouter	check_reload_status[348]: Restarting OpenVPN tunnels/interfaces
        9/26/2023 18:13	pfMyRouter	check_reload_status[348]: Reloading filter
        9/26/2023 18:13	pfMyRouter	php-fpm[62804]: /rc.openvpn: MONITOR: WAN_Cox is available now, adding to routing group InternetWANGroup
        9/26/2023 18:13	pfMyRouter	php-fpm[62804]: 1.2.3.4|1.2.3.478|WAN_Cox|9.829ms|3.634ms|5%|online|none
        9/26/2023 18:13	pfMyRouter	php-fpm[62804]: /rc.openvpn: Gateway, switch to: WAN_Cox
        9/26/2023 18:13	pfMyRouter	php-fpm[62804]: /rc.openvpn: Default gateway setting  as default.
        9/26/2023 18:13	pfMyRouter	php-fpm[286]: /rc.dyndns.update: Gateway, switch to: WAN_Cox
        9/26/2023 18:13	pfMyRouter	php-fpm[287]: /rc.filter_configure_sync: Gateway, switch to: WAN_Cox
        9/26/2023 18:13	pfMyRouter	php-fpm[62804]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use WAN_Cox.
        9/26/2023 11:13	pfMyRouter	xinetd[39386]: Starting reconfiguration
        9/26/2023 11:13	pfMyRouter	xinetd[39386]: Swapping defaults
        9/26/2023 11:13	pfMyRouter	xinetd[39386]: readjusting service 21000-tcp
        9/26/2023 11:13	pfMyRouter	xinetd[39386]: readjusting service 21001-tcp
        9/26/2023 11:13	pfMyRouter	xinetd[39386]: Reconfigured: new=0 old=2 dropped=0 (services)
        9/26/2023 18:13	pfMyRouter	php-fpm[286]: /rc.dyndns.update: phpDynDNS (my2.ddnsdomain.com: No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
        9/26/2023 18:13	pfMyRouter	php-fpm[286]: /rc.dyndns.update: phpDynDNS (my.ddnsdomain.com): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
        
        1 Reply Last reply Reply Quote 0
        • S
          stephenw10 Netgate Administrator
          last edited by Sep 27, 2023, 4:14 PM

          Hmm, well the timestamp difference is odd. But that aside there doesn't look to be anything significant there. The gateway shows loss and triggers the other events.

          Assuming 1.2.3.4 there is your real gateway address you should change the monitoring IP to something upstream like 8.8.8.8 or 1.1.1.1. It could be simply that the gateway loaded and drops pings.

          Steve

          D 1 Reply Last reply Sep 27, 2023, 9:20 PM Reply Quote 0
          • D
            drueter @stephenw10
            last edited by Sep 27, 2023, 9:20 PM

            @stephenw10 Thanks for the suggestion, but:

            • The problem persists even with monitoring disabled
            • The problem results in actual packet loss to workstations: it is not just a monitoring problem
            • The problem persists even with a fixed WAN IP address (I thought it might be related to DHCP renewals.)
            • Looking at additional graphs, there is no evidence of CPU or other load increase at the times the problem ocurrs.

            I am out of ideas at this point and intend to re-install version 22.05. I will report back to confirm the status of the problem after that downgrade.

            1 Reply Last reply Reply Quote 0
            • S
              stephenw10 Netgate Administrator
              last edited by Sep 27, 2023, 10:49 PM

              There is no increase in traffic when this happens?

              D 1 Reply Last reply Sep 28, 2023, 12:57 PM Reply Quote 0
              • D
                drueter @stephenw10
                last edited by Sep 28, 2023, 12:57 PM

                @stephenw10 That is correct: there is no change in network traffic.

                FYI the network nearly always has very light traffic.

                1 Reply Last reply Reply Quote 0
                • S
                  stephenw10 Netgate Administrator
                  last edited by Sep 28, 2023, 1:38 PM

                  When using an external IP do you still see the same level of packet loss? Do you also see latency increases?

                  Otherwise I would try to capture the ping packets when this happens and see if there are actually missing packets on the wire.

                  Are you using an igb or ix NIC for WAN? Try re-assigning it to the other type if you can.

                  1 Reply Last reply Reply Quote 0
                  • D
                    drueter
                    last edited by Sep 28, 2023, 7:37 PM

                    I am starting to think this may be a wild goose chase after all: this may be an ISP problem. It could in fact be that the problem was occurring prior to the firmware upgrade and went unnoticed, and that monitoring was not enabled. (I thought/assumed that monitoring had been enabled the whole time, but I can't say for sure.)

                    I reverted back to 22.05 and have observed the same behavior: dropped packets each hour.

                    Yes, the packet loss is real...going through the pfSense (observable from a workstation). I have not yet been able to confirm packet loss is real without the pfSense in the mix.

                    WAN is on igb0 and LAN is on igb1. I could try moving it, but igb0 is the default. Is there a reason why igb versus ix would make a difference?

                    Sep 28 09:16:44 pMyRouter rc.gateway_alarm[33556]: >>> Gateway alarm: WAN_DHCP (Addr:1.2.3.4 Alarm:1 RTT:10.000ms RTTsd:2.329ms Loss:21%)
                    Sep 28 09:18:34 pMyRouter rc.gateway_alarm[18934]: >>> Gateway alarm: WAN_DHCP (Addr:1.2.3.4 Alarm:0 RTT:10.422ms RTTsd:3.664ms Loss:6%)
                    Sep 28 10:16:52 pMyRouter rc.gateway_alarm[58429]: >>> Gateway alarm: WAN_DHCP (Addr:1.2.3.4 Alarm:1 RTT:8.861ms RTTsd:1.637ms Loss:21%)
                    Sep 28 10:18:25 pMyRouter rc.gateway_alarm[61365]: >>> Gateway alarm: WAN_DHCP (Addr:1.2.3.4 Alarm:0 RTT:9.553ms RTTsd:3.310ms Loss:5%)
                    Sep 28 11:17:05 pMyRouter rc.gateway_alarm[67044]: >>> Gateway alarm: WAN_DHCP (Addr:1.2.3.4 Alarm:1 RTT:8.698ms RTTsd:2.920ms Loss:22%)
                    Sep 28 11:18:11 pMyRouter rc.gateway_alarm[41078]: >>> Gateway alarm: WAN_DHCP (Addr:1.2.3.4 Alarm:0 RTT:9.716ms RTTsd:5.033ms Loss:5%)
                    

                    It's kind of a strange problem (with the hourly recurrence). I'll keep working on getting to the bottom of this and will report back.

                    S 1 Reply Last reply Oct 22, 2023, 5:03 AM Reply Quote 0
                    • S
                      stephenw10 Netgate Administrator
                      last edited by Sep 28, 2023, 9:58 PM

                      Yeah if it's the same in 22.05 it's probably not a NIC/driver issue. However on the 5100 it can be worth trying. The igb NICs are separate PCIe devices. The ix NICs are part of the SoC.

                      1 Reply Last reply Reply Quote 0
                      • S
                        swixo @drueter
                        last edited by Oct 22, 2023, 5:03 AM

                        @drueter Did you ever confirm this? I am troubleshooting a very similar issue - Hourly Gateway Alarms, knocking things offline.
                        I have zero leads on the issue, so looking for hints...

                        1 Reply Last reply Reply Quote 0
                        • S
                          stephenw10 Netgate Administrator
                          last edited by Oct 22, 2023, 1:06 PM

                          Nothing logged, just packet loss? Monitoring the gateway IP(s)? Same hardware?

                          S 1 Reply Last reply Oct 22, 2023, 4:23 PM Reply Quote 0
                          • S
                            swixo @stephenw10
                            last edited by Oct 22, 2023, 4:23 PM

                            @stephenw10 My case is somewhat different - but I am looking for clues. The gateway is an openvpn gateway, the wan does not appear to go down just the openvpn link.

                            What does happen is when the gateway alarm goes off, the DNS lookups that go across the VPN to the remote DNS immediately fail, and so my nagios probes detect the name resolution failure right away and report.

                            The pfSense logs show ">>> Gateway alarm: ... " at about 1 hour intervals, right when the DNS lookups start failing. Last about 2-3 minutes maximum. I suspected that could be when vpn re-keying is taking place?

                            Last night after I left my comment I tried clearing the "Fast UDP" feature in openvpn, and the issue has stopped. (For now). I'll see if it stays working.

                            1 Reply Last reply Reply Quote 0
                            • S
                              stephenw10 Netgate Administrator
                              last edited by Oct 22, 2023, 6:25 PM

                              Hmm, odd. I wouldn't expect that. I have openvpn tunnels that have been up without issue for weeks.

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.