Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Hourly WAN packet loss after updating to pfSense+ 23

    Scheduled Pinned Locked Moved General pfSense Questions
    23.05 upgradepfsense+packet losshourlyntopng
    14 Posts 3 Posters 1.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      What do you see in the main system log at that time when the loss starts?

      D 1 Reply Last reply Reply Quote 0
      • D
        drueter @stephenw10
        last edited by

        @stephenw10 I see the warning alarm, followed by the alarm...and then gateway reconfiguration-related messages.

        Here is a sample containing everything from system.log from one of these events. Note that most log entries seem to be in UTC, but some are in local time (-7) (SHGuard, xxinetd, and rc.gateway_alarm). This is a bit confusing...but what I've pasted below is actually contiguous.

        In this excerpt the first three entries are normal / all is well. The 4th entry at "9/26/2023 11:12" (aka "9/26/2023 18:12") is where the outage begins.

        9/26/2023 17:13	pfMyRouter	php-fpm[62804]: /rc.dyndns.update: phpDynDNS (my.ddnsdomain.com): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
        9/26/2023 10:36	pfMyRouter	sshguard[34957]: Exiting on signal.
        9/26/2023 10:36	pfMyRouter	sshguard[22849]: Now monitoring attacks.
        9/26/2023 11:12	pfMyRouter	rc.gateway_alarm[42912]: >>> Gateway alarm: WAN_Cox (Addr:1.2.3.4 Alarm:1 RTT:9.331ms RTTsd:3.164ms Loss:22%)
        9/26/2023 18:12	pfMyRouter	check_reload_status[348]: updating dyndns WAN_Cox
        9/26/2023 18:12	pfMyRouter	check_reload_status[348]: Restarting IPsec tunnels
        9/26/2023 18:12	pfMyRouter	check_reload_status[348]: Restarting OpenVPN tunnels/interfaces
        9/26/2023 18:12	pfMyRouter	check_reload_status[348]: Reloading filter
        9/26/2023 18:12	pfMyRouter	php-fpm[51310]: /rc.openvpn: MONITOR: WAN_Cox has packet loss, omitting from routing group InternetWANGroup
        9/26/2023 18:12	pfMyRouter	php-fpm[51310]: 1.2.3.4|1.2.3.478|WAN_Cox|9.52ms|3.521ms|23%|down|highloss
        9/26/2023 18:12	pfMyRouter	php-fpm[51310]: /rc.openvpn: Gateway, switch to: WANBackupGW
        9/26/2023 18:12	pfMyRouter	php-fpm[51310]: /rc.openvpn: Default gateway setting  as default.
        9/26/2023 18:12	pfMyRouter	php-fpm[71632]: /rc.filter_configure_sync: Gateway, switch to: WANBackupGW
        9/26/2023 18:12	pfMyRouter	php-fpm[10059]: /rc.dyndns.update: Gateway, switch to: WANBackupGW
        9/26/2023 18:12	pfMyRouter	php-fpm[51310]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use WAN_Cox.
        9/26/2023 11:12	pfMyRouter	xinetd[39386]: Starting reconfiguration
        9/26/2023 11:12	pfMyRouter	xinetd[39386]: Swapping defaults
        9/26/2023 11:12	pfMyRouter	xinetd[39386]: readjusting service 21000-tcp
        9/26/2023 11:12	pfMyRouter	xinetd[39386]: readjusting service 21001-tcp
        9/26/2023 11:12	pfMyRouter	xinetd[39386]: Reconfigured: new=0 old=2 dropped=0 (services)
        9/26/2023 18:12	pfMyRouter	php-fpm[10059]: /rc.dyndns.update: phpDynDNS (my2.ddnsdomain.com: No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
        9/26/2023 18:12	pfMyRouter	php-fpm[10059]: /rc.dyndns.update: phpDynDNS (my.ddnsdomain.com): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
        9/26/2023 11:13	pfMyRouter	rc.gateway_alarm[29520]: >>> Gateway alarm: WAN_Cox (Addr:1.2.3.4 Alarm:0 RTT:9.968ms RTTsd:3.869ms Loss:6%)
        9/26/2023 18:13	pfMyRouter	check_reload_status[348]: updating dyndns WAN_Cox
        9/26/2023 18:13	pfMyRouter	check_reload_status[348]: Restarting IPsec tunnels
        9/26/2023 18:13	pfMyRouter	check_reload_status[348]: Restarting OpenVPN tunnels/interfaces
        9/26/2023 18:13	pfMyRouter	check_reload_status[348]: Reloading filter
        9/26/2023 18:13	pfMyRouter	php-fpm[62804]: /rc.openvpn: MONITOR: WAN_Cox is available now, adding to routing group InternetWANGroup
        9/26/2023 18:13	pfMyRouter	php-fpm[62804]: 1.2.3.4|1.2.3.478|WAN_Cox|9.829ms|3.634ms|5%|online|none
        9/26/2023 18:13	pfMyRouter	php-fpm[62804]: /rc.openvpn: Gateway, switch to: WAN_Cox
        9/26/2023 18:13	pfMyRouter	php-fpm[62804]: /rc.openvpn: Default gateway setting  as default.
        9/26/2023 18:13	pfMyRouter	php-fpm[286]: /rc.dyndns.update: Gateway, switch to: WAN_Cox
        9/26/2023 18:13	pfMyRouter	php-fpm[287]: /rc.filter_configure_sync: Gateway, switch to: WAN_Cox
        9/26/2023 18:13	pfMyRouter	php-fpm[62804]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use WAN_Cox.
        9/26/2023 11:13	pfMyRouter	xinetd[39386]: Starting reconfiguration
        9/26/2023 11:13	pfMyRouter	xinetd[39386]: Swapping defaults
        9/26/2023 11:13	pfMyRouter	xinetd[39386]: readjusting service 21000-tcp
        9/26/2023 11:13	pfMyRouter	xinetd[39386]: readjusting service 21001-tcp
        9/26/2023 11:13	pfMyRouter	xinetd[39386]: Reconfigured: new=0 old=2 dropped=0 (services)
        9/26/2023 18:13	pfMyRouter	php-fpm[286]: /rc.dyndns.update: phpDynDNS (my2.ddnsdomain.com: No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
        9/26/2023 18:13	pfMyRouter	php-fpm[286]: /rc.dyndns.update: phpDynDNS (my.ddnsdomain.com): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
        
        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Hmm, well the timestamp difference is odd. But that aside there doesn't look to be anything significant there. The gateway shows loss and triggers the other events.

          Assuming 1.2.3.4 there is your real gateway address you should change the monitoring IP to something upstream like 8.8.8.8 or 1.1.1.1. It could be simply that the gateway loaded and drops pings.

          Steve

          D 1 Reply Last reply Reply Quote 0
          • D
            drueter @stephenw10
            last edited by

            @stephenw10 Thanks for the suggestion, but:

            • The problem persists even with monitoring disabled
            • The problem results in actual packet loss to workstations: it is not just a monitoring problem
            • The problem persists even with a fixed WAN IP address (I thought it might be related to DHCP renewals.)
            • Looking at additional graphs, there is no evidence of CPU or other load increase at the times the problem ocurrs.

            I am out of ideas at this point and intend to re-install version 22.05. I will report back to confirm the status of the problem after that downgrade.

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              There is no increase in traffic when this happens?

              D 1 Reply Last reply Reply Quote 0
              • D
                drueter @stephenw10
                last edited by

                @stephenw10 That is correct: there is no change in network traffic.

                FYI the network nearly always has very light traffic.

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  When using an external IP do you still see the same level of packet loss? Do you also see latency increases?

                  Otherwise I would try to capture the ping packets when this happens and see if there are actually missing packets on the wire.

                  Are you using an igb or ix NIC for WAN? Try re-assigning it to the other type if you can.

                  1 Reply Last reply Reply Quote 0
                  • D
                    drueter
                    last edited by

                    I am starting to think this may be a wild goose chase after all: this may be an ISP problem. It could in fact be that the problem was occurring prior to the firmware upgrade and went unnoticed, and that monitoring was not enabled. (I thought/assumed that monitoring had been enabled the whole time, but I can't say for sure.)

                    I reverted back to 22.05 and have observed the same behavior: dropped packets each hour.

                    Yes, the packet loss is real...going through the pfSense (observable from a workstation). I have not yet been able to confirm packet loss is real without the pfSense in the mix.

                    WAN is on igb0 and LAN is on igb1. I could try moving it, but igb0 is the default. Is there a reason why igb versus ix would make a difference?

                    Sep 28 09:16:44 pMyRouter rc.gateway_alarm[33556]: >>> Gateway alarm: WAN_DHCP (Addr:1.2.3.4 Alarm:1 RTT:10.000ms RTTsd:2.329ms Loss:21%)
                    Sep 28 09:18:34 pMyRouter rc.gateway_alarm[18934]: >>> Gateway alarm: WAN_DHCP (Addr:1.2.3.4 Alarm:0 RTT:10.422ms RTTsd:3.664ms Loss:6%)
                    Sep 28 10:16:52 pMyRouter rc.gateway_alarm[58429]: >>> Gateway alarm: WAN_DHCP (Addr:1.2.3.4 Alarm:1 RTT:8.861ms RTTsd:1.637ms Loss:21%)
                    Sep 28 10:18:25 pMyRouter rc.gateway_alarm[61365]: >>> Gateway alarm: WAN_DHCP (Addr:1.2.3.4 Alarm:0 RTT:9.553ms RTTsd:3.310ms Loss:5%)
                    Sep 28 11:17:05 pMyRouter rc.gateway_alarm[67044]: >>> Gateway alarm: WAN_DHCP (Addr:1.2.3.4 Alarm:1 RTT:8.698ms RTTsd:2.920ms Loss:22%)
                    Sep 28 11:18:11 pMyRouter rc.gateway_alarm[41078]: >>> Gateway alarm: WAN_DHCP (Addr:1.2.3.4 Alarm:0 RTT:9.716ms RTTsd:5.033ms Loss:5%)
                    

                    It's kind of a strange problem (with the hourly recurrence). I'll keep working on getting to the bottom of this and will report back.

                    S 1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Yeah if it's the same in 22.05 it's probably not a NIC/driver issue. However on the 5100 it can be worth trying. The igb NICs are separate PCIe devices. The ix NICs are part of the SoC.

                      1 Reply Last reply Reply Quote 0
                      • S
                        swixo @drueter
                        last edited by

                        @drueter Did you ever confirm this? I am troubleshooting a very similar issue - Hourly Gateway Alarms, knocking things offline.
                        I have zero leads on the issue, so looking for hints...

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Nothing logged, just packet loss? Monitoring the gateway IP(s)? Same hardware?

                          S 1 Reply Last reply Reply Quote 0
                          • S
                            swixo @stephenw10
                            last edited by

                            @stephenw10 My case is somewhat different - but I am looking for clues. The gateway is an openvpn gateway, the wan does not appear to go down just the openvpn link.

                            What does happen is when the gateway alarm goes off, the DNS lookups that go across the VPN to the remote DNS immediately fail, and so my nagios probes detect the name resolution failure right away and report.

                            The pfSense logs show ">>> Gateway alarm: ... " at about 1 hour intervals, right when the DNS lookups start failing. Last about 2-3 minutes maximum. I suspected that could be when vpn re-keying is taking place?

                            Last night after I left my comment I tried clearing the "Fast UDP" feature in openvpn, and the issue has stopped. (For now). I'll see if it stays working.

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              Hmm, odd. I wouldn't expect that. I have openvpn tunnels that have been up without issue for weeks.

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.