Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    WAN connection dropping intermittently

    Scheduled Pinned Locked Moved General pfSense Questions
    22 Posts 3 Posters 1.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • GertjanG
      Gertjan @alexnovice
      last edited by Gertjan

      @alexnovice

      In this case, as your WAN is a Ethernet cable connection, place a switch on the WAN side of pfSense.
      3 cables :
      1 is the original ISP WAN connection.
      1 goes to the WAN of pfSense
      1 goes to a PC that use use for monitoring.

      As soon as the traffic stops flowing through the 6100 from the WAN, and back, check with the monitoring PC if it shows the same behavior, or not.

      A bad connection can be more as a bad cable connection to the uplink ISP equipment.
      ISPs tend to have more the one client ^^
      If upstream (ISP) devices are saturated, your data throughput will suffer. And no, none of us never saw an ISP admitting that their networks were unusable ones in a while. That just can't happen ;)
      The pfSense monitoring tool, dpinger, sends ICMP packets (ping packets). These are always low priority, and the first to 'vanish' if some upstream device is under load.

      No "help me" PM's please. Use the forum, the community will thank you.
      Edit : and where are the logs ??

      A 1 Reply Last reply Reply Quote 0
      • A
        alexnovice @Gertjan
        last edited by

        @Gertjan

        Thanks Gertjan!

        I will try that (at work now, so has to be later) but it strikes me as unlikely - it's not just ICMP packets that get dropped: all traffic stops, including video calls, youtube streaming, or simply loading a webpage.

        This happens every day, and several times with the pfsense box, while the other devices went for days with no interruption.

        I should probably add that I've tried using different ports for the WAN interface with no change in behaviour, so I don't think it's a hardware issue.

        Best,
        Alex

        GertjanG 1 Reply Last reply Reply Quote 0
        • GertjanG
          Gertjan @alexnovice
          last edited by

          @alexnovice

          Using myself a 4100 with 24.03 behind a ISP router, 1 Gbit fiber connection.
          I couldn't find one ICMP packets lost over the last ... 6 month ?

          I used other hardware ("barebone") solution before I got my 4100, but never has strange outages.
          Broke the connection because I messed up 'something' ? Yes, that has happened.

          A LAN interface is like a WAN interface, why would it fail ?
          If doubt, take another port ^^ a 6100 has 6 NIC's, right ?

          No "help me" PM's please. Use the forum, the community will thank you.
          Edit : and where are the logs ??

          A 1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Your latency looks pretty low, are you still monitoring the gateway IP directly?

            The first thing I would do is set the monitoring IP to something external.

            1 Reply Last reply Reply Quote 0
            • A
              alexnovice @Gertjan
              last edited by

              Good morning, and thanks for the suggestions!

              When I got home I looked at the logs which showed a number of drops during the last 24-36 hours, including the two drops I showed in my original post:
              2024-11-08 21:30:11.582551+01:00 dpinger 32387 WANGW 194.xxx.yyy.3: Clear latency 6351us stddev 8741us loss 5%
              2024-11-08 21:28:51.537529+01:00 dpinger 32387 WANGW 194.xxx.yyy.3: Alarm latency 2441us stddev 3325us loss 22%
              2024-11-08 17:30:58.777260+01:00 dpinger 32387 WANGW 194.xxx.yyy.3: Clear latency 7165us stddev 8798us loss 5%
              2024-11-08 17:29:51.771257+01:00 dpinger 32387 WANGW 194.xxx.yyy.3: Alarm latency 5889us stddev 7967us loss 22%
              2024-11-08 16:31:10.301016+01:00 dpinger 32387 WANGW 194.xxx.yyy.3: Clear latency 5627us stddev 7828us loss 5%
              2024-11-08 16:29:52.558738+01:00 dpinger 32387 WANGW 194.xxx.yyy.3: Alarm latency 3908us stddev 4204us loss 22%
              2024-11-08 12:31:58.081936+01:00 dpinger 32387 WANGW 194.xxx.yyy.3: Clear latency 6649us stddev 10061us loss 6%
              2024-11-08 12:30:53.138370+01:00 dpinger 32387 WANGW 194.xxx.yyy.3: Alarm latency 3571us stddev 4800us loss 22%
              2024-11-08 12:12:02.533386+01:00 dpinger 32387 WANGW 194.xxx.yyy.3: Clear latency 7163us stddev 9233us loss 5%
              2024-11-08 12:10:52.796857+01:00 dpinger 32387 WANGW 194.xxx.yyy.3: Alarm latency 4311us stddev 5938us loss 21%
              2024-11-08 05:13:25.405565+01:00 dpinger 32387 WANGW 194.xxx.yyy.3: Clear latency 5235us stddev 6291us loss 5%
              2024-11-08 05:11:53.814595+01:00 dpinger 32387 WANGW 194.xxx.yyy.3: Alarm latency 5106us stddev 7377us loss 21%
              2024-11-08 02:14:00.821345+01:00 dpinger 32387 WANGW 194.xxx.yyy.3: Clear latency 6536us stddev 9023us loss 5%
              2024-11-08 02:12:54.527981+01:00 dpinger 32387 WANGW 194.xxx.yyy.3: Alarm latency 5851us stddev 7015us loss 22%
              2024-11-07 19:15:23.892450+01:00 dpinger 32387 WANGW 194.xxx.yyy.3: Clear latency 4592us stddev 6255us loss 5%
              2024-11-07 19:13:54.996799+01:00 dpinger 32387 WANGW 194.xxx.yyy.3: Alarm latency 4265us stddev 7476us loss 21%
              2024-11-07 18:55:28.386844+01:00 dpinger 32387 WANGW 194.xxx.yyy.3: Clear latency 5856us stddev 10542us loss 5%
              2024-11-07 18:53:55.677166+01:00 dpinger 32387 WANGW 194.xxx.yyy.3: Alarm latency 3037us stddev 4246us loss 22%
              2024-11-07 18:35:32.070210+01:00 dpinger 32387 WANGW 194.xxx.yyy.3: Clear latency 5903us stddev 8184us loss 5%
              2024-11-07 18:33:55.751032+01:00 dpinger 32387 WANGW 194.xxx.yyy.3: Alarm latency 4777us stddev 7169us loss 22%

              I then made 3 changes as suggested:

              • I changed the monitoring IP to 8.8.8.8
              • I swapped the WAN to a different port
              • I put an unmanaged switch on the WAN side, and also connected a laptop to monitor.

              My laptop had no drops throughout the night, while the router had two:

              2024-11-09 09:47:50.790223+01:00 dpinger 21182 WANGW 8.8.8.8: Clear latency 1397us stddev 28us loss 5%
              2024-11-09 09:46:49.794245+01:00 dpinger 21182 WANGW 8.8.8.8: Alarm latency 1395us stddev 35us loss 21%
              2024-11-09 08:48:02.362721+01:00 dpinger 21182 WANGW 8.8.8.8: Clear latency 1390us stddev 35us loss 6%
              2024-11-09 08:46:49.711801+01:00 dpinger 21182 WANGW 8.8.8.8: Alarm latency 1383us stddev 46us loss 21%

              I guess I can open up a firewall rule on the WAN interface to allow my laptop to ping it from the WAN side, and see if there are any issues when dpinger says there are.

              Any other suggestions on what could be helpful to see to figure out what's going on?

              GertjanG 1 Reply Last reply Reply Quote 0
              • GertjanG
                Gertjan @alexnovice
                last edited by

                @alexnovice said in WAN connection dropping intermittently:

                My laptop had no drops throughout the night, while the router had two:

                I can't suggest a tool, but I'm pretty sure they exist : have your laptop do the same thing : have it ping every 1/2 seconds 8.8.8.8 also.
                At worst, omen the command line and execute

                ping -t 8.8.8.8
                

                and leave it running there for the day.

                @alexnovice said in WAN connection dropping intermittently:

                I guess I can open up a firewall rule on the WAN interface to allow my laptop to ping it from the WAN side, and see if there are any issues when dpinger says there are.

                Good idea !
                On the laptop, a second cmd box, an ping it also.

                ping -t a.b.c.d
                

                where a.b.c.d is your pfSense WAN IP.

                No "help me" PM's please. Use the forum, the community will thank you.
                Edit : and where are the logs ??

                A 1 Reply Last reply Reply Quote 0
                • A
                  alexnovice @Gertjan
                  last edited by

                  @Gertjan

                  I've used a powershell-script to continuously ping various IPs, both when I've been behind the router and now "in front". That, plus actual impact on usability (like video calls dropping) are part of what I've used to identify that my connection has dropped.

                  So I had another WAN drop according to pfsense at noon.. Looked identical to all the others from what I can see. During that time my laptop was able to ping the three IPs I was looking at continuously without any drops:

                  • My ISP's gateway
                  • The internet (8.8.8.8)
                  • The WAN port on my router

                  My interpretation of that is that the issue resides somewhere in the pfsense box, but I have no clue what it could be. ARP table? Firewall rules?

                  Best,
                  Alex

                  GertjanG 1 Reply Last reply Reply Quote 0
                  • GertjanG
                    Gertjan @alexnovice
                    last edited by

                    @alexnovice said in WAN connection dropping intermittently:

                    My interpretation of that is that the issue resides somewhere in the pfsense box, but I have no clue what it could be. ARP table? Firewall rules?

                    Hummm.
                    Not rules.

                    If the 6100 gets very occupied it might start to lose packets.
                    But, "I am running no packages, nor VPNs" so just plain vanilla pfSense, on a 6100, that's ... strange. I've a 4100 and can't over stress it enough to show packet loss.
                    And I use packages like pfBlockerng with a couple of feeds, nothing big. No squid/suricate and other resource hogs.

                    For me ;) for a 6100, you should have to throw many Gbits at before it has troubles doing its job.

                    Go console or SSH, menu option 8 and run 'top' for a while.
                    You can sort on processor activity percentage.

                    @alexnovice said in WAN connection dropping intermittently:

                    I should probably add that I've tried using different ports for the WAN interface with no change in behaviour, so I don't think it's a hardware issue.

                    That exclude individual NIC issues, I agree.

                    No "help me" PM's please. Use the forum, the community will thank you.
                    Edit : and where are the logs ??

                    A 1 Reply Last reply Reply Quote 0
                    • A
                      alexnovice @Gertjan
                      last edited by

                      Seems mostly idle from what I can tell:
                      adecc5d5-5211-4ecb-b54a-ddd1f4a4627b-image.png

                      I also pulled out a bit more on what processes are running:
                      9357420c-c9c4-4b92-beab-88d806b73bc4-image.png

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        I would try to run a packet capture on WAN when it's showing as down and make sure the monitoring pings are actually being sent.

                        A 1 Reply Last reply Reply Quote 0
                        • A
                          alexnovice @stephenw10
                          last edited by

                          @stephenw10

                          Hi Stephen,

                          As you suggested, I ran a packet capture on the WAN interface (not in promiscuous mode) on the ICMP protocol. It looks like this when the WAN goes down:

                          de90330d-039b-44fd-b81c-6ea61c14675a-image.png

                          It seems the packets are sent, but with no response. I also noticed that for some reason it starts pinging a different IP after some time. Not just 8.8.8.8, which is the monitoring IP for dpinger, but also an IP that whois claims belongs to Apple?

                          I also looked a bit more at the logs for when the Gateway is said to be down,. It seems there are intervals of exactly 20 minutes (or multiples of 20 minutes) if that could signify something:

                          6f315455-38b0-4a5e-9162-7f5dd8a609e9-image.png

                          Thanks!

                          Alex

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            20 mins sounds like an ARP issue. Check the actual pcap file or change the view type and make sure the MAC address it's sending those to doesn't change.

                            Those other pings could be from something on the LAN. In a WAN pcap they will have been translated to the WAN address.

                            The curious thing here is that as I understood it you said that during the outage LAN side clients could still ping 8.8.8.8. Anything upstream should see those identically to the pings from dpinger.
                            Is that correct?

                            One possibility is that you have one the inconvenient ISPs that seem to forget your MAC address! We have seen a few users hit that and workaround it be setting a lower ARP timeout. However that breaks all traffic.

                            A 1 Reply Last reply Reply Quote 0
                            • A
                              alexnovice @stephenw10
                              last edited by

                              @stephenw10 said in WAN connection dropping intermittently:

                              20 mins sounds like an ARP issue. Check the actual pcap file or change the view type and make sure the MAC address it's sending those to doesn't change.

                              The destination MAC address remains unchanged before, during and after the connection drops.

                              @stephenw10 said in WAN connection dropping intermittently:

                              The curious thing here is that as I understood it you said that during the outage LAN side clients could still ping 8.8.8.8. Anything upstream should see those identically to the pings from dpinger.
                              Is that correct?

                              No, when dpinger can't get out, neither can upstream clients. However, other devices placed on the WAN side work.

                              @stephenw10 said in WAN connection dropping intermittently:

                              One possibility is that you have one the inconvenient ISPs that seem to forget your MAC address! We have seen a few users hit that and workaround it be setting a lower ARP timeout. However that breaks all traffic.

                              It's a relatively small ISP and they've been pretty responsive - I could try asking them if I only knew what to ask :) But wouldn't that behaviour from the IPS have the same impact on other devices connected in place of pfsense?

                              GertjanG 1 Reply Last reply Reply Quote 0
                              • stephenw10S
                                stephenw10 Netgate Administrator
                                last edited by

                                Effectively the ISP gateway device loses your WAN from it's ARP table and it doesn't ARP for it. Instead it waits until pfSense renews it's ARP entry for the gateway.

                                Try setting: sysctl net.link.ether.inet.max_age=300

                                That is 1200s by default, 20mins. If that seems to prevent it that confirms it's an ARP issue somewhere.

                                A 2 Replies Last reply Reply Quote 0
                                • A
                                  alexnovice @stephenw10
                                  last edited by

                                  Thanks Stephen!

                                  I've made that update - will revert back either if it continues dropping or in ~24 hours when it definitely would have without this change.

                                  1 Reply Last reply Reply Quote 1
                                  • GertjanG
                                    Gertjan @alexnovice
                                    last edited by

                                    @alexnovice

                                    Is this your WAN IP :

                                    ec766396-a984-4fa6-abf1-fc542e0aff4a-image.png

                                    ?
                                    I thought it was a RFC1918 IP.
                                    Using a switch on the WAN side, and pfSense gets this 194.x.x.192 as a WAN IP, then what IP was used by the PC hooked up also to that switch ? How did this PC obtain a 'LAN' IP ?

                                    No "help me" PM's please. Use the forum, the community will thank you.
                                    Edit : and where are the logs ??

                                    A 1 Reply Last reply Reply Quote 0
                                    • A
                                      alexnovice @Gertjan
                                      last edited by

                                      @Gertjan

                                      That is indeed the WAN IP.

                                      The gateway is on the same subnet (just ending in 3 instead of 192). For the laptop on the WAN side I just grabbed another IP in the same subnet (it's a static IP setup so no DHCP), hoping they hadn't locked it down (which it turns out they hadn't).

                                      Like I wrote a couple of responses above, it's a small ISP :-)

                                      Cheers!

                                      Alex

                                      GertjanG 1 Reply Last reply Reply Quote 0
                                      • GertjanG
                                        Gertjan @alexnovice
                                        last edited by

                                        @alexnovice

                                        Ok, great, but the IP you auto assigned yourself could be assigned to some one else.
                                        ( and now 'ARP' gets confused, and the other person could experience WAN IP outages ... ^^)

                                        No "help me" PM's please. Use the forum, the community will thank you.
                                        Edit : and where are the logs ??

                                        A 1 Reply Last reply Reply Quote 0
                                        • A
                                          alexnovice @Gertjan
                                          last edited by

                                          @Gertjan

                                          True, so I stopped doing that as soon as I had results from the test :-)

                                          That said, there are only a few (<5) other users on this subnet (which seems accurate when I stare at ARP broadcasts), since almost all apartments have their home networks managed directly by the ISP (sitting behind their firewall and gateway), whereas I'm bypassing that.

                                          1 Reply Last reply Reply Quote 0
                                          • A
                                            alexnovice @stephenw10
                                            last edited by

                                            @stephenw10

                                            It's been 24 hours and the network has been stable throughout. Incredibly happy and super grateful for you help Stephen and Gertjan.

                                            Thank you!

                                            Alex

                                            1 Reply Last reply Reply Quote 1
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.