Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    pfSense WAN connection hangs after about a minute

    Scheduled Pinned Locked Moved General pfSense Questions
    35 Posts 4 Posters 3.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F
      fulkren
      last edited by

      I can't test this yet but will as soon as I'm done with work for the day. Thanks for the suggestion.

      1 Reply Last reply Reply Quote 0
      • F
        fulkren
        last edited by

        I tried disabling gateway monitoring and it didn't affect anything. Before disabling it, dpinger does show errors in the log. Example:

        send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 64.37.18.5 bind_addr 100.64.24.89 identifier "WAN_PPPOE "

        Also, I was wrong about the IP staying the same when I disconnect and reconnect the WAN port. It does increment by one each time I cycle.

        And I confirmed I could ping the WAN IP, i.e. 100.64.24.89 in my above example, even when no traffic was flowing any further.

        Another oddity was keeping a ping to 8.8.8.8 going from just after I reconnected the WAN. It kept working even when the interface seemed to die, but pings to any other public IP were failing. That seems very odd to me.

        I keep thinking this has to be something up stream, but if that were the case then the same hardware should fail with other OSs, or my current pfSense box shouldn't work if it's the software.

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          @fulkren said in pfSense WAN connection hangs after about a minute:

          send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 64.37.18.5 bind_addr 100.64.24.89 identifier "WAN_PPPOE "

          That is not an error, it's what dpinger logs when it starts showing what IPs it's running on and what values it's using.

          If it is exhausting something, like a state table, in the ISP router then I would expect existing states to stay open and continue to pass traffic but new states to fail. That would normally include the gateway monitoring pings though.

          You have any other hardware device you can test pfSense on?

          Steve

          1 Reply Last reply Reply Quote 0
          • johnpozJ
            johnpoz LAYER 8 Global Moderator
            last edited by johnpoz

            @stephenw10 said in pfSense WAN connection hangs after about a minute:

            That would normally include the gateway monitoring pings though.

            Not always the case.. We just had this discussion awhile back about icmp and states.. While we all know icmp is actually a stateless protocol - most firewalls track it, via an entry in the state table.. Pf does this..

            So pinging especially twice a second would most likely keep such tracking active and allowed.

            But its possible whatever is upstream isn't tracking it at all? And if it was having state issues, maybe icmp doesn't hit whatever issue that is be it exhaustion or whatever.

            To me It really comes down to this - if you see pfsense sending traffic upstream, and you get no answer - then the problem is upstream. If pfsense was malforming whatever its sending, why would it not be doing that for the whole time - and it wouldn't even work for a minute?

            Your saying you can only ever get a connection for exactly 1 minute, and then it fails? exactly 1 minute, or it just happens often? But it far more likely that its something upstream if your seeing traffic sent upstream with no reply.

            An intelligent man is sometimes forced to be drunk to spend time with his fools
            If you get confused: Listen to the Music Play
            Please don't Chat/PM me for help, unless mod related
            SG-4860 24.11 | Lab VMs 2.8, 24.11

            stephenw10S 1 Reply Last reply Reply Quote 0
            • F
              fulkren
              last edited by

              It's not exactly one minute. The time it takes to stop responding typically varies from a few seconds to about a minute and appears to depend on how much traffic I load on the interface. If I just run one ping to a single address it'll stay up for a while, but much more than that and it goes down.

              As for other hardware, that's the part that confuses me most. My current pfSense box is a VM on an old EVGA i7 mobo with two onboard gigabit NICs (Realteks no less) and it's working perfectly. Same version of pfSense and exactly the same config as is failing on the Protectli Vault I purchased to replace it. The only config differences I made were testing each adjustment recommended in Netgate's troubleshooting and tuning document for Intel NICs.

              I originally thought it was a hardware compatibility issue between my Telco box and the Intel 210AT NIC chipset on the Vault and swapped it out for a Vault with the older Intel 82583V chipset. Unfortunately I run into the exact same behavior on the new Vault (what I'm currently testing with). And it can't be purely hardware because running Sophos XG or ClearOS on the Vault does work (although ClearOS seems to have a lot of overhead and is unable to saturate the 500Mb/s line).

              So the problem isn't purely software or purely hardware and appears to be the combination of the software and the hardware that causes the issue. Maybe the combination of the Intel based NICS and the driver(s) that pfSense uses?

              1 Reply Last reply Reply Quote 0
              • johnpozJ
                johnpoz LAYER 8 Global Moderator
                last edited by

                if traffic continues to be sent upstream, and response stop to be seen. That screams upstream problem.. Do a sniff while your connecting so you collect all traffic being sent upstream and downstream... See if you can see any changes in the traffic being sent.. Wireshark for example would show errors and any packets that are malformed, etc.

                An intelligent man is sometimes forced to be drunk to spend time with his fools
                If you get confused: Listen to the Music Play
                Please don't Chat/PM me for help, unless mod related
                SG-4860 24.11 | Lab VMs 2.8, 24.11

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator @johnpoz
                  last edited by

                  @johnpoz said in pfSense WAN connection hangs after about a minute:

                  But its possible whatever is upstream isn't tracking it at all? And if it was having state issues, maybe icmp doesn't hit whatever issue that is be it exhaustion or whatever.

                  Yeah, that's a good point. Or maybe it requires a new state for each ping and therefore breaks when it stops being able to open new states. Which seems like what we're seeing here.

                  I could believe the pings themselves are exhausting it except disabling monitoring did not appear to help.

                  Steve

                  1 Reply Last reply Reply Quote 0
                  • johnpozJ
                    johnpoz LAYER 8 Global Moderator
                    last edited by

                    A sniff of the traffic should show us if new states are what are being denied, or if we are loosing responses to existing traffic, etc.

                    I am just having a hard time coming up with any sort of issue where pfsense would continue to send traffic, but malform it or change it in some way that the upstream didn't like.. Have never seen such thing ever..

                    Sure some device have hard time talking to each other - but this is normally seen in just basic negotiation of speed and duplex, etc. Not working fine for X amount of time, and then start having issues with just some traffic.

                    An intelligent man is sometimes forced to be drunk to spend time with his fools
                    If you get confused: Listen to the Music Play
                    Please don't Chat/PM me for help, unless mod related
                    SG-4860 24.11 | Lab VMs 2.8, 24.11

                    1 Reply Last reply Reply Quote 0
                    • F
                      fulkren
                      last edited by

                      Sorry for the delay in responding. I did another packet capture on em0 (the NIC underlying the PPP connection). This time I started the packet capture with the WAN cable unplugged, plugged it in, and recorded until it died (about 35 seconds later). I don't see any malformed packets (that I recognize as such) but I do see some spurious re-transmissions as well as a small number of RST packets.

                      I've attached the PCAP (as much as the forum allows, about the first 8k packets out of 22k total). Anything stand out to you?

                      If not, then I'm down to calling the Telco and seeing if they can offer any advice. Though I expect their response to be essentially "use our provided router or get stuffed". Regarding which, their provided router is just an ancient D-Link DIR-825 with stock software. Nothing exciting there.em0-2020-10-21-1.zip

                      1 Reply Last reply Reply Quote 0
                      • johnpozJ
                        johnpoz LAYER 8 Global Moderator
                        last edited by

                        I don't see how that sniff is from before you connected.. It starts with a flood of DNS..

                        Why would you be sending a syn (tcp) to 53??

                        Also - clearly your missing start of conversations... This one for example

                        start.png

                        There is no syn here, you see sending http get, never getting a response.. So retran, then finally says screw it and sends fin.. Get a answer finally, then client side says screw it and sends RST..

                        But you never see the start of this conversation.

                        So how exactly did you start this sniff before you wan was even up? Make sure you clear all states if your going to do that again.. But default I believe is to flush all states when connection goes down, have you adjusted that setting?

                        Game is on - so that is my comments for now.. Will look later after the game, or maybe halftime

                        An intelligent man is sometimes forced to be drunk to spend time with his fools
                        If you get confused: Listen to the Music Play
                        Please don't Chat/PM me for help, unless mod related
                        SG-4860 24.11 | Lab VMs 2.8, 24.11

                        1 Reply Last reply Reply Quote 0
                        • F
                          fulkren
                          last edited by

                          I haven't adjusted the settings for flushing states. And just to confirm I haven't accidentally borked anything up I have done a reset on the box settings and then re-configured for my local subnet, PPPoE connection, and DNS Resolver.

                          For the Sniff, my first attempt was to just unplug the WAN cable (LAN still plugged in), start the capture on WAN in promiscuous mode and then plug it back in. That just got me that flood of DNS packets as you saw. I also tried clearing states and then starting a pcap and, as quickly as I could, running "/usr/local/sbin/pfSctl -c 'interface reload wan'" from the shell but that seems to be giving me similar results to what I already posted.

                          Is there a better way for me to get good data during PPP startup?

                          1 Reply Last reply Reply Quote 0
                          • johnpozJ
                            johnpoz LAYER 8 Global Moderator
                            last edited by

                            When you pull the cable on wan.. Clear all your states.. then start you sniff, then plug the cable back in.. We should then see everything, your pppoe connection, your dhcp on your wan, etc. And since all states are cleared, nothing will be allowed through the firewall to even go out the wan without sending a new syn to start a conversation.

                            We can then see what conversations are failing, what are working.

                            Also make sure you run it the whole time until your saying nothing is working.. You said it was like 30 some seconds that it worked, but that sniff is only 11 seconds.

                            Why I don't get is why so much tcp dns traffic... These all seem to be root servers, but your talking to them via tcp.. Is UDP blocked??

                            53tcp.png

                            This seems nuts... in less than 1ms you sent 870 something dns queries - not one of them I see from quick look got answered over UDP..

                            dns.png

                            That doesn't seem normal... Most of your dns should be UDP.. and only stuff that is LARGE answers should have to do TCP..

                            An intelligent man is sometimes forced to be drunk to spend time with his fools
                            If you get confused: Listen to the Music Play
                            Please don't Chat/PM me for help, unless mod related
                            SG-4860 24.11 | Lab VMs 2.8, 24.11

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              Some aliases with fqdns being resolved?

                              Though those all look like typical stuff you might have behind pfSense.

                              You have Unbound configured in resolving mode? I could just about imagine something at your ISP is blocking you based on those DNS requests. The ISPs router will just be using the DNS servers supplied by the ISP, that would be a difference when using pfSense.
                              You could try switching Unbound to forwarding mode and checking 'Allow DNS server list to be overridden by DHCP/PPP on WAN' in System > General Setup.

                              Steve

                              1 Reply Last reply Reply Quote 0
                              • johnpozJ
                                johnpoz LAYER 8 Global Moderator
                                last edited by

                                @stephenw10 said in pfSense WAN connection hangs after about a minute:

                                I could just about imagine something at your ISP is blocking you based on those DNS requests.

                                Yeah I agree something is blocking dns, tries udp - not working so tries tcp..

                                I didn't notice anything bad in the queries - just a freaking shit ton of them.. 870 queries in 1 ms?? How many clients do you have behind pfsense?

                                An intelligent man is sometimes forced to be drunk to spend time with his fools
                                If you get confused: Listen to the Music Play
                                Please don't Chat/PM me for help, unless mod related
                                SG-4860 24.11 | Lab VMs 2.8, 24.11

                                1 Reply Last reply Reply Quote 0
                                • F
                                  fulkren
                                  last edited by

                                  I'll follow your suggestions on getting a new PCAP. I'm getting a small switch in place with just my client laptop and pfSense connected to it to minimize extraneous traffic for the PCAP.

                                  As for clients, this is just a home network so 6-8 PCs, a couple of appliances, and a few mobile devices.

                                  I was running DNS to Cloudflare (1.1.1.1 and 1.0.0.1), have the DNS Forwarder disabled, and the DNS Resolver enabled. Apparently I unchecked 'enable forwarding mode' by mistake somewhere in my testing yesterday. I'll fix that and change to using the default IP specified DNS servers for my next test (I've tested it before and it didn't help, but at this point I'm eliminating as many variables as I can). DNS Resolver will have Network Interfaces set to LAN and Localhost, with Outgoing Network Interface set to WAN.

                                  I'll get a new PCAP completed and uploaded as soon as I can.

                                  1 Reply Last reply Reply Quote 0
                                  • F
                                    fulkren
                                    last edited by

                                    Success! My one laptop test worked. pfSense is up and running. After testing for 10-15 minutes with just one laptop I put it back on the main switch and so far so good.

                                    The only thing I couldn't do was get a good pcap of the WAN link coming up. As suggested, I rebooted pfSense, cleared the States, re-logged into the GUI and then started a promiscuous pcap with full details up to 10,000 packets on the WAN interface with the WAN cable unplugged. I then plugged the cable in and worked with it until it stopped (or in this case, didn't stop).

                                    When I stopped the pcap after a few minutes, I only had three packets listed. I've pasted the three packets in below, but why only three IPv6 packets? I should be seeing a lot more than this shouldn't I? Or does plugging the cable in somehow disrupt the capture?

                                    No. Time Source Destination Protocol Length Info
                                    1 0.000000 :: ff02::16 ICMPv6 100 Multicast Listener Report Message v2

                                    Frame 1: 100 bytes on wire (800 bits), 100 bytes captured (800 bits)
                                    Null/Loopback
                                    Internet Protocol Version 6, Src: ::, Dst: ff02::16
                                    Internet Control Message Protocol v6

                                    No. Time Source Destination Protocol Length Info
                                    2 0.152920 :: ff02::1:ff1e:22f0 ICMPv6 76 Neighbor Solicitation for fe80::2e0:67ff:fe1e:22f0

                                    Frame 2: 76 bytes on wire (608 bits), 76 bytes captured (608 bits)
                                    Null/Loopback
                                    Internet Protocol Version 6, Src: ::, Dst: ff02::1:ff1e:22f0
                                    Internet Control Message Protocol v6

                                    No. Time Source Destination Protocol Length Info
                                    3 0.636174 :: ff02::16 ICMPv6 80 Multicast Listener Report Message v2

                                    Frame 3: 80 bytes on wire (640 bits), 80 bytes captured (640 bits)
                                    Null/Loopback
                                    Internet Protocol Version 6, Src: ::, Dst: ff02::16
                                    Internet Control Message Protocol v6

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      So after switching back to DNS forwarding mode and using Clouflare for upstream you are no longer being blocked after a few minutes?
                                      Are you using DNS over TLS for that?

                                      Steve

                                      1 Reply Last reply Reply Quote 0
                                      • F
                                        fulkren
                                        last edited by

                                        That's correct except for the Cloudflare part. Right now it's just using the ISP DNS provided by the PPPoE connection. I'll try Cloudflare and DNS over TLS again this evening and see if that affects anything.

                                        1 Reply Last reply Reply Quote 0
                                        • johnpozJ
                                          johnpoz LAYER 8 Global Moderator
                                          last edited by

                                          So it was never that your connection was down, its your resolution of dns was not work.. Which is why somestuff worked and other stuff didn't

                                          An intelligent man is sometimes forced to be drunk to spend time with his fools
                                          If you get confused: Listen to the Music Play
                                          Please don't Chat/PM me for help, unless mod related
                                          SG-4860 24.11 | Lab VMs 2.8, 24.11

                                          1 Reply Last reply Reply Quote 0
                                          • F
                                            fulkren
                                            last edited by

                                            Not entirely. While DNS was definitely part of the issue, the WAN connection did appear to stop returning packets. For example, pings from a box on the LAN to a public IP like 8.8.8.8 would start timing out; as would pings to any other public IPs. I would initially get DNS resolution and traffic, but after a short period (15-60 seconds) both DNS resolution and ICMP responses would stop.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.