Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Intermittent connection issue

    Scheduled Pinned Locked Moved DHCP and DNS
    115 Posts 6 Posters 24.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • K
      kevindd992002
      last edited by kevindd992002

      Unbound was working properly since January (when I switched to this new ISP in our country) until a few months ago when it started acting intermittently. When forwarding to the ISP DNS servers (servers provided by DHCP by the ONU), everything works fine. But when I disable DNS Resolver query forwarding, there are times that it cannot resolve hostnames even when I do a manual nslookup in cmd. I have the same exact setup on another house (100Km away) with the same ISP and have 0 issues.

      It seems to me that the ISP has issues and dropping DNS query packets to root hints servers when I'm a not forwarding to their DNS servers of some sort. But like I said, this is intermittently happening. How do I accurately troubleshoot this so that I can provide data to them when I tell them it's their fault?

      1 Reply Last reply Reply Quote 0
      • KOMK
        KOM
        last edited by

        Take a packet capture on WAN for all tcp/udp 53 traffic and see what's going on.

        1 Reply Last reply Reply Quote 0
        • K
          kevindd992002
          last edited by

          It looked like a DNS issue at first but when I checked deeper I noticed this: https://pastebin.com/46KAAMew . I did that when the issue was happening. Look at how it was all RTO's for a few tries and then just magically got responses. This tells me that either packets are being dropped along the way or there's something wrong with my ISP's routes. This mostly always happens during the night (peak hours). How do I prove this to them? Do I need to run a promiscuous mode packet capture on the WAN interface, filtered by ICMP/IPv4, the next time this issue happens?

          1 Reply Last reply Reply Quote 0
          • KOMK
            KOM
            last edited by

            That's not going to be easy. How do you know the intermittent problem isn't with your NIC, your patch cable, or your modem?

            K 1 Reply Last reply Reply Quote 0
            • K
              kevindd992002 @KOM
              last edited by

              @KOM said in Non-forwarding Resolver intermittent operation:

              That's not going to be easy. How do you know the intermittent problem isn't with your NIC, your patch cable, or your modem?

              I'm sure it's not my modem/ONU because it already got replaced by my ISP once.

              The patch cable I use to connect the modem to the pfsense box is pre-made (Flex cable from Monoprice) and connectivity tests seem good but I can definitely test it more with my other tester which detects split pair but I highly doubt there's something wrong with the patch cable.

              Which NIC are you pertaining to? pfsense's NIC? When the issue happens, it happens for all my clients so it's definitely not a client NIC. It could be the NIC of the switch connected to the pfsense box but again how do I prove that?

              1 Reply Last reply Reply Quote 0
              • KOMK
                KOM
                last edited by

                My point wasn't to specifically question everything around you. It was to point out that your ISP likely will, and you need to have your answers ready.

                Which NIC are you pertaining to? pfsense's NIC?

                Well yes, that's the only NIC that matters in this context. Test or temporarily replace everything that you can to rule out one thing after another. And even then your ISP won't believe you because they've been lied to by a thousand other customers who said they've tested and tried this & that. I, myself, tend to not believe my customers when they tell me they did this or that and I make them show me. So many times, what they told me wasn't true. Just like in these forums when people write down what they're doing and I tell them to post a screen of what they've done instead of a description of what they think they have done.

                1 Reply Last reply Reply Quote 0
                • K
                  kevindd992002
                  last edited by

                  Ok. I'm thinking there are really TCP/UDP packet losses when the issue happens. Would these pcap settings probably be a good first step to see if there are DNS packet losses?

                  a24a7ccd-d82a-4d7c-b514-c6c93e3f6d61-image.png

                  When the issue happens, sometimes the records I test (www.google.com, www.speedtest.net, mail.yahoo.com, etc.) cannot be resolved at all and sometimes they are resolvable but start with RTO's (during ping) until they just magically respond. Here are some examples:

                  https://pastebin.com/revDQ50C
                  https://pastebin.com/XRex1AGm

                  1 Reply Last reply Reply Quote 0
                  • KOMK
                    KOM
                    last edited by

                    That will capture DNS traffic but I don't know how useful it will be. If DNS isn't replying, the capture won't give you much other than no response, and the real reason could be a million different network things. Your detail level is probably higher than necessary.

                    Are you doing DNS over TLS or DNSSEC?

                    Btw, you can upload images here directly without having to use an external site like Imgur. In the Edit bar, the second icon from the right is Upload Image.

                    1 Reply Last reply Reply Quote 0
                    • Raffi_R
                      Raffi_
                      last edited by

                      @kevindd992002 I'm not sure why you still suspect DNS issues? Based on your pings, you're not getting an IP resolution error, you're getting no ping response from google. As you said, it looks like packets are not getting through at times. Those packets could be DNS queries, ICMP pings, TCP/UDP, and anything in between.

                      When trying to prove it's not your equipment, the best way to do that is to remove as much of your equipment from the setup as possible. Make it as simple as possible. Plug a know working PC directly into the modem with a known good cable. Are you still having issues with pings being dropped? If so, then it's either the modem or the ISP's line. If you REALLY want to be sure. Try a second PC and a second good cable and repeat. Anything else is still going to raise questions about xyz in the setup wouldn't it?

                      K 1 Reply Last reply Reply Quote 1
                      • K
                        kevindd992002 @Raffi_
                        last edited by

                        [@KOM said in Non-forwarding Resolver intermittent operation:

                        That will capture DNS traffic but I don't know how useful it will be. If DNS isn't replying, the capture won't give you much other than no response, and the real reason could be a million different network things. Your detail level is probably higher than necessary.

                        Are you doing DNS over TLS or DNSSEC?

                        Btw, you can upload images here directly without having to use an external site like Imgur. In the Edit bar, the second icon from the right is Upload Image.

                        I see. Well, I wasn't really trying to prove it is a DNS issue. I just thought that I would limit my capture to DNS because it is also working intermittently due to the fact, of course, of my suspicion of packets being lost along the way somehow. So I was thinking that any type of traffic, TPC or UDP, is included in the issue.

                        I don't have Enable SSL/TLS Service enabled but I do have Enable DNSSEC Support enabled.

                        I uploaded the image in my last post here. Are you probably pertaining to the pastebin texts that I linked?

                        @Raffi_ said in Non-forwarding Resolver intermittent operation:

                        @kevindd992002 I'm not sure why you still suspect DNS issues? Based on your pings, you're not getting an IP resolution error, you're getting no ping response from google. As you said, it looks like packets are not getting through at times. Those packets could be DNS queries, ICMP pings, TCP/UDP, and anything in between.

                        When trying to prove it's not your equipment, the best way to do that is to remove as much of your equipment from the setup as possible. Make it as simple as possible. Plug a know working PC directly into the modem with a known good cable. Are you still having issues with pings being dropped? If so, then it's either the modem or the ISP's line. If you REALLY want to be sure. Try a second PC and a second good cable and repeat. Anything else is still going to raise questions about xyz in the setup wouldn't it?

                        See my reply above regarding why I was trying to limit to DNS issues. But I guess it would make sense to just capture all (no value in port field)?

                        Ok, so when the ISP guys came to my house a few weeks ago, I was trying to prove to them that the issue is with their network. But what turned out is this. Let me first explain my setup.

                        1. one of the pfsense interfaces (WAN) connected to modem/ONU
                        2. one of the psense interfaces (LAN) connected to ASUS switch/access point (router as others call it, but it is acting as a switch/AP in my use case)

                        So it's a basic setup with the additional setup of having a site-to-site VPN to my other house. So when they were here, I tried to prove that even if one of my PC's is connected directly to the modem, the problem still persists. So I connected one of my PC's directly to the modem, and true enough the problem persisted. I was actually lucky that the problem was experienced during the time they were here. Like I said, this is an intermittent problem that is usually just happening during peak hours (weeknights and weekends).

                        So with two devices connected to their modem (pfsense and one of my PC's), they let me remove pfsense (as they don't know anything about it), and to my amusement the directly-connected PC just magically worked. So they concluded that somehow my pfsense is bringing down the network or something. Now I'm not yet fully convinced regarding this conclusion because even with pfsense in the mix, this issue just auto-resolves itself after a few minutes. Like you said, I have to test using the modem directly without any of my own equipment connected to it and observe for a few days.

                        But if say, pfsense is really causing a conflict, how would it be doing that? I really don't understand why when connecting pfsense to the modem, it bugs down the whole network. pfsense and the PC are both clients in the point of view of the modem. So if there's an issue with pfsense, it shouldn't affect the other clients connected to the same modem (i.e. my PC), would it? I checked for IP conflicts and there's really none.

                        While I was typing this, I encountered the issue again and here's the unfiltered packet capture if you guys can help me check on it real quick? While the packet capture is running, I ping'ed www.google.com and here's the results.

                        Raffi_R 1 Reply Last reply Reply Quote 0
                        • KOMK
                          KOM
                          last edited by KOM

                          Nothing jumps out at me from your capture. Your ping requests are replied to. You tried to open a connection to something on tcp/443 that wasn't answered. You tried to talk to another web server and it told you to go away. There were some minor errors. There are references to OpenVPN. Do you have a VPN tunnel running?

                          K 1 Reply Last reply Reply Quote 0
                          • K
                            kevindd992002 @KOM
                            last edited by

                            @KOM said in Non-forwarding Resolver intermittent operation:

                            Nothing jumps out at me from your capture. Your pin requests are replied to. You tried to open a connection to something on tcp/443 that wasn't answered. There are references to OpenVPN. Do you have a VPN tunnel running?

                            That's weird. If they were all replied to, why do I have a lot of RTO's in the ping results?

                            Yes, for some reason I just started having issues with the pfsense webgui when accessing it through firefox but works just fine with chrome and edge. Error is:

                            Certificate key usage is inadequate for attempted operation. Error code: SEC_ERROR_INADEQUATE_KEY_USAGE.

                            This is the first time I've encountered this error with the gui.

                            Yes, I have a site-to-site VPN to my other home'a network through openvpn.

                            1 Reply Last reply Reply Quote 0
                            • johnpozJ
                              johnpoz LAYER 8 Global Moderator
                              last edited by johnpoz

                              @kevindd992002 said in Non-forwarding Resolver intermittent operation:

                              RTO's in the ping results?

                              Ping is not the same as NS not answering a query.

                              What cert are you using on pfsense gui, the self generated self signed, your own, acme?

                              What is the extended key usage set on the cert? That error points to this not being called out, or wrong for what your trying to do with the cert.
                              "A certificate has a key usage extension that does not assert a required usage"

                              Here is what is listed on my created certs I use for pfsense web gui, via the Cert Manager, and created CA.
                              Not Critical
                              TLS Web Server Authentication (1.3.6.1.5.5.7.3.1)
                              1.3.6.1.5.5.8.2.2

                              Maybe your trying to use a user vs a server cert?

                              An intelligent man is sometimes forced to be drunk to spend time with his fools
                              If you get confused: Listen to the Music Play
                              Please don't Chat/PM me for help, unless mod related
                              SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                              K 1 Reply Last reply Reply Quote 0
                              • K
                                kevindd992002 @johnpoz
                                last edited by

                                @johnpoz said in Non-forwarding Resolver intermittent operation:

                                @kevindd992002 said in Non-forwarding Resolver intermittent operation:

                                RTO's in the ping results?

                                Ping is not the same as NS not answering a query.

                                What cert are you using on pfsense gui, the self generated self signed, your own, acme?

                                What is the extended key usage set on the cert? That error points to this not being called out, or wrong for what your trying to do with the cert.
                                "A certificate has a key usage extension that does not assert a required usage"

                                Here is what is listed on my created certs I use for pfsense web gui, via the Cert Manager, and created CA.
                                Not Critical
                                TLS Web Server Authentication (1.3.6.1.5.5.7.3.1)
                                1.3.6.1.5.5.8.2.2

                                Maybe your trying to use a user vs a server cert?

                                Yes, I know DNS queries aren't same as ping (ICMP) packets. I was trying check earlier if DNS query packets are also getting lost along the way, and then I showed the ping results as additional information for troubleshooting.

                                The CA I have in there is the self-signed FreeRADIUS CA which is used to sign the FreeRADIUS cert. The certs I have under the Certificates tab are:

                                1. WebConfigurator default (self-signed)
                                  Signature Digest: RSA-SHA256
                                  KU: Digital Signature, Key Encipherment
                                  EKU: TLS Web Server Authentication, IP Security IKE Intermediate

                                2. FreeRADIUS Server Cert (signed by FreeRADISU CA)
                                  Serial: 1
                                  Signature Digest: RSA-SHA256
                                  KU: Digital Signature, Key Encipherment
                                  EKU: TLS Web Server Authentication, IP Security IKE Intermediate

                                Weird thing is that I have another pfsense box (the one at the other end of the openvpn tunnel) that I can access just fine with Firefox and it has the same cert structure and EKU's.

                                1 Reply Last reply Reply Quote 0
                                • KOMK
                                  KOM
                                  last edited by

                                  @kevindd992002 said in Non-forwarding Resolver intermittent operation:

                                  If they were all replied to, why do I have a lot of RTO's in the ping results?

                                  Your cap only shows 3 pings within 1.5 seconds, all replied to. There are no pings that go out and don't echo back.

                                  K 1 Reply Last reply Reply Quote 0
                                  • K
                                    kevindd992002 @KOM
                                    last edited by

                                    @KOM said in Non-forwarding Resolver intermittent operation:

                                    @kevindd992002 said in Non-forwarding Resolver intermittent operation:

                                    If they were all replied to, why do I have a lot of RTO's in the ping results?

                                    Your cap only shows 3 pings within 1.5 seconds, all replied to. There are no pings that go out and don't echo back.

                                    Ok, does that tell us that it'a a client issue? Pings not reaching the default gateway (pfsense). I doubt it because when the issue happens, even then ping diagnostic tool of pfsense has this issue.

                                    1 Reply Last reply Reply Quote 0
                                    • KOMK
                                      KOM
                                      last edited by

                                      It doesn't tell you much at all,unfortunately. I might throw it back to your ISP at this point and see what they say.

                                      1 Reply Last reply Reply Quote 0
                                      • johnpozJ
                                        johnpoz LAYER 8 Global Moderator
                                        last edited by johnpoz

                                        You say its intermittent, just because it didn't happen when you removed pfsense doesn't mean it wasn't just not happening at that time.. Doesn't prove that the issues is pfsense.

                                        I could see maybe if pfsense was generating a shit ton of traffic or something that could bring down your network, or its interface was spewing out garbage packets or something?

                                        But there is really nothing in that sniff that shows anything out of the ordinary at all..

                                        Are you saying without pfsense connected it works fine for days? And then the second you connect pfsense it crashes? Keep in mind if your problem is dns related.. Is your client also resolving like pfsense does, or does it just forward?

                                        When the issue happens your other direct connect client also stops working, or just the clients behind pfsense?

                                        An intelligent man is sometimes forced to be drunk to spend time with his fools
                                        If you get confused: Listen to the Music Play
                                        Please don't Chat/PM me for help, unless mod related
                                        SG-4860 24.11 | Lab VMs 2.7.2, 24.11

                                        K 1 Reply Last reply Reply Quote 0
                                        • Raffi_R
                                          Raffi_ @kevindd992002
                                          last edited by

                                          @kevindd992002 said in Non-forwarding Resolver intermittent operation:

                                          So when they were here, I tried to prove that even if one of my PC's is connected directly to the modem, the problem still persists. So I connected one of my PC's directly to the modem, and true enough the problem persisted. I was actually lucky that the problem was experienced during the time they were here. Like I said, this is an intermittent problem that is usually just happening during peak hours (weeknights and weekends).

                                          So with two devices connected to their modem (pfsense and one of my PC's), they let me remove pfsense (as they don't know anything about it), and to my amusement the directly-connected PC just magically worked. So they concluded that somehow my pfsense is bringing down the network or something. Now I'm not yet fully convinced regarding this conclusion because even with pfsense in the mix, this issue just auto-resolves itself after a few minutes. Like you said, I have to test using the modem directly without any of my own equipment connected to it and observe for a few days.

                                          I'm reading two conflicting paragraphs. The first paragraph says you were able to reproduce the problem in front of the service tech with only a PC directly connected to the modem. The second paragraph says you were not able to reproduce the problem again with only a PC connected to the modem. It sounds like you let the service tech off the hook too easily. Once you proved in front of them that the direct PC was not working, that is their issue. Why was any additional testing needed? Did the tech ask you to hook pfSense back up? How can they conclude that pfSense is the issue when it wasn't even connected in that first scenario? I'm I misunderstanding something?

                                          Raffi

                                          1 Reply Last reply Reply Quote 0
                                          • K
                                            kevindd992002 @johnpoz
                                            last edited by

                                            @johnpoz said in Non-forwarding Resolver intermittent operation:

                                            You say its intermittent, just because it didn't happen when you removed pfsense doesn't mean it wasn't just not happening at that time.. Doesn't prove that the issues is pfsense.

                                            I could see maybe if pfsense was generating a shit ton of traffic or something that could bring down your network, or its interface was spewing out garbage packets or something?

                                            But there is really nothing in that sniff that shows anything out of the ordinary at all..

                                            Are you saying without pfsense connected it works fine for days? And then the second you connect pfsense it crashes? Keep in mind if your problem is dns related.. Is your client also resolving like pfsense does, or does it just forward?

                                            When the issue happens your other direct connect client also stops working, or just the clients behind pfsense?

                                            No, I'm not saying that. Running without pfsense for a few days would be my next step. And no, it just looks like that the second I remove pfsense from the mix, the ping goes through and the problem resolves itself. But then again, it could just be a coincidence since I need to be able to reproduce it consistently.

                                            Like I mentioned though, I don't think my problem is DNS-related. It's packet loss related for all type of packets. So when unbound tries to resolve through root hints, the packets get lossed (sometimes) and so it returns with failed queries, which in itself looks like a DNS issue.

                                            What do you mean by my client resolving like pfsense does or does it just forward? It's a Windows client with the pfsense LAN IP configured as its DNS server to ask queries from?

                                            When the issue happens, as long as pfsense is connected to the modem, all clients (behind pfsense or directly connected to the modem) experience the issue.

                                            @Raffi_ said in Non-forwarding Resolver intermittent operation:

                                            @kevindd992002 said in Non-forwarding Resolver intermittent operation:

                                            So when they were here, I tried to prove that even if one of my PC's is connected directly to the modem, the problem still persists. So I connected one of my PC's directly to the modem, and true enough the problem persisted. I was actually lucky that the problem was experienced during the time they were here. Like I said, this is an intermittent problem that is usually just happening during peak hours (weeknights and weekends).

                                            So with two devices connected to their modem (pfsense and one of my PC's), they let me remove pfsense (as they don't know anything about it), and to my amusement the directly-connected PC just magically worked. So they concluded that somehow my pfsense is bringing down the network or something. Now I'm not yet fully convinced regarding this conclusion because even with pfsense in the mix, this issue just auto-resolves itself after a few minutes. Like you said, I have to test using the modem directly without any of my own equipment connected to it and observe for a few days.

                                            I'm reading two conflicting paragraphs. The first paragraph says you were able to reproduce the problem in front of the service tech with only a PC directly connected to the modem. The second paragraph says you were not able to reproduce the problem again with only a PC connected to the modem. It sounds like you let the service tech off the hook too easily. Once you proved in front of them that the direct PC was not working, that is their issue. Why was any additional testing needed? Did the tech ask you to hook pfSense back up? How can they conclude that pfSense is the issue when it wasn't even connected in that first scenario? I'm I misunderstanding something?

                                            Raffi

                                            The first paragraph "assumes" that pfsense is still connected to the modem, which is why the directly-connected PC experiences the same problem. The second paragraph says there were to clients to the modem (pfsense and directly-connected PC) and when I removed pfsense the continuous ping of the direclty-connected PC worked. So no conflicts in those paragraphs as they're two different scenarios.

                                            I do think that I let the service tech off the hook easily. I must admit that I wasn't prepared to show all possible tests that time because I was in a hurry. But I'll be prepared next time which is why I'm try to pick your brains off.

                                            They just wanted to try and remove pfsense because they didn't know anything about it. You know how incompetent service tech goes sometimes. As long as they don't know a certain software/hardware, they grow suspicious of it. I gave in because I didn't think that that would do anything to the result, but to my surprise it did. Yes, the tech asked me to hook pfsense back up and the problem repeated all over again.

                                            Again, pfsense was connected in the first paragraph test above. It was only disconnected in the second paragraph.

                                            Raffi_R 1 Reply Last reply Reply Quote 1
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.