Intermittent connection issue



  • Unbound was working properly since January (when I switched to this new ISP in our country) until a few months ago when it started acting intermittently. When forwarding to the ISP DNS servers (servers provided by DHCP by the ONU), everything works fine. But when I disable DNS Resolver query forwarding, there are times that it cannot resolve hostnames even when I do a manual nslookup in cmd. I have the same exact setup on another house (100Km away) with the same ISP and have 0 issues.

    It seems to me that the ISP has issues and dropping DNS query packets to root hints servers when I'm a not forwarding to their DNS servers of some sort. But like I said, this is intermittently happening. How do I accurately troubleshoot this so that I can provide data to them when I tell them it's their fault?



  • Take a packet capture on WAN for all tcp/udp 53 traffic and see what's going on.



  • It looked like a DNS issue at first but when I checked deeper I noticed this: https://pastebin.com/46KAAMew . I did that when the issue was happening. Look at how it was all RTO's for a few tries and then just magically got responses. This tells me that either packets are being dropped along the way or there's something wrong with my ISP's routes. This mostly always happens during the night (peak hours). How do I prove this to them? Do I need to run a promiscuous mode packet capture on the WAN interface, filtered by ICMP/IPv4, the next time this issue happens?



  • That's not going to be easy. How do you know the intermittent problem isn't with your NIC, your patch cable, or your modem?



  • @KOM said in Non-forwarding Resolver intermittent operation:

    That's not going to be easy. How do you know the intermittent problem isn't with your NIC, your patch cable, or your modem?

    I'm sure it's not my modem/ONU because it already got replaced by my ISP once.

    The patch cable I use to connect the modem to the pfsense box is pre-made (Flex cable from Monoprice) and connectivity tests seem good but I can definitely test it more with my other tester which detects split pair but I highly doubt there's something wrong with the patch cable.

    Which NIC are you pertaining to? pfsense's NIC? When the issue happens, it happens for all my clients so it's definitely not a client NIC. It could be the NIC of the switch connected to the pfsense box but again how do I prove that?



  • My point wasn't to specifically question everything around you. It was to point out that your ISP likely will, and you need to have your answers ready.

    Which NIC are you pertaining to? pfsense's NIC?

    Well yes, that's the only NIC that matters in this context. Test or temporarily replace everything that you can to rule out one thing after another. And even then your ISP won't believe you because they've been lied to by a thousand other customers who said they've tested and tried this & that. I, myself, tend to not believe my customers when they tell me they did this or that and I make them show me. So many times, what they told me wasn't true. Just like in these forums when people write down what they're doing and I tell them to post a screen of what they've done instead of a description of what they think they have done.



  • Ok. I'm thinking there are really TCP/UDP packet losses when the issue happens. Would these pcap settings probably be a good first step to see if there are DNS packet losses?

    a24a7ccd-d82a-4d7c-b514-c6c93e3f6d61-image.png

    When the issue happens, sometimes the records I test (www.google.com, www.speedtest.net, mail.yahoo.com, etc.) cannot be resolved at all and sometimes they are resolvable but start with RTO's (during ping) until they just magically respond. Here are some examples:

    https://pastebin.com/revDQ50C
    https://pastebin.com/XRex1AGm



  • That will capture DNS traffic but I don't know how useful it will be. If DNS isn't replying, the capture won't give you much other than no response, and the real reason could be a million different network things. Your detail level is probably higher than necessary.

    Are you doing DNS over TLS or DNSSEC?

    Btw, you can upload images here directly without having to use an external site like Imgur. In the Edit bar, the second icon from the right is Upload Image.



  • @kevindd992002 I'm not sure why you still suspect DNS issues? Based on your pings, you're not getting an IP resolution error, you're getting no ping response from google. As you said, it looks like packets are not getting through at times. Those packets could be DNS queries, ICMP pings, TCP/UDP, and anything in between.

    When trying to prove it's not your equipment, the best way to do that is to remove as much of your equipment from the setup as possible. Make it as simple as possible. Plug a know working PC directly into the modem with a known good cable. Are you still having issues with pings being dropped? If so, then it's either the modem or the ISP's line. If you REALLY want to be sure. Try a second PC and a second good cable and repeat. Anything else is still going to raise questions about xyz in the setup wouldn't it?



  • [@KOM said in Non-forwarding Resolver intermittent operation:

    That will capture DNS traffic but I don't know how useful it will be. If DNS isn't replying, the capture won't give you much other than no response, and the real reason could be a million different network things. Your detail level is probably higher than necessary.

    Are you doing DNS over TLS or DNSSEC?

    Btw, you can upload images here directly without having to use an external site like Imgur. In the Edit bar, the second icon from the right is Upload Image.

    I see. Well, I wasn't really trying to prove it is a DNS issue. I just thought that I would limit my capture to DNS because it is also working intermittently due to the fact, of course, of my suspicion of packets being lost along the way somehow. So I was thinking that any type of traffic, TPC or UDP, is included in the issue.

    I don't have Enable SSL/TLS Service enabled but I do have Enable DNSSEC Support enabled.

    I uploaded the image in my last post here. Are you probably pertaining to the pastebin texts that I linked?

    @Raffi_ said in Non-forwarding Resolver intermittent operation:

    @kevindd992002 I'm not sure why you still suspect DNS issues? Based on your pings, you're not getting an IP resolution error, you're getting no ping response from google. As you said, it looks like packets are not getting through at times. Those packets could be DNS queries, ICMP pings, TCP/UDP, and anything in between.

    When trying to prove it's not your equipment, the best way to do that is to remove as much of your equipment from the setup as possible. Make it as simple as possible. Plug a know working PC directly into the modem with a known good cable. Are you still having issues with pings being dropped? If so, then it's either the modem or the ISP's line. If you REALLY want to be sure. Try a second PC and a second good cable and repeat. Anything else is still going to raise questions about xyz in the setup wouldn't it?

    See my reply above regarding why I was trying to limit to DNS issues. But I guess it would make sense to just capture all (no value in port field)?

    Ok, so when the ISP guys came to my house a few weeks ago, I was trying to prove to them that the issue is with their network. But what turned out is this. Let me first explain my setup.

    1. one of the pfsense interfaces (WAN) connected to modem/ONU
    2. one of the psense interfaces (LAN) connected to ASUS switch/access point (router as others call it, but it is acting as a switch/AP in my use case)

    So it's a basic setup with the additional setup of having a site-to-site VPN to my other house. So when they were here, I tried to prove that even if one of my PC's is connected directly to the modem, the problem still persists. So I connected one of my PC's directly to the modem, and true enough the problem persisted. I was actually lucky that the problem was experienced during the time they were here. Like I said, this is an intermittent problem that is usually just happening during peak hours (weeknights and weekends).

    So with two devices connected to their modem (pfsense and one of my PC's), they let me remove pfsense (as they don't know anything about it), and to my amusement the directly-connected PC just magically worked. So they concluded that somehow my pfsense is bringing down the network or something. Now I'm not yet fully convinced regarding this conclusion because even with pfsense in the mix, this issue just auto-resolves itself after a few minutes. Like you said, I have to test using the modem directly without any of my own equipment connected to it and observe for a few days.

    But if say, pfsense is really causing a conflict, how would it be doing that? I really don't understand why when connecting pfsense to the modem, it bugs down the whole network. pfsense and the PC are both clients in the point of view of the modem. So if there's an issue with pfsense, it shouldn't affect the other clients connected to the same modem (i.e. my PC), would it? I checked for IP conflicts and there's really none.

    While I was typing this, I encountered the issue again and here's the unfiltered packet capture if you guys can help me check on it real quick? While the packet capture is running, I ping'ed www.google.com and here's the results.



  • Nothing jumps out at me from your capture. Your ping requests are replied to. You tried to open a connection to something on tcp/443 that wasn't answered. You tried to talk to another web server and it told you to go away. There were some minor errors. There are references to OpenVPN. Do you have a VPN tunnel running?



  • @KOM said in Non-forwarding Resolver intermittent operation:

    Nothing jumps out at me from your capture. Your pin requests are replied to. You tried to open a connection to something on tcp/443 that wasn't answered. There are references to OpenVPN. Do you have a VPN tunnel running?

    That's weird. If they were all replied to, why do I have a lot of RTO's in the ping results?

    Yes, for some reason I just started having issues with the pfsense webgui when accessing it through firefox but works just fine with chrome and edge. Error is:

    Certificate key usage is inadequate for attempted operation. Error code: SEC_ERROR_INADEQUATE_KEY_USAGE.

    This is the first time I've encountered this error with the gui.

    Yes, I have a site-to-site VPN to my other home'a network through openvpn.


  • LAYER 8 Global Moderator

    @kevindd992002 said in Non-forwarding Resolver intermittent operation:

    RTO's in the ping results?

    Ping is not the same as NS not answering a query.

    What cert are you using on pfsense gui, the self generated self signed, your own, acme?

    What is the extended key usage set on the cert? That error points to this not being called out, or wrong for what your trying to do with the cert.
    "A certificate has a key usage extension that does not assert a required usage"

    Here is what is listed on my created certs I use for pfsense web gui, via the Cert Manager, and created CA.
    Not Critical
    TLS Web Server Authentication (1.3.6.1.5.5.7.3.1)
    1.3.6.1.5.5.8.2.2

    Maybe your trying to use a user vs a server cert?



  • @johnpoz said in Non-forwarding Resolver intermittent operation:

    @kevindd992002 said in Non-forwarding Resolver intermittent operation:

    RTO's in the ping results?

    Ping is not the same as NS not answering a query.

    What cert are you using on pfsense gui, the self generated self signed, your own, acme?

    What is the extended key usage set on the cert? That error points to this not being called out, or wrong for what your trying to do with the cert.
    "A certificate has a key usage extension that does not assert a required usage"

    Here is what is listed on my created certs I use for pfsense web gui, via the Cert Manager, and created CA.
    Not Critical
    TLS Web Server Authentication (1.3.6.1.5.5.7.3.1)
    1.3.6.1.5.5.8.2.2

    Maybe your trying to use a user vs a server cert?

    Yes, I know DNS queries aren't same as ping (ICMP) packets. I was trying check earlier if DNS query packets are also getting lost along the way, and then I showed the ping results as additional information for troubleshooting.

    The CA I have in there is the self-signed FreeRADIUS CA which is used to sign the FreeRADIUS cert. The certs I have under the Certificates tab are:

    1. WebConfigurator default (self-signed)
      Signature Digest: RSA-SHA256
      KU: Digital Signature, Key Encipherment
      EKU: TLS Web Server Authentication, IP Security IKE Intermediate

    2. FreeRADIUS Server Cert (signed by FreeRADISU CA)
      Serial: 1
      Signature Digest: RSA-SHA256
      KU: Digital Signature, Key Encipherment
      EKU: TLS Web Server Authentication, IP Security IKE Intermediate

    Weird thing is that I have another pfsense box (the one at the other end of the openvpn tunnel) that I can access just fine with Firefox and it has the same cert structure and EKU's.



  • @kevindd992002 said in Non-forwarding Resolver intermittent operation:

    If they were all replied to, why do I have a lot of RTO's in the ping results?

    Your cap only shows 3 pings within 1.5 seconds, all replied to. There are no pings that go out and don't echo back.



  • @KOM said in Non-forwarding Resolver intermittent operation:

    @kevindd992002 said in Non-forwarding Resolver intermittent operation:

    If they were all replied to, why do I have a lot of RTO's in the ping results?

    Your cap only shows 3 pings within 1.5 seconds, all replied to. There are no pings that go out and don't echo back.

    Ok, does that tell us that it'a a client issue? Pings not reaching the default gateway (pfsense). I doubt it because when the issue happens, even then ping diagnostic tool of pfsense has this issue.



  • It doesn't tell you much at all,unfortunately. I might throw it back to your ISP at this point and see what they say.


  • LAYER 8 Global Moderator

    You say its intermittent, just because it didn't happen when you removed pfsense doesn't mean it wasn't just not happening at that time.. Doesn't prove that the issues is pfsense.

    I could see maybe if pfsense was generating a shit ton of traffic or something that could bring down your network, or its interface was spewing out garbage packets or something?

    But there is really nothing in that sniff that shows anything out of the ordinary at all..

    Are you saying without pfsense connected it works fine for days? And then the second you connect pfsense it crashes? Keep in mind if your problem is dns related.. Is your client also resolving like pfsense does, or does it just forward?

    When the issue happens your other direct connect client also stops working, or just the clients behind pfsense?



  • @kevindd992002 said in Non-forwarding Resolver intermittent operation:

    So when they were here, I tried to prove that even if one of my PC's is connected directly to the modem, the problem still persists. So I connected one of my PC's directly to the modem, and true enough the problem persisted. I was actually lucky that the problem was experienced during the time they were here. Like I said, this is an intermittent problem that is usually just happening during peak hours (weeknights and weekends).

    So with two devices connected to their modem (pfsense and one of my PC's), they let me remove pfsense (as they don't know anything about it), and to my amusement the directly-connected PC just magically worked. So they concluded that somehow my pfsense is bringing down the network or something. Now I'm not yet fully convinced regarding this conclusion because even with pfsense in the mix, this issue just auto-resolves itself after a few minutes. Like you said, I have to test using the modem directly without any of my own equipment connected to it and observe for a few days.

    I'm reading two conflicting paragraphs. The first paragraph says you were able to reproduce the problem in front of the service tech with only a PC directly connected to the modem. The second paragraph says you were not able to reproduce the problem again with only a PC connected to the modem. It sounds like you let the service tech off the hook too easily. Once you proved in front of them that the direct PC was not working, that is their issue. Why was any additional testing needed? Did the tech ask you to hook pfSense back up? How can they conclude that pfSense is the issue when it wasn't even connected in that first scenario? I'm I misunderstanding something?

    Raffi



  • @johnpoz said in Non-forwarding Resolver intermittent operation:

    You say its intermittent, just because it didn't happen when you removed pfsense doesn't mean it wasn't just not happening at that time.. Doesn't prove that the issues is pfsense.

    I could see maybe if pfsense was generating a shit ton of traffic or something that could bring down your network, or its interface was spewing out garbage packets or something?

    But there is really nothing in that sniff that shows anything out of the ordinary at all..

    Are you saying without pfsense connected it works fine for days? And then the second you connect pfsense it crashes? Keep in mind if your problem is dns related.. Is your client also resolving like pfsense does, or does it just forward?

    When the issue happens your other direct connect client also stops working, or just the clients behind pfsense?

    No, I'm not saying that. Running without pfsense for a few days would be my next step. And no, it just looks like that the second I remove pfsense from the mix, the ping goes through and the problem resolves itself. But then again, it could just be a coincidence since I need to be able to reproduce it consistently.

    Like I mentioned though, I don't think my problem is DNS-related. It's packet loss related for all type of packets. So when unbound tries to resolve through root hints, the packets get lossed (sometimes) and so it returns with failed queries, which in itself looks like a DNS issue.

    What do you mean by my client resolving like pfsense does or does it just forward? It's a Windows client with the pfsense LAN IP configured as its DNS server to ask queries from?

    When the issue happens, as long as pfsense is connected to the modem, all clients (behind pfsense or directly connected to the modem) experience the issue.

    @Raffi_ said in Non-forwarding Resolver intermittent operation:

    @kevindd992002 said in Non-forwarding Resolver intermittent operation:

    So when they were here, I tried to prove that even if one of my PC's is connected directly to the modem, the problem still persists. So I connected one of my PC's directly to the modem, and true enough the problem persisted. I was actually lucky that the problem was experienced during the time they were here. Like I said, this is an intermittent problem that is usually just happening during peak hours (weeknights and weekends).

    So with two devices connected to their modem (pfsense and one of my PC's), they let me remove pfsense (as they don't know anything about it), and to my amusement the directly-connected PC just magically worked. So they concluded that somehow my pfsense is bringing down the network or something. Now I'm not yet fully convinced regarding this conclusion because even with pfsense in the mix, this issue just auto-resolves itself after a few minutes. Like you said, I have to test using the modem directly without any of my own equipment connected to it and observe for a few days.

    I'm reading two conflicting paragraphs. The first paragraph says you were able to reproduce the problem in front of the service tech with only a PC directly connected to the modem. The second paragraph says you were not able to reproduce the problem again with only a PC connected to the modem. It sounds like you let the service tech off the hook too easily. Once you proved in front of them that the direct PC was not working, that is their issue. Why was any additional testing needed? Did the tech ask you to hook pfSense back up? How can they conclude that pfSense is the issue when it wasn't even connected in that first scenario? I'm I misunderstanding something?

    Raffi

    The first paragraph "assumes" that pfsense is still connected to the modem, which is why the directly-connected PC experiences the same problem. The second paragraph says there were to clients to the modem (pfsense and directly-connected PC) and when I removed pfsense the continuous ping of the direclty-connected PC worked. So no conflicts in those paragraphs as they're two different scenarios.

    I do think that I let the service tech off the hook easily. I must admit that I wasn't prepared to show all possible tests that time because I was in a hurry. But I'll be prepared next time which is why I'm try to pick your brains off.

    They just wanted to try and remove pfsense because they didn't know anything about it. You know how incompetent service tech goes sometimes. As long as they don't know a certain software/hardware, they grow suspicious of it. I gave in because I didn't think that that would do anything to the result, but to my surprise it did. Yes, the tech asked me to hook pfsense back up and the problem repeated all over again.

    Again, pfsense was connected in the first paragraph test above. It was only disconnected in the second paragraph.



  • @kevindd992002 Thanks for clearing that up. I was definitely misunderstanding it then. In that first statement, to me a PC directly connected to the modem is...
    ISP ==> Modem ==> PC.

    But you're saying the first scenario was actually,
    ISP ==> Modem ==> pfSense ==> PC.

    That first scenario had the problem so you then did this,
    ISP ==> Modem ==> PC

    And the problem went away. I can understand then why the tech assumed pfSense. I would have also :)

    As you said, it will require more testing without pfSense to know for sure. This is easier said than done. I'm sure the very smart people on here know of ways to troubleshoot this further without removing pfSense. Personally though, if it is an option, I would put in any off the shelf router you have sitting around and see if it behaves the same over time. This would not tell you for sure if the problem is pfSense or not, but it could help as a simple sanity check. This test is not as ideal as a PC directly connected to the modem scenario (ISP ==> Modem ==> PC), but that isn't something you can realistically do for long periods of time.

    I agree with others that the captures and pings provided are not adding up. The pings are going to google.com resolving to IP (216.58.200.228). The captures show pings to google's primary DNS server (8.8.8.8). Are you sure this capture was done at the same time? If so, then it looks like those pings to 216.58.200.228 are never getting to pfSense.
    In fact, I even double checked this. Open up wireshark and put in the filter, ip.addr == 216.58.200.228. That IP didn't get a single hit. I would start looking at the devices or cables between the PC and pfSense on the LAN side. It could also be that PC. Try the same test from another one. See if that ping ever reaches pfSense.

    Edit again after I thought about this,

    • Start the ping test to google.com from the PC.
    • Start the pfSense capture on the LAN side.
    • Use Wireshark filter, ip.addr == <IP google resolved to>. Is the IP for google.com coming up in the capture on the LAN side?
    • If not, check what I said above. The problem is not pfSense unless it is the LAN NIC on the pfSense machine.
    • If so, keep the ping test going from the PC.
    • Start the pfSense capture on the WAN side.
    • Use Wireshark filter, ip.addr == <IP google resolved to>. Is the IP for google.com coming up in the capture on the WAN side?

    I think you get where this is going.



  • @Raffi_ said in Non-forwarding Resolver intermittent operation:

    @kevindd992002 Thanks for clearing that up. I was definitely misunderstanding it then. In that first statement, to me a PC directly connected to the modem is...
    ISP ==> Modem ==> PC.

    But you're saying the first scenario was actually,
    ISP ==> Modem ==> pfSense ==> PC.

    That first scenario had the problem so you then did this,
    ISP ==> Modem ==> PC

    And the problem went away. I can understand then why the tech assumed pfSense. I would have also :)

    As you said, it will require more testing without pfSense to know for sure. This is easier said than done. I'm sure the very smart people on here know of ways to troubleshoot this further without removing pfSense. Personally though, if it is an option, I would put in any off the shelf router you have sitting around and see if it behaves the same over time. This would not tell you for sure if the problem is pfSense or not, but it could help as a simple sanity check. This test is not as ideal as a PC directly connected to the modem scenario (ISP ==> Modem ==> PC), but that isn't something you can realistically do for long periods of time.

    I agree with others that the captures and pings provided are not adding up. The pings are going to google.com resolving to IP (216.58.200.228). The captures show pings to google's primary DNS server (8.8.8.8). Are you sure this capture was done at the same time? If so, then it looks like those pings to 216.58.200.228 are never getting to pfSense.
    In fact, I even double checked this. Open up wireshark and put in the filter, ip.addr == 216.58.200.228. That IP didn't get a single hit. I would start looking at the devices or cables between the PC and pfSense on the LAN side. It could also be that PC. Try the same test from another one. See if that ping ever reaches pfSense.

    Edit again after I thought about this,

    • Start the ping test to google.com from the PC.
    • Start the pfSense capture on the LAN side.
    • Use Wireshark filter, ip.addr == <IP google resolved to>. Is the IP for google.com coming up in the capture on the LAN side?
    • If not, check what I said above. The problem is not pfSense unless it is the LAN NIC on the pfSense machine.
    • If so, keep the ping test going from the PC.
    • Start the pfSense capture on the WAN side.
    • Use Wireshark filter, ip.addr == <IP google resolved to>. Is the IP for google.com coming up in the capture on the WAN side?

    I think you get where this is going.

    @Raffi_ You kinda got half of it right, sorry for not explaining better.

    Scenario 1 is:

    ISP -> modem -> PC (client1)
    modem -> pfsense -> switch -> PC (client2)

    Scenario 2 is:

    ISP -> modem -> PC (client1)

    So in scenario 1, I started pinging to www.google.com and the issue happens. I then removed pfsense in scenario 2 so that only client1 is seen by the modem and the problem went away.

    Yes, I was sure the capture and the ping were done at the same time. And yes, the only explanations for the ping not reaching the WAN interface of the pfsense are either:

    1. Packets from client1 not reaching the pfsense gateway somehow. I highly doubt this because when the issue happens, I also use the ping diagnostic tool from pfsense's webgui and get the same issue. I also test from another PC and get the same result so it's not isolated to just one client.
    2. NAT issue from LAN to WAN
    3. Default route issue from LAN to WAN

    Either way, I have to do some digging myself and I'll post back with more conclusive results. Thanks to all for your help so far.



  • All, here's another WAN packet capture when I was trying to ping to google.com (172.217.160.110):

    https://www.dropbox.com/s/tp0cmy1smv1mmxv/packetcapture.cap?dl=0

    You can clearly see that the first handful of echo requests where not replied to and then responses suddenly started coming. Isn't this enough proof yet the issue is at least on their network (anywhere in between modem to ISP routers)?



  • @kevindd992002 said in Non-forwarding Resolver intermittent operation:

    All, here's another WAN packet capture when I was trying to ping to google.com (172.217.160.110):

    https://www.dropbox.com/s/tp0cmy1smv1mmxv/packetcapture.cap?dl=0

    You can clearly see that the first handful of echo requests where not replied to and then responses suddenly started coming. Isn't this enough proof yet the issue is at least on their network (anywhere in between modem to ISP routers)?

    @kevindd992002 that's a good sample of data. 192.168.100.2 is the WAN address provided by the ISP? If you filter for "icmp" without the quotes, you'll see a bunch of pings never get responses. It's hit or miss and that definitely doesn't look good. It looks like you had pings going out to several different IP's and that's actually good. That tells you it's not just one server that's being moody and doesn't feel like responding to ping. A bunch of them are not responding.

    My opinion in order of most likely to least likely, this points to an ISP issue/the modem itself, the cable between the modem pfSense, the WAN NIC on pfSense. The only way to rule out the last two would be to replace them.

    Just to be sure, are all the stats on the Dashboard OK when this is happening? Nothing unusual like memory, CPU, state table or anything else.

    Raffi


  • LAYER 8 Global Moderator

    What is the gateway here.. 192.168.100.1? That is what device?



  • @Raffi_ said in Non-forwarding Resolver intermittent operation:

    @kevindd992002 said in Non-forwarding Resolver intermittent operation:

    All, here's another WAN packet capture when I was trying to ping to google.com (172.217.160.110):

    https://www.dropbox.com/s/tp0cmy1smv1mmxv/packetcapture.cap?dl=0

    You can clearly see that the first handful of echo requests where not replied to and then responses suddenly started coming. Isn't this enough proof yet the issue is at least on their network (anywhere in between modem to ISP routers)?

    @kevindd992002 that's a good sample of data. 192.168.100.2 is the WAN address provided by the ISP? If you filter for "icmp" without the quotes, you'll see a bunch of pings never get responses. It's hit or miss and that definitely doesn't look good. It looks like you had pings going out to several different IP's and that's actually good. That tells you it's not just one server that's being moody and doesn't feel like responding to ping. A bunch of them are not responding.

    My opinion in order of most likely to least likely, this points to an ISP issue/the modem itself, the cable between the modem pfSense, the WAN NIC on pfSense. The only way to rule out the last two would be to replace them.

    Just to be sure, are all the stats on the Dashboard OK when this is happening? Nothing unusual like memory, CPU, state table or anything else.

    Raffi

    Right, I have SmokePing installed on my Linux server and is doing pings to a couple of servers (20 pings for every 300s) so that's probably what the packet capture caught.

    192.168.100.2 is the IP assigned by the modem, yes, and I have it statically set in the modem config page by the pfsense interface MAC. My ISP is doing double-NAT which is why the pfsense WAN interface is given a private IP.

    1. The modem was already replaced so I doubt that that is the problem.
    2. I can definitely replace the LAN cable that connects pfsense to the modem.
    3. Replacing the WAN NIC of pfsense is going to be hard because I'm using a PCEngines APU2C4 board for pfsense but I do have an extra port that I can try and use for WAN. Is there an easy way to migrate all WAN settings from one port to another in pfsense?

    And yes, all stats OK in the Dashboard when the issue happens. Even the OpenVPN connection is OK. If it was not, I will get a notification from pfsense because of gateway monitoring, but it is all green which is why I don't think there's a problem with the pfsense WAN NIC or the cable between pfsense/modem.

    @johnpoz said in Non-forwarding Resolver intermittent operation:

    What is the gateway here.. 192.168.100.1? That is what device?

    192.168.100.1 is the gateway IP set in the pfsense WAN interface. It is the interface IP of the modem.



  • Moving the WAN to another interface should just be a matter of assigning a new interface. Any settings specific to that interface might have to be redone, but it shouldn't be that complicated. Mostly copy and paste. Makes sure you save the config before doing anything and take plenty of screen shots of any custom settings for the WAN interface.

    You mentioned gateway monitoring. Does the gateway monitoring not indicate any issues? Based on your pings, some are getting through so as it averages over time, it may not be bad enough to trigger the alert or failure. If you look at the graph though, it should indicate some packet loss. The monitoring should catch these failed pings when it's occurring.

    I also wouldn't rule of the possibility that the ISP gave you another bad modem. In any case though, it would be their issue. Either their network of the device they provided. It's up to them to figure out which since it doesn't seem like it's on your end.

    Raffi



  • @Raffi_ said in Non-forwarding Resolver intermittent operation:

    Moving the WAN to another interface should just be a matter of assigning a new interface. Any settings specific to that interface might have to be redone, but it shouldn't be that complicated. Mostly copy and paste. Makes sure you save the config before doing anything and take plenty of screen shots of any custom settings for the WAN interface.

    You mentioned gateway monitoring. Does the gateway monitoring not indicate any issues? Based on your pings, some are getting through so as it averages over time, it may not be bad enough to trigger the alert or failure. If you look at the graph though, it should indicate some packet loss. The monitoring should catch these failed pings when it's occurring.

    I also wouldn't rule of the possibility that the ISP gave you another bad modem. In any case though, it would be their issue. Either their network of the device they provided. It's up to them to figure out which since it doesn't seem like it's on your end.

    Raffi

    Yeah, that's what I thought. I mean, I can do it easily, I just thought there's an easier (lazier) way to do it.

    I was trying to check the graphs just now and I accidentally clicked "reset data" and lost all RRD data :( I don't know why I thought it was reset settings or something. Oh well, I guess I have to wait for the issue to happen again. It's very exhausting to troubleshoot a randomly occurring problem.

    Yeah, I just need to have concrete proof. They don't seem to want packet captures (I sense incompetence). They told me to do the tests without pfsense involved at all, so all clients directly connected to the modem either via cable or wireless. I'm doing that now and so far I cannot reproduce the issue.



  • Ouch, yea intermittent problems are annoying to nail down. If you have another off the shelf router, try that for a while. Do a factory reset on that off the shelf router to make sure it doesn't have any funky settings. Set that up with a new Ethernet cable going to the WAN. If that's still giving you trouble, then let the ISP know. You might want to bend the truth and tell them the test was done with the PC directly connected to the modem.

    Do you have access to the modem webGUI? That might let you see what the signals at the modem look like when this is happening.



  • @Raffi_ said in Non-forwarding Resolver intermittent operation:

    Ouch, yea intermittent problems are annoying to nail down. If you have another off the shelf router, try that for a while. Do a factory reset on that off the shelf router to make sure it doesn't have any funky settings. Set that up with a new Ethernet cable going to the WAN. If that's still giving you trouble, then let the ISP know. You might want to bend the truth and tell them the test was done with the PC directly connected to the modem.

    Do you have access to the modem webGUI? That might let you see what the signals at the modem look like when this is happening.

    Yeah, well the switch that I use is an off the shelf router (ASUS RT-AC66U) that's running in AP mode so that's one of the tests that I can do. I thought of bending the truth and just say I did what they asked me to do and give them the ping results and packet captures but the problem is they have remote access to the modem GUI and they can definitely see the clients (pfsense or PC) connected to their modems.

    And yes, I do have access to the modem GUI and all signals are fine there. And they already replaced the whole fiber cable from the modem to the building cabinet.



  • We have the exact same Asus router for Wifi on a seperate network. See what you get with that setup in router mode in place of pfSense.

    On a separate topic, that Asus router stopped getting updates long ago. If you're using it as a switch though, it probably doesn't matter. But in case you're interested, I've been pretty happy with the Asuswrt-Merlin Fork below. It helps at least get some kind of patching support since Asus no longer wants to.
    https://www.snbforums.com/threads/fork-asuswrt-merlin-374-43-lts-releases-v39e3.18914/

    We mainly use this router for guest Wifi access, smartphones, and laptops. It's definitely not a primary network but the firmware has been solid.



  • @Raffi_ said in Non-forwarding Resolver intermittent operation:

    We have the exact same Asus router for Wifi on a seperate network. See what you get with that setup in router mode in place of pfSense.

    On a separate topic, that Asus router stopped getting updates long ago. If you're using it as a switch though, it probably doesn't matter. But in case you're interested, I've been pretty happy with the Asuswrt-Merlin Fork below. It helps at least get some kind of patching support since Asus no longer wants to.
    https://www.snbforums.com/threads/fork-asuswrt-merlin-374-43-lts-releases-v39e3.18914/

    We mainly use this router for guest Wifi access, smartphones, and laptops. It's definitely not a primary network but the firmware has been solid.

    Yeah. Let me monitor the RRD Graphs first and see what comes put. I also replaced the cable to see if that's the culprit.

    I beat you to it. I've been using the latest merlin firmware for this router for a long time now :)

    My network here is a very simple flat network but I experience this issue. My other network in my other residence is more complicated with all Ubiquiti switches, AP's, CCTV's, same pfsense box, Guest wifi too, and I use the same ISP (though with a higher plan and a static public IP) yet it's working flawlessly over there. So yeah, you can say I'm scratching my head big time with this intermittent issue.



  • That's definitely a good idea. See what the graph say.

    Why didn't you tell me about it?! :)

    Good luck


  • LAYER 8 Global Moderator

    The quality graph can be very useful for sure..

    Here is example of resent issue I was having
    graph.png

    So you can see when the trouble started, but I didn't really notice it until the first big outage.. After that never came back to full upload speed, down was fine 500+, but was seeing packet loss.. Was working with them at the second outage. Reset modem, move box to just modem and not behind pfsense - you know the typical level one shit... They said would have to call them.. Gave it a few days there was a weekend in there for sure.. But it seemed to be getting worse - wasn't really a problem for me but since download was fine, but upload was really in the dirt at this time 5 when it should be 50.. And was just seeing constant packet loss... That first bump in the cluster is me calling them - them resetting the modem yet again, etc. Then they scheduled tech to come out, thos next two bumps are techs out on the line behind house... Then the final drop is when there were 2 trucks out side my house on my way to work, and 2 guys redoing lines.. After that fixed..



  • @Raffi_ said in Non-forwarding Resolver intermittent operation:

    That's definitely a good idea. See what the graph say.

    Why didn't you tell me about it?! :)

    Good luck

    @johnpoz said in Non-forwarding Resolver intermittent operation:

    The quality graph can be very useful for sure..

    Here is example of resent issue I was having
    graph.png

    So you can see when the trouble started, but I didn't really notice it until the first big outage.. After that never came back to full upload speed, down was fine 500+, but was seeing packet loss.. Was working with them at the second outage. Reset modem, move box to just modem and not behind pfsense - you know the typical level one shit... They said would have to call them.. Gave it a few days there was a weekend in there for sure.. But it seemed to be getting worse - wasn't really a problem for me but since download was fine, but upload was really in the dirt at this time 5 when it should be 50.. And was just seeing constant packet loss... That first bump in the cluster is me calling them - them resetting the modem yet again, etc. Then they scheduled tech to come out, thos next two bumps are techs out on the line behind house... Then the final drop is when there were 2 trucks out side my house on my way to work, and 2 guys redoing lines.. After that fixed..

    Ok, so I replaced the cable and it didn't solve the issue. One thing I noticed though is this:

    I don't know if it's just coincidence but it happened a couple of times already since this issue started. Most of the times the issue happens when I after booting my laptop or desktop either from sleep or shutdown, immediately after the clients connect to either the wifi or wired network.

    It happened just now. When I woke up I immediately browsed my phone and nothing seems to be abnormal. And when I decided to use my laptop that's when the issue happened. Ping to google.com and 8.8.8.8 from my laptop AND from the pfsense tool (with either source IP set to auto, WAN, or LAN) failed. The weird thing is that the graph for gateway monitoring did not catch that occurence (gateway monitoring is set to monitor 8.8.8.8):

    d526f0b2-856e-4915-a17c-af21f9cc8f4a-image.png

    It happened at around 7:38 to 7:40 and you can see that the graph is smooth as pie. And then it started working again. Any other ideas? What I know is that when I was trying to ping google.com from my laptop when the issue is happening is that it cannot resolve the name (probably because of the DNS packet loss as well).

    P.S. Don't mind those minor dips on the lefthand side of the graph because those are minor packet losses.



  • I'm confused, why would monitoring not catch packet loss? Is it possible the ping response is getting back to the WAN for monitoring, but not to your LAN devices? I'm not sure how that works. That still wouldn't explain why a ping to 8.8.8.8 doesn't work from pfSense with the WAN as source.

    Some more suggestions on troubleshooting.
    https://serverfault.com/questions/12341/how-to-tell-if-its-your-problem-or-your-isps-problem


  • LAYER 8 Global Moderator

    It would catch packet loss ;) This thread is all over the board - what the F does he think will happen if there is packet loss? For example his blips on the left of his graph - that for sure could cause issues with dns resolving..

    He is all over the place - and to be honest not actually even sure what his issue is or if there is even one..

    Yes if your ISP is dropping traffic - you prob going to see some issues.. I was not seeing any sort of issues that stood out, since my downloads were fast, just upload having an issue.. Even though was seeing significant packet loss in the monitoring.. But you have to take that with a gain of salt, since maybe its just the device your monitoring not answering, etc.

    If you think your isp is messing with your dns, then log your dns both queries and reply in unbound - and look for issues. But a packet dropped here or there should not really cause that much grief since it will try multiple times to resolve something, and once its resolved its cached, etc. etc. You really would have to have significant packet loss to notice problems with dns unless you were really watching for it.



  • @Raffi_ said in Non-forwarding Resolver intermittent operation:

    I'm confused, why would monitoring not catch packet loss? Is it possible the ping response is getting back to the WAN for monitoring, but not to your LAN devices? I'm not sure how that works. That still wouldn't explain why a ping to 8.8.8.8 doesn't work from pfSense with the WAN as source.

    Some more suggestions on troubleshooting.
    https://serverfault.com/questions/12341/how-to-tell-if-its-your-problem-or-your-isps-problem

    That's my confusion as well. You would think that it will catch the packet loss but I'm simply stating my observations here. Those packet losses should appear as -100% blips in the graph but they didn't. It totally doesn't make sense to me at all.

    @johnpoz said in Non-forwarding Resolver intermittent operation:

    It would catch packet loss ;) This thread is all over the board - what the F does he think will happen if there is packet loss? For example his blips on the left of his graph - that for sure could cause issues with dns resolving..

    He is all over the place - and to be honest not actually even sure what his issue is or if there is even one..

    Yes if your ISP is dropping traffic - you prob going to see some issues.. I was not seeing any sort of issues that stood out, since my downloads were fast, just upload having an issue.. Even though was seeing significant packet loss in the monitoring.. But you have to take that with a gain of salt, since maybe its just the device your monitoring not answering, etc.

    If you think your isp is messing with your dns, then log your dns both queries and reply in unbound - and look for issues. But a packet dropped here or there should not really cause that much grief since it will try multiple times to resolve something, and once its resolved its cached, etc. etc. You really would have to have significant packet loss to notice problems with dns unless you were really watching for it.

    With all the screenshots and info that I gave in this thread, what's making you think "I'm all over the place"? Like I said, I'm simply stating the observations. I'm not making these things up. I wouldn't want to waste anyone's time and ask for help if this issue wasn't confusing as hell to me too.

    I can't belive you're still having doubts that I'm having issues here. I mean, did you even think that I made up the packet capture I've uploaded that was showing the issue? I care less about packet drops that happen from time to time as I know those are perfectly normal. But that's not what we're talking about here. When the issue happens, no single device can browse the Internet for 5 to 10 minutes until it just suddenly fixes itself. If you were in my shoes, wouldn't you be pissed and not consider that a problem?

    So let's be clear here. I'm not after perfect 0% packet loss in an infinite time period. I want to solve my issue where packets are dropped in a continous 5 to 10 minutes span of interval each time the issue happens. And I was not able to catch that in the graphs, "yet", for some odd reason.


  • LAYER 8 Global Moderator

    Your pings to 8.8.8.8 not being answered - ok, contact your ISP about it.. Has zero to do with pfsense, ZERO..

    And this again has zero to do with unbound and your title.
    "Non-forwarding Resolver intermittent operation"

    Unbound could give 2 shits about 8.8.8.8 not answering when it resolves, ie non forwarding mode..



  • @johnpoz said in Non-forwarding Resolver intermittent operation:

    Your pings to 8.8.8.8 not being answered - ok, contact your ISP about it.. Has zero to do with pfsense, ZERO..

    And this again has zero to do with unbound and your title.
    "Non-forwarding Resolver intermittent operation"

    Unbound could give 2 shits about 8.8.8.8 not answering when it resolves, ie non forwarding mode..

    @johnpoz from what I've understood is that @kevindd992002 did contact the ISP, but their tech is blaming pfSense and won't take any action. I can understand the frustration there. Even more frustrating is that in trying to prove it's not pfSense, it's only making things more confusing.

    I would look at some of the suggestions in that link I sent. Also, putting the Asus in place of pfSense might be another step?

    Oh, and something that might also give you more insight then ping is a trace route.

    Raffi


Log in to reply