Intermittent connection issue



  • @kevindd992002 said in Non-forwarding Resolver intermittent operation:

    All, here's another WAN packet capture when I was trying to ping to google.com (172.217.160.110):

    https://www.dropbox.com/s/tp0cmy1smv1mmxv/packetcapture.cap?dl=0

    You can clearly see that the first handful of echo requests where not replied to and then responses suddenly started coming. Isn't this enough proof yet the issue is at least on their network (anywhere in between modem to ISP routers)?

    @kevindd992002 that's a good sample of data. 192.168.100.2 is the WAN address provided by the ISP? If you filter for "icmp" without the quotes, you'll see a bunch of pings never get responses. It's hit or miss and that definitely doesn't look good. It looks like you had pings going out to several different IP's and that's actually good. That tells you it's not just one server that's being moody and doesn't feel like responding to ping. A bunch of them are not responding.

    My opinion in order of most likely to least likely, this points to an ISP issue/the modem itself, the cable between the modem pfSense, the WAN NIC on pfSense. The only way to rule out the last two would be to replace them.

    Just to be sure, are all the stats on the Dashboard OK when this is happening? Nothing unusual like memory, CPU, state table or anything else.

    Raffi


  • LAYER 8 Global Moderator

    What is the gateway here.. 192.168.100.1? That is what device?



  • @Raffi_ said in Non-forwarding Resolver intermittent operation:

    @kevindd992002 said in Non-forwarding Resolver intermittent operation:

    All, here's another WAN packet capture when I was trying to ping to google.com (172.217.160.110):

    https://www.dropbox.com/s/tp0cmy1smv1mmxv/packetcapture.cap?dl=0

    You can clearly see that the first handful of echo requests where not replied to and then responses suddenly started coming. Isn't this enough proof yet the issue is at least on their network (anywhere in between modem to ISP routers)?

    @kevindd992002 that's a good sample of data. 192.168.100.2 is the WAN address provided by the ISP? If you filter for "icmp" without the quotes, you'll see a bunch of pings never get responses. It's hit or miss and that definitely doesn't look good. It looks like you had pings going out to several different IP's and that's actually good. That tells you it's not just one server that's being moody and doesn't feel like responding to ping. A bunch of them are not responding.

    My opinion in order of most likely to least likely, this points to an ISP issue/the modem itself, the cable between the modem pfSense, the WAN NIC on pfSense. The only way to rule out the last two would be to replace them.

    Just to be sure, are all the stats on the Dashboard OK when this is happening? Nothing unusual like memory, CPU, state table or anything else.

    Raffi

    Right, I have SmokePing installed on my Linux server and is doing pings to a couple of servers (20 pings for every 300s) so that's probably what the packet capture caught.

    192.168.100.2 is the IP assigned by the modem, yes, and I have it statically set in the modem config page by the pfsense interface MAC. My ISP is doing double-NAT which is why the pfsense WAN interface is given a private IP.

    1. The modem was already replaced so I doubt that that is the problem.
    2. I can definitely replace the LAN cable that connects pfsense to the modem.
    3. Replacing the WAN NIC of pfsense is going to be hard because I'm using a PCEngines APU2C4 board for pfsense but I do have an extra port that I can try and use for WAN. Is there an easy way to migrate all WAN settings from one port to another in pfsense?

    And yes, all stats OK in the Dashboard when the issue happens. Even the OpenVPN connection is OK. If it was not, I will get a notification from pfsense because of gateway monitoring, but it is all green which is why I don't think there's a problem with the pfsense WAN NIC or the cable between pfsense/modem.

    @johnpoz said in Non-forwarding Resolver intermittent operation:

    What is the gateway here.. 192.168.100.1? That is what device?

    192.168.100.1 is the gateway IP set in the pfsense WAN interface. It is the interface IP of the modem.



  • Moving the WAN to another interface should just be a matter of assigning a new interface. Any settings specific to that interface might have to be redone, but it shouldn't be that complicated. Mostly copy and paste. Makes sure you save the config before doing anything and take plenty of screen shots of any custom settings for the WAN interface.

    You mentioned gateway monitoring. Does the gateway monitoring not indicate any issues? Based on your pings, some are getting through so as it averages over time, it may not be bad enough to trigger the alert or failure. If you look at the graph though, it should indicate some packet loss. The monitoring should catch these failed pings when it's occurring.

    I also wouldn't rule of the possibility that the ISP gave you another bad modem. In any case though, it would be their issue. Either their network of the device they provided. It's up to them to figure out which since it doesn't seem like it's on your end.

    Raffi



  • @Raffi_ said in Non-forwarding Resolver intermittent operation:

    Moving the WAN to another interface should just be a matter of assigning a new interface. Any settings specific to that interface might have to be redone, but it shouldn't be that complicated. Mostly copy and paste. Makes sure you save the config before doing anything and take plenty of screen shots of any custom settings for the WAN interface.

    You mentioned gateway monitoring. Does the gateway monitoring not indicate any issues? Based on your pings, some are getting through so as it averages over time, it may not be bad enough to trigger the alert or failure. If you look at the graph though, it should indicate some packet loss. The monitoring should catch these failed pings when it's occurring.

    I also wouldn't rule of the possibility that the ISP gave you another bad modem. In any case though, it would be their issue. Either their network of the device they provided. It's up to them to figure out which since it doesn't seem like it's on your end.

    Raffi

    Yeah, that's what I thought. I mean, I can do it easily, I just thought there's an easier (lazier) way to do it.

    I was trying to check the graphs just now and I accidentally clicked "reset data" and lost all RRD data :( I don't know why I thought it was reset settings or something. Oh well, I guess I have to wait for the issue to happen again. It's very exhausting to troubleshoot a randomly occurring problem.

    Yeah, I just need to have concrete proof. They don't seem to want packet captures (I sense incompetence). They told me to do the tests without pfsense involved at all, so all clients directly connected to the modem either via cable or wireless. I'm doing that now and so far I cannot reproduce the issue.



  • Ouch, yea intermittent problems are annoying to nail down. If you have another off the shelf router, try that for a while. Do a factory reset on that off the shelf router to make sure it doesn't have any funky settings. Set that up with a new Ethernet cable going to the WAN. If that's still giving you trouble, then let the ISP know. You might want to bend the truth and tell them the test was done with the PC directly connected to the modem.

    Do you have access to the modem webGUI? That might let you see what the signals at the modem look like when this is happening.



  • @Raffi_ said in Non-forwarding Resolver intermittent operation:

    Ouch, yea intermittent problems are annoying to nail down. If you have another off the shelf router, try that for a while. Do a factory reset on that off the shelf router to make sure it doesn't have any funky settings. Set that up with a new Ethernet cable going to the WAN. If that's still giving you trouble, then let the ISP know. You might want to bend the truth and tell them the test was done with the PC directly connected to the modem.

    Do you have access to the modem webGUI? That might let you see what the signals at the modem look like when this is happening.

    Yeah, well the switch that I use is an off the shelf router (ASUS RT-AC66U) that's running in AP mode so that's one of the tests that I can do. I thought of bending the truth and just say I did what they asked me to do and give them the ping results and packet captures but the problem is they have remote access to the modem GUI and they can definitely see the clients (pfsense or PC) connected to their modems.

    And yes, I do have access to the modem GUI and all signals are fine there. And they already replaced the whole fiber cable from the modem to the building cabinet.



  • We have the exact same Asus router for Wifi on a seperate network. See what you get with that setup in router mode in place of pfSense.

    On a separate topic, that Asus router stopped getting updates long ago. If you're using it as a switch though, it probably doesn't matter. But in case you're interested, I've been pretty happy with the Asuswrt-Merlin Fork below. It helps at least get some kind of patching support since Asus no longer wants to.
    https://www.snbforums.com/threads/fork-asuswrt-merlin-374-43-lts-releases-v39e3.18914/

    We mainly use this router for guest Wifi access, smartphones, and laptops. It's definitely not a primary network but the firmware has been solid.



  • @Raffi_ said in Non-forwarding Resolver intermittent operation:

    We have the exact same Asus router for Wifi on a seperate network. See what you get with that setup in router mode in place of pfSense.

    On a separate topic, that Asus router stopped getting updates long ago. If you're using it as a switch though, it probably doesn't matter. But in case you're interested, I've been pretty happy with the Asuswrt-Merlin Fork below. It helps at least get some kind of patching support since Asus no longer wants to.
    https://www.snbforums.com/threads/fork-asuswrt-merlin-374-43-lts-releases-v39e3.18914/

    We mainly use this router for guest Wifi access, smartphones, and laptops. It's definitely not a primary network but the firmware has been solid.

    Yeah. Let me monitor the RRD Graphs first and see what comes put. I also replaced the cable to see if that's the culprit.

    I beat you to it. I've been using the latest merlin firmware for this router for a long time now :)

    My network here is a very simple flat network but I experience this issue. My other network in my other residence is more complicated with all Ubiquiti switches, AP's, CCTV's, same pfsense box, Guest wifi too, and I use the same ISP (though with a higher plan and a static public IP) yet it's working flawlessly over there. So yeah, you can say I'm scratching my head big time with this intermittent issue.



  • That's definitely a good idea. See what the graph say.

    Why didn't you tell me about it?! :)

    Good luck


  • LAYER 8 Global Moderator

    The quality graph can be very useful for sure..

    Here is example of resent issue I was having
    graph.png

    So you can see when the trouble started, but I didn't really notice it until the first big outage.. After that never came back to full upload speed, down was fine 500+, but was seeing packet loss.. Was working with them at the second outage. Reset modem, move box to just modem and not behind pfsense - you know the typical level one shit... They said would have to call them.. Gave it a few days there was a weekend in there for sure.. But it seemed to be getting worse - wasn't really a problem for me but since download was fine, but upload was really in the dirt at this time 5 when it should be 50.. And was just seeing constant packet loss... That first bump in the cluster is me calling them - them resetting the modem yet again, etc. Then they scheduled tech to come out, thos next two bumps are techs out on the line behind house... Then the final drop is when there were 2 trucks out side my house on my way to work, and 2 guys redoing lines.. After that fixed..



  • @Raffi_ said in Non-forwarding Resolver intermittent operation:

    That's definitely a good idea. See what the graph say.

    Why didn't you tell me about it?! :)

    Good luck

    @johnpoz said in Non-forwarding Resolver intermittent operation:

    The quality graph can be very useful for sure..

    Here is example of resent issue I was having
    graph.png

    So you can see when the trouble started, but I didn't really notice it until the first big outage.. After that never came back to full upload speed, down was fine 500+, but was seeing packet loss.. Was working with them at the second outage. Reset modem, move box to just modem and not behind pfsense - you know the typical level one shit... They said would have to call them.. Gave it a few days there was a weekend in there for sure.. But it seemed to be getting worse - wasn't really a problem for me but since download was fine, but upload was really in the dirt at this time 5 when it should be 50.. And was just seeing constant packet loss... That first bump in the cluster is me calling them - them resetting the modem yet again, etc. Then they scheduled tech to come out, thos next two bumps are techs out on the line behind house... Then the final drop is when there were 2 trucks out side my house on my way to work, and 2 guys redoing lines.. After that fixed..

    Ok, so I replaced the cable and it didn't solve the issue. One thing I noticed though is this:

    I don't know if it's just coincidence but it happened a couple of times already since this issue started. Most of the times the issue happens when I after booting my laptop or desktop either from sleep or shutdown, immediately after the clients connect to either the wifi or wired network.

    It happened just now. When I woke up I immediately browsed my phone and nothing seems to be abnormal. And when I decided to use my laptop that's when the issue happened. Ping to google.com and 8.8.8.8 from my laptop AND from the pfsense tool (with either source IP set to auto, WAN, or LAN) failed. The weird thing is that the graph for gateway monitoring did not catch that occurence (gateway monitoring is set to monitor 8.8.8.8):

    d526f0b2-856e-4915-a17c-af21f9cc8f4a-image.png

    It happened at around 7:38 to 7:40 and you can see that the graph is smooth as pie. And then it started working again. Any other ideas? What I know is that when I was trying to ping google.com from my laptop when the issue is happening is that it cannot resolve the name (probably because of the DNS packet loss as well).

    P.S. Don't mind those minor dips on the lefthand side of the graph because those are minor packet losses.



  • I'm confused, why would monitoring not catch packet loss? Is it possible the ping response is getting back to the WAN for monitoring, but not to your LAN devices? I'm not sure how that works. That still wouldn't explain why a ping to 8.8.8.8 doesn't work from pfSense with the WAN as source.

    Some more suggestions on troubleshooting.
    https://serverfault.com/questions/12341/how-to-tell-if-its-your-problem-or-your-isps-problem


  • LAYER 8 Global Moderator

    It would catch packet loss ;) This thread is all over the board - what the F does he think will happen if there is packet loss? For example his blips on the left of his graph - that for sure could cause issues with dns resolving..

    He is all over the place - and to be honest not actually even sure what his issue is or if there is even one..

    Yes if your ISP is dropping traffic - you prob going to see some issues.. I was not seeing any sort of issues that stood out, since my downloads were fast, just upload having an issue.. Even though was seeing significant packet loss in the monitoring.. But you have to take that with a gain of salt, since maybe its just the device your monitoring not answering, etc.

    If you think your isp is messing with your dns, then log your dns both queries and reply in unbound - and look for issues. But a packet dropped here or there should not really cause that much grief since it will try multiple times to resolve something, and once its resolved its cached, etc. etc. You really would have to have significant packet loss to notice problems with dns unless you were really watching for it.



  • @Raffi_ said in Non-forwarding Resolver intermittent operation:

    I'm confused, why would monitoring not catch packet loss? Is it possible the ping response is getting back to the WAN for monitoring, but not to your LAN devices? I'm not sure how that works. That still wouldn't explain why a ping to 8.8.8.8 doesn't work from pfSense with the WAN as source.

    Some more suggestions on troubleshooting.
    https://serverfault.com/questions/12341/how-to-tell-if-its-your-problem-or-your-isps-problem

    That's my confusion as well. You would think that it will catch the packet loss but I'm simply stating my observations here. Those packet losses should appear as -100% blips in the graph but they didn't. It totally doesn't make sense to me at all.

    @johnpoz said in Non-forwarding Resolver intermittent operation:

    It would catch packet loss ;) This thread is all over the board - what the F does he think will happen if there is packet loss? For example his blips on the left of his graph - that for sure could cause issues with dns resolving..

    He is all over the place - and to be honest not actually even sure what his issue is or if there is even one..

    Yes if your ISP is dropping traffic - you prob going to see some issues.. I was not seeing any sort of issues that stood out, since my downloads were fast, just upload having an issue.. Even though was seeing significant packet loss in the monitoring.. But you have to take that with a gain of salt, since maybe its just the device your monitoring not answering, etc.

    If you think your isp is messing with your dns, then log your dns both queries and reply in unbound - and look for issues. But a packet dropped here or there should not really cause that much grief since it will try multiple times to resolve something, and once its resolved its cached, etc. etc. You really would have to have significant packet loss to notice problems with dns unless you were really watching for it.

    With all the screenshots and info that I gave in this thread, what's making you think "I'm all over the place"? Like I said, I'm simply stating the observations. I'm not making these things up. I wouldn't want to waste anyone's time and ask for help if this issue wasn't confusing as hell to me too.

    I can't belive you're still having doubts that I'm having issues here. I mean, did you even think that I made up the packet capture I've uploaded that was showing the issue? I care less about packet drops that happen from time to time as I know those are perfectly normal. But that's not what we're talking about here. When the issue happens, no single device can browse the Internet for 5 to 10 minutes until it just suddenly fixes itself. If you were in my shoes, wouldn't you be pissed and not consider that a problem?

    So let's be clear here. I'm not after perfect 0% packet loss in an infinite time period. I want to solve my issue where packets are dropped in a continous 5 to 10 minutes span of interval each time the issue happens. And I was not able to catch that in the graphs, "yet", for some odd reason.


  • LAYER 8 Global Moderator

    Your pings to 8.8.8.8 not being answered - ok, contact your ISP about it.. Has zero to do with pfsense, ZERO..

    And this again has zero to do with unbound and your title.
    "Non-forwarding Resolver intermittent operation"

    Unbound could give 2 shits about 8.8.8.8 not answering when it resolves, ie non forwarding mode..



  • @johnpoz said in Non-forwarding Resolver intermittent operation:

    Your pings to 8.8.8.8 not being answered - ok, contact your ISP about it.. Has zero to do with pfsense, ZERO..

    And this again has zero to do with unbound and your title.
    "Non-forwarding Resolver intermittent operation"

    Unbound could give 2 shits about 8.8.8.8 not answering when it resolves, ie non forwarding mode..

    @johnpoz from what I've understood is that @kevindd992002 did contact the ISP, but their tech is blaming pfSense and won't take any action. I can understand the frustration there. Even more frustrating is that in trying to prove it's not pfSense, it's only making things more confusing.

    I would look at some of the suggestions in that link I sent. Also, putting the Asus in place of pfSense might be another step?

    Oh, and something that might also give you more insight then ping is a trace route.

    Raffi



  • 8.8.8.8 not being answered and not really a huge deal with unbound is something that I agree with, ok, but that was just one sample server. As you can see with the packet capture, almost all servers that I was trying to ping had packet loss one way or another. So with unbound (non forwarding), I'm sure the query to the root hints servers is affected too as it seems that all destination servers are affected when the issue is happening.

    As for the title, yes I'm sorry about that. I should've created a new thread that is not specific to unbound but it just got derailed and hard to abandon now. But if possible, the mods can put it in the correct forum section and let me edit the title.

    @Raffi_ , yes I'll check the suggestions in that link tomorrow :) It's already 1:30AM here and will have to continue tomorrow.

    And yes, I didn't forget about removing pfsense from the mix and try either just the asus router or directly connecting to the modem. Those steps will come eventually.


  • LAYER 8 Global Moderator

    And show the tech that pings were sent and not answer - this has ZERO to do with what sent them... If the tech will not believe that.. Then they are an idiot..

    Still not seeing what the problem is.. Like I said its all over the board.. If X is not answering a dns query.. Where is that listed? Was the dns query sent, then again not anything to do with pfsense..

    What else is there to talk about... Simple sniff, if shows traffic being sent, and nothing answered it has nothing to do with pfsense.. There is nothing left to do if you sniff and see traffic being put on the wire, and nothing coming back - its outside the control of what put it on the wire.

    You would have another thing if nothing being sent on the wire, or you see an answer but not being processed.



  • @johnpoz said in Non-forwarding Resolver intermittent operation:

    And show the tech that pings were sent and not answer - this has ZERO to do with what sent them... If the tech will not believe that.. Then they are an idiot..

    Still not seeing what the problem is.. Like I said its all over the board.. If X is not answering a dns query.. Where is that listed? Was the dns query sent, then again not anything to do with pfsense..

    What else is there to talk about... Simple sniff, if shows traffic being sent, and nothing answered it has nothing to do with pfsense.. There is nothing left to do if you sniff and see traffic being put on the wire, and nothing coming back - its outside the control of what put it on the wire.

    You would have another thing if nothing being sent on the wire, or you see an answer but not being processed.

    Again, I'm simply asking help in what to tell the tech in trying to rule out pfsense. They are idiots, I agree, but I don't have a choice but to deal with them and convince them that the issue on their side.



  • @kevindd992002 I'm interested to know how the non pfSense experiments go.

    If push comes to shove and they don't want to help you fix your problem, you nicely explain to them that you'll switch to their competitor (assuming there is one). You'll be surprised at how they suddenly become more interested in helping you solve it.


  • LAYER 8 Global Moderator

    You have already proven the issue is there end, when you show a ping going out and not getting an answer.. Not sure what else you can do.. Do the same thing when just a PC connected to their device..



  • @Raffi_ said in Non-forwarding Resolver intermittent operation:

    @kevindd992002 I'm interested to know how the non pfSense experiments go.

    If push comes to shove and they don't want to help you fix your problem, you nicely explain to them that you'll switch to their competitor (assuming there is one). You'll be surprised at how they suddenly become more interested in helping you solve it.

    Yeah, me too. I'll try accomplishing both direct-to-modem and using the Asus router tests this week and will report back. Thanks.

    Yeah, I know how ISP's react when you say that. They panic and makes things solved faster. There are competitors, for sure, but in my condo there's only one offering FTTH connections, the one that I'm using now. The others are using crappy phone copper cables which are very substandard. And I just switched from that copper ISP to this fiber ISP since January 2019 so not long ago.


  • LAYER 8 Global Moderator

    Where your going to have a problem is pinging stuff off their network and getting packet loss - they can always just say not their network..

    You need to have pings going to their network, and not getting back answers.. Also many (all really) an ISP do not promise zero loss, so unless you have significant packet loss - good luck.. TCP can work just fine with a small amount of packet loss.. Do they have anything in their SLA about amount of packet loss?

    If you call and say yeah over 10k pings I saw .01% loss - they will just laugh and say, ok so? But if you show that you have sustained loss say 5% then you might have something to complain about.

    So I am going to say it again, a few packets here or there loss is not going to be an issue.. And not the root of your problem with issues with resolving stuff.. Is not like dns only does 1 query, and if no answer just says F it, doesn't work... DNS will send multiple queries before it gives up, and can even switch to tcp vs udp, etc. So for packet loss to be a problem with dns resolution it really needs to be a significant issue.

    Also - for resolving, not forwarding the different NS will be tried - for example roots have 13 different NS, if one does not respond another will be tried.. Unbound keeps track of which ns respond faster, etc. And will use them more then ones that are less responsive.. Look at your infra cache..

    If your having dns resolving issues - you need to troubleshoot that specific issue.. Not just that you lost some pings to 8.8.8.8



  • @johnpoz said in Non-forwarding Resolver intermittent operation:

    Where your going to have a problem is pinging stuff off their network and getting packet loss - they can always just say not their network..

    You need to have pings going to their network, and not getting back answers.. Also many (all really) an ISP do not promise zero loss, so unless you have significant packet loss - good luck.. TCP can work just fine with a small amount of packet loss.. Do they have anything in their SLA about amount of packet loss?

    If you call and say yeah over 10k pings I saw .01% loss - they will just laugh and say, ok so? But if you show that you have sustained loss say 5% then you might have something to complain about.

    So I am going to say it again, a few packets here or there loss is not going to be an issue.. And not the root of your problem with issues with resolving stuff.. Is not like dns only does 1 query, and if no answer just says F it, doesn't work... DNS will send multiple queries before it gives up, and can even switch to tcp vs udp, etc. So for packet loss to be a problem with dns resolution it really needs to be a significant issue.

    I'm going to ask this again too: Would a 10-minute ping to google.com with ALL RTO's not considered an issue for you? I'm really having a hard time thinking why you wouldn't consider that an issue.

    Like I said, I CARE LESS for few RTO's because they will be retried anyway, I agree with you completely. But if you start getting 100% RTO for a span of even just one minute and your clients cannot browse the Internet 100%, then where in the world is that not an issue?



  • I will record a video of the issue when I get the chance and post it here as proof.


  • LAYER 8 Global Moderator

    10 minutes yeah that is a problem - But maybe outside the isp control, maybe its somewhere past where the traffic leaves the ISP.. You can get them to troubleshoot it is happening all the time and you can not get to google. But 8.8.8.8 is not google.

    And 8.8.8.8 does not have anything to do with overall dns, unless its the authoritative NS for what your looking for and can not talk to.. Which seems really odd, since its an anycast.. Unless you are forwarding to it, which per your title you are not.

    If you can show an outage pinging 8.8.8.8 for 10 minutes - then for sure you can bring that to your ISP attention and say hey - WTF... But unless you can show them that it happens more than once in a while your going to have a hard time getting their attention.

    edit: Do not post any stupid videos.. JFC nobody is going to watch such nonsense... You can either resolve something or you can not, you can either ping something you can not... Show the sniff of the traffic, show the traceroute to the IP.. So the sniffs of the resolving action and getting no response, etc.

    BTW - here are the NS involved in resolving google.com.. Notice 8.8.8.8 not there

    [2.4.4-RELEASE][admin@sg4860.local.lan]/: unbound-control -c /var/unbound/unbound.conf lookup google.com
    The following name servers are used for lookup of google.com.
    ;rrset 78708 4 0 2 0
    google.com.     78708   IN      NS      ns2.google.com.
    google.com.     78708   IN      NS      ns1.google.com.
    google.com.     78708   IN      NS      ns3.google.com.
    google.com.     78708   IN      NS      ns4.google.com.
    ;rrset 78623 1 0 1 0
    ns4.google.com. 78623   IN      A       216.239.38.10
    ;rrset 78623 1 0 1 0
    ns4.google.com. 78623   IN      AAAA    2001:4860:4802:38::a
    ;rrset 78623 1 0 1 0
    ns3.google.com. 78623   IN      A       216.239.36.10
    ;rrset 78623 1 0 1 0
    ns3.google.com. 78623   IN      AAAA    2001:4860:4802:36::a
    ;rrset 78623 1 0 1 0
    ns1.google.com. 78623   IN      A       216.239.32.10
    ;rrset 78623 1 0 1 0
    ns1.google.com. 78623   IN      AAAA    2001:4860:4802:32::a
    ;rrset 78623 1 0 1 0
    ns2.google.com. 78623   IN      A       216.239.34.10
    ;rrset 78623 1 0 1 0
    ns2.google.com. 78623   IN      AAAA    2001:4860:4802:34::a
    Delegation with 4 names, of which 0 can be examined to query further addresses.
    It provides 8 IP addresses.
    2001:4860:4802:34::a    expired, rto 154191216 msec, tA 0 tAAAA 0 tother 0.
    216.239.34.10           rto 223 msec, ttl 776, ping 7 var 54 rtt 223, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
    2001:4860:4802:32::a    rto 376 msec, ttl 776, ping 0 var 94 rtt 376, tA 0, tAAAA 0, tother 0, EDNS 0 assumed.
    216.239.32.10           not in infra cache.
    2001:4860:4802:36::a    rto 376 msec, ttl 776, ping 0 var 94 rtt 376, tA 0, tAAAA 0, tother 0, EDNS 0 assumed.
    216.239.36.10           rto 311 msec, ttl 776, ping 3 var 77 rtt 311, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
    2001:4860:4802:38::a    rto 376 msec, ttl 776, ping 0 var 94 rtt 376, tA 0, tAAAA 0, tother 0, EDNS 0 assumed.
    216.239.38.10           rto 252 msec, ttl 776, ping 4 var 62 rtt 252, tA 0, tAAAA 0, tother 0, EDNS 0 probed.
    [2.4.4-RELEASE][admin@sg4860.local.lan]/: 
    


  • @johnpoz said in Non-forwarding Resolver intermittent operation:

    10 minutes yeah that is a problem - But maybe outside the isp control, maybe its somewhere past where the traffic leaves the ISP.. You can get them to troubleshoot it is happening all the time and you can not get to google. But 8.8.8.8 is not google.

    And 8.8.8.8 does not have anything to do with overall dns, unless its the authoritative NS for what your looking for and can not talk to.. Which seems really odd, since its an anycast.. Unless you are forwarding to it, which per your title you are not.

    If you can show an outage pinging 8.8.8.8 for 10 minutes - then for sure you can bring that to your ISP attention and say hey - WTF... But unless you can show them that it happens more than once in a while your going to have a hard time getting their attention.

    edit: Do not post any stupid videos.. JFC nobody is going to watch such nonsense... You can either resolve something or you can not, you can either ping something you can not... Show the sniff of the traffic, show the traceroute to the IP.. So the sniffs of the resolving action and getting no response, etc.

    I know 8.8.8.8 is not google.com. These are two different servers that I showed in my tests above. I'm not sure why you're not following.

    Well, I thought a video will help you believe me that there's a problem. If you don't want it, then fine. Obviously, your network troubleshooting skills are way better than mine but I make it to the point to give any information I deem necessary for everyone to check. This is why I'm asking for guidance.

    I thought we're already past the point where we're not considering this to be a DNS issue anymore? If it was, then the resolution part is where I'll have issues but when the issue happens pinging random servers show RTO's as well. I'm not mentioning here that it is a DNS issue. 8.8.8.8 was just the monitor IP I have for the WAN gateway from the very start so that's what I showed everyone in this forum.



  • Here's a traceroute that I sent them a few months ago: https://pastebin.com/JqPx326v

    That looks like the RTO starts from the hop that's within the ISP network. Is that enough evidence for them to conclude that the problem is in their network?

    And then 3 minutes after the issue, it got resolved and the traceroute results became like this: https://pastebin.com/XYbNMiWy


  • LAYER 8 Global Moderator

    You got to the end point in that trace.. That all hops along the way do not naswer does not always mean anything..

    Not sure what you think that shows as a problem?

    Tracing route to www.pfsense.org [208.123.73.69]
    over a maximum of 30 hops:
    
      1    <1 ms    <1 ms    <1 ms  192.168.9.253
      2    10 ms     9 ms    16 ms  50.4.132.1
      3    11 ms    17 ms    10 ms  76.73.191.106
      4     9 ms     9 ms     8 ms  76.73.164.142
      5    12 ms    10 ms     9 ms  76.73.164.154
      6    13 ms    10 ms    10 ms  76.73.191.242
      7    11 ms    21 ms    10 ms  143.59.95.224
      8    30 ms    15 ms    18 ms  75.76.35.8
      9     *       13 ms    11 ms  4.16.38.157
     10     *        *        *     Request timed out.
     11    36 ms    46 ms    37 ms  4.14.49.2
     12    41 ms    35 ms    35 ms  64.20.229.158
     13    36 ms    35 ms    35 ms  66.219.34.194
     14    34 ms    38 ms    35 ms  208.123.73.4
     15    39 ms    35 ms    39 ms  208.123.73.69
    

    So from that trace I guess I am having issues getting to www.pfsense.org?

    same goes for cnn seems

    $ tracert -d www.cnn.com
    
    Tracing route to turner-tls.map.fastly.net [151.101.185.67]
    over a maximum of 30 hops:
    
      1    <1 ms     3 ms    <1 ms  192.168.9.253
      2    10 ms    11 ms    16 ms  50.4.132.1
      3    19 ms    20 ms     8 ms  76.73.191.106
      4    11 ms    10 ms     9 ms  76.73.164.142
      5    13 ms    10 ms    11 ms  76.73.164.154
      6    10 ms    11 ms    11 ms  76.73.191.242
      7    10 ms    10 ms    10 ms  143.59.95.224
      8    13 ms     9 ms    10 ms  75.76.35.8
      9     *        *        *     Request timed out.
     10    12 ms    10 ms    10 ms  151.101.185.67
    


  • @johnpoz said in Non-forwarding Resolver intermittent operation:

    You got to the end point in that trace.. That all hops along the way do not naswer does not always mean anything..

    Not sure what you think that shows as a problem?

    Tracing route to www.pfsense.org [208.123.73.69]
    over a maximum of 30 hops:
    
      1    <1 ms    <1 ms    <1 ms  192.168.9.253
      2    10 ms     9 ms    16 ms  50.4.132.1
      3    11 ms    17 ms    10 ms  76.73.191.106
      4     9 ms     9 ms     8 ms  76.73.164.142
      5    12 ms    10 ms     9 ms  76.73.164.154
      6    13 ms    10 ms    10 ms  76.73.191.242
      7    11 ms    21 ms    10 ms  143.59.95.224
      8    30 ms    15 ms    18 ms  75.76.35.8
      9     *       13 ms    11 ms  4.16.38.157
     10     *        *        *     Request timed out.
     11    36 ms    46 ms    37 ms  4.14.49.2
     12    41 ms    35 ms    35 ms  64.20.229.158
     13    36 ms    35 ms    35 ms  66.219.34.194
     14    34 ms    38 ms    35 ms  208.123.73.4
     15    39 ms    35 ms    39 ms  208.123.73.69
    

    So from that trace I guess I am having issues getting to www.pfsense.org?

    Of course not! Some routers are setup to not respond to ICMP requests, I know that.

    But how do you explain my first and second traceroute before and after (three minutes interval) the issue? Is it because the route to the same server changed in a span of 3 minutes?



  • @kevindd992002 said in Non-forwarding Resolver intermittent operation:

    Here's a traceroute that I sent them a few months ago: https://pastebin.com/JqPx326v

    That looks like the RTO starts from the hop that's within the ISP network. Is that enough evidence for them to conclude that the problem is in their network?

    And then 3 minutes after the issue, it got resolved and the traceroute results became like this: https://pastebin.com/XYbNMiWy

    @kevindd992002 That's not really proof of an issue on their network. Not all hops along the route will always respond. It's common to have hops that don't respond along the route. As long as at the end you get to the server, that's what matters. Also, the hops are not taking very long so that also looks OK.



  • @Raffi_ said in Non-forwarding Resolver intermittent operation:

    @kevindd992002 said in Non-forwarding Resolver intermittent operation:

    Here's a traceroute that I sent them a few months ago: https://pastebin.com/JqPx326v

    That looks like the RTO starts from the hop that's within the ISP network. Is that enough evidence for them to conclude that the problem is in their network?

    And then 3 minutes after the issue, it got resolved and the traceroute results became like this: https://pastebin.com/XYbNMiWy

    @kevindd992002 That's not really proof of an issue on their network. Not all hops along the route will always respond. It's common to have hops that don't respond along the route. As long as at the end you get to the server, that's what matters. Also, the hops are not taking very long so that also looks OK.

    Right, that's what I thought. I just posted the screenshots here in case you guys see something out of the ordinary.



  • @kevindd992002 said in Non-forwarding Resolver intermittent operation:

    Yeah, I know how ISP's react when you say that. They panic and makes things solved faster. There are competitors, for sure, but in my condo there's only one offering FTTH connections, the one that I'm using now. The others are using crappy phone copper cables which are very substandard. And I just switched from that copper ISP to this fiber ISP since January 2019 so not long ago.

    Off topic, but we're actually in the copper test industry. Believe it or not, they have technology now that is able to get close to Gigabit speeds on those old copper lines if the ISP is willing to invest in it. I'm curious are you in Australia? Here in the US, the old phone lines have been mostly abandoned in terms of further investment.



  • @Raffi_ said in Non-forwarding Resolver intermittent operation:

    @kevindd992002 said in Non-forwarding Resolver intermittent operation:

    Yeah, I know how ISP's react when you say that. They panic and makes things solved faster. There are competitors, for sure, but in my condo there's only one offering FTTH connections, the one that I'm using now. The others are using crappy phone copper cables which are very substandard. And I just switched from that copper ISP to this fiber ISP since January 2019 so not long ago.

    Off topic, but we're actually in the copper test industry. Believe it or not, they have technology now that is able to get close to Gigabit speeds on those old copper lines if the ISP is willing to invest in it. I'm curious are you in Australia? Here in the US, the old phone lines have been mostly abandoned in terms of further investment.

    I can imagine. I was mostly talking about how sub-standard the copper wires are here in our condo. Even the copper ISP's themselves tell me that the copper wires that the contractors used in this condo are crap. The copper wires in the building's cabinet are worse than how spaghetti looks like. And no one wants to invest to replace those. I'm in the Philippines, so a third-world country, but Internet service here came a long way already. My two service plans are 35 down/35 up (around $31) and 300 down/300 up (around $87).



  • @Raffi_

    The Asus router as the main router and without pfsense has been issue-free for the last two days. It's still too early to tell but I'll continue monitoring during the weekend (the time when the issue usually occurs most) before I come to a conclusion. If it does run flawless until Monday though, I'm not sure how to continue troubleshooting pfsense except to uninstall and reinstall it from scratch. I mean that's an easy task when I just need to reload the config but if I am to go that route I would want to not carry over any settings from the config (which might be corrupted or something, for all we know).



  • @kevindd992002 Interesting. Ok, that sounds like a good plan. Yea give it a little while to see how it goes. We'll see what the next step is from there. Have a good weekend.
    Raffi



  • @Raffi_

    After 5 days of continuously using the ASUS router, I've never had any single occurrence of the issue! That isolates the ASUS router, cables, and ISP modem from being the root cause of the issue.

    I've decided, just now, to switch to pfsense and as soon as I've plugged it in and waited for everything to go green in the Dashboard, I experienced the issue. It's got to be either the pfsense software itself or the physical hardware that hosts pfsense (although I doubt this). What can you recommend as a next step here?


  • LAYER 8 Global Moderator

    And your asus router was actual resolving for dns?

    You title says non forwarding problems.. I find it unlikely that your asus router was resolving for dns vs forwarding..

    Do you understand what the difference is?



  • @johnpoz said in Non-forwarding Resolver intermittent operation:

    And your asus router was actual resolving for dns?

    You title says non forwarding problems.. I find it unlikely that your asus router was resolving for dns vs forwarding..

    Do you understand what the difference is?

    Yes, I understand the difference between DNS resolver and DNS forwarder. I've already established this a few posts above. How can I rename the title for this whole thread and move it to the correct section? So that we can all be over the technicalities. Are my test results still not convincing for you that pfsense is causing my issue? What can I do to convince you?

    The ASUS router is NOT a DNS resolver. It is a DNS forwarder and I was forwarding to the OpenDNS servers. That's the only main difference I see: pfsense was set as a DNS resolver (using root hints and not forwarding) while the ASUS router does not have this feature and is simply doing DNS forwarding.


Log in to reply