Short disconnects multiple times per day



  • While I was typing my post I got a small message in the bottom right about "your connection to the netgate servers has been disrupted", fixed itself a few seconds later.

    I've added 8.8.8.8 as the monitoring IP for my gateway, thank you for that suggestion.
    Don't really know where I can set the % but thats not that important right now, I've had no errors like you posted them in over a month, so again I'm pretty sure my WAN connection is stable.

    It has to be something between pfSense and my PC.
    Is there a good monitoring tool for windows ?
    If it was a permanent thing I could maybe troubleshoot it, but right now it's just sort of happening for a few seconds and I can't do shit about it. And even if I somehow fixed the problem, I would have no way of actually knowing if it works.
    I need some hard logs I can work with.

    Oh and my DNS resolver is more or less default. The only thing I changed is adding some host overrides for some devices on my local network. I removed the ovveride for my pfSense box just in case that somehow screwed up something.

    Oh one more thing, I basically have 2 connections from my PC to my pfSense box. One over my normal LAN network, and then a seperate network on a different LAN cable between my PC and the pfSense box.

    My LAN has
    allow IPv4 UDP LAN * this firewall 53
    and
    allow IPv4 * LAN * !RFC1918 *

    My management network has
    allow IPv4 UDP LAN * this firewall 53
    and
    allow IPv4 TCP Management net * this firewall 443+80+22



  • @XX302 said in Short disconnects multiple times per day:

    I've added 8.8.8.8 as the monitoring IP for my gateway

    In that case, this :

    Apr 27 15:54:10 dpinger WAN_PPPOE 8.8.8.8: Alarm latency 6293us stddev 178us loss 42%
    Apr 27 15:55:35 dpinger WAN_PPPOE 8.8.8.8: Clear latency 6203us stddev 213us loss 0%

    is normal.
    8.8.8.8 is a DNS server with " quiet a bit of people hanging on to it (half the planet by now ?)". The IP is probably a shadowed server all over the planet, but probably not the best ICMP "Monitoring IP".

    edit :
    @XX302 said in Short disconnects multiple times per day:

    allow IPv4 UDP LAN * this firewall 53

    And TCP ??? Not letting through TCP port 53 is very bad - many DNS requests and answers are to big, and are send over TCP.
    (this probably solves your local issue).



  • Alright changing that lol

    So, if that is indeed the source of all of my problems, is there any monitoring tool like dpinger for windows ?



  • It also a question of selecting a good IP to monitor.
    Most often it's an IP upstream known to you as the upstream gateway.
    Using Google's IP's makes you being sending ICMP to Stockholm, then Berlin, Paris, Amsterdam, New York and Berlin again ....

    edit : wait ....
    This is one of my 'munin' pages : https://www.test-domaine.fr/munin/brit-hotel-fumel.net/pfsense.brit-hotel-fumel.net/ping_google_public_dns_a_google_com.html - I do actually use a Google domain to ping against ... for years now ....


  • Netgate Administrator

    When you are monitoring the gateway IP directly you can only say the connection was good to the gateway, the first hop in the ISPs network. If the ISP has some upstream issue you won't see that in the gateway logs or quality graphs.
    So you cannot assume the data you have until now shows a good connection. Setting an external monitoring IP will show you better data from now on.
    I've personally never seen any issue using 8.8.8.8 as the monitoring IP but, yes, it was not intended for that purpose. It uses anycast to provide a pretty local IP to you where ever you are.

    Steve



  • Google will probably work for now.
    But thats only pfSense -> everything upstream, I need to be able to monitor from my PC upwards.

    Edit:
    Yeah, I understand what you mean Steve. But I called my ISP and they didn't mention anything, so evem though it would be nice to just blame my ISP, I couldn't do anything about it in that case. And since I'm merely a enthusiast there is a high possibility that I misconfigured something here in my local network.



  • Yup, changing the DNS firewall rule and deleting the host override for the router did absolutely nothing, still getting disconnects.
    Nothing in the gateway logs so the WAN side is absolutely fine.

    What else can I do to track down this problem ?


  • Netgate Administrator

    So you change to monitoring 8.8.8.8 for the WAN? What do the WAN quality graphs look like?

    Steve



  • ...where was the quality graph again ?

    Monitoring is on 8.8.8., but at this point I'm pretty sure it is not a WAN problem.
    It feels like pfSense is dropping the connection on my LAN every few hours, but I don't know why it would do that.
    Are there logs for the LAN that could reveal anything, or like I said any monitoring tools for windows you guys would recommend ?


  • Netgate Administrator

    The graphs are in Status > Monitoring then hit the wrench icon and chose Quality and the appropriate gateway.

    I would also check the processor usage for any spikes at that time.

    Also check the system logs. I have seen systems where reloading everything caused system loading sufficient to introduce delays in opening connections.

    Steve



  • @XX302 said in Short disconnects multiple times per day:

    monitoring tools for windows

    What about setting up a 'pfSense' monitor to one of your LAN devices that is always on ?

    Btw : when the LAN NIC goes down and up, this is logged. related LAN services like the DHCP server would also restart.
    Maybe a bad cable / bad contact somewhere / bad switch (check the power of the switch).
    Or a bad NIC.



  • @stephenw10
    https://imgur.com/a/n7wjETx
    and
    https://imgur.com/a/51yPiDJ
    Last disconnect was 15:55.

    Logs are clear, nothing during that time in General, Gateways, Routing, Firewall...

    And it's not just a delay in opening connections, I don't really care about that, it is more the fact that established connections get droppen.

    @Gertjan Alright, so how would I do that ?
    LAN NIC is on the board of that QOTOM Mini PC, but I guess stuff like that would show up in the system logs, right ?
    There is absolutely nothing. DHCP logs say also nothing...
    Switch power is ok, I could switch the cable itself (self-crimped Cat6 iirc), but I need to see these disconnects on actual logs. So far I am only experiencing the effects, but there is not the slightest sign in pfSense that something happened during that time.


  • Netgate Administrator

    Well that big spike looks suspicious but that was at ~2am if the time is correct there. There was an event though just before 16.00. Nothing dramatic at that scale but you should try using 1h at 1m resolution graph to see more.
    You should also disable the processes line on the system graph as that swamps the other data. I can't see any spikes there though.

    Steve



  • https://imgur.com/a/E5GnI2v

    The big spike was 15:08, the smaller one exactly 16:00.
    I do know however that the disconnect was at exactly 15:54, and was over after less than a minute, so both of those spikes are too late/early. And it is just a jump from 2 to 4/11 ms, package loss was at 0% all the time, so I doubt it had something to do with it.

    Back to what Gertjan mentioned, how can I set up logging like that for LAN devices ?
    I do have some new infermation hinting towards a possible problem further upstream, but I have to confirm that the connection is stable on my end before I can start calling my ISP and yelling at people.
    I'm getting more and more desperate here every day, I just want my working Internet back.

    Alright, enough with the crying, if you guys had a problem likes this on your network, what would you do to find the issue ?


  • Netgate Administrator

    I would setup MTR or Smoke ping from a LAN side client to some places out of the internet and let it run until I saw failures then check where in the route is failing. I believe there are Windows variants of those but I've never tried them.

    Steve



  • Well... this gets more and more interesting.

    Just woke up to see that I had no internet at all, pfSense showed my upstream gateway as down, not pingable.
    Did the magical "Haveyoutriedturningitoffandonagain?" trick (the exact thing I wanted to avoid with pfSense), and at least my gateway is back. ISP said their end is green, no problems at all.

    And I downloaded a windows version of MTR.
    If I use a custom host like the one that is bothering me every day with the disconnects it just shows nothing, with 8.8.8.8 it works, but shows me 100% loss and "host not available" on all steps, except 8.8.8.8 at the end, which somehow works with 0%.
    I'm trying a different tool now, ping plotter, perhaps this will show me something new.
    Right off the bat I see a 20% loss at my first upstream gateway... hm...

    Edit:
    Now MTR decided to work. I see huge loss numbers (100% most of the time) for my direct upstream gateway, but everything after that seems fine. I'm guessing thats probably some sort of ping spam protection.
    I'll keep an eye on this.



  • Thats what I'm talking about, finally something !
    I traced it back to er1.ams1.nl.above.net, 100% package for 35 seconds.
    https://imgur.com/a/X2OrSB3

    I've never dealt with something like this before, is there anything you can do as enduser ?
    I mean, it's not really the ISPs fault since it's not their server, it's not the fault of the server I'm trying to reach, and it seems like it's not my fault (and I'm surprised by that).

    And even though it seems like that problem is not in my network, I'm still worried about my gateway just going offline this morninh. Like I said, ISP said there was nothing, reboot of my pfSense box fixed it... any ideas ?


  • Netgate Administrator

    Not really. If the ISP gateway stopped responding to ping was it still responding to ARP?

    What errors did you see in the system or gateways log?

    Steve



  • @XX302 said in Short disconnects multiple times per day:

    my gateway just going offline this morninh

    What are your "dpinger" settings (screenshot image) ?
    Even when there is a massive ICMP loss, the upstream connection can still be up and alive.

    bloadtest mentions something special ?



  • Not really. If the ISP gateway stopped responding to ping was it still responding to ARP?

    The gateway stopped responding to ICMP a minute or two after I started MTR and this PingPlotter tool, probably some sort of spam protection. It is at 100% package loss since then, but internet is working fine.

    What errors did you see in the system or gateways log?

    System log is too far back (I only see the last 50 in the GUI, don't know where the actual files are), but Gateway log has
    Jul 10 09:50:51 dpinger WAN_DHCP 8.8.8.8: Alarm latency 9080us stddev 2118us loss 21%
    and monitoring shows this
    https://imgur.com/a/GxWsKYe

    What are your "dpinger" settings (screenshot image) ?

    Not sure if I understand you correctly... dpinger on my pfSense has the default settings, monitoring 8.8.8.8
    If you want a screenshot of my settings, no idea where they are, if you are asking about my screenshot, that is a tool called "PingPlotter", seems to do pretty much the same thing as MTR just with a better GUI. MTR also confirms the package loss at that time.

    Even when there is a massive ICMP loss, the upstream connection can still be up and alive.

    Yeah, 100% PL from my upstream gateway at least for ICMP, but no problems.
    The picture I posted was an actual DC though, fucked my morning up because I lost some things.

    bloadtest mentions something special ?

    https://imgur.com/a/0EAPfbQ
    Grade C

    I am more and more confused.
    If it is just er1.ams1.nl.above.net then it would explain why this problem occured without me changing anything.
    It would also explain why it's not affecting things like VOIP.
    However I find it strange that some international server somewhere in Amsterdam has issues this severe for days now.
    And I also talked with other people in my country using 87.237.34.200 every day (possibly also through that above.net server) without any problems.
    And I had problems like this when trying to reach other servers aswell.



  • @XX302 said in Short disconnects multiple times per day:

    no idea where they are

    Like this (no need to involve imgur - just paste your image here) :

    111dbd4e-8427-4b30-9291-05293d69207d-image.png

    The "5.196.43.182" is an IPv4 I "own", on a server I "own" so no ICMP tricks on that side ;)

    @XX302 said in Short disconnects multiple times per day:

    Grade C

    Ok, please queue up here and put in those rules in place that bring that score to an A+. Include the special "ICMP handling".


  • Netgate Administrator

    @XX302 said in Short disconnects multiple times per day:

    I only see the last 50 in the GUI,

    You can change how many log lines are displayed for each log or globally in the settings.

    Steve



  • 89476e54-e607-48d5-a21f-2a3dd81fbd25-grafik.png
    neat.

    put in those rules in place that bring that score to an A+. Include the special "ICMP handling".

    ...what rules ?
    You linked me a sub forum, do you want me to create a new thread or something ?

    You can change how many log lines are displayed for each log or globally in the settings.

    changed, thx



  • @XX302 said in Short disconnects multiple times per day:

    do you want me to create a new thread or something ?

    Noop.
    No need to ask for something. Just reading ^^
    The 'how to remove bloat' is already discussed over there. Look it up and implement it.


  • Netgate Administrator

    Something like the Limiters defined here:
    https://forum.netgate.com/post/807490

    There are a number of posts in that thread detailing similar arrangements.

    Steve


Log in to reply