Troubleshooting WAN latency
-
I would likely agree that in modem only mode, the page may not be visible unless it's required, which would typically be for other reasons anyway, however it does ping very sporadically so it's not completely offline so to speak. Also as noted earlier, if i disconnect the modem, then power it back on, i can both ping and access the modem management page for a few hours thereafter while still retaining internet access.
I went with your suggestion, i removed the coax cable from the back of the modem while running a ping to the modem only mode IP, i can confirm that the modem did NOT return, i did lose internet and Pf shows as offline with packet loss as expected, but the modem page did not return.
reconnecting the internet as above did grant me access to the modem IP and webpage once more, it's only lasted about half hour this time, but i was able to get in and take screenshots, nothing shows as unusual all connections look good and strong. While writing this the ping has also returned to my modem IP and is solid 1ms - 2ms or so.
I am however less concerned about gaining access to the modem management page than i am about troubleshooting why i have latency and why it's almost every hour.
It's not unbearable, but it's also frustrating.
I will report this to my ISP for completeness, but every hour +/- 10 minutes seems too regular for it to be an external factor and in the 22 years I've been with them, only ever had 2 maybe 3 issues, all of which disconnected me completely as it was a cabinet fault (outside).
Open to any other suggestions of course and I will also build a clean Pf box alongside this one, to see if i can repeat the issue, if i cannot repeat the issue this will confirm something is going on within Pf - right?
-
@Rod-It Most probably its an isp issue.
You are reporting packet loss, not just latency, which is much more severe.
I highly doubt it has anything to do with pf.
Most probably, as traffic patterns changed massively due to work from home loads, there is congestion somewhere.You could hook up a laptop directly on the modem, and run some pings for a few hours.
-
Just an FYI the ISP is investigating, but this may take some time as nothing is showing as a fault at present and I've had to log it on their forum as it's not a critical or without service fault.
Thanks for chiming in.
I'm still fairly new to networking in general and troubleshooting stuff like this, I deal mostly with server and virtualization but need to understand networking more generally, so all advice and guidance is well received.
-
@Rod-It Keep in mind that isps make money by overselling installed capacity. The covid situation has invalidated many known statistically safe patterns, and isp's struggle to balance traffic with upgrades.
Any descent isp has in place monitoring capabilities and can see where and when congestion occurs. If a report comes in, they will have a look, and in most cases silently adjust things, even though they will never say more than just fixed.Now if you really want to make sure your infrastructure is ok, you need a box (laptop/desktop) that can be moved around and can run iperf3.
iperf3 creates traffic that saturates lines and will reveal lurking issues.
First you run iperf3 on your lan, with just the ethernet switch.
Thus a baseline is established. You should expect near gigabit speeds there.
Then you run the same test with pf lan, and finally you move the probe to the wan side of pf.
You can leave it running for some time and observe cpu load, interface errors, and packet loss.
Much easier than recreating pf from scratch too. -
Again I agree with you about over saturation and Covid-19 having an impact - that's a given, especially as more and more people work from home or watch Netflix etc to pass the time.
I am happy to also put it down to this, but the timings are too frequent, I've been watching the logs for a while on and off all day and it seems to happen every hour about 15-20 minutes past, not always disconnecting me, but generating latency.
The fact it's frequent would suggest this is not a saturation issue, but something else is happening.
https://www.thinkbroadband.com/broadband/monitoring/quality/share/d4f84708f295bf0e6f70b9e8dc9486e001bd6167
Here is what i started capturing earlier today, ignore the red block - Dumbo here forgot to allow WAN pings.
I am also aware my ISP uses Intel's Puma6 chipset in my router, another reason i put it in modem only mode, their next revision uses Puma7 also not great, but offers gigabit speeds.
I also noticed my modem is only showing 2 of 4 upstream channels, I've posted that on the other forum too and this may be the cause, but I'll have to wait and see.
The LAN side of things is not affected, I can happily push gigabit over the network all day long without issues, Wi-Fi is fine and stable and none of the VLANs show issues, it's only WAN and seemingly every hour, roughly 15-20 minutes past each time.
As an FYI - the WAN and LAN are pretty quiet today as there is only me in and for the most part i only have a couple of pings running to a few devices and YT in the background to keep me from falling asleep.
Wan is showing 2Mbps currently and the biggest other VLAN in use is 5k - a fairly quiet typical day.
-
@Rod-It Still it could be congestion upstream.
From what you say there is nothing in your system to cause this.
However traffic in the neighborhood can. -
Absolutely, but not so frequent on the same times.
All i can do now is wait for the ISP techs to look at my router logs and determine if i have a problem, from what i read, if the qam is not 64 (for example if it's 32 or 16) or the channels are not populated fully, in my case 2 instead of 4 and downstream 24, then this could cause sporadic issues - in a way I hope it's the ISP, at least i can stop diagnosing a fault i cannot control.
Not having fully connected channels is something to do with SNR and less to do with congestion, either way, appreciate you sticking with me.
-
@Rod-It My practical knowledge on docsis is worse than yours, so we have to wait for the isp.
Apart from that, iperf testing can verify everything else but the wan link. -
I guess the other side to this is, had i not have been working from home, would I have ever know if there was an issue.
I pay them a monthly fee, they can earn it this month.
-
What brand and model modem are you using?
Who is the ISP?
-
SuperHub3 (Arris VMDG505) provided by Virgin Media UK.
It would not be my choice, but we dont get a choice, take it or leave it.
I still have my old SH2 but this does not support the speed i am paying for and the SH4 is only available to gigabit customers, of which i cannot be at this time as it's not in my area (even if i wanted to be on this)
-
So, I can confirm this is NOT an ISP issue, if i remove PfSense from the mix and connect directly to the router i do not have this issue.
This is last nights BQM directly connected to a laptop;
Ironically this is this mornings, back on PfSense;
Before the red block was yesterday on PfSense, the red block was when i had the device connected directly to a laptop (screen above) and the content after the red block is back on PfSense - it has yet to have a repeat spike like it was, i believe the initial spike was as devices were all connecting back online for the first time.
Pfsense has not been rebooted, all i did was unplug the WAN cable and put it in to another device - i did this already a week ago before i started with the BQMs and this did not happen.
Hopefully it's gone away, but posting to keep all informed.
Not sure i understand why this was the case if it has now gone away, so any other ways to troubleshoot the cause in the future would be welcome.
Also note i am continually able to ping and connect to my modem, whereas before this i was not, i wonder if this has some reflection on the sporadic spikes.
-
Two things to double check :
MTU
Buffer Bloat -
MTU is fine. Automatically set. 1500.
Bufferbloat report
-
I do appreciate the replies though and things to look for.
I got a spike, as expected during the test, but since this morning i am not suffering the same continual latencies i was for the last few weeks/month and looking at my live graph above, it's been much more settled, so i am lost as to what was causing it
-
@Rod-It said in Troubleshooting WAN latency:
VMDG505
So this is a Puma6 powered modem. And the spikes you are seeing are detailed here..
http://badmodems.com
There are a few issues with the Intel chip that will never be fixed. There are some that have been patched.. but will never be applied to many modems.
If your ISP says they are unaware of this issue then they are lieing.
UDP traffic will cause all kinds of havok. Video, Voip calls, gaming.. ect.
.
Try this tool.. http://www.dslreports.com/tools/puma6 Run this test on something other than Firefox. That browser seems to have its own issues lately. -
My ISP is aware of this, but there is nothing to replace it with, we are not allowed to use our own modems, so it's this or drop to a lower tier.
The modem i have is capable of up to 500Mbits, then they offer you a SH4 (SuperHub4) capable of gigabit, but that is a) not in my area yet and b) runs on the Pupa7 which also has this latency issue, however a slightly better CPU masks a lot of this, likewise, i still have no choice in what i use for the modem piece.
Today has been a lot better - and I can't understand why, I'm actually using the internet more today than yesterday, but I've had maybe 5 spikes, none of which have disconnected me, and of course, some lag and latency spikes I expect.
I've already ran that test as well as many others and i am currently reading up on SQM.
-
Well, i know my replies have been lacking here and i hate leaving topics unresolved, so some further information that might help others.
As it stands, there are two known issues.
1). Not causing my latencies or packet loss, but still needs fixing - my upload channels, i have 2 bonded channels, i should have 4 - the ISP is investigating the cause of this, the problem this causes me is two-fold. First of all i cannot reach my 36Mbps upload speed because i only have half the channels bonded, this in turn cases me an unstable connection during high uploads - I have temporarily fixed this by putting a limiter on my WAN uploads.
2). The ISPs router - the SuperHub 3 has apparently a bug in the FW which causes high latencies and random spikes (seems to have been rolling out since Feb/March), but only when used in Modem only mode and mostly noticeable with PfSense (since this is a common choice for enthusiasts) - I'm linking a topic relating to others and this issue. While this is likely to be a router specific issue, PfSense or BSD doesn't appear to help with this, if you leave the ISP modem in modem only mode and use another firewall, even a consumer grade router the issue doesn't appear to exist or at least not as often, changing the router back to router mode and double-NATting also doesn't show the symptoms, but this is not really a workable solution.
So, back to my original question, is there something or somewhere in PF i can log anything else that would be useful to anyone, to see if the OS or firewall itself is adding to this, perhaps a driver issue? (I am guessing, but trying to offer help at the same time).
Side note - last night i rebooted my router and i've yet to have the same issues, based on posts in the link below, this can sometimes be true for up to about a week. This suggests the modem is getting full and unable to clear itself, in modem only mode there is limited options, in fact after about an hour, i lose access to it completely on the modem IP.
While i am more convinced this is all related to a FW issue in the router after additional findings, is there anything else i can do or capture to see if anything in PF or BSD is related to this?
I've read a few articles that say this issue does not seem to be in 2.5, but i'll have to try and relocate those links again.
The link to one of many of the ISPs FW bug, for those interested;
https://community.virginmedia.com/t5/Networking-and-WiFi/Did-anything-change-with-Superhub-3-0-firmware-recently/td-p/4192831
Post 4 in the above links multiple other articles.
For anyone reading in the UK, this seems to affect
SH3, with FW 9.1.1811.401 and possibly the SH4 for those on Gig
Both the SH3 and SH4 use Intel Puma 6 and 7 chipsets respectively, which have their own known issues, unfortunately in the UK we are restricted by the kit supplied by the ISP and have no options to buy our own modems.
If nothing else i wanted to keep people informed of the progress on my issues.