DNS 8000+ms, troubleshooting help
-
@srytryagn said in DNS 8000+ms, troubleshooting help:
Load on pfsense, in what way. CPU 3%, Mem 3%, bandwidth 25Mb/s of 1000Mb/s. Seems underutilized.
I mean the relative load, so does it increase when you enable the app? Either the CPU load or the traffic it's seeing?
Check the monitoring graphs in Status > Monitoring.
If all LAN side clients are seeing latency of 8000ms to all external IPs that pretty much has to be some huge traffic load on the router or maybe a switch etc.
-
@stephenw10 said in DNS 8000+ms, troubleshooting help:
maybe a switch etc.
yeah quite possible one of the apps is just flooding the network with garbage, but pfsense itself is not processing this garbage..
Really be curious to see the sniff on lan side interface of pfsense when you turn this thing on and see the latency..
Your network 2 you show there, is there just 1 box connected directly to pfsense interface, does that go through the same switch? if network 1 was flooded with garbage that pfsense was not processing, then network 2 should really see no effects if your switch on network 1 was overloaded, say a loop or something.
Vs looking at just cpu, mem etc... Do say your states sky rocket when you turn this device on? mbuf?
-
To answer you questions 1) Network2 is not on same unmanaged switch, it is an independent LAN, and yes it is grinds to a hault when AppB runs with AppA on a different machine in Network1. 2) States have a lot more entries since I connect w/ 30 peers.
Thanks for suggesting that tool to not show public IP, I could also just dump into excel sheet and share relace it w/ 1.2.3.4. Is there any other critical information I should keep out?
What kind of pcap shall I post that would be most helpful for sorting out what is going on ?
Should I run pcaps with no app, AppA, AppA + AppB ?
Please let me know the specific options and interfaces, IP, Ports etc...
Really looking forward to finding a solution and thanks for helping !
-
@srytryagn A sniff on your lan for everything without your stuff running, and then again with it running would be good to start with for sure.. it defaults to 1000 packets.. Which should be enough to see if network is being flooded with stuff.
The tool would be much better to change your public IP.. And you can also remove payload if you want and truncate the frames.. We really don't need to see what the data is - but helpful to see all the protocols and such.. A spreadsheet would be much more difficult to look at then just opening it in wireshark.
You can look at the anon file it creates to make sure your public IP has been changed, etc.
The tool is pretty good for anonymizing the traffic.. But still see what is taking to what.. For example 1.2.3.4 while clearly made up that is my wan, so with some insight to what your changed to what we can get the details we need of what is trying to talk to what, etc.
States and mbuf are one thing - but do they jump up to insane amounts - I mean 30 peers isn't crazy, unless they are creating like 10k states each to each other, etc.
-
@johnpoz Will do that and post it here soon.
-
@johnpoz File enclosed with No apps and AppA + AppB running.
I have a bunch of other captures, no port fwd, port fed, start AppA, Sync AppA, Running AppA, AppA + AppB.....
Please let me know what the file points to being wrong w/ my setup and how I may repair it, thanks.
-
@srytryagn could you give some insight to what IPs you changed to what..
Your one with stuff off I see 993 packets in a total of 228 ms.. between 10.50.18.154 and 10.50.245.28, that is a lot of packets between 2 devices in short amount of time.. For what I would assume is network not doing anything.. Is one of those IPs your public, one of one these node devices.. Thought you said they were off?
Then in this other one with on, Its hard to follow because there is 166 different conversations in it all with this 192.168.86.26, what is this 192.168.86.26? in a total sniff of 1.4 seconds?
-
In Apps off mode:
10.-50.18.154 = PC on network 2
10.50.245.28 = KVM over IP connected to PC aboveIn Apps on mode:
192.168.86.26 = PC on network 1 running the apps -
@srytryagn said in DNS 8000+ms, troubleshooting help:
10.50.245.28 = KVM over IP connected to PC above
Well is the only traffic really in that sniff.. So that sniff is pretty useless..
So your saying with that 2nd pcap - is when your network is dead?
-
labelled -> pcap_AppA_AppB_anon.pcap = Apps on internet DEAD
labelled -> pcap_apps_off_Port_Fwd_anon.pcap = Apps off, internet and everything else working normal.
Should I do a longer pcap with a particular configuration to make it more useful for analysis ?
-
@srytryagn well your sniff where you said stuff is broke, I see nothing but this 192.168.86.26 talking to bunch of stuff.. I don't see anything else.. So its hard to say well dns was delayed or arps failed or lots of retrans.. There are a few retrans, but nothing out of the ordinary..
Looking at that sniff I don't see anything wrong at all.. But then again there isn't much other traffic.. And the small about there is, its in the middle of something and don't see any problems.. No retrans for example.. If your network had huge delays on it, you would see loads and loads of retrans when something didn't get an answer fast enough, etc..
-
@johnpoz Yup, I cant sort out what is going wrong, many folks have confirmed being able to run the apps without any issue.
What do you reckon ?
-
Seems very likely one or both 'apps' are misconfigured and flooding traffic that should never leave the host IMO.
-
@srytryagn said in DNS 8000+ms, troubleshooting help:
What do you reckon ?
Can't reckon anything from those sniff.. One is just kvm traffic, and the other is an IP talking to lots of other stuff, but nothing that is insane amount of traffic.. no errors seen, no retrans seen that are anything of any significance.. Don't even see any other traffic, no broakcast, no flood of multicast, Then again is 1.4 seconds worth of data.. So wouldn't expect to see anything - unless there was some sort of flood of traffic..
Maybe a longer sniff - while your trying to do stuff that says failing.
-
@johnpoz Sure will run a longer sniff and will try to connect to some websites that will fail.
Any point in sniffing the Wan side or anything else while I am working on this ?
-
@srytryagn not really - you say your devices when turned on are what cause the problem.. So unless when your devices phone home they are doing a volumetric ddosing attack against your wan? The traffic has to go through your lan to get anywhere.
Guess that is a possibility, and if all the traffic was just dropped it wouldn't show really on pfsense as anything.. But your internet could become crazy slow..
I guess if your connecting to some sort of swarm or something - and you get bombed from 100s or thousands of devices trying to talk to your wan? And they are bombing you with large UDP packets or something?
Couldn't hurt to see normal taffic flow on your wan, and then sniff after you turn on your stuff and you say stuff fails/slow, etc.
-
If that is the case then DNS and/or ping response from pfSense itself to something external would also be affected. I think we asked about that but I may have missed the reply.
-
This post is deleted! -
@johnpoz pcap enclosed w/ couple of attemtps at reaching out to websites (they all failed). extralongpcap_anon.zip
Please let me know what mapping IPs/descriptions are helpful to know if at all.
-
@stephenw10 Can I ping from the console in web gui to test that ?