diagnose stuttering performance
-
Hi all, looking for a few pointers on how to diagnose stuttering pfsense performance.
I semi-frequently get a slight delay when opening new webpages during browsing sessions.
Less frequently (it happened three times today), I experience a longer delay that disconnects me from Google hangouts. During this time I am unable to access or refresh the pfsense router webgui, I am unable to ssh to the router, and any ssh sessions to a different vlan are also disrupted.I didn't notice these issues previously when I had a far simpler flat network. I haven't seen anything in the log files to help direct efforts. I did suspect slow DNS lookups were responsible for the frequent web delays, but that wouldn't explain the firewall becoming unresponsive and disruption that I get to active sessions.
The machine is well specced and doesn't seem to be overloaded. i3-7100U with 8GB Ram.
spec:Intel(R) Core(TM) i3-7100U CPU @ 2.40GHz Current: 2300 MHz, Max: 2401 MHz 4 CPUs: 1 package(s) x 2 core(s) x 2 hardware threads AES-NI CPU Crypto: Yes (active) last pid: 43482; load averages: 0.38, 0.30, 0.33 up 0+04:35:21 17:38:31 83 processes: 1 running, 82 sleeping CPU: 1.0% user, 0.0% nice, 0.1% system, 0.5% interrupt, 98.4% idle Mem: 434M Active, 1001M Inact, 1182M Wired, 774M Buf, 5191M Free Swap: 4096M Total, 4096M Free
The pfsense box has 6 NICS, configured as follows
1: empty 2: WAN (EchoLife HG8245Q2, 400Mbps up/down) 3: L2 Switch (TL-SG116E) 4: empty 5: IOT (bridged with a VLAN to create an IOT bridge) 6: Lab (not currently powered up)
The L2 switch is as follows
1: trunk (AP) 2: trunk (AP) 3: access (vlan 50) 4: access (vlan 50) 5: access (vlan 50) 6: access (vlan 50) 7: access (vlan 50) 8: trunk (vmware ESX) 9: access (vlan 30) 10: 11: access (vlan 10) 12: access (vlan 10) 13: access (vlan 10) 14: access (vlan 10) 15: management (vlan 99) 16: uplink to pfsense
The AP's carry a number of VLans- 20, 40, 42, 44, 70
PFSense is the dhcp server for each vlan. It is also the DNS resolver on each interface. I block outbound DNS to prevent my amazon devices ignoring dhcp and trying to use google dns (8.8.8.8).
My Unifi dashboard reports a few issues that may or may not be related as a steer...
Association Timeout 0 WPA Authentication Timeout/Failure 17 Blocked by access control 0 DHCP Timeout/Failure 15 245 Anomalies over 32 devices 4:00pm Client is having trouble resolving a domain name to an IP address (DNS timeout) for last 5 hours 4:00pm High TCP latency for Client for last 24 hours 4:00pm High TCP latency for Client for last 4 hours 4:00pm High TCP latency for Client for last 4 hours 4:00pm High TCP latency for Client for last 24 hours 4:00pm High DNS latency for Client. 4:00pm Client is having trouble resolving a domain name to an IP address (DNS timeout).for last 14 hours 4:00pm Client is having trouble resolving a domain name to an IP address (DNS timeout). 4:00pm High TCP latency for Client for last 4 hours 4:00pm High TCP latency for Client for last 24 hours 4:00pm High TCP latency for Client for last 24 hours 4:00pm High WiFi retries for Client for last 2 hours 4:00pm High DNS latency for Client .for last 6 hours 4:00pm High DNS latency for Client .for last 2 hours 4:00pm Client is having trouble resolving a domain name to an IP address (DNS timeout).for last 4 hours 4:00pm High TCP latency for Client for last 3 hours 4:00pm High DNS latency for Client. 4:00pm High TCP latency for Client for last 24 hours 4:00pm High DNS latency for Client .for last 9 hours 4:00pm Client is having trouble resolving a domain name to an IP address (DNS timeout).for last 4 hours 4:00pm Client is having trouble resolving a domain name to an IP address (DNS timeout).for last 2 hours 4:00pm Client is having trouble resolving a domain name to an IP address (DNS timeout). 4:00pm High DNS latency for Client . 4:00pm High TCP latency for Client for last 4 hours 4:00pm High TCP latency for Client for last 2 hours 4:00pm High TCP latency for Client 4:00pm High DNS latency for Client .for last 2 hours 4:00pm High TCP latency for Client for last 24 hours 4:00pm High TCP latency for Client 4:00pm High DNS latency for Client . 4:00pm High DNS latency for Client .for last 3 hours 4:00pm High TCP latency for Client for last 24 hours 4:00pm Client is having trouble resolving a domain name to an IP address (DNS timeout). 4:00pm Client is having trouble resolving a domain name to an IP address (DNS timeout). 4:00pm Client is having trouble resolving a domain name to an IP address (DNS timeout). 4:00pm High TCP latency for Client for last 4 hours 4:00pm High DNS latency for Client .for last 4 hours 4:00pm High DNS latency for Client. 4:00pm High TCP latency for Client for last 24 hours 4:00pm High TCP latency for Client for last 24 hours 3:00pm Low signal strength for Client for last 2 hours 3:00pm High DNS latency for Client .for last 7 hours 3:00pm Client is having trouble resolving a domain name to an IP address (DNS timeout). 3:00pm High TCP latency for Client for last 5 hours 3:00pm High WiFi latency for Client 3:00pm High DNS latency for Client . 3:00pm High WiFi retries for Client 3:00pm High WiFi retries for Client 2:00pm High DNS latency for Client. 2:00pm High DNS latency for Client. 2:00pm Client is having trouble resolving a domain name to an IP address (DNS timeout).for last 7 hours 2:00pm Client is having trouble resolving a domain name to an IP address (DNS timeout).for last 2 hours 2:00pm Client is having trouble resolving a domain name to an IP address (DNS timeout).for last 2 hours 2:00pm High WiFi retries for Client 2:00pm Client is having trouble resolving a domain name to an IP address (DNS timeout).for last 2 hours 2:00pm High DNS latency for Client .for last 5 hours 2:00pm High TCP latency for Client for last 5 hours 2:00pm High TCP latency for Client for last 3 hours 2:00pm High DNS latency for Client. 1:00pm Client is having trouble obtaining an IP via DHCP. 1:00pm Client is having trouble obtaining an IP via DHCP. 1:00pm High DNS latency for Client . 1:00pm High TCP latency for Client 1:00pm High DNS latency for Client . 1:00pm Client is having trouble resolving a domain name to an IP address (DNS timeout). 1:00pm High DNS latency for Client. 1:00pm Client is having trouble resolving a domain name to an IP address (DNS timeout). 1:00pm High WiFi latency for Client for last 2 hours 1:00pm Client is having trouble obtaining an IP via DHCP. 1:00pm Client is having trouble resolving a domain name to an IP address (DNS timeout). 1:00pm Client is having trouble resolving a domain name to an IP address (DNS timeout).for last 7 hours 1:00pm High DNS latency for Client .for last 2 hours 1:00pm High DNS latency for Client . 1:00pm Client is having trouble obtaining an IP via DHCP. 1:00pm Client is having trouble obtaining an IP via DHCP. 1:00pm Client is having trouble resolving a domain name to an IP address (DNS timeout). 1:00pm High TCP latency for Client
Which logfiles, or what tests might I do in order to diagnose what's causing these problems?
Thanks for reading/assisting
-
Is this happening from one pc or multiple? Are all devices having the issue connected to the same L2 switch? I see two empty interfaces on pfSense. Connect directly to one those when the problems are occuring. It might help rule out/in all the VLAN stuff.
Check the most obvious stuff. Status > Monitoring, does the quality graph show any issues such as excessive packet loss or delay? I remember a while back seeing a case when my WAN went down my access to pfSense webgui was slower.
-
Yes, it's happening on multiple devices.
And yes, all devices are connected via the same L2 switch (wired and wireless). The L2 switch is new (bought to enable my switch to VLANs). I feel like it was initially fine, but I definitely can't rule out the switch.
What makes it hard to diagnose or test (e.g using other NICs) is that, it clears up within 10-20 seconds. I would need to have something running over a long period of time and see if it was affected... perhaps a continious ping against something like google.com or bbc.com?
What are some valid tests that I can run in order to pinpoint the issue? I can deploy raspberry pi's on multiple vlans if necessary
-
@meem , Try to change firewall maximum table entries to 2000000. Go to system > Advanced > Firewall&Nat >
-
@meem said in diagnose stuttering performance:
I would need to have something running over a long period of time and see if it was affected... perhaps a continious ping against something like google.com or bbc.com?
There is a built in tool which already does that for you. Status > Monitoring. Change the left axis to quality, set the right axis to none. That is showing the status of a continuous ping to your default gateway over time
-
What type of NICs do you have?
-
@AKEGEC said in diagnose stuttering performance:
@meem , Try to change firewall maximum table entries to 2000000. Go to system > Advanced > Firewall&Nat >
Thanks- i've made that change.
@Raffi_ said in diagnose stuttering performance:
@meem said in diagnose stuttering performance:
I would need to have something running over a long period of time and see if it was affected... perhaps a continious ping against something like google.com or bbc.com?
There is a built in tool which already does that for you. Status > Monitoring. Change the left axis to quality, set the right axis to none. That is showing the status of a continuous ping to your default gateway over time
https://i.ibb.co/gJJS1rk/Capture.jpg
@Raffi_ said in diagnose stuttering performance:
What type of NICs do you have?
Well, I thought they were realtek..
[2.4.5-RELEASE][root@fw.meemsbox.com]/root: pciconf -lv | grep -A1 -B3 network igb0@pci0:1:0:0: class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'I211 Gigabit Network Connection' class = network subclass = ethernet igb1@pci0:2:0:0: class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'I211 Gigabit Network Connection' class = network subclass = ethernet igb2@pci0:3:0:0: class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'I211 Gigabit Network Connection' class = network subclass = ethernet igb3@pci0:4:0:0: class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'I211 Gigabit Network Connection' class = network subclass = ethernet igb4@pci0:5:0:0: class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'I211 Gigabit Network Connection' class = network subclass = ethernet igb5@pci0:6:0:0: class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'I211 Gigabit Network Connection' class = network subclass = ethernet
But apparantly they're Intel!
-
Quality charts:
8 hours: https://i.ibb.co/g6nqMNh/8hours.jpg
1 day: https://i.ibb.co/gJJS1rk/Capture.jpg
one month: https://i.ibb.co/s10R93Z/month.jpg
-
That's the right graph, but we don't need to see what it looks like now, we would be more interested to look at it when you do have an issue. FYI, you can paste an image right along side the text as you type in this forum.
@meem said in diagnose stuttering performance:
But apparantly they're Intel!
That's a plus.
-
@Raffi_ said in diagnose stuttering performance:
That's the right graph, but we don't need to see what it looks like now, we would be more interested to look at it when you do have an issue. FYI, you can paste an image right along side the text as you type in this forum.
I had a ~10 second delay on a page loading whilst i've been commenting on this thread.
-
@meem said in diagnose stuttering performance:
I had a ~10 second delay on a page loading whilst i've been commenting on this thread.
The quality graphs over time look fine. That doesn't seem to be the issue.
Are you using the default DNS resolver (unbound)? If so, go to Services > DNS Resolver > General Settings, and make sure the DHCP Registration option is not checked.
Another way to check if this is your issue is to go to Status > System Logs > System > DNS Resolver. Is DNS Resolver restarting when this issue occurs? The option I mentioned above could make the resolver restart every time any device on the network makes a DHCP request.
-
@Raffi_ said in diagnose stuttering performance:
@meem said in diagnose stuttering performance:
I had a ~10 second delay on a page loading whilst i've been commenting on this thread.
The quality graphs over time look fine. That doesn't seem to be the issue.
Are you using the default DNS resolver (unbound)? If so, go to Services > DNS Resolver > General Settings, and make sure the DHCP Registration option is not checked.
Another way to check if this is your issue is to go to Status > System Logs > System > DNS Resolver. Is DNS Resolver restarting when this issue occurs? The option I mentioned above could make the resolver restart every time any device on the network makes a DHCP request.
I do have it checked so that I can connect to my services internally by hostname.
I can see that I get 30-40 dns HUPS per hour - looking at the settings, I hadn't changed the default lease time for my new VLANS so i've made that change now. It was at the default (2hours)... made it 8 hours now. Looking at my Splunk logs I can see that i've been getting 30-40 HUPS per hour every hour (including throughout the night)
-
@meem said in diagnose stuttering performance:
I can see that I get 30-40 dns HUPS per hour - looking at the settings, I hadn't changed the default lease time for my new VLANS so i've made that change now. It was at the default (2hours)... made it 8 hours now. Looking at my Splunk logs I can see that i've been getting 30-40 HUPS per hour every hour (including throughout the night)
That could do it. Hopefully, changing that to 8 hours is enough. I've seen rogue DHCP clients ask for an address every hour regardless of the default setting in pfSense. If changing that is not enough, see if unchecking DHCP registration helps just as test. You then have to decide if your need to lookup hosts by names outweighs having stable DNS, or you can try to track down any remaining rogue DHCP clients on the network not following the 8 hour lease time.