2.4.5 High latency and packet loss, not in a vm

A Former User

Bare metal server. Supermicro 5018D-FN4T. No pfblocker, no limiters or queues. As generic as it can be other than my VLAN's. This happens on every boot. Can not log into the admin interface for 1-2 minutes after the login screen is presented. Latency and packet loss persist for 1-2 minutes after the loging in.

The same issue but to a lessor degree happens (Latency and packet loss) happens every time the filters are reloaded/alias added or edited and persists for 2-3 minutes before settling down.

After filter reload:
Screen Shot 2020-03-29 at 09.33.29.png

and:

Screen Shot 2020-03-29 at 12.00.20.png

After boot:
Boot Latency.jpg

A Former User

Terrible subject line :P
Sounds exactly like what we're all discussing here.

A Former User

Thanks. I've seen that thread, didn't read every post, it's mostly installs in a vm and with pfblocker. pfblocker exacerbates the underlying problem as best I can tell but isn't the issue.

A Former User

@jwj said in 2.4.5 High latency and packet loss, not in a vm:

Thanks. I've seen that thread, didn't read every post, it's mostly installs in a vm and with pfblocker. pfblocker exacerbates the underlying problem as best I can tell but isn't the issue.

It has sadly been taken over with people that think pfBlockerNG is something to do with it. I have exactly the same problem as you've posted, where changes to the platform result in latency and packet loss. It certainly seems to affect vm platforms worse, but you have the same symptoms.
Anyway, the more threads the merrier I guess so that people realise pfBlockerNG isn't the cause (though the rules it applies does seem to help surface the underlying problem)

cmcdonald

I'm seeing the same thing. bare metal and in VMs.

getcom

Hello all,

I experienced similar issues also on bare metal. My conclusion is that it is traffic related. pfBlockerNG is also producing traffic with the lists, DNSBL & Maxmind updates.
There was a netgate patch of pfctl in FreeBSD 11.3 which may has indifferent side effects.
Here are some more details beginning from here: https://forum.netgate.com/post/901257
I catched all reported problems beginning from broken mirror, missing PHP files, high latency on both gateways, high system load, unresponsible console.
I will restore to 2.4.4-P3 tomorrow.

mikekoke

Same problem in a physical box.
When I edit a rule and apply the changes, the latency rises to 300 ms.

asan

A
asan 17 minutes ago

I'm also affected.
HW: SG-4860

If the process pfctl has a 100% peak, ping latency is also very high.

Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=1125ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=1613ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=1190ms TTL=55
Reply from 9.9.9.9: bytes=32 time=5ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55
Reply from 9.9.9.9: bytes=32 time=2ms TTL=55

stephenw10

Try running a packet capture on the WAN when you see this. Filter by pings.
Check to see where the latency is happening. Ping requests delayed sending, delayed responses or somehow delayed within pf before it gets back to the ping process.

Steve

A Former User

Delayed by pf. Pings between vlans see the latency when tables are reloaded.

From one vlan to another:

Screen Shot 2020-04-04 at 16.19.02.png

Derelict

That is not a packet capture.

A Former User

I am aware of that. Standby for a packet capture.

Screen Shot 2020-04-04 at 16.54.05.png

Derelict

If you are not able to test in a way that allows you to post actual pcaps I don't know how much good it is going to do anyone.

It is past the point of trying to convince people this is a problem (in apparently edge cases). Now it's about trying to compile information so it can be identified and corrected.

A Former User

That is a pcap, in wireshark with my public ip blanked out. I would be happy to send you the file if you would like but I'll decline to post it publicly, some knuckle head will just decide to go fishing around at my public ip.

stephenw10

I find adding the 'time difference' and 'response time' columns useful here.

That will show if the request is delayed. And what the actual response time on the wire is. Like:

Derelict

I just don't think this data is very helpful at diagnosing exactly what is happening.

A Former User

@stephenw10 said in 2.4.5 High latency and packet loss, not in a vm:

I see delta time but not response time as column choices. Maybe it would be more expedient for me to send the pcap. I have used wireshark exactly once, this time. :)

OK, I see now. Custom column and then icmp.resptime. Does that make any sense if it's not sorted by the icmp seq number?

A Former User

I hope this is more useful. If not I'll try again.

A Former User

I'll add this to the mix. I changed the average time in the gateway settings. That's the time dpinger averages over. When changing the setting, saving and then applying it the interface locked up for an extended time (minutes).

So, I ssh'd in, ran top and did it again:

Screen Shot 2020-04-04 at 21.04.40.png

I can see dpinger using some resources, but why pfctl, ntpd and sshd? I'm not sure if that means anything, but it sure appears odd to me.

riften

This looks so much like the problem I had, even before PFS 2.45. The symptoms. Latency spikes, then packet loss over and over. I had just created my first VLAN and gave the VLAN interface a static IPV6 in one of the 64s I should have. But no route and this horrible latency and packet drop. I followed the info HERE and created a 'Configuration Override' on the WAN IPV6 and set my VLAN static IPV6 and that was the only way to get darn ATT to route IPV6 from my VLAN. It made it trouble free after I spent almost a week pulling out my hair. So just wondering, can you guys ping (route) from your LAN or from the VLANS in ipv6? I am seeing ipv4 pings but did I miss the ipv6 pings...
I'm on 2.45 with no issues, and am using the latest PFBLOCKERNG. It just looks so familiar...