High ping and packet loss in local network

bullet92

My point being that the latencies will vary largely based on the end point devices and their load. A short spike may just indicate that the end device is under load at the time.
In this case, the devices the OP is pinging may simply be under load at the time (since they are technically routers doing their job).

It might seem correct, but if it were i should saturate 100mbit or the end point devices should have 100% or something like cpu using. This is impossible also because when i have this latency from pfsense if i ping from another box the same router at the SAME TIME, i get a fast and stable ping time.

@stephenw10:

In your diagram above you seem to have two boxes labelled 192.168.1.254. Typo? Just indicating the subnet? My not understanding your diagram?

Sry, my error, em1: 192.168.1.130/24, not 192.168.1.254.

dreamslacker

@bullet92:

It might seem correct, but if it were i should saturate 100mbit or the end point devices should have 100% or something like cpu using. This is impossible also because when i have this latency from pfsense if i ping from another box the same router at the SAME TIME, i get a fast and stable ping time.

Is this machine on the same network as pfSense?

Presumably, pfSense is pinging the said router on its 'LAN' or 'DMZ' interface. Is the machine you're using also attached to the same interface or a different interface?

BTW, do you have traffic shaping or QOS enabled on pfSense or the target router(s)?

johnpoz

"It's really normal to see <0.5ms pings on a local network run"

Agreed.. .4 to .5 very common I see this all the time in the lan and expect it.. Is just .1 to .2, I don't see that – I wouldn't call our switches over worked or anything but only time I recall seeing such low numbers is pinging local..

When at work trmw going to ping around the datacenter seeing what kind of low times I can find ;)

bullet92

@dreamslacker:

Is this machine on the same network as pfSense?
Presumably, pfSense is pinging the said router on its 'LAN' or 'DMZ' interface. Is the machine you're using also attached to the same interface or a different interface?
BTW, do you have traffic shaping or QOS enabled on pfSense or the target router(s)?

Yes, i've putted this machine on the same switch ( so the same router interface ) of pfSense. Yes, traffis shaping is currently enabled on pfSense, disabled on the target routers.

dreamslacker

@bullet92:

@dreamslacker:

Is this machine on the same network as pfSense?
Presumably, pfSense is pinging the said router on its 'LAN' or 'DMZ' interface. Is the machine you're using also attached to the same interface or a different interface?
BTW, do you have traffic shaping or QOS enabled on pfSense or the target router(s)?

Yes, i've putted this machine on the same switch ( so the same router interface ) of pfSense. Yes, traffis shaping is currently enabled on pfSense, disabled on the target routers.

Try prioritizing icmp using the floating rules and see what you get. In practical terms, it does little but if it works then you know what to do to get the best effect on your setup.

dreamslacker

@johnpoz: Don't bother to do so on my account. Lol. I do see 0.2 ms roundtrip on wired connections now and then during line testing but this is with fully idle systems. I.e. Idling system ping to smart switch on the other end without and other connected devices.
Nevertheless, a lightly loaded system should still give 0.4-0.5 ms pings.

johnpoz

talking about pinging say the console or mgmt switches from say another switch - neither would be very active – idle sucking juice is about all they wold be doing.. Now in a DC where the run might be 100ft, or quite possible they are next to each other in the rack but again patch panels to connect them most likely.

I wouldn't be doing it for anyone other than myself - I don't recall ever seeing those kinds of speed in any network I have worked on in 20+ years.. But then again maybe I just never pinged anything directly connected ;) Quite possible.

dreamslacker

I only do that to quickly test structured runs really. It's not indicative of real world practical applications but any major issues or interference generally shows.

bullet92

@dreamslacker:

Try prioritizing icmp using the floating rules and see what you get. In practical terms, it does little but if it works then you know what to do to get the best effect on your setup.

@192.168.2.1-QOS_ENABLED:

TEST1
ping -c50 192.168.2.1
–- 192.168.2.1 ping statistics ---
50 packets transmitted, 50 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.117/1.591/14.669/3.278 ms
TEST2
–- 192.168.2.1 ping statistics ---
50 packets transmitted, 50 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.132/1.868/19.644/3.629 ms

@192.168.1.254-QOS_ENABLED:

TEST1
ping -c50 192.168.1.254
–- 192.168.1.254 ping statistics ---
50 packets transmitted, 50 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.640/2.117/10.636/2.417 ms
TEST2
–- 192.168.1.254 ping statistics ---
50 packets transmitted, 50 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.609/1.666/12.005/1.936 ms

@192.168.2.1-QOS_DISABLED:

TEST1
–- 192.168.2.1 ping statistics ---
50 packets transmitted, 48 packets received, 4.0% packet loss
round-trip min/avg/max/stddev = 0.105/4.136/46.755/8.910 ms

TEST2
–- 192.168.2.1 ping statistics ---
50 packets transmitted, 50 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.116/2.888/20.880/5.121 ms

@192.168.1.254-QOS_DISABLED:

TEST1
ping -c50 192.168.1.254
–- 192.168.1.254 ping statistics ---
50 packets transmitted, 50 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.614/3.516/78.450/10.933 ms
TEST2
–- 192.168.1.254 ping statistics ---
50 packets transmitted, 50 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.646/2.797/30.649/4.756 ms

@johnpoz:

And what macs are you seeing on those IPs from arp -a on 2.5 pinging 2.1

arp -a
? (192.168.2.1) at 00:0d:61:79:54:d8 on em1_vlan13 expires in 1175 seconds [vlan]
? (192.168.2.5) at 00:26:55:e3:3f:67 on em1_vlan13 permanent [vlan]

Yesterday i've also updated to snapshot 2.1.1, but no changes.
I think that the prioritization give some benefits, but the proble are not the ping (obviously) but the internet traffic that go through this links wich results slower.. So if QoS is working, there is a bottleneck somewhere in my box? It seems strange load is low and the connections too..
Moreover, even if QoS reduces the problem, this is still present, is inconceivable go from 0.117 ms latency to 14.669 ms. :/

P.S. At this time (12.00am) every day ping and packet loss are worsen

PING 192.168.2.1 (192.168.2.1): 56 data bytes
64 bytes from 192.168.2.1: icmp_seq=0 ttl=64 time=49.547 ms
64 bytes from 192.168.2.1: icmp_seq=1 ttl=64 time=54.244 ms
64 bytes from 192.168.2.1: icmp_seq=2 ttl=64 time=51.494 ms
64 bytes from 192.168.2.1: icmp_seq=3 ttl=64 time=27.504 ms
64 bytes from 192.168.2.1: icmp_seq=4 ttl=64 time=119.251 ms
64 bytes from 192.168.2.1: icmp_seq=5 ttl=64 time=43.744 ms
64 bytes from 192.168.2.1: icmp_seq=7 ttl=64 time=8.241 ms
64 bytes from 192.168.2.1: icmp_seq=8 ttl=64 time=0.118 ms
64 bytes from 192.168.2.1: icmp_seq=9 ttl=64 time=90.099 ms

--- 192.168.2.1 ping statistics ---
10 packets transmitted, 9 packets received, 10.0% packet loss
round-trip min/avg/max/stddev = 0.118/49.360/119.251/35.273 ms

top

last pid: 55487;  load averages:  0.00,  0.03,  0.01    up 0+19:20:19  12:23:35
39 processes:  1 running, 38 sleeping
CPU:  0.0% user,  0.0% nice,  0.4% system,  0.0% interrupt, 99.6% idle
Mem: 63M Active, 213M Inact, 143M Wired, 1052K Cache, 112M Buf, 1570M Free

Swap:

vmstat -i

interrupt                          total       rate
irq1: atkbd0                           3          0
irq14: ata0                           57          0
irq19: uhci1+                     135782          1
cpu0: timer                     27847076        399
irq256: em0                      4073157         58
irq257: em1                     44002413        631
irq258: em2                       818545         11
irq259: em3                     40608091        583
cpu3: timer                     27847053        399
cpu2: timer                     27847052        399
cpu1: timer                     27847052        399
Total                          201026281       2886

/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
Load Average

Interface Traffic Peak Total
em1_vlan13 in 216.004 KB/s 233.244 KB/s 2.379 GB
out 10.831 KB/s 11.537 KB/s 1.078 GB

em0_vlan10 in 0.146 KB/s 2.091 KB/s 175.861 MB
out 1.430 KB/s 13.097 KB/s 3.474 GB

lo0 in 0.000 KB/s 0.000 KB/s 957.725 KB
out 0.000 KB/s 0.000 KB/s 957.725 KB

em3 in 126.926 KB/s 165.866 KB/s 2.608 GB
out 316.054 KB/s 316.054 KB/s 702.344 MB

em2 in 0.600 KB/s 0.600 KB/s 70.714 MB
out 0.744 KB/s 0.744 KB/s 286.328 MB

em1 in 316.593 KB/s 316.593 KB/s 2.063 GB
out 123.062 KB/s 129.154 KB/s 2.080 GB

em0 in 1.521 KB/s 4.879 KB/s 376.255 MB

netstat -i -b -n -I em1_vlan13

Name               Mtu Network       Address              Ipkts Ierrs Idrop     Ibytes    Opkts Oerrs     Obytes  Coll
em1_vlan13        1496 <link#11>     00:26:55:e3:3f:67 14875307     0     0 2561511047  8441171 72643 1157981020     0
em1_vlan13        1496 fe80::226:55f fe80::226:55ff:fe        0     -     -          0        2     -        152     -
em1_vlan13        1496 192.168.2.0/2 192.168.2.5           7945     -     -     508860       20     -       1680     -</link#11>

netstat -i -b -n -I em1

Name               Mtu Network       Address              Ipkts Ierrs Idrop     Ibytes    Opkts Oerrs     Obytes  Coll
em1               1500 <link#2>      00:26:55:e3:3f:67 32198966     0     0 2232366523 19473909     0 2238003408     0
em1               1500 fe80::226:55f fe80::226:55ff:fe        0     -     -          0        1     -         96     -
em1               1500 192.168.1.0/2 192.168.1.130        25921     -     -    3280518       10     -        840     -</link#2>

dreamslacker

@bullet: I meant prioritizing the ping packets rather than to disable the qos as a whole. That is, for the purpose of testing, set floating rules on each interface with direction out, protocol icmp, source address of the interface and place in the highest priority queue.

Edit: By any chance, do you have Upperlimit, or Limiter, or PowerD with P4TCC, or any combination of those enabled?

bullet92

@dreamslacker
yes, i understand that, infact is what i've done!

I've setted the traffic shaper with the wizard and it's automatically set a Upperlimit.

So now i tried to remove all the shaper and this is what i get:

–- 192.168.2.1 ping statistics ---
50 packets transmitted, 50 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.100/0.174/0.251/0.039 ms

SOLVED!!!! ;D ;D ;D ;D

ps: So now I only need to configure correctly the traffic shaping

johnpoz

Glad you got your issue solved.. I want to do some more testing in my own DC, trying to find these sub .2 ms response times.. So couple of cisco switches yesterday I pinged from one to the other.. Now the IPs were SVI on a vlan, but switches not doing anything and directly connected was seeing .4 ms roughly.. Which is bit of difference from .1 ms ;)

I will be keeping an eye out.. I just personally don't recall seeing such low response times even just local lan with directly connected equipment unless you were pinging loopback, local IP, etc.

But I will be paying more attention in the future on the hunt ;)

dreamslacker

@bullet92:

@dreamslacker
yes, i understand that, infact is what i've done!

I've setted the traffic shaper with the wizard and it's automatically set a Upperlimit.

So now i tried to remove all the shaper and this is what i get:

–- 192.168.2.1 ping statistics ---
50 packets transmitted, 50 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.100/0.174/0.251/0.039 ms

SOLVED!!!! ;D ;D ;D ;D

ps: So now I only need to configure correctly the traffic shaping

I dislike the wizard, it never gives an optimal solution (nor should it since every use-case is different).

Nevertheless, you can run the wizard but remove all upperlimits in all queues once it's done. I've found that it increases latencies across the board (usually when it's active on the queue and occasionally at low loads) for some unknown reason.

I've settled on manually setting up my own queues and settings damn nearly since day 1 (pfSense 1.2rc2).

stephenw10

Good call. :)

Time for a low ping contest Johnpoz?

Steve

bullet92

@dreamslacker:

I dislike the wizard, it never gives an optimal solution (nor should it since every use-case is different).
Nevertheless, you can run the wizard but remove all upperlimits in all queues once it's done. I've found that it increases latencies across the board (usually when it's active on the queue and occasionally at low loads) for some unknown reason.
I've settled on manually setting up my own queues and settings damn nearly since day 1 (pfSense 1.2rc2).

I will try your advice, thanks ;)

Glad you got your issue solved.. I want to do some more testing in my own DC, trying to find these sub .2 ms response times.. So couple of cisco switches yesterday I pinged from one to the other.. Now the IPs were SVI on a vlan, but switches not doing anything and directly connected was seeing .4 ms roughly.. Which is bit of difference from .1 ms ;)
I will be keeping an eye out.. I just personally don't recall seeing such low response times even just local lan with directly connected equipment unless you were pinging loopback, local IP, etc.
But I will be paying more attention in the future on the hunt ;)

Thx :) I advice you to try to ping a zeroshell box! Just out of curiosity :D