2.5.2 : IPV4 Gateway status unknown / Dpinger mystery

Juve

After upgrading to 2.5.2 from 2.4.5p1 I have a cluster showing strange behaviour (on both nodes)

Gateway monitoring is showing an unknown status. After restarting, the gateway will be on but for a short time and with high RTTSD.

I did capture trafic : I see echo request and echo reply.
I did ping manually through shell and it is ok
I did execute dpinger manually and it returns 0 0 0, it also return 0 0 0 in the sock file
Dpinger give no error
I did truss the dpinger to analyse the calls
I did check the source code of dpinger

And I am here trying to figure out what is happening because I can't understand why dpinger isn't able to read the receive values.

Has anyone already experienced this ?

Juve

Hum,
I have juste noticed that even the ping command isn't capable of measuring the time:

PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: icmp_seq=0 ttl=113 time=0.000 ms
64 bytes from 8.8.8.8: icmp_seq=1 ttl=112 time=0.000 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=113 time=0.000 ms

whatever I ping, it is 0ms
And ttl is oscillating from 112 to 113, another box is 114 Stable, something happens.

a traceroute, using UDP or ICMP mode also report every latency at 0ms .

This is scary

Juve

Just look at the result of a traceroute (udp mode not ICMP).
The machine can't compute durations, that's why dpinger is failing to report the latency.

I am in the process of comparing every file checksum with another 2.5.2.

traceroute to 8.8.8.8 (8.8.8.8), 64 hops max, 40 byte packets
1 xxxxx.atlas.cogentco.com (xxxxx) 0.000 ms 0.000 ms 0.000 ms
2 xxxxx.atlas.cogentco.com (xxxxx) 0.000 ms
xxxxx.atlas.cogentco.com (xxxxx) 0.000 ms
xxxxx.atlas.cogentco.com (xxxxx) 0.000 ms
3 xxxxx.atlas.cogentco.com (xxxxx) 0.000 ms
xxxxx.atlas.cogentco.com (xxxxx) 0.000 ms
xxxxx.atlas.cogentco.com (xxxxx) 0.000 ms
4 be2471.ccr41.par01.atlas.cogentco.com (130.117.49.37) 0.000 ms
be2472.ccr42.par01.atlas.cogentco.com (130.117.49.121) 0.000 ms 0.000 ms
5 be2102.ccr32.par04.atlas.cogentco.com (154.54.61.18) 0.000 ms
be3183.ccr31.par04.atlas.cogentco.com (154.54.38.66) 0.000 ms
be3184.ccr31.par04.atlas.cogentco.com (154.54.38.158) 0.000 ms
6 be2151.agr21.par04.atlas.cogentco.com (154.54.61.34) 0.000 ms
be3169.agr21.par04.atlas.cogentco.com (154.54.37.238) 0.000 ms
be2151.agr21.par04.atlas.cogentco.com (154.54.61.34) 0.000 ms
7 tata.par04.atlas.cogentco.com (130.117.15.70) 0.000 ms 0.000 ms 0.000 ms
8 72.14.212.77 (72.14.212.77) 0.000 ms 1000.000 ms 0.000 ms
9 108.170.244.225 (108.170.244.225) 0.000 ms
108.170.245.1 (108.170.245.1) 0.000 ms
108.170.244.161 (108.170.244.161) 0.000 ms
10 142.251.49.133 (142.251.49.133) 0.000 ms
209.85.244.155 (209.85.244.155) 0.000 ms
216.239.59.209 (216.239.59.209) 0.000 ms
11 dns.google (8.8.8.8) 0.000 ms 0.000 ms 0.000 ms

Juve

I narrowed it to SMP issue.
Reverting to 1 CPU isn't showing this behaviour.