New latency every 30 seconds with 2.4.2 caused by radvd 2.17_3

  • I've noticed since upgrading to 2.4.2 that I'm getting nasty latency spikes almost exactly every 30 seconds.  I've tried hitting the firewall internal IP from multiple hosts, and they all see the same behavior (I've got 3 VLANs and the IP on each VLAN exhibits the same thing).  Pings between hosts on the same subnet do not see this behavior, however pings that cross subnets (and have to go through the router) do.  The spikes are really nasty too, 300ms+ on most occasions.

    Anyone else seeing similar behavior?  It's made my network almost unusable for anything WAN facing.

    I'm running a Supermicro SYS-5018D-FN4T with a Chelsio T520-CR if that makes a difference.  Again, this JUST started after updating to 2.4.2, 0 issues with prior versions of PFsense.

    Happy to grab logs if they'd help, but I don't see anything obvious.

  • I guess I'll reply to my own, I've narrowed it down - every time there is a latency spike, RADVD jumps up to 30% CPU usage, so I think I've found the culprit, now the question is what changed in 2.4.2 that might be causing that behavior?  I'm assuming radvd is running for an ipv6 tunnel I use.

    I see the radvd version changed with the upgrade, I think I'm closing in on the issue at least :)

    Nov 21 21:56:30 kernel radvd: 1.9.1 -> 2.17_3 [pfSense]
    Nov 21 21:56:57 radvd 76672 version 2.17 started

    **so I'm not spamming here, I'll just reply to this.  Killing radvd stopped all of my issues.  I'm going to leave it down for now - I can live without IPv6 for the moment - let me know what else I can provide to help the troubleshooting.

  • bump - anybody?

  • Alright, so I'm hoping someone from the netgate crew can help me understand protocol here.  I've seen REPEATEDLY people get flamed for posting to the bug tracker without posting an issue on the forum first.  I track down almost exactly where the issue resides and post it on the forum and get nothing but crickets.  Not so much as a "hey, give us X log" from anyone from netgate.  This is a very easily reproducible issue, I can literally downgrade, have it go away, upgrade and it re-appears issue.  So what are the steps to getting this fixed?  This is feeling like a Plex issue at this point… aka: acknowledged as broken but nobody cares enough to bother fixing it.

  • Im not seeing this. And with the lack of response Im guessing nobody else is either.

    Talk more about your total network.  No Puma equipped cable modems do ya?

  • PFsense directly attached to a pair of ADSL modems that are in bridge mode (Netgear 7550).  I won't say the rest of the network is irrelevant, but it's kind of irrelevant given the behavior exists even directly attached to the pfsense box.

  • Having the same issue here, did this ever get fixed?

  • I missed the response 8 mos. ago.. Is this happening with 2.4.3 for you? Are you on a bonded connection or just load balancing?

  • I'm using lacp with 4 interfaces and vlans on the bond, running pfsense 2.4.3. I started radvd in debug mode but nothing that indicates what might cause the problem. It results in pings getting lost or go up all the way to 750ms whenever that spike happens. I also have some messages in dmesg about a "listen queue overflow". I am using snort, but not on the interfaces that I am having problems with, so I don't think it is related (just wanted to mention it as its an installed package).

  • Unfortunately this still exists in 2.4.4. I also noticed some latency spikes when PPPOE reconnects, so maybe this is related to IPv6 in General? dpinger and dhcp are using lots of CPU on the PPPOE reconnection event.

    Only thing in dmesg that could be related is

    sa6_recoverscope: embedded scope mismatch: xxxxxx sin6_scope_id was overridden

    a few times.