Gigaclear & ip6 - lose of connectivity after *exactly* 5 minutes

marcosm

When a WAN only has a local-link IPv6 address, pfsense should be using this as the source address of neighbor advertisement messages, not picking a random LAN interface as the source address.

I tested this on 25.03 and pfSense always sent NA's sourced with the correct link-local address. This sounds more like there may be some routing/NAT issue. There's a default rule that allows any source/destination though so it should go through regardless.

Regarding https://redmine.pfsense.org/issues/16123#note-3
I see the ISP device sending NS for pfSense's link-local address without a response from pfSense:

138	2025-04-04 01:51:01.778710	0.020680	2a02:fb8::11	ff02::1:ff21:6b74	ICMPv6	86	Neighbor Solicitation for fe80::6662:66ff:fe21:6b74 from 4e:6d:58:87:10:17
145	2025-04-04 01:51:04.730835	0.972840	2a02:fb8::11	ff02::1:ff21:6b74	ICMPv6	86	Neighbor Solicitation for fe80::6662:66ff:fe21:6b74 from 4e:6d:58:87:10:17
152	2025-04-04 01:51:07.730711	0.359749	2a02:fb8::11	ff02::1:ff21:6b74	ICMPv6	86	Neighbor Solicitation for fe80::6662:66ff:fe21:6b74 from 4e:6d:58:87:10:17
209	2025-04-04 01:51:11.150835	0.015817	2a02:fb8::11	ff02::1:ff21:6b74	ICMPv6	86	Neighbor Solicitation for fe80::6662:66ff:fe21:6b74 from 4e:6d:58:87:10:17
214	2025-04-04 01:51:14.138711	1.003684	2a02:fb8::11	ff02::1:ff21:6b74	ICMPv6	86	Neighbor Solicitation for fe80::6662:66ff:fe21:6b74 from 4e:6d:58:87:10:17

While this is happening, check that pfSense has joined the respective multicast group with ifmcstat -i igc0 -f inet6. If it has joined as expected then it should show something like:

inet6 fe80::6662:66ff:fe21:6b74%igc0 scopeid 0x3
mldv2 flags=2<USEALLOW> rv 2 qi 125 qri 10 uri 3
...
        group ff02::1:ff21:6b74%igc0 scopeid 0x3 mode exclude
                mcast-macaddr 33:33:ff:21:6b:74

A similar issue which has since been fixed exists in old versions:
https://redmine.pfsense.org/issues/13423

Irata

When this is happening I see this:

vtnet0.3:
inet6 fe80::1c1e:54ff:fe8a:705%vtnet0.3 scopeid 0x19
mldv2 flags=2<USEALLOW> rv 2 qi 125 qri 10 uri 3
group ff02::1:ff00:1%vtnet0.3 scopeid 0x19 mode exclude
mcast-macaddr 33:33:ff:00:00:01
group ff01::1%vtnet0.3 scopeid 0x19 mode exclude
mcast-macaddr 33:33:00:00:00:01
group ff02::2:1861:20ce%vtnet0.3 scopeid 0x19 mode exclude
mcast-macaddr 33:33:18:61:20:ce
group ff02::2:ff18:6120%vtnet0.3 scopeid 0x19 mode exclude
mcast-macaddr 33:33:ff:18:61:20
group ff02::1%vtnet0.3 scopeid 0x19 mode exclude
mcast-macaddr 33:33:00:00:00:01
group ff02::1:ff8a:705%vtnet0.3 scopeid 0x19 mode exclude
mcast-macaddr 33:33:ff:8a:07:05

The only clue I have is, just the same as others I must add a static route in to the GUA of the ISP router to get pfsense to reply out of the correct WAN interface, and I have to manually add a permanent ndp entry for that device too. I am running multi-wan. I also don't understand why the ndp table is not being updated for that ISP device, even when I ping that device.

It's really strange, but pfsense definately does not send NA without the above encouragement. I can recreate this issue so easily, every time without fail.

The only other oddity I've noticed is that running:

ndp -s 2a02:fb8::32 4a:5a:0d:5a:f2:b7

gives this sometimes:

ndp: delete: cannot locate 2a02:fb8::32

But I'm not trying to delete, I'm trying to add? Whereas, running this command just after adding the static route will make the command work.

marcosm

@Irata Would you share the output of uname -a?

Irata

FreeBSD pfSense 15.0-CURRENT FreeBSD 15.0-CURRENT #0 plus-RELENG_25_03-n256497-da24eca0fcd2: Mon Apr 14 19:32:49 UTC 2025     root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-25_03-main/obj/amd64/ILoDLiJx/var/jenkins/workspace/pfSense-Plus-snapshots-25_03-main/sources/FreeBSD-src-plus-RELENG_25_03/amd64.amd64/sys/pfSense amd64

Thought I should add some info of my setup:

I have 4 WAN interfaces, all dual-stack IPv4 and IPv6 and all with a delegated /48 subnet. I use the IPv6 prefix of the first WAN for all LAN interfaces, I use NPT to translate those to the other 3 WAN prefixes if used. I use a failover group, with each WAN interfaces in a different tier.

The ISP link causing this issue is not the first interface, so i am not using that /48 prefix at the LAN. This troublesome ISP link is the only one where IPv6 is native on the WAN interface and they send NS from a GUA. The other IPv6 interfaces are either L2TP tunnels, or the ISP uses a link-local address for it's Network Solicitation/Advertisements.

It appears others have a much simpler setup to myself and still see the issue.

If I shutdown my pfsense VM, and start up a simple OpenWrt VM, everything is stable and the IPv6 interface just works for this ISP, NA's are sent correctly to the GUA. All I do is swap one VM to another VM of a different OS, no other network changes are made.

To me, the only common factor when NA's are not sent (without some tinkering) is when I'm using pfsense and the ISP sends NS from a GUA. This does appear to be typical behavour from Juniper switches, as I also see this issue on a different test bed.

I have also just discovered, during this issue on this version of pfsense, that opening diagnostics->NDP table on pfsense UI causes the UI to lockup and eventually crash. It sends me to a crash report but that just says

Crash report begins.  Anonymous machine information:

amd64
15.0-CURRENT
FreeBSD 15.0-CURRENT #0 plus-RELENG_25_03-n256497-da24eca0fcd2: Mon Apr 14 19:32:49 UTC 2025     root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-25_03-main/obj/amd64/ILoDLiJx/var/jenkins/workspace/pfSense-Plus-snapshots-25_03-main/sources/FreeB

Crash report details:

No PHP errors found.

No FreeBSD crash data found.

I'm unsure if this is related.

marcosm

I was able to reproduce the issue. The behavior is intended and a workaround exists - see:
https://redmine.pfsense.org/issues/16146

I expect this to work with the ISP Gigaclear as well.

ahxcjay

@marcosm thank you!

(5 minutes later) Confirmed: it works!

Irata

@ahxcjay Check after 24 hours, as even with this change I still see the NDP entry for first hop going stale and it may get deleted after 24 hours.

ahxcjay

@Irata yes - 2h53m left…

ahxcjay

@ahxcjay NDP renewed for another 24h…! 15+ minutes in. No drops.

Irata

Unfortunately, it does not work for me on 25.03-BETA. The setting has not changed behaviour - i have tried a reboot after adding the setting into system->advanced>system tunables.

I did try this setting weeks ago, but since it made no difference I disgarded it.

I still need to add the static NDP entry, even with this setting.

I wonder if this setting is no longer working on 25.03-BETA. However, that NDP entry should not be going stale for 24 hours anyway and so something still isn't right. It's responding to NS, so why isn't the NDP entry updating every minute whenever they are responded to?

There is something additional I've spotted with the NA from the ISP, the source IP is GUA but the target address is link-local. I don't mean destination ip, I do mean target address within the ICMPv6 payload - before the (rte, sol) below:

4	13:45:21.68	2a02:fb8::32	fe80::1c1e:54ff:fe8a:705	ICMPv6	82	Neighbor Advertisement fe80::4a5a:dff:fe5a:f2b7 (rtr, sol)

The NDP table is being updated with the target address, but it does not update the source ip into the NDP table. That might be correct behaviour, but if so then what is updating the source ip entry in the NDP table after 24 hours (for the situation it does).

I have also seen the situation with where the NDP entry was stale for 24 hours and strangely was updated, which kept it working and when using my previous tricks to get it to work. However, it was not consistent after every reboot and connectiviy was still unreliable.

I still beleive we are walking around some other root cause here, possibly two issues.

I do admit the spec is ambiguious and this is not prohibited within the spec, but this implementation is a good example of exactly not what the spec intended. However, it should still be working.

We should not be seeing the first hop GUA going stale for 24 hours anyway.