No connection after WAN IP change
-
Hello,
I have a network of 4 pfsense nodes connected with wireguard links and frr bgp routing on top. Links are configured "full mesh", each box with a link to the other 3 nodes. These links are each configured as a dedicated tunnel in pfsense and have a pfsense interface assigned to ease routing configuration.
This setup normally works good, route failovers work nicely in case a link drops for any reason.I have observed though some issues with one link in the network. This link is between a node behind CGNAT and a node that has dynamic IP (fiber using pppoe with a weekly forced disconnect by the ISP).
After a WAN IP change on box A (pppoe fiber), the link between this node and box B (behind CGNAT) won't come up: wireguard shows no handshake on this link.Things I did check to troubleshoot:
- Restarting the wireguard service on the node behind CGNAT (box B), or even restarting the whole box does not help.
- doesn't seem to be an FRR issue: I can't ping the IP assigned to the other end of the wireguard link - the connection to box B is fine otherwise, obviously routed through other links (eg: box A -> box C -> box B) instead of directly
- route is present on both ends of the link for the wireguard network
- wireguard service is up on both ends, peer reports no handshake since the WAN IP changed on the one box
- other links involving box A seem to be not affected, they come up nicely after the WAN IP change
- It has to be noted that all other wireguard links are to hosts (box C and box D) that have publicly routable IP's - so no CGNAT - it very well may be that "inbound" connections to box A are broken, I just don't see it, as "outbound" connections are successful
- I ruled out a firewall rule issue: as soon as I restart the wireguard service on the box where the WAN IP has changed, things go back to normal
Any suggestions how could I diagnose wireguard on box A? Why is it getting into a seemingly broken state after a WAN IP change?
UPDATE: I think I found something. If I look at the firewall rule for wireguard on box A, I see:
The first line is the impacted link (from box B), the second line is a working link (from box C).
Looks like nothing is accepting the traffic? Which is weird, the wireguard process looks to be fine (green) in the Status > Services screen.