Comcast IPv6 works for 1-2 days, then stops routing

STS-134

So I'm having an issue with Comcast IPv6 on my SG-3100 running 2.4.5-p1. IPv6 comes up, the SG-3100 gets a /59 block of addresses from the cable modem, it delegates a /64 block to each VLAN as it's supposed to, and works for 1-2 days. Then it just stops working.

However, when I run a sniffer on the WAN port of the SG-3100, I can see that the DHCPv6 packets are still being exchanged between pfSense and the cable modem, and the cable modem is replying that the renewal of the /59 address block is successful. However at the same time, when devices behind the pfSense router try to use any of these IPv6 addresses, they don't route. For example, I can run a ping from one of these addresses behind the router and I can see in the sniffer logs that the ping request packets are going out to the cable modem, but no reply ever comes back. However when I run a ping from the cable modem's /64 subnet (i.e. the pfSense router's WAN IPv6 address, or any computer connected directly to the cable modem), at the same time, I can get ping replies back. It's as if the modem is routing only its own /64 subnet, but refusing to route traffic for the addresses it delegated.

The strange thing is, this all works for 1-2 days, and THEN it stops working after that. Rebooting pfSense does not help; the traffic still does not route. However if I turn off IPv6 for a while or pfSense finally gets a different block of IPv6 addresses (maybe the cable modem "forgets" the DUID?) then it will work again. Well, for 1-2 days, before the same thing happens.

See attached sniffer capture image. Note that the there are ping requests going out, but no replies coming back, on one of the /64s that has been delegated to pfSense, and this was literally seconds after pfSense just renewed the addresses and the modem replied in the affirmative to the renewal request.

Comcast IPv6 Problem.jpg

Is this a Comcast modem issue, or could pfSense be doing something improper that results in loss of connectivity? Comcast is giving me the runaround about demarcation but I suspect it could be a problem with the modem's routing table.

More tidbits of info:

IPv6 worked fine up until about late February, when I upgraded to 21.02, at which point I had the infamous no IPv6 gateway problem. I then downgraded to 2.4.5-p1 and reloaded my configs and everything started working again, but only for 1-2 days. This then, it's been stuck in the current situation where it works for at most 1-2 days before it stops working.
One Comcast rep claims that they pushed a firmware update to the modem in the late February timeframe, which roughly aligns with the time IPv6 stopped working. However, as I also upgraded to 21.02 at the same time, I can't entirely point the finger at the modem firmware update. It seems possible that some configuration was not reloaded exactly the same way when I restored from the saved configuration file after I wiped pfSense from the SG-3100 and installed 2.4.5-p1 from scratch, then reloaded the configs.

Tzvia

@sts-134 Have you looked at the logs?
You could try an IPV6 ping using PFSense DIAGNOSTICS/PING, from your LAN interface or a VLAN, to the WAN IPV6 address just to see if PFSense can get that far. If it can't then I would think the issue is in the PFSense box, not the modem or ISP. Then you could try something like GOOGLE.COM by name and IPV6 address, if you were able to ping the WAN address.

JKnott

@sts-134

This sounds like a problem I had with my ISP a couple of years ago, where the WAN address worked, but not the LAN addresses. You can do testing, if you have another IPv6 connection available. I get 2 connections from my ISP, which allowed me to connect my notebook computer directly to the modem, while the firewall/router was also connected. I have also tethered to my cell phone, as my phone gets IPv6 and can pass it to tethered devices. If you don't have either of those, then perhaps you can test with someone else. Regardless of how it's done, try pinging a LAN address from outside your pfsense connection. If the pings don't arrive, then it's a problem with your ISP or beyond. Also ping the WAN address, to ensure that works. I eventually determined where the problem was by using Wireshark to examine the DHCPv6-PD sequence, which identified the failing system, at my ISP, by host name. Also, when this happened, by next door neighbour had the same problem and when a senior tech came out, he also had it with his own modem and computer. Another test you can try is to put your modem back into gateway mode and see if the LAN side still fails. If it does, it's definitely an ISP problem and not pfsense. In my case, it didn't work for a couple of days and then fail. It was solid for about 3 months.

STS-134

@jknott That's a good idea. I believe I've tried this before and it still worked. I'll have to try it again when I am able to (currently, I just rebooted the router and IPv6 happens to be working again, although I am not sure why. IPv6 routing should break by Tuesday morning at the latest if the past behavior does not change.)

I did finally get those packet sniffer logs over to Comcast though. Finally got a rep who didn't really understand what I was telling her, but I convinced her to put certain things in the account notes and said that tier 2 tech support would know what those words mean. Tier 2 called and asked me for the logs and has apparently forwarded the issue to their engineering team.

JKnott

@sts-134

I started with tier 2, as I don't waste my time with tier 1. I was able to show them the problem, as well as the senior tech who came to my home. The problem was getting the network guys to accept they had a problem. Initially they refused to do anything, as I had my own router. This despite the fact I had identified the failing system. What it took to convince them was the senior tech had the same failure with his own modem/firewall and computer. He then took them to the head end I'm connected to and tried 4 different CMTS and only the one I was connected to and had identified had failed. The network guys then got off their butts and fixed the problem.

BTW, shortly before I had this problem, I was doing some work for that same ISP, in a few different head ends, though not mine, so I had a bit more knowledge of the situation than the average customer. This was in addition to my using Wireshark to identify the failing systems and decades of experience in telecom, computers and networks. I doubt an ordinary customer would have had much luck with this sort of problem, when even the ISPs techs & support don't fully understand the way things work.

STS-134

@jknott Yep, it's frustrating because their CS agents aren't properly trained. They'll rattle off everything about "demarcation" but don't even know where their own responsibilities begin and end. I told one CS agent that a phone company refusing to support my case because "you're using your own telephone" doesn't apply so long as that telephone is sending the proper tones and pulses to the system. If it's sending the proper tones and the system isn't doing what it's supposed to, it's a telephone system issue.

It seems like testing over at Comcast is lacking. They initially gave me a Comcast Business Router, which had a very similar bug where they delegate IPv6 blocks to routers but it never routes those addresses, only its own /64 block. This was over a year ago, mind you, and they apparently STILL haven't fixed the problem: https://forums.businesshelp.comcast.com/conversations/ipv6/can-not-get-internal-ipv6-traffic-to-route-with-the-cga4131com/5fe0a62cc5375f08cd960e81

I told them that I wanted to go back to a Cisco Business Wireless Gateway modem (I just disable the WiFi) because that one routes IPv6 properly, but it seems their latest firmware update even screwed that up.

I'm wondering how long I should give engineering to analyze my packet capture logs and hopefully reproduce and actually fix the issue before I bug them again.

JKnott

@sts-134

I remember those days when people were advised to keep one phone company phone, just in case.. As for your captures, do they show the failure?

I have attached my capture that shows the failure, from packet #29, where the Status Message say no prefix is available:

bootup_capture.pcapng

That error message should have told the network guys exactly where to look, but they refused to do anything, until the senior tech demonstrated the problem was only on the CMTS I was connected to. As I said, I doubt many other customers could have provided that sort of detail.

STS-134

@jknott I suppose you could say they show the failure. They show my router coming up and requesting a /59 via DHCPv6-PD, and the modem replying with a /59 prefix. Unlike in your case, where you had a "No Prefix Available" status code, I get a Success status code, so the modem did in fact acknowledge that the process of delegating the address block was successful.

But then the logs show me running an IPv6 ping test from the router's WAN port (on the modem's /64 subnet) to google.com and the replies coming back. After that, they show me running an IPv6 ping test from one of the delegated addresses, but those packets appear to disappear into a black hole and no replies ever come back.

I also gave them two traceroutes, one from the router's WAN port (successful) and the other from one of my VLANs on one of the delegated IPv6 prefix (packets go as far as the cable modem and then just stop).

JKnott

@sts-134

As I mentioned above, try pinging from elsewhere and see if it appears at your WAN interface. If it doesn't, it's not a pfsense problem.

STS-134

@jknott
I tried pinging a server on the cable modem's /64 subnet from one of the router's delegated addresses, using pfSense's ping tool, and it's failing.

What seems to be happening is that the cable modem is sending a ICMPv6 Redirect frame with both Target Address and Destination Address equal to pfSense's delegated address. The source MAC of this ICMPv6 Redirect frame is the MAC address of the cable modem and the destination MAC address of this ICMPv6 Redirect frame is the MAC address of the server.

The cable modem is then sending a Neighbor Soliciation frame asking for the IPv6 address of pfSense (presumably that it saw during the ping attempt?) and no reply to this Neighbor Solicitation is ever received.

JKnott

@sts-134

Try the pings from outside your prefix. As I mentioned, I get 2 connections through my cable modem and used 1 for testing. I have also tethered to my cell phone. The idea is to completely isolate the system. In fact, for much of my testing I used a data tap and separate computer running Wireshark, rather than using Packet Capture. When you're using something to test itself, you can sometimes get erroneous results.

If you want, you can open a chat to me, to pass me your addresses and I can try pinging them.

STS-134

@jknott
Ping to server on cable modem's /64 from computer attached to cell phone: successful
Ping to pfSense's interface on /59 delegated to it by cable modem: no reply

But this is getting interesting. I don't have a data tap, however when I run the packet capture function on pfSense's WAN port, I can see the incoming ping packets from the cell phone. I cannot see any ping replies being sent back. So either pfSense is failing to log its own replies and the cable modem has a one-way (outbound) routing problem, or it's a pfSense issue. Do I need to enable anything to make sure pfSense replies to IPv6 ping packets sent to its interface addresses for internal VLANs?

JKnott

@sts-134

Well, start analyzing those captures. You can also try running them on the LAN interface, to see if they're arriving there. You will want to ping to some device on the LAN, not the interface though. As my link shows, making a data tap is easy with a managed switch. A few years back I bought a cheap 5 port switch just for that purpose. Are there any floating rules that might interfere?

STS-134

@jknott I don't see how floating rules could possibly be the problem, given that it works for a few days before it breaks.

This did seem to start when I updated from 2.4.5-p1 to 21.02, which of course broke IPv6. I then went back to 2.4.5-p1 and loaded my configuration file that I took from before the upgrade, but IPv6 never worked properly after that. Comcast does claim that they pushed an update to the cable modem at around the same time, so I thought it definitely had to do with that. I wonder if it's possible that the configuration reload after the reinstall didn't set something properly?

JKnott

@sts-134

I'm just tossing out ideas of things to consider. Is there anyone else here on Comcast with the same problem? How do the packet capture compare when it's working vs when it's not? Given it fails after 2-3 days, it might be something with with the lease time, if it's that long. Have you captured the DHCPv6 sequence? You'll find the lease times in one of the reply XID packets. What happens if you disconnect/reconnect the WAN cable?

STS-134

@jknott No, disconnecting and reconnecting the cable does not cause the behavior to change. Even a full reboot does not seem to fix the issue. The DHCPv6 packets contain a Preferred lifetime of 86400 and Valid lifetime of 172800.

STS-134

@jknott Update: been working for about 5 days now. Was trying to get more packet capture logs and noticed something strange: ipv6 pings were failing from devices behind the pfSense, but succeeding from pfSense's ping tool from the interface associated with their VLAN. This was unlike in the past, where both seemed to succeed or fail together.

Digging into why the pings would succeed from pfSense itself but fail for devices behind the router, I looked into the firewall rules. Eventually I ended up removing a rule that was blocking IPv6 traffic, if it was sent to fe80::/10 (actually what I did was I had a rule that blocked all traffic sent to any "private address", in the sense that I had a rule at the end of the chain for traffic on that VLAN that passed all traffic sent NOT to a private address, and fe80::/10 was on the list of private addresses). Well, once I removed fe80::/10 from the definition of "private address", things actually started working, and have now been working for 5 days straight.

I'm still trying to figure out why it ever worked for so many months (actually 2+ years) with this rule in place, if that was actually the problem. It should also be noted that when traffic was refusing to route before, I never was able to get pings through, even from pfSense's "ping" tool when I selected individual VLANs as the source address, so I also wonder if Comcast actually fixed something. It's possible that there were two simultaneous issues here.

JKnott

@sts-134

Yeah, blocking link local addresses would cause problems, as IPv6 relies on them for so much.

My rule for private addresses includes the RFC1918 blocks and all ULA. As link local doesn't pass through a router, there's no need to block it.

STS-134

@jknott Yeah, I should have known that. But I simply looked at the table of "private addresses" and blindly added them all to a rule.

Do you have a clue why it would have worked for so long before failing? Why it even worked at all (for approximately 2 days) prior to February of this year? How could it have worked at all if IPv6 needs link-local addresses in order to operate?

JKnott

@sts-134

No idea.