Diagnosing IPTV (IGMP + multicast) issues
-
Query is somewhat related to this thread https://forum.pfsense.org/index.php?topic=71806.0
Have discovered that I'm missing at least two TV channels with the IPTV service since switching to pfSense from the ISP's router. I know that many of the settings are highly specific to the ISP and the type of IPTV service. That information is contained in this guide: https://www.dropbox.com/s/zg9ju9373t0fnpu/GoogleFiberRouterGuide.pdf. I've double checked and re-checked that everything is set the way the guide has it.
I have a switch capable of mirroring ports, and I have been able to run wireshark and watch the traffic going to the IPTV tuner. I notice that when a channel is active, there is a stream of data (makes sense) coming from a specific IP address in the 10.16.0.0/8 or 10.30.0.0/8 subnets, depending on which channel is tuned. This stream stops when I change to a "missing" channel. Using Wireshark, I'm not sure what this is but with the the streaming data no longer present, I see basically just this repeated.
src | dst | proto | len | info
10.30.254.1 225.2.100.1 UDP 76 Source port: lds-distrib Destination port: lds-distribI'd be grateful for any suggestions on how to properly troubleshoot this issue, including how to properly leverage my pfSense system for help in resolving it. I'm not terribly familiar with IGMP, multicast, or IPTV but I more or less understand my way around Wireshark at least at a basic level.
One thing to note - I am subscribed to these channels and should be receiving them, this isn't an attempt to bypass any sort of provider restrictions. Of the two in question, one is a local channel and other other is a cable news channel.
-
Your TV box subscribes to specific Multicast groups depending on the channel which is selected. The subscribing of multicast routes generally works in your setup otherwise you wouldn't be able to watch any channel.
You have some logs from the IGMP proxy? This is in the system.log. Perhaps the TV box requests channels from addresses outside of the specified subnets? These might not be in the upstream interface of the IGMP proxy or might be missing from the routes?
src | dst | proto | len | info
10.30.254.1 225.2.100.1 UDP 76 Source port: lds-distrib Destination port: lds-distribHow does this differ from the "stream of data" when a channel is displayed? At which rate do these packets arrive? lds-distrib is the port 6543 btw. which is shown in the firewall rules in the guide. To me this looks exactly like such a data stream just the length of the packets seems short. Is it possible that the missing channels are actually displayed but are showing a blank screen?
Two minor things I noticed:
I took a short look at the guide you referenced. It shows firewall rules for a 225.0.0.0/4 subnet. This is strange and should probably be 224.0.0.0/4 which is the reserved address space for multicast. Does not make a difference however.
You mentioned subnets 10.16.0.0/8 and 10.30.0.0/8. These are actually the same. In the guide there are subnets 10.16.0.0/16 and 10.30.0.0/16. This shouldn't make a difference either however.
-flo-
-
Thanks for the reply.
The subscribing of multicast routes generally works in your setup otherwise you wouldn't be able to watch any channel.
Right, so it is mostly working. Oddly, by removing the pfSense putting the ISP's router back in (which restored the channels immediately), and then putting the pfSense back in (w/o rebooting it) the missing channels stayed available. However, I noticed last night that one of the two missing channels is missing again.
I can't imagine this is coincidence, but I'm noticing issues with the ISP's DVR functionality as well - one of the two missing channels that is back still isn't recording the scheduled programs it should be (but was).
You have some logs from the IGMP proxy? This is in the system.log. Perhaps the TV box requests channels from addresses outside of the specified subnets? These might not be in the upstream interface of the IGMP proxy or might be missing from the routes?
I do have the logs. Nothing is jumping out at me - though I'm not precisely clear what to look for. I don't see references to subnets that are out of the expected ranges. I've been trying to get the logs into Splunk Storm, but so far I'm having a hard time.
How does this differ from the "stream of data" when a channel is displayed?
The stream of data when a channel is displayed usually shows up as much more traffic (many, many more packets), and wireshark sees a lot of encrypted MPEG (MPEG TS, I think it calls the packets) for the normal cable channels and something else on the local network channels.
I will look at this a little bit closer to see if maybe the traffic for the missing channels is using a different port (not 2000 or 6543). One thing you make me think of is a situation we ran into at work where we tried to set up an application's health check on port 2000 (mostly a random choice), but could not for the life figure out why it wasn't working properly. Turned out that 2000 is some type of management port for the Cisco gear, and the switch was intercepting the traffic. Given that most channels are working, I wonder if these are hitting something like that where the Netgear switch is misbehaving. I can remove it from the network, put the OTN directly into the pfSense WAN interface to see if that makes any difference. May need to let it sit for a few days to have the problem come back.
At which rate do these packets arrive?
the traffic from 10.30.254.1 is probably ~1-2 packets/second. Not very fast or very much. I'll have to dig into this to see if this traffic is present (but buried so I'm just not noticing it, hence trying to get logs into Splunk) when a channel is streaming like it should be.
It shows firewall rules for a 225.0.0.0/4 subnet. This is strange and should probably be 224.0.0.0/4 which is the reserved address space for multicast.
Just in case, I switched it to 224.0.0.0/4 with no luck.
You mentioned subnets 10.16.0.0/8 and 10.30.0.0/8. These are actually the same. In the guide there are subnets 10.16.0.0/16 and 10.30.0.0/16.
Good catch. You're right. That was a typo in writing up my post.
-
You asked about logs. I'm not sure what these mean, because the entries don't seem to be complete (no source address?) but I do see some IGMP traffic being blocked. nfe0_vlan2 is the WAN interface. Multicast traffic doesn't seem to be blocked. That is, I can't find anything this is harming.
pfSense says these are coming from a default deny rule, but I haven't located it yet.
Mar 21 21:55:20 192.168.2.1 Mar 21 21:55:55 pf: 00:00:00.011877 rule 3/0(match): block in on nfe0_vlan2: (tos 0xc0, ttl 1, id 0, offset 0, flags [none], proto IGMP (2), length 36, options (RA))
Mar 21 21:54:20 192.168.2.1 Mar 21 21:54:54 pf: 00:00:01.275459 rule 3/0(match): block in on nfe0_vlan2: (tos 0xc0, ttl 1, id 0, offset 0, flags [none], proto IGMP (2), length 36, options (RA))
Mar 21 21:53:56 192.168.2.1 Mar 21 21:54:31 pf: 00:00:00.645101 rule 3/0(match): block in on nfe0_vlan2: (tos 0xc0, ttl 1, id 54404, offset 0, flags [DF], proto IGMP (2), length 36, options (RA))
Mar 21 21:53:19 192.168.2.1 Mar 21 21:53:54 pf: 00:00:00.765942 rule 3/0(match): block in on nfe0_vlan2: (tos 0xc0, ttl 1, id 0, offset 0, flags [none], proto IGMP (2), length 36, options (RA))
Mar 21 21:52:20 192.168.2.1 Mar 21 21:52:55 pf: 00:00:02.712292 rule 3/0(match): block in on nfe0_vlan2: (tos 0xc0, ttl 1, id 0, offset 0, flags [none], proto IGMP (2), length 36, options (RA))
Mar 21 21:51:51 192.168.2.1 Mar 21 21:52:26 pf: 00:00:01.588239 rule 3/0(match): block in on nfe0_vlan2: (tos 0xc0, ttl 1, id 41278, offset 0, flags [DF], proto IGMP (2), length 36, options (RA)) -
Oddly, by removing the pfSense putting the ISP's router back in (which restored the channels immediately), and then putting the pfSense back in (w/o rebooting it) the missing channels stayed available.
That's odd. Channels stay available even when channels are switched? I would assume that if you switch to a different channel then the new channel is subscribed to and the old channel is unsubscribed (via IGMP). So either the unsubscribing does not happen or there is something else that your ISPs box does.
I can remove it from the network, put the OTN directly into the pfSense WAN interface to see if that makes any difference.
If it works with the switch and your ISP's router then I guess the switch isn't the problem. But try nonetheless.
Mar 21 21:51:51 192.168.2.1 Mar 21 21:52:26 pf: 00:00:01.588239 rule 3/0(match): block in on nfe0_vlan2: (tos 0xc0, ttl 1, id 41278, offset 0, flags [DF], proto IGMP (2), length 36, options (RA))
I'm not sure what this is either. IGMP should not be blocked at all. Can you just open the firewall completely temporarily? This could rule out that the firewall is causing the problems.
-flo-
-
After having the switch out of the mix for a week or so, as expected, it made no difference. OTN physically/directly attached to the pfSense and channels still went missing. I was hoping maybe somehow the switch was caching something in its internal routing or ARP table, but that doesn't seem to be the case.
That's odd. Channels stay available even when channels are switched? I would assume that if you switch to a different channel then the new channel is subscribed to and the old channel is unsubscribed (via IGMP). So either the unsubscribing does not happen or there is something else that your ISPs box does.
Yep, it seems as if the channels vanish after a while - not right away. I don't know exactly when they stop working, but initially they all seem to work fine switching through them.
I put the ISPs router back in (so pfSense out), and have a packet sniffer set up like a mouse trap with peanut butter trying to grab anything to/from what appears to be a management port, 4567. I'm hoping there is a clue, or a way to access that ISP device's internal configuration to see if I'm missing something in my multicast setup.
Can you just open the firewall completely temporarily? This could rule out that the firewall is causing the problems.
Thanks for the suggestion. I've put that on my list of things to try. I need to look at it again, but IIRC there are rules showing up in the pfSense logs that do not seem to be accessible in the UI that I've been able to find.
Each new configuration takes some amount of time (have been giving it a few days or so) for the channels to stop working, which is making this difficult to sort out.
I'm not sure if this is related but when I was looking at some packet traces a couple of weeks ago with the ISP's box in place, I think I noticed something that may be different about two of the channels I'm having trouble with - each of these trouble channels has the same source IP as at least one adjacent channel. I haven't gone through all channels recording their IP addresses, I just happened to notice when changing channels on these particular ones, the source IP wasn't changing (but the channel/programming changes just fine). On other channels that always work, they don't (from what I could see) have the same IP.
The two problem channels do not share a source IP address (one is channel 13 and the other channel 119) with each other, they just seem to share one with an adjacent channel (ie (don't remember exact specifics) channel 13 and 14, channel 118 and 119).