23.01.b.20230106.0600 IGMP proxy stops TV stream
-
@stephenw10 said in 23.01.b.20230106.0600 IGMP proxy stops TV stream:
Yeah this seems very unlikely to be hardware given it was working fine in 22.05.
I could believe the updated driver is doing something to mangle the igmp packets though.
Check the enabled options in the NIC with
ifconfig -vvvm igc0
. Perhaps it's not correctly disabling checksum offload. Or maybe some new option is present in 23.01 that wasn't in 22.05 at all.Morning @stephenw10 below the output and some screenshots from a diff tool. They do diff, could you see some issues?
[23.01-BETA][admin@pfSense.high.local]/root: pciconf -lv igc0 igc0@pci0:2:0:0: class=0x020000 rev=0x04 hdr=0x00 vendor=0x8086 device=0x125c subvendor=0x8086 subdevice=0x0000 vendor = 'Intel Corporation' class = network subclass = ethernet [23.01-BETA][admin@pfSense.high.local]/root: ifconfig -vvvm igc0 igc0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: Server options=48020b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_MAGIC,NOMAP> capabilities=4f43fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP> ether 7c:2b:e1:13:7a:db inet6 fe80::7e2b:e1ff:fe13:7adb%igc0 prefixlen 64 scopeid 0x1 inet6 fe80::1:1%igc0 prefixlen 64 scopeid 0x1 inet6 2a02:a<bla> prefixlen 64 inet 172.16.1.1 netmask 0xffffff00 broadcast 172.16.1.255 media: Ethernet autoselect (1000baseT <full-duplex>) status: active supported media: media autoselect media 2500Base-T media 1000baseT media 1000baseT mediaopt full-duplex media 100baseTX mediaopt full-duplex media 100baseTX media 10baseT/UTP mediaopt full-duplex media 10baseT/UTP nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> [23.01-BETA][admin@pfSense.high.local]/root:
-
Ok, so we can see that, interestingly, the VLAN hardware filtering capability has been removed. And it was previously in use but I don't think you are using VLANs?
The NOMAP (unmapped mbufs) capability has been added and is enabled by default. I don't believe that would affect only multicast traffic. And it appears to be a transmit option.
Are you able to test using a different interface?
-
@stephenw10 said in 23.01.b.20230106.0600 IGMP proxy stops TV stream:
Ok, so we can see that, interestingly, the VLAN hardware filtering capability has been removed. And it was previously in use but I don't think you are using VLANs?
The NOMAP (unmapped mbufs) capability has been added and is enabled by default. I don't believe that would affect only multicast traffic. And it appears to be a transmit option.
Are you able to test using a different interface?
The diff had made against YOUR output let me clarify that ;-) Yes using a hand-full VLAN's on LAN and two on WAN.
Would you be able to elaborate more on the NOMAP?
Do you mean another interface on the same board, like from igc0 to igc2? Or put in another interface board, the last option is impossible as it is a fixed housing+NIC's.
-
The thing is that its not quite easy to rollback tot 22.05 to test the previous interface output or IGMP packet capture.
-
The nomap capability should be easy to disable but it appears ifconfig may not have caught up yet. I can't find a syntax that works. But I doubt that would make any difference here.
I also compared the options between 22.05 and 23.01 from the same config in the 8200 and saw the same.
If it's a ZFS install you should be ble to roll back the BE snapshot.
-
@stephenw10 23.01.r.20230202.0019 same results with the RC build ;(
-
@stephenw10 could this IGMP issue be added to the list of issues?https://redmine.pfsense.org/projects/pfsense/issues?query_id=186
-
If an IGMPproxy bug report can be opened then it can be added to that list.
Were you able to confirm that the queries from the proxy are generated with a correct checksum in 22.05?
If so I'd go ahead and open a bug report for it.
Do you see the bad checksum on all IGMP packets generated by the proxy? Upstream and downstream?
-
@stephenw10 said in 23.01.b.20230106.0600 IGMP proxy stops TV stream:
If an IGMPproxy bug report can be opened then it can be added to that list.
Were you able to confirm that the queries from the proxy are generated with a correct checksum in 22.05?
If so I'd go ahead and open a bug report for it.
Do you see the bad checksum on all IGMP packets generated by the proxy? Upstream and downstream?
Hi steve, the thing is that the config is changing a lot. My 22.05 backup is too old. And the 226v NIC is not supported yet so a fresh 22.05 installation is consuming quite a lot time.
My action plan:
-fresh install of final 23.01 usb installer image. Test with just a clean install, configure the IPTV_WAN/LAN and IGMP.
-If that works, load the backup file, check again.
-If both not work I'll install 2.7 and try to upgrade to 22.05, well in fact that's a downgrade. My only option left is a fresh install with an USB NIC from 2.6.0?I'll capture the WAN traffic to check the checksum.
-
You don't have a 22.05 ZFS BE snapshot you can roll back to?
-
@stephenw10 said in 23.01.b.20230106.0600 IGMP proxy stops TV stream:
You don't have a 22.05 ZFS BE snapshot you can roll back to?
You mentioned that before. I got not clue how to do that. Tested some CLI commands, never found the power to press enter.
If there is some more explanation or a how to documented?
EDIT: BE seems to be Boot Environment? Interesting, I was on the wrong path with the CLI commands.
-
Well you can do it from the CLI using bectl but it's much easier using the GUI.
See: https://docs.netgate.com/pfsense/en/latest/backup/zfsbe/gui.html
Also note that you cannot roll back to a 22.01 snapshot BE. It will fail to boot. The compatibility was added in 22.05.
-
@stephenw10 its definitely an 23.01 bug, reverted to 22.05 and not a glitch in the TV. And the corrupt IGMP packet is gone, and therefore the STB is answering in 22.05. Due to the corrupt IGMP report listeners request the STB does not answer back in 23.01 and TV stops playing.
How does filing a bug works?
22.05 full set of requests + answers from the STB + valid packet
23.01 full set of request and missing answers from the STB + corrupt packet shows up.
-
Bug looks good, thanks.
-
Hi I just came accross this same issue. And this is a blocking issue for me aswell.
I hope it can be fixed in 23.01 instead of 23.05 where it is scheduled for right now. -
Just to report that I also seem to be having an issue with the IGMP Proxy generating packets with invalid checksums, although in my case I am running 2.6.0, and I am using a completely different network adaptor.
As I am only just configuring the IGMP proxy for the first time (after the router has been installed for a few months) I do not have a previously working installation to compare with, and I have been troubleshooting for some time before noticing the checksum errors. Tcpdump running on the same device shows:
tcpdump: listening on ix0, link-type EN10MB (Ethernet), capture size 262144 bytes 11:21:25.462732 IP (tos 0xc0, ttl 1, id 13, offset 0, flags [none], proto IGMP (2), length 32, options (RA)) 10.0.2.3 > 224.0.0.1: igmp query v2 11:21:26.999364 IP (tos 0x0, ttl 1, id 58030, offset 0, flags [none], proto IGMP (2), length 32, options (RA), bad cksum 0 (->5624)!) 10.0.1.254 > 224.0.0.7: igmp v2 report 224.0.0.7 11:21:27.601357 IP (tos 0x0, ttl 1, id 35790, offset 0, flags [none], proto IGMP (2), length 32, options (RA), bad cksum 0 (->ac0f)!) 10.0.1.254 > 224.0.0.252: igmp v2 report 224.0.0.252 11:21:30.205921 IP (tos 0x0, ttl 1, id 33675, offset 0, flags [none], proto IGMP (2), length 32, options (RA), bad cksum 0 (->b453)!) 10.0.1.254 > 224.0.0.251: igmp v2 report 224.0.0.251 11:21:36.336285 IP (tos 0x0, ttl 1, id 37154, offset 0, flags [none], proto IGMP (2), length 32, options (RA), bad cksum 0 (->a5fc)!) 10.0.1.254 > 224.0.1.187: igmp v2 report 224.0.1.187 11:21:36.672559 IP (tos 0x0, ttl 1, id 59654, offset 0, flags [none], proto IGMP (2), length 32, options (RA), bad cksum 0 (->3fd9)!) 10.0.1.254 > 239.255.255.250: igmp v2 report 239.255.255.250
Tcpdump running on another PFSense router connected to the same switch does not show these packets at all so presumably they are being dropped by the switch or receiving router due to the checksum error. In any case Multicast routing between VLAN's is not working.
Here is the network card configuration:
pciconf -lv ix0 ix0@pci0:1:0:0: class=0x020000 card=0x031b1dcf chip=0x15288086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Controller 10-Gigabit X540-AT2' class = network subclass = ethernet ifconfig -vvvm ix0 ix0: flags=8b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: LAN options=e138bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6> capabilities=f53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6> ether 98:b7:85:89:7f:74 inet6 fe80::9ab7:85ff:fe89:7f74%ix0 prefixlen 64 scopeid 0x1 inet 10.0.1.254 netmask 0xffff0000 broadcast 10.0.255.255 media: Ethernet autoselect (10Gbase-T <full-duplex>) status: active supported media: media autoselect media 100baseTX media 1000baseT media 10Gbase-T nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
I have not yet tried disabling Hardware Checksum Offloading as I've only just discovered this thread now in the middle of a work day so I can't reboot just now but I could try rebooting in the evening if this might provide a useable workaround.
The hardware is a Quad core 3.5Ghz Xeon server so should be OK to do software based checksum generation I would think ?
-
@dbmandrake I think a reboot should be done. With the setting enable this could be a valid data capture. Without the offloading pfSense is able to generate the checksums itself.
We can’t make a conclusion without disabling offloading.
-
@thebear Yes I will disable checksum offloading and reboot after hours and test again tomorrow.
Also unless I have missed something, PFSense is still sending out igmp v2 reports with corrupt checksums even after I have disabled the IGMP proxy service ?
Such as:
12:29:38.863070 IP (tos 0x0, ttl 1, id 33267, offset 0, flags [none], proto IGMP (2), length 32, options (RA), bad cksum 0 (->b5eb)!) 10.0.1.254 > 224.0.0.251: igmp v2 report 224.0.0.251
Could this be related to the Avahi service (which is still running) since 224.0.0.251 is the multicast address used by mDNS ? If so what is generating the igmp report ? The Avahi service itself or something within a lower level in the OS ?
-
Yes, with hardware checksum offloading enabled the pcap will show bad checksums as the hardware calculates them before the wire.
You might not expect to see replies to reports like that either. Unlike the queries shown above.
Steve
-
@stephenw10 Sorry for the noise in the thread, you were right about the checksum offloading.
I temporarily disabled IGMP snooping on the switch (to stop it filtering/handling the requests itself) and was able to observe the IGMP reports from the upstream side of IGMP proxy at another device arriving safely across the network with correct checksums. So checksums is not my issue so I never bothered to try turning off hardware checksuming.
However something really weird is going on - when the switch which is normally the IGMP querier (10.0.2.1) is the querier, it is somehow suppressing IGMP membership reports from the upstream interface of IGMP proxy on PFSense, (10.0.1.254) but when I make a different switch the querier (10.0.2.3 - the switch PFsense is connected to) suddenly IGMP proxy starts sending the upstream IGMP membership reports that it should be and it starts to work - at least partially, as I was able to find and play an SSDP advertised multicast stream between VLAN's.
A lot more debugging required on my part to figure out what's going on. One problem I have is a large mix of different switch models, with some of the older ones not necessarily playing nicely - I have a suspicion there is at least one other switch on the network which is not following the gentlemans rule of "don't be IGMP querier when a lower IP address device is already sending IGMP queries" as I have seen intermittent changes in the multicast router destination pointing towards these suspect downstream switches even though they are explicitly configured to never be an IGMP querier!