23.01.b.20230106.0600 IGMP proxy stops TV stream
-
@thebear There are reports that the Intel i226-V has issues like the 1st revision of the Intel i225-V. From what I recall, it was only fully resolved with a hardware fix with Revision 3.
https://linustechtips.com/topic/1483003-raptor-lake-motherboards-allegedly-hit-with-ethernet-controller-flaw-intel-i226-v-25gbe-has-a-connection-drop-issue-no-fix-currently/
-
@lnguyen said in 23.01.b.20230106.0600 IGMP proxy stops TV stream:
@thebear There are reports that the Intel i226-V has issues like the 1st revision of the Intel i225-V. From what I recall, it was only fully resolved with a hardware fix with Revision 3.
https://linustechtips.com/topic/1483003-raptor-lake-motherboards-allegedly-hit-with-ethernet-controller-flaw-intel-i226-v-25gbe-has-a-connection-drop-issue-no-fix-currently/
Yeah saw that, but where is the truth and facts? All the website are copying the news and spreading rumors last week, clickbaits. My pfSense unit with 22.5 never had a single disconnect, stable as a rock.
And now only igmp joins are not received by the proxy. Which could be some kernel igmp snooping for example -> software ;-)
I'm not believing the news sites as my unicast internet is stable as a rock with 23.01 and the i-226v. Its more clickbaite news ;-)
Nevertheless, thanks for sharing, all pieces could help.
Nothing going on- > Link logs over the last uptime arround 3,5 day:
2023-01-22 08:49:16.956753+01:00 php-fpm 363 /rc.newwanip: rc.newwanip: Info: starting on igc1.4. 2023-01-22 08:49:15.952055+01:00 check_reload_status 401 rc.newwanip starting igc1.4 2023-01-22 08:49:14.525924+01:00 check_reload_status 401 Linkup starting igc1.6 2023-01-22 08:49:14.520114+01:00 check_reload_status 401 Linkup starting igc1.4 2023-01-22 08:49:14.514628+01:00 kernel - igc1.6: link state changed to UP 2023-01-22 08:49:14.514596+01:00 kernel - igc1.4: link state changed to UP 2023-01-22 08:49:14.514537+01:00 kernel - igc1: link state changed to UP 2023-01-22 08:49:14.514245+01:00 check_reload_status 401 Linkup starting igc1 2023-01-22 08:48:59.918274+01:00 check_reload_status 401 Linkup starting igc1.6 2023-01-22 08:48:59.916272+01:00 check_reload_status 401 Linkup starting igc1.4
-
Yeah this seems very unlikely to be hardware given it was working fine in 22.05.
I could believe the updated driver is doing something to mangle the igmp packets though.
Check the enabled options in the NIC with
ifconfig -vvvm igc0
. Perhaps it's not correctly disabling checksum offload. Or maybe some new option is present in 23.01 that wasn't in 22.05 at all.For example the 8200 i226 NICs:
[22.05.1-RELEASE][root@8200-2.stevew.lan]/root: pciconf -lv igc0 igc0@pci0:4:0:0: class=0x020000 card=0x00008086 chip=0x125c8086 rev=0x04 hdr=0x00 vendor = 'Intel Corporation' class = network subclass = ethernet [22.05.1-RELEASE][root@8200-2.stevew.lan]/root: ifconfig -vvvm igc0 igc0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e120bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6> capabilities=f53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6> ether 90:ec:77:47:5c:e8 inet6 fe80::92ec:77ff:fe47:5ce8%igc0 prefixlen 64 scopeid 0x1 inet6 fe80::1:1%igc0 prefixlen 64 scopeid 0x1 inet 192.168.92.1 netmask 0xffffff00 broadcast 192.168.92.255 media: Ethernet autoselect status: no carrier supported media: media autoselect media 2500Base-T media 1000baseT media 1000baseT mediaopt full-duplex media 100baseTX mediaopt full-duplex media 100baseTX media 10baseT/UTP mediaopt full-duplex media 10baseT/UTP nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
Steve
-
@stephenw10 said in 23.01.b.20230106.0600 IGMP proxy stops TV stream:
Yeah this seems very unlikely to be hardware given it was working fine in 22.05.
I could believe the updated driver is doing something to mangle the igmp packets though.
Check the enabled options in the NIC with
ifconfig -vvvm igc0
. Perhaps it's not correctly disabling checksum offload. Or maybe some new option is present in 23.01 that wasn't in 22.05 at all.Morning @stephenw10 below the output and some screenshots from a diff tool. They do diff, could you see some issues?
[23.01-BETA][admin@pfSense.high.local]/root: pciconf -lv igc0 igc0@pci0:2:0:0: class=0x020000 rev=0x04 hdr=0x00 vendor=0x8086 device=0x125c subvendor=0x8086 subdevice=0x0000 vendor = 'Intel Corporation' class = network subclass = ethernet [23.01-BETA][admin@pfSense.high.local]/root: ifconfig -vvvm igc0 igc0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: Server options=48020b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_MAGIC,NOMAP> capabilities=4f43fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP> ether 7c:2b:e1:13:7a:db inet6 fe80::7e2b:e1ff:fe13:7adb%igc0 prefixlen 64 scopeid 0x1 inet6 fe80::1:1%igc0 prefixlen 64 scopeid 0x1 inet6 2a02:a<bla> prefixlen 64 inet 172.16.1.1 netmask 0xffffff00 broadcast 172.16.1.255 media: Ethernet autoselect (1000baseT <full-duplex>) status: active supported media: media autoselect media 2500Base-T media 1000baseT media 1000baseT mediaopt full-duplex media 100baseTX mediaopt full-duplex media 100baseTX media 10baseT/UTP mediaopt full-duplex media 10baseT/UTP nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> [23.01-BETA][admin@pfSense.high.local]/root:
-
Ok, so we can see that, interestingly, the VLAN hardware filtering capability has been removed. And it was previously in use but I don't think you are using VLANs?
The NOMAP (unmapped mbufs) capability has been added and is enabled by default. I don't believe that would affect only multicast traffic. And it appears to be a transmit option.
Are you able to test using a different interface?
-
@stephenw10 said in 23.01.b.20230106.0600 IGMP proxy stops TV stream:
Ok, so we can see that, interestingly, the VLAN hardware filtering capability has been removed. And it was previously in use but I don't think you are using VLANs?
The NOMAP (unmapped mbufs) capability has been added and is enabled by default. I don't believe that would affect only multicast traffic. And it appears to be a transmit option.
Are you able to test using a different interface?
The diff had made against YOUR output let me clarify that ;-) Yes using a hand-full VLAN's on LAN and two on WAN.
Would you be able to elaborate more on the NOMAP?
Do you mean another interface on the same board, like from igc0 to igc2? Or put in another interface board, the last option is impossible as it is a fixed housing+NIC's.
-
The thing is that its not quite easy to rollback tot 22.05 to test the previous interface output or IGMP packet capture.
-
The nomap capability should be easy to disable but it appears ifconfig may not have caught up yet. I can't find a syntax that works. But I doubt that would make any difference here.
I also compared the options between 22.05 and 23.01 from the same config in the 8200 and saw the same.
If it's a ZFS install you should be ble to roll back the BE snapshot.
-
@stephenw10 23.01.r.20230202.0019 same results with the RC build ;(
-
@stephenw10 could this IGMP issue be added to the list of issues?https://redmine.pfsense.org/projects/pfsense/issues?query_id=186
-
If an IGMPproxy bug report can be opened then it can be added to that list.
Were you able to confirm that the queries from the proxy are generated with a correct checksum in 22.05?
If so I'd go ahead and open a bug report for it.
Do you see the bad checksum on all IGMP packets generated by the proxy? Upstream and downstream?
-
@stephenw10 said in 23.01.b.20230106.0600 IGMP proxy stops TV stream:
If an IGMPproxy bug report can be opened then it can be added to that list.
Were you able to confirm that the queries from the proxy are generated with a correct checksum in 22.05?
If so I'd go ahead and open a bug report for it.
Do you see the bad checksum on all IGMP packets generated by the proxy? Upstream and downstream?
Hi steve, the thing is that the config is changing a lot. My 22.05 backup is too old. And the 226v NIC is not supported yet so a fresh 22.05 installation is consuming quite a lot time.
My action plan:
-fresh install of final 23.01 usb installer image. Test with just a clean install, configure the IPTV_WAN/LAN and IGMP.
-If that works, load the backup file, check again.
-If both not work I'll install 2.7 and try to upgrade to 22.05, well in fact that's a downgrade. My only option left is a fresh install with an USB NIC from 2.6.0?I'll capture the WAN traffic to check the checksum.
-
You don't have a 22.05 ZFS BE snapshot you can roll back to?
-
@stephenw10 said in 23.01.b.20230106.0600 IGMP proxy stops TV stream:
You don't have a 22.05 ZFS BE snapshot you can roll back to?
You mentioned that before. I got not clue how to do that. Tested some CLI commands, never found the power to press enter.
If there is some more explanation or a how to documented?
EDIT: BE seems to be Boot Environment? Interesting, I was on the wrong path with the CLI commands.
-
Well you can do it from the CLI using bectl but it's much easier using the GUI.
See: https://docs.netgate.com/pfsense/en/latest/backup/zfsbe/gui.html
Also note that you cannot roll back to a 22.01 snapshot BE. It will fail to boot. The compatibility was added in 22.05.
-
@stephenw10 its definitely an 23.01 bug, reverted to 22.05 and not a glitch in the TV. And the corrupt IGMP packet is gone, and therefore the STB is answering in 22.05. Due to the corrupt IGMP report listeners request the STB does not answer back in 23.01 and TV stops playing.
How does filing a bug works?
22.05 full set of requests + answers from the STB + valid packet
23.01 full set of request and missing answers from the STB + corrupt packet shows up.
-
Bug looks good, thanks.
-
Hi I just came accross this same issue. And this is a blocking issue for me aswell.
I hope it can be fixed in 23.01 instead of 23.05 where it is scheduled for right now. -
Just to report that I also seem to be having an issue with the IGMP Proxy generating packets with invalid checksums, although in my case I am running 2.6.0, and I am using a completely different network adaptor.
As I am only just configuring the IGMP proxy for the first time (after the router has been installed for a few months) I do not have a previously working installation to compare with, and I have been troubleshooting for some time before noticing the checksum errors. Tcpdump running on the same device shows:
tcpdump: listening on ix0, link-type EN10MB (Ethernet), capture size 262144 bytes 11:21:25.462732 IP (tos 0xc0, ttl 1, id 13, offset 0, flags [none], proto IGMP (2), length 32, options (RA)) 10.0.2.3 > 224.0.0.1: igmp query v2 11:21:26.999364 IP (tos 0x0, ttl 1, id 58030, offset 0, flags [none], proto IGMP (2), length 32, options (RA), bad cksum 0 (->5624)!) 10.0.1.254 > 224.0.0.7: igmp v2 report 224.0.0.7 11:21:27.601357 IP (tos 0x0, ttl 1, id 35790, offset 0, flags [none], proto IGMP (2), length 32, options (RA), bad cksum 0 (->ac0f)!) 10.0.1.254 > 224.0.0.252: igmp v2 report 224.0.0.252 11:21:30.205921 IP (tos 0x0, ttl 1, id 33675, offset 0, flags [none], proto IGMP (2), length 32, options (RA), bad cksum 0 (->b453)!) 10.0.1.254 > 224.0.0.251: igmp v2 report 224.0.0.251 11:21:36.336285 IP (tos 0x0, ttl 1, id 37154, offset 0, flags [none], proto IGMP (2), length 32, options (RA), bad cksum 0 (->a5fc)!) 10.0.1.254 > 224.0.1.187: igmp v2 report 224.0.1.187 11:21:36.672559 IP (tos 0x0, ttl 1, id 59654, offset 0, flags [none], proto IGMP (2), length 32, options (RA), bad cksum 0 (->3fd9)!) 10.0.1.254 > 239.255.255.250: igmp v2 report 239.255.255.250
Tcpdump running on another PFSense router connected to the same switch does not show these packets at all so presumably they are being dropped by the switch or receiving router due to the checksum error. In any case Multicast routing between VLAN's is not working.
Here is the network card configuration:
pciconf -lv ix0 ix0@pci0:1:0:0: class=0x020000 card=0x031b1dcf chip=0x15288086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Controller 10-Gigabit X540-AT2' class = network subclass = ethernet ifconfig -vvvm ix0 ix0: flags=8b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST> metric 0 mtu 1500 description: LAN options=e138bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,RXCSUM_IPV6,TXCSUM_IPV6> capabilities=f53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,NETMAP,RXCSUM_IPV6,TXCSUM_IPV6> ether 98:b7:85:89:7f:74 inet6 fe80::9ab7:85ff:fe89:7f74%ix0 prefixlen 64 scopeid 0x1 inet 10.0.1.254 netmask 0xffff0000 broadcast 10.0.255.255 media: Ethernet autoselect (10Gbase-T <full-duplex>) status: active supported media: media autoselect media 100baseTX media 1000baseT media 10Gbase-T nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
I have not yet tried disabling Hardware Checksum Offloading as I've only just discovered this thread now in the middle of a work day so I can't reboot just now but I could try rebooting in the evening if this might provide a useable workaround.
The hardware is a Quad core 3.5Ghz Xeon server so should be OK to do software based checksum generation I would think ?
-
@dbmandrake I think a reboot should be done. With the setting enable this could be a valid data capture. Without the offloading pfSense is able to generate the checksums itself.
We can’t make a conclusion without disabling offloading.
-
@thebear Yes I will disable checksum offloading and reboot after hours and test again tomorrow.
Also unless I have missed something, PFSense is still sending out igmp v2 reports with corrupt checksums even after I have disabled the IGMP proxy service ?
Such as:
12:29:38.863070 IP (tos 0x0, ttl 1, id 33267, offset 0, flags [none], proto IGMP (2), length 32, options (RA), bad cksum 0 (->b5eb)!) 10.0.1.254 > 224.0.0.251: igmp v2 report 224.0.0.251
Could this be related to the Avahi service (which is still running) since 224.0.0.251 is the multicast address used by mDNS ? If so what is generating the igmp report ? The Avahi service itself or something within a lower level in the OS ?
-
Yes, with hardware checksum offloading enabled the pcap will show bad checksums as the hardware calculates them before the wire.
You might not expect to see replies to reports like that either. Unlike the queries shown above.
Steve
-
@stephenw10 Sorry for the noise in the thread, you were right about the checksum offloading.
I temporarily disabled IGMP snooping on the switch (to stop it filtering/handling the requests itself) and was able to observe the IGMP reports from the upstream side of IGMP proxy at another device arriving safely across the network with correct checksums. So checksums is not my issue so I never bothered to try turning off hardware checksuming.
However something really weird is going on - when the switch which is normally the IGMP querier (10.0.2.1) is the querier, it is somehow suppressing IGMP membership reports from the upstream interface of IGMP proxy on PFSense, (10.0.1.254) but when I make a different switch the querier (10.0.2.3 - the switch PFsense is connected to) suddenly IGMP proxy starts sending the upstream IGMP membership reports that it should be and it starts to work - at least partially, as I was able to find and play an SSDP advertised multicast stream between VLAN's.
A lot more debugging required on my part to figure out what's going on. One problem I have is a large mix of different switch models, with some of the older ones not necessarily playing nicely - I have a suspicion there is at least one other switch on the network which is not following the gentlemans rule of "don't be IGMP querier when a lower IP address device is already sending IGMP queries" as I have seen intermittent changes in the multicast router destination pointing towards these suspect downstream switches even though they are explicitly configured to never be an IGMP querier!
-
Unfortunately, I have the same problem here. Just updated from 22.05 to
23.01-RC (amd64)
built on Wed Feb 08 14:19:05 UTC 2023I can confirm that the IGMP proxy does not work and IPTV with this version unfortunately then also not.
Then the bug probably also applies to "CE 2.7.0", right?
-
-
I would expect it to be the same in 2.7 currently, yes.
-
Is threre any way the fix will make it into 23.01? That would be very appreciated.
https://redmine.pfsense.org/issues/13929
-
Not at this point, at least not in the release build. There is no fix yet as I understand it and the release is imminent. If it's something that can be patched at runtime it can be added to the system patches package later.
-
This is not good news...
Then I have to skip this update. :( -
-
I agree it's unfortunate but we couldn't hold the release for this. As soon as we pin down the issue we'll know more.
-
@stephenw10 is this issue affecting all IGMP proxy setups? Not just particullar driver or configuration. Combinations?
-
I haven't been able to replicate it here yet so I can't say for sure. However I would expect it to be something that affects all NICs/drivers. Other multicast traffic, like CARP, does not appear affected so it looks like something in igmpproxy or the libraries it uses.
Steve
-
I faced the same issue when upgraded to 23.01, Disable hardware checksum offload didnt help, bad checksum was in the packet capture
after reverting to BE 22.05 no bad checksum and ip tv works,
my interface is I226-V -
It would be nice if this issue was included in the release notes for 23.01. I read through all the documentation for 23.01 and then proceeded with the upgrade only to find that igmp-proxy did not work. Several hours of troubleshooting lead me to this forum post where it appears that the issue was known for some time before the release of 23.01. This is not the introduction to Netgate and pfSense that I wanted, having just migrated to Netgate products from Ubiquiti Unifi products specifically for the igmp-proxy functionality. Oh well, now I'll get an early introduction to the process for rolling back upgrades.
-
@jrueger said in 23.01.b.20230106.0600 IGMP proxy stops TV stream:
Oh well, now I'll get an early introduction to the process for rolling back upgrades.
Well that's not new to UniFi users right ;-) but yes I think the dev's underestimate the impact of this bug for home deployments. The list get longer and longer when users hit the update button.
At least it could be documented under know issues.
-
@thebear - no kidding with the Unifi gateways ... I finally abandoned Ubiquiti because of the opaqueness of their processes and capabilities. They did finally introduce igmp-proxy functionality in the latest release but with no documentation ... only a checkbox in the GUI and no explanation of just what exactly was enabled when checking the box. So far I am enjoying pfSense immensely so I can live with the igmp-proxy issue since I get to tinker more with things. My wife wasn't happy that her television show was interrupted!
-
Yes, I apologise, it should have been in the notes. The issue was undefined, and still is to some extent. Hopefully we can pin it down shortly.
Steve
-
@stephenw10
issue seems to be from upstream FreeBSD-14 , I was able reproduce checksum errors on FreeBSD-14.0-CURRENT-amd64-20230216-2894c8c96b9b-260969 while freeBSD 13.1-Release doesn't have the issue, pcap attached
13.pcap
14.pcap -
I think i found the issue updated bugtracker, I am attaching patch + binary , it is working for me now, if someone else can confirm that would be great
-
@mxt3rs said in 23.01.b.20230106.0600 IGMP proxy stops TV stream:
I think i found the issue updated bugtracker, I am attaching patch + binary , it is working for me now, if someone else can confirm that would be great
Sure happy to test, TV is currently in use so need to wait until tomorrow. File is loaded and ready for testing.
-
@mxt3rs very good work! IGMP if flowing and the TV stream also.
Every membership query is answered: