SG-1000 VLANs not working unless PROMISC is set



  • Hi,

    Hardware, SG-1000.

    I am hitting the issue
    https://redmine.pfsense.org/issues/7645
    https://forum.netgate.com/topic/116846/sg-1000-vlans-don-t-seem-to-work

    I am running pfsense 2.4.4_3.

    One thing to add: i have a lagg over the two cpsw nic, and vlans built on top of it.

    In the issue there is reported to open a forum thread, so here i am...

    Thank you very much,
    Daniele


  • Netgate Administrator

    Hmm, that may not be a supported configuration on the SG-1000.
    Can you test traffic across the LAGG directly without VLANs?
    Where are you setting PROMISC exactly? Which interface(s)?

    The ports on the SG-1000 are actually switch ports although the driver reports them to the OS as individual NICs.

    Steve



  • Hi Steve,

    I have set PROMISC on cpsw0 and cpsw1.

    The LAGG is LACP configured.
    Without PROMISC on cpsw*, one port of the LACP (cpsw0 OR cpsw1) was never distributing.
    It was one at the time, normally cpsw1, but if i take out of the LAGG cpsw0 then cpsw1 is working.

    Can you test traffic across the LAGG directly without VLANs?

    This is working, it's affecting only the VLANs.

    Thanks,
    Daniele


  • Netgate Administrator

    Yes, I don't think that's a supported mode.
    The VLANs will apply settings to their parent interface but if that's a LAGG that is normally passed to it's members. But the cpsw driver is somewhat unique and may not work with that.
    If you need to use the setup I suspect you will have to add the ifconfig command as a shellcmd to correct it.
    https://docs.netgate.com/pfsense/en/latest/development/executing-commands-at-boot-time.html

    If you use the afterfilterchnagesshellcmd type it should reapply when you make changes so not come out of promiscuous mode.

    Steve



  • About the supported mode,
    I was effectively using it.
    Then i have upgraded && possibly changed something, and pfsense got broken.
    So it was at least working :)

    I have made a further test, and this really seems not linked to the LAGG configuration.
    I have removed cpsw1 from the lagg, and moved to another switch physical port.
    Added the VLAN on top of it, and assigned to a network interface in pfsense.

    [root@pf2-tos ~]# ping 172.16.82.241
    PING 172.16.82.241 (172.16.82.241): 56 data bytes
    ^C
    --- 172.16.82.241 ping statistics ---
    4 packets transmitted, 0 packets received, 100.0% packet loss
    [root@pf2-tos ~]# ifconfig cpsw1 promisc
    [root@pf2-tos ~]# ping 172.16.82.241
    PING 172.16.82.241 (172.16.82.241): 56 data bytes
    64 bytes from 172.16.82.241: icmp_seq=0 ttl=64 time=1.447 ms
    64 bytes from 172.16.82.241: icmp_seq=1 ttl=64 time=0.697 ms
    ^C
    --- 172.16.82.241 ping statistics ---
    2 packets transmitted, 2 packets received, 0.0% packet loss
    round-trip min/avg/max/stddev = 0.697/1.072/1.447/0.375 ms
    [root@pf2-tos ~]#

    [root@pf2-tos ~]# ifconfig cpsw1
    cpsw1: flags=28943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST,PPROMISC> metric 0 mtu 1500
    options=8000b<RXCSUM,TXCSUM,VLAN_MTU,LINKSTATE>
    ether c8:df:84:c1:16:39
    hwaddr c8:df:84:c1:16:39
    inet6 fe80::cadf:84ff:fec1:1639%cpsw1 prefixlen 64 scopeid 0x2
    media: Ethernet autoselect (1000baseT <full-duplex>)
    status: active
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
    [root@pf2-tos ~]# ifconfig cpsw1.12
    cpsw1.12: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=80000<LINKSTATE>
    ether c8:df:84:c1:16:39
    inet6 fe80::cadf:84ff:fec1:1639%cpsw1.12 prefixlen 64 scopeid 0x10
    inet 172.16.82.242 netmask 0xffffff00 broadcast 172.16.82.255
    groups: vlan
    vlan: 12 vlanpcp: 0 parent interface: cpsw1
    media: Ethernet autoselect (1000baseT <full-duplex>)
    status: active
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
    [root@pf2-tos ~]#

    So seems not bonded to the bonding configuration (ah, i like stupid wording jokes :D)


  • Netgate Administrator

    Ah, interesting. Do you know what version you were running when it was working as expected?

    Steve



  • I recall that the first issue was back in Apr/2019.
    According to https://docs.netgate.com/pfsense/en/latest/releases/versions-of-pfsense-and-freebsd.html,
    I was moving from 2.4.4-p1 to 2.4.4-p2.
    I'm now on 2.4.4-p3.

    But, I'm not 100% sure about it...

    Back to https://redmine.pfsense.org/issues/7645, what was the changelog for the issue?
    Just wondering to identify if there is a regression.

    Thanks,
    Daniele


  • Netgate Administrator

    Just spent a while looking for it and failed to track it down. I'll have to ask someone who might know directly.



  • Hi there,

    Any news?

    Thank you very much,
    Daniele


  • Netgate Administrator

    I've been unable to replicate this in 2.4.4p3 or 2.5.

    [2.4.4-RELEASE][root@ufw3.stevew.lan]/root: ifconfig
    cpsw0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
            options=8000b<RXCSUM,TXCSUM,VLAN_MTU,LINKSTATE>
            ether 0c:b2:b7:af:2f:4f
            hwaddr 0c:b2:b7:af:2f:4f
            inet6 fe80::eb2:b7ff:feaf:2f4f%cpsw0 prefixlen 64 scopeid 0x1 
            inet 172.21.16.80 netmask 0xffffff00 broadcast 172.21.16.255 
            media: Ethernet autoselect (1000baseT <full-duplex,master>)
            status: active
            nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
    cpsw1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
            options=8000b<RXCSUM,TXCSUM,VLAN_MTU,LINKSTATE>
            ether 0c:b2:b7:af:2f:51
            hwaddr 0c:b2:b7:af:2f:51
            inet 192.168.80.1 netmask 0xffffff00 broadcast 192.168.80.255 
            inet6 fe80::1:1%cpsw1 prefixlen 64 duplicated scopeid 0x2 
            media: Ethernet autoselect (1000baseT <full-duplex>)
            status: active
            nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
    enc0: flags=0<> metric 0 mtu 1536
            groups: enc 
            nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
    lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
            options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
            inet6 ::1 prefixlen 128 
            inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 
            inet 127.0.0.1 netmask 0xff000000 
            groups: lo 
            nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
    pfsync0: flags=0<> metric 0 mtu 1500
            syncpeer: 224.0.0.240 maxupd: 128 defer: on
            syncok: 1
            groups: pfsync 
    pflog0: flags=100<PROMISC> metric 0 mtu 33184
            groups: pflog 
    cpsw1.50: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
            options=80000<LINKSTATE>
            ether 0c:b2:b7:af:2f:51
            inet6 fe80::eb2:b7ff:feaf:2f51%cpsw1.50 prefixlen 64 scopeid 0x7 
            inet 172.18.10.11 netmask 0xffffff00 broadcast 172.18.10.255 
            groups: vlan 
            vlan: 50 vlanpcp: 0 parent interface: cpsw1
            media: Ethernet autoselect (1000baseT <full-duplex>)
            status: active
            nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
    
    [2.4.4-RELEASE][root@ufw3.stevew.lan]/root: ping 172.18.10.1
    PING 172.18.10.1 (172.18.10.1): 56 data bytes
    64 bytes from 172.18.10.1: icmp_seq=0 ttl=64 time=0.621 ms
    64 bytes from 172.18.10.1: icmp_seq=1 ttl=64 time=0.972 ms
    64 bytes from 172.18.10.1: icmp_seq=2 ttl=64 time=0.491 ms
    ^C
    --- 172.18.10.1 ping statistics ---
    3 packets transmitted, 3 packets received, 0.0% packet loss
    round-trip min/avg/max/stddev = 0.491/0.695/0.972/0.203 ms
    

    Capturing at the other side:

    16:36:42.590615 0c:b2:b7:af:2f:51 > 00:90:0b:76:8e:52, ethertype 802.1Q (0x8100), length 60: vlan 50, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 9584, offset 0, flags [none], proto ICMP (1), length 28)
        172.18.10.11 > 172.18.10.1: ICMP echo request, id 52423, seq 628, length 8
    16:36:42.590667 00:90:0b:76:8e:52 > 0c:b2:b7:af:2f:51, ethertype 802.1Q (0x8100), length 46: vlan 50, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 59789, offset 0, flags [none], proto ICMP (1), length 28)
        172.18.10.1 > 172.18.10.11: ICMP echo reply, id 52423, seq 628, length 8
    

    Requiring promiscuous mode like that usually implies the wrong MAC address or at least an unexpected MAC. I wonder if the lagg on there altered it for that driver?

    Steve



  • @stephenw10 lagg has been not configured. Without lagg I confirm that this is working

    My lagg is configred with lacp

    Thanks,


  • Netgate Administrator

    Oh, OK so this only a problem over lagg? In your previous post it looked like you removed cpsw1 from the lagg and a VLAN on that port still only worked with promiscuous mode enabled.

    Steve



  • Shame on myself,

    I have made a further test, and this really seems not linked to the LAGG configuration.
    So, i'll try to upgrade it from scratch and let's see.

    I will keep you posted.

    Thanks,
    Daniele


Log in to reply