Loosing connection for all vlans on a LAGG after em-change



  • Hello,

    today I had a problem with my pfsense. I had problems with em3 and decided to change the lagg1 from em2+em3 to em2+em4.

    Before:

    lagg1: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
            options=4219b <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,tso4,wol_magic,vlan_hwtso>ether <deleted>inet6 <deleted>prefixlen 64 scopeid 0xf
            nd6 options=3 <performnud,accept_rtadv>media: Ethernet autoselect
            status: active
            laggproto lacp
            laggport: em3 flags=18 <collecting,distributing>laggport: em2 flags=1c <active,collecting,distributing>lagg1_vlan15: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 15                                                                                        00
            options=103 <rxcsum,txcsum,tso4>ether <deleted>inet6 <deleted>%lagg1_vlan15 prefixlen 64 scopeid 0x11
            inet 192.168.15.1 netmask 0xffffff00 broadcast 192.168.15.255
            nd6 options=1 <performnud>media: Ethernet autoselect
            status: active
            vlan: 15 vlanpcp: 0 parent interface: lagg1</performnud></deleted></deleted></rxcsum,txcsum,tso4></up,broadcast,running,simplex,multicast></active,collecting,distributing></collecting,distributing></performnud,accept_rtadv></deleted></deleted></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,tso4,wol_magic,vlan_hwtso></up,broadcast,running,simplex,multicast> 
    

    After changing the LAGG members I first wondered that the lagg1 was going down with each vlan. I didn't expected that - the lagg should stay up if at least one member stays the same. But I can live with that - the next problem was even worse: the lagg1 got up again, but the vlans weren't reachable  :(

    lagg1: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
            options=4219b <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,tso4,wol_magic,vlan_hwtso>ether <deleted>inet6 <deleted>%lagg1 prefixlen 64 scopeid 0xf
            nd6 options=3 <performnud,accept_rtadv>media: Ethernet autoselect
            status: active
            laggproto lacp
            laggport: em4 flags=0<>
            laggport: em2 flags=1c <active,collecting,distributing>lagg1_vlan15: flags=8803 <up,broadcast,simplex,multicast>metric 0 mtu 1500
            options=103 <rxcsum,txcsum,tso4>ether <deleted>inet6 <deleted>%lagg1_vlan15 prefixlen 64 scopeid 0x11
            inet 192.168.15.1 netmask 0xffffff00 broadcast 192.168.15.255
            nd6 options=1 <performnud>vlan: 0 vlanpcp: 0 parent interface:</performnud></deleted></deleted></rxcsum,txcsum,tso4></up,broadcast,simplex,multicast></active,collecting,distributing></performnud,accept_rtadv></deleted></deleted></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,tso4,wol_magic,vlan_hwtso></up,broadcast,running,simplex,multicast> 
    

    As you can see the vlan id and the parent interface aren't set. Any idea what has happened?


  • Netgate Administrator

    How did you make the change? Was the box rebooted? (not implying it has to)
    I imagine the LAGG interface was destroyed and rebuilt to make the change and hence anything using it as a parent interface became invalid.

    Steve



  • I made the change over the adminstration website (I had only the console open because I checked the state of em3).
    Just deselected em3 and selected em4.

    I'll try to reproduce it if I find time to built up a test box


  • Netgate Administrator

    I would imagine that if you had rebooted the box or in some other way recreated the VLAN interface it would come up no problems. The issue you describe seems like it would happen only if the box was continuously running throughout. However what you describe could be seen as a bug, if a VLANs parent interface is brought down for some reason perhaps the VLAN should be rebuilt. A question for the devs I think.

    Steve



  • Not sure if you were the one that filed a similar bug report, but I did confirm this issue and confirmed it's fixed in 2.2.



  • I can reproduce this at will on embedded alix2d13 running 2.2.6-RELEASE (i386):

    lagg0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
            options=8280b <rxcsum,txcsum,vlan_mtu,wol_ucast,wol_magic,linkstate>ether <removed>inet6 <removed>prefixlen 64 scopeid 0x8
            inet <removed>netmask 0xffffff00 broadcast 192.168.17.255
            nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect
            status: active
            laggproto lacp lagghash l2,l3,l4
            laggport: vr2 flags=1c <active,collecting,distributing>laggport: vr1 flags=1c <active,collecting,distributing>lagg0_vlan9: flags=8803 <up,broadcast,simplex,multicast>metric 0 mtu 1500
            ether 00:00:00:00:00:00
            inet6 <removed>prefixlen 64 scopeid 0xa
            inet <removed>netmask 0xffffff00 broadcast 192.168.18.255
            nd6 options=21 <performnud,auto_linklocal>vlan: 0 vlanpcp: 0 parent interface: <none>lagg0_vlan10: flags=8803 <up,broadcast,simplex,multicast>metric 0 mtu 1500
            ether 00:00:00:00:00:00
            inet6 <removed>prefixlen 64 scopeid 0x9
            inet <removed>netmask 0xffffff00 broadcast 192.168.19.255
            nd6 options=21 <performnud,auto_linklocal>vlan: 0 vlanpcp: 0 parent interface:</performnud,auto_linklocal></removed></removed></up,broadcast,simplex,multicast></none></performnud,auto_linklocal></removed></removed></up,broadcast,simplex,multicast></active,collecting,distributing></active,collecting,distributing></performnud,auto_linklocal></removed></removed></removed></rxcsum,txcsum,vlan_mtu,wol_ucast,wol_magic,linkstate></up,broadcast,running,simplex,multicast> 
    

    the vlanid is 0 and interface is none in such cases. Sadly this happens on every reboot or alteration to the LAGG. This is the first time I'm using such a setup on pfsense, thus I don't know if this ever worked before on this platform.

    I left a comment in the related issue: https://redmine.pfsense.org/issues/3976 but I don't think I have permission to reopen the ticket.


Log in to reply