[SOLVED] VLAN/802.1Q/Trunk + Custom MAC Addresses requires promisc port



  • EDIT: Using custom mac addresses on vlan interfaces seems to cause outgoing packets to disappear between em0_vlanX and em0.

    This should go into a VLAN category but I couldn't find one ;D
    
    I'm trying to set up a pfsense in a regular NAT setup for a very small ISP.
    
    Our customer base is divided into about 3 segments, (10.12.0.x/16, 10.13.0.x/16, and 10.14.0.x/16) which will each come into the pfSense box via their own vlan (912,913,914), then all the outbound traffic will go out through vlan911, via one of several 64.146.180.x IP addresses (set as Virtial IPs.)
    
    Specifically, a group of customers will have  IPs in the range of 10.12.0.1-254/16 with a default gateway of 10.12.0.1, which is the IP assigned to em0_vlan912 on the pfsense box.
    
    These clients cannot even ping 10.12.0.1\. The arp-who-has packets (according to tcpdump) come in through em0 tagged, then em0_wlan912 untagged, then the replies are seen on em0_vlan912 but never show up on em0.
    
    All of the clients in the 10.12.0.2-254 range are on vlan912 and should be able to directly ping 10.12.0.1, which is the IP assigned to em0_vlan912.
    
    I have "accept from/to anywhere, any protocol" – completely wide open accept rules under filter for vlan912 as well as vlan913 and vlan911.
    
    Log does not seem to show my ping or arp packets, either as accepted or denied.
    
    The pfsense box cannot ping out over the vlan interfaces either. The same thing happens - I see the arp who-has packets on em0_vlan913 (for example) but not on em0.
    
    It's as if the kernel doesn't know where to send tagged frames.
    
    Hardware is Netgate FW-7535 which is supposed to support trunking.
    
    Any clues would be very much appreciated.
    
    Thanks,
    ~Jesse
    
    PS/EDIT: I forgot one detail - I've assigned a unique mac address to each vlan interface, because some switches don't like the same mac address on two vlans. I salvaged some mac addresses from old dead network devices.


  • It's currently looking like user-specified macs and vlans don't mix, or there is a trick to make it work - a trick for which I'm still searching.

    Thanks,

    ~Jesse



  • @Jesse:

    PS/EDIT: I forgot one detail - I've assigned a unique mac address to each vlan interface, because some switches don't like the same mac address on two vlans.

    Do you need this kludge? Without knowing exactly what "don't like" means I would guess such switches probably aren't 802.1Q compliant, at least as far as 802.1Q is commonly implemented.

    I run VLANs. The VLANs inherit the MAC address from the parent interface. I don't have the trouble you describe.

    @Jesse:

    Specifically, a group of customers will have  IPs in the range of 10.12.0.1-254/16 with a default gateway of 10.12.0.1, which is the IP assigned to em0_vlan912 on the pfsense box.

    These clients cannot even ping 10.12.0.1. The arp-who-has packets (according to tcpdump) come in through em0 tagged, then em0_wlan912 untagged, then the replies are seen on em0_vlan912 but never show up on em0.

    This may not mean what you think it means.

    On my pfSense box rl0 is the parent interface for two vlans. I setup a ping to a particular host on one of the VLANs. If I run tcpdump on the vlan interface, filtering on the IP address that is the ping destination, the tcpdump output shows the traffic (ping and response).  If I recall the same command and just change the interface name to rl0 (the VLAN parent interface) I don't see any traffic logged. (Though I am warned that rl0 doesn't have an IPv4 address.) If I remove the filter from the tcpdump command and look carefully through the output I can see the ping and response.

    One problem I come across from time to time when trying to interpret missing transmit packets from a tcpdump trace is not knowing precisely what packets are logged:

    • all transmit packets given to the driver

    • all transmit packets given to the hardware ("ill formed" packets not logged)

    • all packets the hardware has processed

    • all packets the hardware reports it has successfully transmitted



  • Thank you very much for the response!

    @wallabybob:

    @Jesse:

    PS/EDIT: I forgot one detail - I've assigned a unique mac address to each vlan interface, because some switches don't like the same mac address on two vlans.

    Do you need this kludge? Without knowing exactly what "don't like" means I would guess such switches probably aren't 802.1Q compliant, at least as far as 802.1Q is commonly implemented.

    By "Don't like" I mean our upstream provider (which provides internet as well as fiber transport to several access points) complaints about us having the same mac address on multiple vlans. It works, but their gear creates log entries about it I guess. (They use Cisco. Remember, Cisco is that company that sells massively expensive high end network machines with a serial port that doesn't support flow control - which has been a standard for probably 50 years – and which is an extremely important feature when you're in emergency recovery mode trying to load a new config by pasting it into a serial terminal :0)

    In any case, our goal is to not use the same MAC address on more then one vlan, and it's definitely a standard feature of routers to allow the user to specify a MAC address for each interface, and the pfsense GUI makes it very easy to set up, it just doesn't work on vlan interfaces.

    @wallabybob:

    I run VLANs. The VLANs inherit the MAC address from the parent interface. I don't have the trouble you describe.

    Exactly right. If I remove the user-specified mac addresses, then my setup works.
    But, I need the user supplied mac addresses :)

    @wallabybob:

    @Jesse:

    Specifically, a group of customers will have  IPs in the range of 10.12.0.1-254/16 with a default gateway of 10.12.0.1, which is the IP assigned to em0_vlan912 on the pfsense box.

    These clients cannot even ping 10.12.0.1. The arp-who-has packets (according to tcpdump) come in through em0 tagged, then em0_wlan912 untagged, then the replies are seen on em0_vlan912 but never show up on em0.

    This may not mean what you think it means.

    I have been using tcpdump on a daily basis for years but I'm open to suggestions.

    @wallabybob:

    On my pfSense box rl0 is the parent interface for two vlans. I setup a ping to a particular host on one of the VLANs. If I run tcpdump on the vlan interface, filtering on the IP address that is the ping destination, the tcpdump output shows the traffic (ping and response).  If I recall the same command and just change the interface name to rl0 (the VLAN parent interface) I don't see any traffic logged. (Though I am warned that rl0 doesn't have an IPv4 address.) If I remove the filter from the tcpdump command and look carefully through the output I can see the ping and response.

    Yeah, I noticed different versions of pfSense seem to have different types of names for ethernet devices. In any case, mine is em0, em1, etc. (I have a Lanner 6Gig Port box, so I have em0 through em5.)

    In any case, the reason "host 1.2.3.4" doesn't work on a trunk is because the trunk tag is in front of the IP header, so when tcpdump seeks to a certain offset in the packet it doesn't realize it's looking in the wrong place.

    The correct solution is to specify "vlan" before your ip matching parameters, then they will work again.
    Thus, these two commands would both work:

    tcpdump -lni rl0 vlan and host 1.2.3.4
    tcpdump -lni rl0_vlan123 host 1.2.3.4

    You can also get creative and choose which vlan you'd like to view, by sniffing the trunk but specifying "vlan 123 and host 1.2.3.4" or whatever.

    And in a pinch, if you are trying to figure out where traffic is going and you don't know if it's tagged or not, you can use grep, like this:

    tcpdump -lni rl0 | grep '1.2.3.4' and it'll display just the IP address you're after, regardless of whether it's tagged or not.

    However, in my case, the packets show up on on em0_vlanXXX but not on em0 – tagged or otherwise.

    @wallabybob:

    One problem I come across from time to time when trying to interpret missing transmit packets from a tcpdump trace is not knowing precisely what packets are logged:

    • all transmit packets given to the driver

    • all transmit packets given to the hardware ("ill formed" packets not logged)

    • all packets the hardware has processed

    • all packets the hardware reports it has successfully transmitted

    **Aaahhhh! After writing the above comments, I noticed that when running tcpdump on em0 traffic would flow, and then stop when I stopped tcpdump – which can only mean one thing -- because tcpdump puts the card into promiscuous mode. I know I ran tcpdump about a hundred times already on em0 over the course of 3 days, with the hope that it'd make things work, but it didn't. But now it did.

    So I typed (in the shell) ifconfig em0 promisc
    and everything started working!!

    So now I need to figure out how to keep em0 in promisc across reboots.**

    Thanks,

    ~Jesse



  • Package called shellcmd would do the job, but it's for 1.2 version of pfsense. so i don't know does it work on 2.0



  • @Metu69salemi:

    Package called shellcmd would do the job, but it's for 1.2 version of pfsense. so i don't know does it work on 2.0

    Thanks a million! It seems to be all working now!

    As you said, I just installed shellcmd which was available for 2.0 and seems to work fine.

    After installing it, I put in ifconfig em0 promisc as a "shellcmd" (rather than an early shellcmd) and saved and rebooted and now it works.

    For newbies such as myself, to install shellcmd go to "System" then "Packages" then click the "Available Packages" tab and scroll down or search and find "shellcmd" and click the little Plus (+) Icon to the right of it, and then follow the prompts.

    Then once it's installed, click "Services" then click "shellcmd" and you can add the command there.

    Thanks again!

    ~Jesse



  • Good to hear. you can edit first posts subject of this message chain with [SOLVED]



  • That you needed the interface to go into promiscuous mode suggests that it isn't being programmed with the MAC address of the different VLAN interfaces - which may be a bug or it may be a limitation of the hardware.

    Depending on the network gear this interface is connected to, operating in promiscuous mode MIGHT add significant overhead because the software might have to deal with lots of received packets that it discards.



  • @wallabybob:

    That you needed the interface to go into promiscuous mode suggests that it isn't being programmed with the MAC address of the different VLAN interfaces - which may be a bug or it may be a limitation of the hardware.

    Well, user-supplied mac address works fine for actual interfaces, just not vlan interfaces.

    Is detrunking done by the card itself, or entirely in the kernel?

    I think I saw in my reading on pfsense that trying to trunk over a card that doesn't support it will work for small packets but not larger packets, so as far as I know, a cards ability to trunk is primarily related to its ability to carry oversized packets.

    Also, when I was sniffing around with tcpdump, I'd see the arp-reply outbound on em0_vlan912 but not on em0 – so it appears that it was actually at a software/driver stage where the packets were being dropped, somewhere in the re-trunking code. But then I'm assuredly no kernel programmer!

    I've seen before a case where bridging in Linux required the card to be left in promisc mode, which sort of makes sense since in a bridge configuration the card must deal with every packet that comes even if it's not to its own MAC Address.

    But in the case of pfsense losing the packets that are actually to the mac address assigned to the interface, it does sound like a bug in the kernel. I wonder if a bug report should be filed - either to pfsense so they can put a fix in (like having a checkbox to leave card in promisc for vlan+virtual-mac-address) or with the bsd folks about fixing the actual problem.  (Or maybe it's an artifact from another featuer!)

    In any case, my VLAN interfaces do respond with the MAC address I assigned them, and all seems to be working nicely.

    Depending on the network gear this interface is connected to, operating in promiscuous mode MIGHT add significant overhead because the software might have to deal with lots of received packets that it discards.

    Yeah, I thought about that. The vast majority of traffic hitting the trunk port should be destined for this box, however, so there shouldn't be too much of a performance hit.

    Thanks very much,

    ~Jesse



  • Yesterday the pfsense box went back to dropping all packets on trunk interfaces, even though the card was in promisc.
    The only thing we did that could have upset it was to unplug its trunk port for a while, however repeated tests do not seem to cause the problem.

    A reboot brought it back to full functionality.

    Evidently, there must be some bugs in the kernel network code - but as long as I can't figure out how to reliably reproduce the problem, there's a very low chance of it getting fixed. (Although maybe if the the network developers fixed the trunk+custom-mac+promisc problem they'd stumble across the cause of this other problem :)

    In any case, I'm brand new to the pfsense/bsd world. (I'm heavily familiar with networking at the packet header level and Linux, so I understand the general concepts)

    Does anyone have advice for me? Is this a bug that can likely be fixed by the wonderful volunteers who write BSD kernel drivers, or am I pretty much stuck, especially so long as I can't easily reproduce the problem?

    I really do need to use custom mac addresses and vlan interfaces together, and it certainly wouldn't do to have a router that arbitrarily stops passing traffic for an unknown reason :-)

    Thanks a million!

    ~Jesse


Locked