Incoming traffic on vlan not recognized

podilarius

Did you upgrade your 2.0.1 config or did you start from scratch?

dotOne

I created the lan and wan interfaces including vlans during install.
Then I imported the 2.0.1 config.

After I found out things didn't work like it should I deleted the wan interface and recreated it again, but this did not solve the problem.

Starting from scratch is a awfull lot of work.
But if it has to be done…, still I'm not convinced it's a configuration issue.

@ndre

dotOne

again another update.

Because I wanted to rule out any hardware issues I tried another system (a jetway NF92 board) also with a intel 3port eth extension board.
The on-board interface has a Broadcom chipset.

On this system I used the re0 with vlan's as the WAN interface.
Again, this system shows exactly the same symptoms. Incoming traffic is seen on the main interface but not on the vlan interface.

Here I also created a FW rule allowing any traffic to be able to test with ICMP (ping) also.

As a final test I did the same with 2.0.1 software and everything works as a charm.

In the end my only conclusion can be it is in the vlan code of the 2.1 software.
The build I tested with was from Sat Aug 25 13:20:25 EDT 2012.

I hope this will be fixed soon.

@andre

dotOne

After reading all that I've done and again carefully reading epek's answer I must admit he was right.

Setting the PCP bits to 6 (Internetwork control) I see ARP replies.
but… 'normal' traffic has a PCP of 0. The reply should have PCP 0 also.
This makes it more or less unworkable. all network control traffic should have PCP 6 and regular traffic should have PCP 0

I tested this by starting a ping.
Since the firewall doesn't know the MAC address of the opposite switch it will send an ARP Request with PCP 6
The switch replies with an ARP Reply having PCP 6, but since the firewall has PCP 0 on the vlan, the traffic is dropped.
During the ping I change the vlan PCP to 6. This makes that the ARP Reply is accepted and the ping starts running, with PCP 6.

Then I stop the ping and start a SSH session from the firewall console to the switch.
This won't work because the SSH session is initiated with PCP 0 from the firewall. The vlan PCP is 6 because I just configured it for the ARP.
After setting PCP to 0 on the vlan the SSH works but only until the ARP cache times out. then it starts the ARP requests again demanding PCP 6

Currently I made a work-around by using a switch with an ACL that reset PCP to 0 for all traffic.

This is a unworkable situation.

Sometimes your own knowledge is bothering you.
Being a network engineer for an carrier ethernet vendor I expected the regular behavior, the PCP bits do not have any relation to the acceptance of the traffic.
I must agree with epek the current implementation is a bug or at least a misinterpretation of the .1q standard.

The pcp bits are used to tell the switch the traffic has a certain level of priority. traffic with a high pcp value should have precedence over lower values.
Depending on the pcp bits traffic is/can be directed to a specific queue.

@ndre

epek

@avink:

After reading all that I've done and again carefully reading epek's answer I must admit he was right.

I have to admit, I would have preferred not to be right.
Your observations underline my assumption, that something in these pcp patches has gone awfully wrong.
It somehow reminds me of ECN problematic of the late 90s/beginning millenium.

@avink:

Setting the PCP bits to 6 (Internetwork control) I see ARP replies.
but… 'normal' traffic has a PCP of 0. The reply should have PCP 0 also.

Some providers - as far as I have read about it - have deliberately chosen to set priority tags for their IPTV services.
As long as this is also separated by vlans, it should not matter. Just set the vlanpcp for just the vlan interface in question.
Other traffic should arrive on the other vlan, and may stay pcp-tagged zero.

In my scenario, I would have to do some bridging between an untagged port on openwrt and the vlan on wan. :-/

@avink:

…

Currently I made a work-around by using a switch with an ACL that reset PCP to 0 for all traffic.

I tried this too, but was unsuccessful. - Cisco SLM200-8T

@avink:

This is a unworkable situation.

I absolutely agree.

@avink:

Sometimes your own knowledge is bothering you.
Being a network engineer for an carrier ethernet vendor I expected the regular behavior, the PCP bits do not have any relation to the acceptance of the traffic.
I must agree with epek the current implementation is a bug or at least a misinterpretation of the .1q standard.

But not only here… See Openwrt - why do untagged ports send pcp 1 instead of 0?
I fear, that this problems will arise as soon as more OSs start supporting PCP instead of ignoring it.

@avink:

The pcp bits are used to tell the switch the traffic has a certain level of priority. traffic with a high pcp value should have precedence over lower values.
Depending on the pcp bits traffic is/can be directed to a specific queue.

While tags and meaning differ in case of '0' and '1' in respect of 802.1q/p …
PfSense has rewrite functionality built into the web interface, but it won't work. (Values are always identical for in and out after the settings have been saved. I guess, that a newly introduced default pf-rule is the culprit, not the patch itself. Being a newbie to pfSense and FreeBSD, I have not yet figured it out.

Epek

P.S. Thanks Andre for filing this bug report: http://redmine.pfsense.org/issues/2613

epek

I have not received an answer of any kind from a developer yet.

Bump

dotOne

It's very quiet indeed.
Still waiting for a solution.

dhatz

Have you tried latest 2.1-BETA build?

The ticket has been marked as having been fixed.

dotOne

When I checked tonight (CET) there wasn't a new build yet.
Let me check.

It still sys you're on the latest version

Version 2.1-BETA0 (amd64)
built on Mon Aug 27 14:57:37 EDT 2012
FreeBSD 8.3-RELEASE-p4

You are on the latest version.

podilarius

Should be a new version out there for you.

dotOne

No, unfortunately still no new version.

2.1-BETA0 (amd64)
built on Mon Aug 27 14:57:37 EDT 2012
FreeBSD 8.3-RELEASE-p4

You are on the latest version.
This was on Friday 08:00 CET

Still waiting….:(

phil.davis

The 32-bit nanobsd has had regular builds the last few days:

Current version: 2.1-BETA0
  NanoBSD Size : 2g
       Built On: Thu Aug 30 02:36:04 EDT 2012
    New version: Thu Aug 30 06:26:55 EDT 2012

So maybe something is going wrong building 64-bit?

podilarius

Must be the embedded, full is there.

2.1-BETA0 (amd64) 
built on Thu Aug 30 06:54:02 EDT 2012 
FreeBSD 8.3-RELEASE-p4

dotOne

Then something must be wrong on my side.
I still get that I'm on the latest version…. let's do the update by hand.

Updated without problems.
This afternoon I will test the PCP issue.

dotOne

I can confirm that the issue has been fixed.
I will do more elaborate testing this weekend with different PCP's and PCP combinations.
For now it looks promising.

@ndre

epek

Confirmed. '2.1-BETA0 (amd64) built on Fri Aug 31 11:22:13 EDT 2012' seems to work.
Thanks to everyone involved!
What exactly went wrong?

Update: the web interface for special rules for 802.1p still does not save different values. So shifting pcp on incoming/outgoing packets is still unsupported (through the gui).

frater

Have you tried turning off VLAN_HWTAGGING ???

I have an Atom based machine with 2 Intel NICs and vlans don't work until I turn this feature off.

I had to put this in a cronjob:

ifconfig em0 | grep -q VLAN_HWTAG && ifconfig em0 -vlanhwtag  
ifconfig em1 | grep -q VLAN_HWTAG && ifconfig em1 -vlanhwtag

Worth a try….

dhatz

@frater:

Have you tried turning off VLAN_HWTAGGING ???
I have an Atom based machine with 2 Intel NICs and vlans don't work until I turn this feature off.

I had to put this in a cronjob:
ifconfig em0 | grep -q VLAN_HWTAG && ifconfig em0 -vlanhwtag  
ifconfig em1 | grep -q VLAN_HWTAG && ifconfig em1 -vlanhwtag  

This is an issue that should be investigated …

What is the exact model of your Intel NIC (output of dmesg) and your mainboard ?

frater

@dhatz:

This is an issue that should be investigated …

What is the exact model of your Intel NIC (output of dmesg) and your mainboard ?

I brought it up before ( http://forum.pfsense.org/index.php/topic,52224.0.html ) and made a bug report asking for an option to turn off hardware vlan tagging.

http://redmine.pfsense.org/issues/2577

I would welcome some follow-up, but don't want to hijack this thread.

dotOne

Well, This is the output of dmesg regarding to my nics:

em0: <intel(r) 1000="" pro="" network="" connection="" 7.3.2="">port 0xcc00-0xcc1f mem 0xfe7e0000-0xfe7fffff,0xfe7dc000-0xfe7dffff irq 18 at device 0.0 on pci3
em0: Using MSIX interrupts with 3 vectors
em0: [ITHREAD]
em0: [ITHREAD]
em0: [ITHREAD]
pcib4: <acpi pci-pci="" bridge="">irq 19 at device 28.3 on pci0
pci4: <acpi pci="" bus="">on pcib4
em1: <intel(r) 1000="" pro="" network="" connection="" 7.3.2="">port 0xdc00-0xdc1f mem 0xfe8e0000-0xfe8fffff,0xfe8dc000-0xfe8dffff irq 19 at device 0.0 on pci4
em1: Using MSIX interrupts with 3 vectors
em1: [ITHREAD]
em1: [ITHREAD]
em1: [ITHREAD]
em2: <intel(r) 1000="" pro="" legacy="" network="" connection="" 1.0.4="">port 0xec00-0xec3f mem 0xfebe0000-0xfebfffff,0xfebc0000-0xfebdffff irq 18 at device 4.0 on pci6
em2: [FILTER]
em3: <intel(r) 1000="" pro="" legacy="" network="" connection="" 1.0.4="">port 0xe880-0xe8bf mem 0xfeb80000-0xfeb9ffff,0xfeb60000-0xfeb7ffff irq 19 at device 6.0 on pci6
em3: [FILTER]
em4: <intel(r) 1000="" pro="" legacy="" network="" connection="" 1.0.4="">port 0xe800-0xe83f mem 0xfeb20000-0xfeb3ffff,0xfeb00000-0xfeb1ffff irq 16 at device 7.0 on pci6

em0 and em1 are the on-board nics. em2..5 are the nics on the expansion board.

VLAN_HTWAG is off by default. I checked again to be sure.

@ndre</intel(r)></intel(r)></intel(r)></intel(r)></acpi></acpi></intel(r)>