DiffServ Code

markn62

So did the issue discussed at; http://forum.pfsense.org/index.php/topic,63580.0.html get resolved?

It seemed the DiffServ Code Point dropdown was going to be reduced to choices CS1 through CS7 and the AF's and TOS 0x0? were going away. I still see all of them in the final 2.1 release. So do I have to test all of them to determine which ones work or is it documented? I just realized today the EF code was catching traffic it shouldn't. For example, port 123 packets had an EF tag. Unless the Unbound package tags EF then something isn't quite right. Tried CS7 which seemed to behave like EF.

Thanks.

markn62

It seems apparent to me the DiffServ codes still don't work. I put CS5-7 along with an EF rule, all quick, and the same connection hits all four rules within the same second and to the same destination address, see attached.

Best I can tell any DiffServ setting hits on any DiffServ value. And yes, I did clear states after modifying rules.

ScreenShot002a.JPG_thumb

markn62

Nobody uses DiffServ for shaping? I find that hard to believe.

Klaws

@markn62:

Nobody uses DiffServ for shaping? I find that hard to believe.

Because of the Great TOS/DiffServ/ECN Fuckup, everyone now uses VLANs to tag and prioritize traffic. Or they do it by port numbers, if they need to filter by application.

Of course there are PBXes which use DSCP to tag their VoIP traffic. And some application which use TOS. And sometimes VLANs are too cumbersome. In these cases, most admins appear to draw the "I don't care" card.

I was dumb enough to care.

On the positive side, I belong to the exclusive club of DiffServ/TOS users (last member count was three; that includes you and me).

Some background: the reason why I tackled the DSCP issue was that the beta of pfSense 2.1 blew up with very visible error messages in the WebGUI (plus the firewall/BAT rules wouldn't load completely). Apparently, I was the only person using DiffServ on 2.1 before it was released. However, I was running the 2.1 beta/RCs not in a production environment, and I had no time to setup a complete test rig, so I tested mostly for the absence of negative side effects and for (very) basic functioning.

During the last few months, I was moving home and office, so today's teh first time after a few months that I have had a look into the pfSense forums. Today I also managed to upgrade a productionn system to 2.1. Finally!

So maybe I'll have time to look into this DiffServ issue during the next few days.

markn62

Sure hope you can Klaws. It's sorely needed. I have CPE's that tag traffic at the edge network and could utilize the tags to prioritize if PfSense would act on them accordingly. For now, only have dst port and ip to match to queues. L7 is too hit-n-miss.

Klaws

Unfortunately, I currently have no time to give this issue the required attention. So far, I've only done static code analysis.

Now, if anyone already has a "DSCP test environment" and is willing to help me with some testing…

http://pfsense.stock-consulting.com/

These are standard i386 builds. The file http://pfsense.stock-consulting.com/pfSense-Full-Update-2.1-RELEASE-i386-20140120-1658.tgz can be used for a manual firmware upgrade; I tried it on a freshly setup pfSense 2.1 VM and firmware update and very basic operation worked. This is an unsigned image, so pfSense will ask for additional confirmation before the upgrade starts.

Update: I have deployed the update image to my live pfSense box. Everything works as usual, so I guess I got the build right. No time for DSCP tests yet, though,

Klaws

While I'm on it, I took care of the missing VA option in the pf kernel module as well. Just in the kernel, not in the GUI, but adding it to the GUI is a piece of cake (can be done via the WebGUI of a live system - which I actually did for testing).

Here's the updated updater, unsigned of course again: http://pfsense.stock-consulting.com/pfSense-Full-Update-2.1-RELEASE-i386-20140126-1152.tgz

I mentioned testing in my second sentence. Yes, I have managed to find time to test my changes this time. After verifying that the updater worked on a test VM, I updated my live prod system to perform a "real life test". Which went through successfully.

Well, I had to wrestle with Windows 7 first.

What I did for the first basic test was to launch gpedit.msc on my Win7 machine, navigate to Computer Configuration, Windows Settings, Policy-bases QoS and to generate some policies with different DiffServ Code Points (one of them for "all traffic", assigned DSCP CS2). I then rebooted the Win7 machine, just to be sure.

After that, I could see that traffic indeed went into the CS2 queue of my pfSense box. Hooray! What didn't work were the application-specific policies. I suspect that was Window's fault.

While I was adding new floating rules on the pfSense end, I suddenly noticed that teh traffic through the CS2 queue dropped to zero and everything going though the default queue again. Whatever I tried, I couldn't get it working again. What had happened? I had no idea. Anyhow, I finally got the brilliant idea to do a packet capture. And discovered that all packets coming from my Win7 machine had a DSCP value (ToS value) of 0x00. That's definitely not CS2, and it was therefore toaly correct that all traffic landed in the default queue.

But why has my Win7 box suddenly reverted to DSCP 0x00 instead of CS2 (which had worked perfectly initially)? I had not changed anything. A few Google searches later, I had learned soem new myths about Windows 7 and DiffServ.

First rumor: DiffServ is only active if the machine is a domain member.

This is of course 100% BS, as my machine is just a workgroup member and DSCP had worked.

Second rumor: Windows 7 automatically finds out if your network supports DiffServ/QoS and deactivates it if this is not the case.

This is of course 300% bullshit. First of all, how can Windows find out if a network supports DSCP? Well, it can't. My pfSense installation supports DSCP, but it is configured only for egress traffic. Oh, you might thing up some pretty convoluted scheme where Windows tries to fill queues by generating dummy traffic with different DSCP values…but that's 100& BS.

Second, why should Windows deactivate DSCPs if the network doesn't support them and there would be zero impact if packets are DSCP-tagged anyhow? Another 100% of BS.

And then, why should anyone introduce another layer of complexity into a system, which is totally useless and can do nothing else but cause headache and random failure? Yup, here's the third 100% BS.

Apparently, the observed behaviour could be described by assuming that 300% of BS is just what some Microsoft engineer would think up. I mean, my Win7 box did generate DSCP-tagged traffic for some time and then suddenly decided to cease to do so. On its own. Go figure.

Whatever. If there's a dumb, um, "feature" in Windows which needs tweaking, the primary user interface is regedit. So I tried some registry settings I compiled from diverse rumors and conspiracy theories:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\Tcpip\Parameters]
"DisableUserTOSSetting"=dword:00000000

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\Tcpip\QoS]
"Do not use NLA"="1"

and rebooted, and suddenly my traffic from the Win7 machine appeared in the correct queue again.

Oh boy. I mean, am I testing a pfSense kernel patch or is it more like Win7 debugging?

Anyhow, I could confirm at least that my patches worked.

Klaws

Yesterday, I tried to install the experimental firmware into a prod environment where the WAN side is connected via PPPoE via VLAN7. Result was a complete failure of Internet access, with me having no clue what went wrong, not even after looking into the logs. However, I could confirm that the physical WAN port worked, as I could access the modem's web interface (untagged WAN, that is without VLAN). So I went back to 2.1 (simply restored the backup made during the firmware upgrade process).

While the restore was running, I scrolled down the web page with the modem's status, noticing that the SNR was down way below -3000dB. Yup, that was more than three thousand decibel below zero.

It appears that the modem broke while pfSense was rebooting after the firmware upgrade (I later confirmed that a capacitor was blown).

Anyhow, I decided that it was definitely time to replace the modem, so I did. Well, took me some time to get to the modem, as someone has put much effort in providing a nice new decoration in the corridor, which unfortunately was in the way between me and the modem. No big deal getting though the decoration, though.

I then discovered that the decoration was there for a purpose. Which was to hide the pile of trash which someone had placed with scientific precision in front of the infrastructure racks. Luckily, I had my trusty flashlight with me, as someone had decided to cut electrical power to that area, presumably in order to "save energy because the trash doesn't need lighting". But at least I now knew that it had been smart to power our infrastructure equipment via a UPS located in a totally different area of the building, with independent power.

But I'm getting carried away.

After downgrade to plain 2.1 and swapping of the modem (and confirming that everything was back to normal, except that the modem had the wrong firmware version), re-upgraded to my experimental version again.

About two minutes after the reboot, the replacement modem decided to experience a mechanical power switch failure. I have a string opinion about power switches on infrastructure devices - when I want to power down a device which usually runs 24/7, I can tolerate to simply pull the plug. But a power switch adds an additional point of failure.

I, of course, was already restoring to 2.1 again. I mean, spare modem, right out of the box, and failure within minutes of installing it? Not bloody likely. Whatever, I had by then decided to call it day and drive home. No more testing after fixing the spare modem.

But I'm getting carried away.

Let's get to the test result: it appears that in the two minutes or so during which my experimental version was up and the spare modem was still working, pfSense could notestablish the PPPoE connection. I know that this is not what one would call a solid result, but it's enough to warn against use of my version in a production environment which uses PPPoE or VLANs.

I does though reliably work in my home pfSense box, where the WAN connection uses DHCP (no VLANs).

Klaws

Okay, just figured out some, hm, dumbness of mine.

I was assuming that I was working on the 2.1 release branch. I wasn't, I was working on the 2.1.1 pre-release branch. So above images/updates are not based on stable releases. Use with caution.

markn62

You've had quite the unusual setbacks Klaws, sorry to hear that. What of DiffServ are you trying to test?

Klaws

The main issue with DiffServ under 2.1 was, according to your report, that some rules matched even when the DSCP values in the rule and the traffic were different. Thank you for your detailed error report, that set me off into the right direction right away.

So what I needed to test was (a) that I didn't break any vital functions of pfSense and (b) that the DSCP values now match exactly.

Was able to perform limited testing for both cases, so I'm now pretty confident that my changes are okay. However, I have not done real stress tests with all kinds of traffic. I have not, for example, sent real-life VoIP traffic.

The "real concern" is probably not that my changes break something, but that something slipped my attention - or that of the person who originally wrote these kernel patches.

As my builds are based on 2.1.1, which is currently evolving, I provide new (mostly untested) builds ever few days.

I have opened four pull requests for my changes (8.3 kernel patch - for the 2.1.* versions, 10.0 kernel patch - for future version of PfSense, AFAIR, FreeBSD 9.* will never be used for pfSense, a GUI patch for the 2.1 branch and the same for the master branch for pfSense 2.2 and later).

markn62

Wasnt sure if you were still working on the DSCP values or other shaping issues. Glad to hear you're tackling the DiffServ issue. Wish I understood the underlying code better like you do. If I can test or help in any way, let me know. I have a non-production box I can evaluate with.

Klaws

Some background:

I am not a member of the pfSense development team. I am just contributing to the DiffServ feature because I use it and found it broken in the 2.1-prereleases. Might have been broken since a longer time, though.

FreeBSD doesn't support DiffServ. So a kernel patch is required to add this feature to pf. I have no idea where these patches originally came from, but I found some issues there - especially that the GUI code of pfSense (the PHP GUI code) did fit together with the kernel code. So that was what I initially fixed and which went into the 2.1 version.

This version then had the issue pointed out dead on target by markn62. It appears that his issue was built into the original kernel patches on purpose, as the TOS code (which is not used in pfSense) also shows similar alteration. I suspect that the original author tried to separate classes (CS1..CS7) from the other bits in the TOS byte - for both TOS and DiffServ. While this might have made sense for TOS, it makes however about no sense for DSCP. And it violates the RFCs. Plus, it didn't work correctly anyway, causing, for example, a CS7 rule to match everything from C1..CS7 (plus all AF codepoints and VA/EF), while a CS2 rule would match CS2, CS3, CS7 and CS7 (plus some AF codepoints). Not useful for real-life situations, and totally not intuitive, IMHO. Nevertheless, I re-checked all relevant RFCs, which specified only exact matching of codepoints, nothing else.

So the core of my recent edits was simply to change a few & into a few ==. I also added the VA codepoint, which I had accidentally left out of the edits for 2.1.

Simple changes, really.

Now, if a developer talks of "simple changes, really", do not trust him. So, first thing I did was not to trust me, and I still don't.

However, Ermal appears to have trusted me, as he merged my pull requests two days ago. So my changes are now in the official pfSense 2.1.1 prerelease snapshots.

markn62

That's great news Klaws and thanks for taking the time to share your background on this.

Looks like the most recent 64-bit full is dated the 5th so your changes should be in file;
pfSense-Full-Update-2.1.1-PRERELEASE-amd64-20140205-1408.tgz ?

I'm suprised more aren't interested. I'll give it a test drive first chance I get.

Klaws

I currently have no idea what's going on the "64 bit front".

An easy way to spot if my changes are included is to check if the DSCP list (in the WebGUI, Firewall - Rules - add new rule via teh plus sign - DiffServ Code Point - Advanced) contains the VA code point.