Best way to find out who (and what) needs to be shaped?
We would like to hear from people on this list what they would consider as the best way to find out who (source ip address) and what (dest ip address & port) needs to be shaped.
Let me explain: one of our customers has a 2 mbits/s internet circuit and from time to time it gets overloaded. Since the overload is not permanent - it lasts only a few minutes (or hours sometimes), we've been trying to find a way in pfSense 2.1 to pinpoint exactly who (and for what) is using all the available bandwidth at that specific time.
We've been playing with pfTop and bandwitdthd, but they don't seem to provide a straitghtforward answer - or maybe we need guidance on how to properly use them. :-[
Anyway, any help would be appreciated.
If it is mainly HTTP traffic which clogs the pipes, you might try to install Squid plus (optionally) some Suid monitoring package. With Squid in transparent mode, you'd be able to see (and log…) all HTTP requests.
Note that this is a violation of privacy unless the users are informed before you take such a measure!
For other kinds of traffic, I'd try some bandwidth monitor. I think bandwidthd can show traffic by internal IP, so you might be able to track down the machine which bogs down your network. "Hi, do you have an idea why your machine bogs down the entire networks during the lunch break?" - "Can you wait until I have finished downloading this collection of funny cat videos?"
Both approaches are actually rather blunt weapons, but quick to deploy and maybe a bit more helpful than staring at pftop and trying not to blink.
Thank you Klaws for your suggestions, but i'm afraid they won't be of that much help in our situation.
We already have squid in place working in proxy mode, not transparent. Logs are analyzed using SARG (https://sourceforge.net/projects/sarg/), but these reports are inherently inaccurate regarding spent user time and exchanged bytes. We are trying to take a global and single look to all protocols in use. With squid, as you mentioned, its all about HTTP.
Bandwidthd is what we have been using so far with little success due to the fact that its meant to store historical data, not realtime or at least some smaller time periods, like seconds, minutes or hours (please correct me if i'm wrong). The smallest time window available in bandwidthd is a day. So, it wont won't help us much. Plus, our pfSense appliances only have 4 GB of disk space in a compact flash card to store bandwidthd's log files.
With pfTop, to see the top data exchangers, we are sorting reports by their 'Rate'. Would that be the best way to detect these bandwidth suckers?
And what about pfflowd? Could we get something like this in pfSense?
It's been a few months since I had to use bandwidthd myself, but I vaguely remember that it could display a graph covering two days or so, with, dunno, five minute intervals or so.
Below page seems to confim that, there's a screenshot of a graph which looks like that::
Of course, still by host only, not by protocol or anything, but maybe a good starting point. A 4GB should be sufficient to collect log data for a few days or weeks (assuming that you have a rather lean pfSense installation), so this is a solution which is not totally suited for permanent use in your situation.
The page mentions above also mentions darkstat. I have no experience with that.
pfflowd is available as a package in pfSense. It is meant to supply data to another machine,w hich runs some Netflow-compatible monitoring software. Probably a reasonable solution if your routers cannot store the monitoring data by themselves. However, I have no experience with pflowd and no significant experience with any monitoring tools out there. I don't even know if Cacti natively supports Netflow (was annouoce at some time) or if it has to load a plugin for that.
Just for kicks, I installed the softflowd package on pfSense. Thought settings up the netflow emitter and some free, perhaps even open source monitoring software on a Windows or Linux VM would be done in five minutes. Ten, max.
Well, not quite. Tried Cacti first. Installation on Ubuntu 13.x went smooth, then I downloaded and installed the netflow plugin. The required procedure differed from the documentation, so it took some time. Then cacti began throwing errors. After quite some, way beyond the five minute estimate, and way beyond the ten minute estimate as well, I gave up.
After searching for some simple solution other than PTRG on a Windows box, I finally gave in and downloaded and installed PTRG on a Windows 8 VM. The website claims that you can get PTRG running within two minutes. Yep, I guess I can confirm that. But had like five minutes of major annoyance because the Microsoft concept of enabling even the dumbest user to work as efficiently as a pro user with 25+ years of experience works very well on Windows 8. Obviously not by elevating the skills of the below average user, but by lowering the skills of the pro user to the level of an idiot.
I chose the freeware license for PTRG, not the 30 day full trial one you automatically receive when you download. Initial setup via the "Guru" went smooth, and PTRG started discovering all kinds of network devices, besides pfSense. I quickly deleted the auto-discovered group, and also most sensors for the local node PTRG runs on. As the freeware license has a ten sensor limit, and the local node already adds quite a few sensors, I got rid as many automatically added sensors I could delete. Three local node sensors cannot be deleted, leaving seven for pfSense. I had SNMP running on pfSense, so I had quite a few auto-added sensors there as well, which I also deleted. softfowd on pfSense was already configured to send to the Windows VM's IP, port 2055, Netflow V9. PTRG hasn't auto-discovered that, so I manually added a sensor for flows. Quite simple, and, to my surprise, worked right away.
After waiting some time, I then checked the data the Netflow sensor had collected. Well - it was actually useful. I discovered quite some traffic from some stupid advertising server I hadn't blocked yet. Nifty, although the kind of presentation in PTRG may take a few minutes to get used to. Haven't checked if it offers drill-down, though.
Nice to see interest in this. I've been asking about this very need. A way to see real-time, by time range (shorter than 4 hrs), by protocol by rate that includes the local IP generating the traffic. I'm only interested in monitoring outbound. Currently using nTop, PfTop, packet capture and wireshark in combination. Not a very user friendly approach at addressing this need.
Okay…just noticed that I had a 100% success rate in mistyping PRTG in my previous post ;)
Anyhow, my Windows 8 VM went belly-up. Some sort of activation issue. Oh man, how I hate to be a legitimate Microsoft customer (I won't admit to be a Microsoft Partner, actually).
Solarwinds is an application recommended by quiet a few people. I think the freeware license limits it to provide a 24 hours rolling backlog only - which would be okay for real time analysis. However, I have no experience with it. I vaguely remember that I installed it once, but either I disliked it or ran into some trouble.
Got a bit fed up with the available tools when trying to analyze traffic pattern myself. Yep, there a nice tools which provide cool dashboards and graphics and everything, cost more than a new car, and still provide no means drill down beyond the colurful praphics. There are rather primitive tools, which cost slightly less than a new car. And there a free tools, which plainly suck, even if you manage to get them to compile without errors.
Why is there no tool which just collects the raw data and shoves it into a database, so the network admin can create the reports he needs with a few simple SQL queries?
Additionally, in my case, I happen to have to have a WIndows Server box runnings 24/7, so a Windows tool might be nice.
So, here's a simple NetFlow v5 datagram collector which shoves the raw data via ODBC into a database: http://pfsense.stock-consulting.com/
It's accompanied by a batch of simple SQL statements to provide a starting point to write a useful report. It's written for MS SQL Server, because that was already up and running. I guess other RDBMSes may work with the collector tool as well, as long as they support ODBC. However, the sample report is rather specific to MS SQL. I suspect rewriting it for Oracle might suck hard, as timestamp arithmetic is not as straightforward as in MS SQL-
PRTG will give you a free 30 sensor license if you put a link to their site on a public webpage fyi.