How to know what websites generate traffic?



  • Hi,

    I have to diagnose what generate WAN traffic on my network. I installed ntop, and now I know that most traffic come from HTTP. Now, how can I know what websites a generating this traffic? Is there any plugin for that?

    Thank you,

    Charles.



  • You should be able to see the hosts that are using the traffic using ntop, without the need for anything else.



  • The problem is the the host generating almost all the traffic is the IP of my ISP gateway, an address in 24.0.0.0 range. This is why I'm not able to use this information.

    Charles.


  • Rebel Alliance Developer Netgate

    If you want to know which website is being used a lot, you'll need to run your users through a proxy like squid to actually get the details of where they are going. Knowing an IP of a web server doesn't tell you much, there could be hundreds or thousands of sites run off a single IP address.



  • I already use squid proxy. But lightsquid reports me around 1 to 3 GB of traffic / day, when actual data transfer is around 10 to 20 GB / day. I'm not able to see where the difference comes from.


  • Rebel Alliance Developer Netgate

    If you are using squid in transparent mode, then it could be https traffic. If you are configuring squid directly on the clients, then it could be other things - not web sites. Things like streaming audio, bittorrent, etc.

    For that you'd need something like ntop to break it down by port/protocol



  • ntop tells me it is http traffic. Is it possible it could be some kind of traffic like video streaming or p2p, but using port 80? If so, how could I detect that?

    Charles.


  • Rebel Alliance Developer Netgate

    It could be, though if it's on port 80 and you have squid enabled in transparent mode, it should be logging the traffic no matter what.

    To be sure you'd have to get a packet capture of the traffic when it's going and inspect it in something like Wireshark



  • I still don't understand what is happening. Yesterday, I had 66 GB of traffic on WAN. Most of it is recorded as http download in ntop. However, lightsquid only reports me 1GB of data for the same day. squid is working as transparent proxy. Is it that lightsquid doesn't report all traffic? Is it possible someone bypass squid?

    Another strange thing is that Bytes In on my WAN interface is 6 times greater than Bytes Out on my LAN interfaces. Is it normal?

    Help!


  • Rebel Alliance Developer Netgate

    Do you have squid set to try to cache windows updates or anything similar? I have heard of that happening in that situation, something in squid is trying to (re?)download the items into the cache.



  • Hi,

    I mentioned this behaviour with squid in the past.
    I used squid to cache windows updates and I was using "range_offset_limit -1" which forces squid to download the complete file instead of only a part the client requested. This will be good for Windows Updates but could be a problem for other sites and files which will be completely cached, too, even if the client aborted the download, browsed away from a site or stopped stream.

    I reverted back to "range_offset_limit 0" because before I hade WAN download 30GB and LAN only 12GB.

    You should google for this squid options:

    quick_abort_min 0 KB;
    quick_abort_max 0 KB;
    quick_abort_pct 70;
    

    I am now using "quick_abort_pct 70;" which means that if a file is downloaded 70% or more, squid will finish the download. If the download of a client is less than 70% of the whole file, squid will abort the download.

    It would be interesting, if I could use "range_offset_limit -1" for only windows updates and "Range_offset_limit 0" for all other downloads. But I didn't find an answer on google.


  • Rebel Alliance Developer Netgate



  • @jimp:

    I added those last bits here:

    http://doc.pfsense.org/index.php/Squid_Package_Tuning#Tweaking_Update_Caching_.2F_Squid_seems_to_download_on_its_own
    

    There was a little misunderstading of my post:

    You have to use :

    quick_abort_min 0 KB;
    quick_abort_max 0 KB;
    

    or

    quick_abort_pct 70;
    

    Both together will not work. It will first use quick_abort_min / quick_abort_max and if this is not in config it will use quick_abort_pct

    I copied this from squid-cache.org. This explains it in detail:

    	If the transfer has less than 'quick_abort_min' KB remaining,
    	it will finish the retrieval.
    
    	If the transfer has more than 'quick_abort_max' KB remaining,
    	it will abort the retrieval.
    
    	If more than 'quick_abort_pct' of the transfer has completed,
    	it will finish the retrieval
    

    In my case I am using:

    
    quick_abort_pct 70;
    range_offset_limit 0;
    
    

    PS: If range_offset_limit ist set to -1 the quick abort options will NOT work.


  • Rebel Alliance Developer Netgate

    Fixed.



  • As mentionned in http://forum.pfsense.org/index.php/topic,38406.0.html , the given regular expressions no longer work in pfSense 2. Also, I'm crossing my fingers, but all my traffic problems seem to be resolved since I removed the range_offset_limit -1 option. Maybe it's not a good idea to keep it on the wiki…


  • Rebel Alliance Developer Netgate

    I added that one to the GUI as well.

    If someone wants to go through and test all that out and recommend a good sane default, the docs can be changed.


Locked