Resolving IP addresses from media providers

dotOne

Being extremely privacy and security all outbound connections I use are through VPN's
Most if not all media providers (Netflix, Amazon Prime, etc) are allergic to VPN connections because they don't own the rights to provide each movie and tv serie in all countries.
One of the options I have is to use a VPN provider that is not blocked by them.
Since I'm also a big supporter of IPv6, this is pretty much impossible. The majority of VPN providers do not support IPv6.

In the end I decided to to actively collect the IP addresses from the media providers and use them in pfBlockerNG IP match lists.
This way I can select the WAN gateway for this traffic instead of the default VPN gateway.
My initial process was pretty straightforward:

Collect FQDN's used by the media providers
Resolve the FQDN's to IP addresses
Add these IP addresses to pfBlockerNG match lists
create a FW policy with the IP match list to select the correct gateway.

Initially I used a TAP and a packet broker to capture the traffic from my Apple TV's and filter out the DNS requests and HTTPS client hello packets to capture all the FQDN's
While it worked pretty well, it is a cumbersome and laborious process and while it provided a lot of information, it definitely wasn't the way forward.
My second approach was using tcpdump on the firewall and capture all DNS resolve requests sent by the Apple TV's.
This worked pretty well but I found tcpdump not particularly flexible, especially if you know tshark is much more convenient, so I abandoned this method as well.
Then I realised that all DNS requests are redirected using the NAT list to the unbound dns resolver on the firewall, so this would make a great source for all FQDN's

A script can continuously read the unbound log file, extract the FQDN's and save them in a per provider file.
Having the FQDNs I only have to resolve them to get the IP addresses.
It turned out that Amazon provides different IP addresses every time an A or a quad A request is received. So, to get all possible IP addresses, multiple requests are required.
In the end I created to scripts. The first reading the unbound log file for the required domain names and the second one that reads all domain names from the files created by the first script and periodically resolving the IP addresses and providing them to pfBlockerNG

The first script:

#!/usr/local/bin/bash
#
# name: extractMediaDomains
# extract the FQDN's from the media providers and store them in a per provider file
#
#
declare -A PROVIDER
PROVIDER+=(["Netflix"]='(netflix\.com|nflxvideo\.net|nflxso\.net)')
PROVIDER+=(["AmazonPrime"]='(amazon\.com|amazon\.co\.uk|amazonvideo\.com|media-amazon\.com|aiv-cdn\.net|aiv-delivery\.net|pv-cdn\.net|cloudfront\.net|llnwnd\.net)')
PROVIDER+=(["NPOplus"]='(npo\.nl|npoplayer\.nl|bitmovin\.com|nepworldwide\.nl|scalia\.network|streamgate\.nl|2cnt\.net|npostart)')

TMPDIR='/local/db/pfbng'
LOCK="${TMPDIR}/lock"
EXT='Domains.txt'
PIDFILE="${TMPDIR}/${0##*/}.pid"

[ -d ${TMPDIR} ] || mkdir -p ${TMPDIR}

# Check if we are not already running
[ -f ${PIDFILE} ] && PID=$(cat ${PIDFILE})
if [ ! -z ${PID} ] && [ $(ps | grep -c "^${PID}") != 0 ]; then
        echo "Already running"
        exit
else
        echo $BASHPID > ${PIDFILE}
fi


function process() {
        while IFS= read -r DOMAIN
        do
                # Check for termination request
                if [ -f ${TMPDIR}/terminate ]; then
                        rm -f ${PIDFILE}
                        exit 1
                fi

                if [ ! -f ${LOCK} ]; then
                        for INDEX in ${!PROVIDER[@]}
                        do
                                FQDN="${PROVIDER[${INDEX}]}\.$"
                                if [ ! -z $(echo ${DOMAIN} | grep -E "${FQDN}") ] && [ $(grep -c ${DOMAIN} ${TMPDIR}/${INDEX}${EXT}) == 0 ]; then
                                        /usr/bin/logger "Adding domain ${DOMAIN} to ${INDEX}"
                                        echo ${DOMAIN} >> ${TMPDIR}/${INDEX}${EXT}
                                fi
                        done
                fi
        done
}


# Create output files if they do not exist
for INDEX in ${!PROVIDER[@]}
do
        [ -f ${TMPDIR}/${INDEX}${EXT} ] || touch ${TMPDIR}/${INDEX}${EXT}
done

tail -F /var/log/resolver.log | grep -E ".*resolv.*[[:blank:]](A{1}|A{4})[[:blank:]]IN" | cut -d" " -f 11 |process

the second script:


#!/usr/local/bin/bash
#
# name: resolvMediaDomains
# resolve the FQDN's provided by the extractMediaDomains script and feed them to pfBlockerNG
#
#
PROVIDER=("Netflix" "AmazonPrime" "NPOplus")
TMPDIR='/local/db/pfbng'
TMPv4="${TMPDIR}/IPv4.tmp"
TMPv6="${TMPDIR}/IPv6.tmp"
LOCK="${TMPDIR}/lock"
PFBDIR='/var/db/pfblockerng'
EXT='Domains.txt'
PIDFILE="${TMPDIR}/${0##*/}.pid"

[ -d ${TMPDIR} ] || mkdir -p ${TMPDIR}

# Check if we are not already running
[ -f ${PIDFILE} ] && PID=$(cat ${PIDFILE})
if [ ! -z ${PID} ] && [ $(ps | grep -c "^${PID}") != 0 ]; then
        echo "Already running"
        exit
else
        echo $BASHPID > ${PIDFILE}
fi


IFS=$'\n'

# Create output files if they do not exist
for INDEX in ${PROVIDER[@]}
do
        [ -f ${TMPDIR}/${INDEX}Domains.txt ] || touch ${TMPDIR}/${INDEX}Domains.txt
        [ -f ${TMPDIR}/${INDEX}IPv4.org ]    || touch ${TMPDIR}/${INDEX}IPv4.org
        [ -f ${TMPDIR}/${INDEX}IPv6.org ]    || touch ${TMPDIR}/${INDEX}IPv6.org
done

while true
do
        touch ${LOCK}
        for INDEX in ${PROVIDER[@]}
        do
                # Check for termination request
                if [ -f ${TMPDIR}/terminate ]; then
                        rm -f ${LOCK}
                        rm -f ${PIDFILE}
                        exit
                fi

                # cleanup temp files for next itteration
                echo > ${TMPv4}
                echo > ${TMPv6}

                # logger "Resolving ${INDEX} media hosts"
                [ -f ${TMPDIR}/${INDEX}IPv4.txt ] && cp ${TMPDIR}/${INDEX}IPv4.txt ${TMPDIR}/${INDEX}IPv4.org
                [ -f ${TMPDIR}/${INDEX}IPv6.txt ] && cp ${TMPDIR}/${INDEX}IPv6.txt ${TMPDIR}/${INDEX}IPv6.org

                for DOMAIN in $(cat ${TMPDIR}/${INDEX}Domains.txt)
                do
                        # ignore comment and empty lines
                        FQDN=$(echo ${DOMAIN} | sed 's/^[[:blank:]]*//;s/[[:blank:]]*$//')
                        if [[ ${FQDN::1} != '#' || ${FQDN::1} == '\n' ]] then
                                # Process IPv4
                                dig -t a +short ${FQDN} |grep -E '^[0-9]{1,3}.' >> ${TMPv4}

                                # Process IPv6
                                dig -t aaaa +short ${FQDN} |grep -E '^[0-9a-fA-F]{1,4}\:' >> ${TMPv6}
                        fi
                done
                cat ${TMPDIR}/${INDEX}IPv4.org ${TMPv4} |sort -n |uniq > ${TMPDIR}/${INDEX}IPv4.txt
                cat ${TMPDIR}/${INDEX}IPv6.org ${TMPv6} |sort -n |uniq > ${TMPDIR}/${INDEX}IPv6.txt

                # Copy updated IP lists to pfBlockerNG
                cp ${TMPDIR}/${INDEX}IPv?.txt ${PFBDIR}

                rm -f ${TMPDIR}/${INDEX}IPv?.org
        done

        sleep 5
        rm -f ${LOCK}
        sleep 25
done

My initial experiment capturing the DNS requests and client hello packets already provided my with the top-level domains.
These are used in the script to capture the required FQDN's

I created a start up script to easily start and stop both scripts

#!/usr/local/bin/bash
#

BASEDIR='/local'
TMPDIR='/local/db/pfbng'
EXTRACT_PIDFILE='extractMediaDomains.pid'
RESOLVE_PIDFILE='resolvMediaDomains.pid'


startme() {
        ${BASEDIR}/bin/extractMediaDomains &
        ${BASEDIR}/bin/resolvMediaDomains &
}


stopme() {
        # Graceful stop
        touch ${TMPDIR}/terminate
        COUNT=60
        while [ ${COUNT} != 0 ]
        do
                if [ -f ${TMPDIR}/${EXTRACT_PIDFILE} ] || [ -f ${TMPDIR}/${RESOLVE_PIDFILE} ]; then
                        ((COUNT--))
                else
                         rm -f ${TMPDIR}/terminate
                         exit
                fi

                echo -n .
                sleep 2
        done
        echo "processes not stopping... going to kill"
        killme
}

killme() {
        kill -9 $(cat ${TMPDIR}/${EXTRACT_PIDFILE}) && rm -f ${TMPDIR}/${EXTRACT_PIDFILE}
        kill -9 $(cat ${TMPDIR}/${RESOLVE_PIDFILE}) && rm -f ${TMPDIR}/${RESOLVE_PIDFILE}
        [ -f  ${TMPDIR}/terminate ] && rm -f {TMPDIR}/terminate
}

statusme() {
        [ -f ${TMPDIR}/${EXTRACT_PIDFILE} ] && PID=$(cat ${TMPDIR}/${EXTRACT_PIDFILE})
        if [ ! -z ${PID} ] && [ $(ps | grep -c "^${PID}") != 0 ]; then
                echo "extract process running"
        else
                echo "extract process not running"
        fi

        [ -f ${TMPDIR}/${RESOLVE_PIDFILE} ] && PID=$(cat ${TMPDIR}/${RESOLVE_PIDFILE})
        if [ ! -z ${PID} ] && [ $(ps | grep -c "^${PID}") != 0 ]; then
                echo "resolve process running"
        else
                echo "resolve process not running"
        fi
}

case "$1" in
        start)          startme ;;
        stop)           stopme ;;
        kill)           killme ;;
        restart)        stopme; startme ;;
        status)         statusme ;;
        *) echo         "Usage: $0 start|stop|restart|status" >&2
                                exit 1
                                ;;
esac

This setup/configuration is working pretty good. I hardly get locked out due to a VPN in use message.
If that happens it is usually resolved after the IP match list is updated with the latest IP addresses.

The next version will probably python based and will have time stamps on the FQDN's and IP addresses so they can be removed when they are not periodically refreshed by a capture or an address resolve.