Amazon VPC Routing: OSPF and IPsec Backup



  • Hello all, I just thought I'd crane my neck down as Im looking at an issue to do with our setup..

    We have:

    two pfsense firewall/routers connected to our network.

    Four interfaces.:

    em0 - Internet
    em1 - VPLS layer 2 with OSPF routing to our provider that connects us to our VPC in Amazon.
    em2 - local admin lan
    em3 -  pfSync

    Ive got OSPF routing setup in area 0 - which is correctly routing to our amazon virtual cloud (vpc) (via amazon's Direct connect which our provider is connected to, essentially connected at their side to aws via BGP)

    Amazon recommends that  when you're using a direct connect (DX for short) they suggest an IPSec vpn backup from your device to your VPC incase your DX goes down. *( which we've done and tested and seen an outage - those who were in the UK and saw telecity loose power last week will know that )

    1. My ipsec tunnel to the AwS vpn server is good - its bound to our VPG (virtual private gw) in our vpc, as is the BGP session with our provider - that part is largely transparent.

    2. the aws route back to our Data centre is configured and I can ping our dc lan gw without issues when i test the vpn only connection by dropping the VPLS/ospf interface on both firewalls - because there's no OSPF traffic going over these links.

    Now, here's the interesting part.

    scenario:  Shutdown the em1 - vpls interface test vpn fail over to test the traffic from dc goes out via the VPN - whilst the vpls interface is not running,.

    if i  do a netstat, and look for the vpc subnet i see this

    VPLS int.
    [2.2.4-RELEASE][root@ee-dr-fw1-adm.fmlocal]/root: ifconfig em1

    em1: flags=8802 <broadcast,simplex,multicast>metric 0 mtu 1500

    options=209b <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_magic>ether d2:34:c0:1e:6d:ac

    inet6 fe80::d034:c0ff:fe1e:6dac%em1 prefixlen 64 scopeid 0x2

    inet 192.168.5.132 netmask 0xffffff80 broadcast 192.168.5.255

    nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect (1000baseT <full-duplex>)

    status: active

    (at this point i ifconfig em1 down)

    EM2 is my internet interface..
    I look at the routing table and see this

    [2.2.4-RELEASE][root@ee-dr-fw1-adm.fmlocal]/root: netstat -nra | grep 10.99

    10.99.0.0/16      172.29.33.5        UG1        em2

    so its pushing a route for the VPN to go via the second firewall's vpls interface.

    so i bring up the interface again (em1)

    [2.2.4-RELEASE][root@ee-dr-fw1-adm.fmlocal]/root: ifconfig em1 up

    [2.2.4-RELEASE][root@ee-dr-fw1-adm.fmlocal]/root: netstat -nra | grep 10.99

    10.99.0.0/16      172.29.33.5        UG1        em2

    [2.2.4-RELEASE][root@ee-dr-fw1-adm.fmlocal]/root:

    It's still showing the ospf route to 2nd firewall

    In another ssh session,  i do a netstat to see whats happening with my vpc route (10.99/16)

    [2.2.4-RELEASE][root@ee-dr-fw1-adm.fmlocal]/root:

    [2.2.4-RELEASE][root@ee-dr-fw1-adm.fmlocal]/root:

    [2.2.4-RELEASE][root@ee-dr-fw1-adm.fmlocal]/root: netstat -nra | grep 10.99

    {nothing there}

    [2.2.4-RELEASE][root@ee-dr-fw1-adm.fmlocal]/root: netstat -nra | grep 10.99

    [2.2.4-RELEASE][root@ee-dr-fw1-adm.fmlocal]/root: netstat -nra | grep 10.99

    I check again

    [2.2.4-RELEASE][root@ee-dr-fw1-adm.fmlocal]/root: netstat -nra | grep 10.99

    10.99.0.0/16      192.168.5.129      UG1        em1

    VPLS interface is back up and OSPF injects the route back via em1.

    However a tcpdump shows that enc0 is my vpn interface for ipsec and is still showing traffic leaving over the vpn

    [2.2.4-RELEASE][root@ee-dr-fw1-adm.fmlocal]/root: tcpdump -li enc0 -n| head

    tcpdump: WARNING: enc0: no IPv4 address assigned

    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

    listening on enc0, link-type ENC (OpenBSD encapsulated IP), capture size 65535 bytes

    capability mode sandbox enabled

    10:20:34.175102 (authentic,confidential): SPI 0x014e0213: IP 172.29.33.37.13541 > 10.99.101.90.7301: Flags [P.], seq 689148127:689148227, ack 3747528986, win 517, length 100

    10:20:34.243513 (authentic,confidential): SPI 0x014e0213: IP 172.29.33.37.13541 > 10.99.101.90.7301: Flags [P.], seq 100:1107, ack 1, win 517, length 1007

    10:20:35.006258 (authentic,confidential): SPI 0x014e0213: IP 172.29.33.37.13541 > 10.99.101.90.7301: Flags [P.], seq 1107:1203, ack 1, win 517, length 96

    10:20:35.086412 (authentic,confidential): SPI 0x014e0213: IP 172.29.33.37.13541 > 10.99.101.90.7301: Flags [P.], seq 1203:1301, ack 1, win 517, length 98

    And on the VPLS i see the replies for the ACk's back

    [2.2.4-RELEASE][root@ee-dr-fw1-adm.fmlocal]/root: tcpdump -lni em1 net 10.99.101.0/24 and port 7301|head

    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

    listening on em1, link-type EN10MB (Ethernet), capture size 65535 bytes

    capability mode sandbox enabled

    10:20:26.644723 IP 10.99.101.90.7301 > 172.29.33.37.13541: Flags [.], ack 689131649, win 516, length 0

    10:20:26.652621 IP 10.99.101.90.7301 > 172.29.33.37.13541: Flags [.], ack 1459, win 517, length 0

    10:20:26.708810 IP 10.99.101.90.7301 > 172.29.33.37.13541: Flags [.], ack 1762, win 515, length 0

    10:20:27.101622 IP 10.99.101.90.7301 > 172.29.33.37.13541: Flags [.], ack 3249, win 517, length 0

    10:20:27.102206 IP 10.99.101.90.7301 > 172.29.33.37.13541: Flags [.], ack 4644, win 517, length 0

    10:20:27.114552 IP 10.99.101.90.7301 > 172.29.33.37.13541: Flags [.], ack 5247, win 514, length 0

    10:20:27.220642 IP 10.99.101.90.7301 > 172.29.33.37.13541: Flags [.], ack 5348, win 514, length 0

    But yet the routes are good still and I've not changed anything on the config..

    So what Im wondering is, is this symptomatic of what amazon is doing or our firewall or both ?

    Can anyone tell me why the traffic does not fail back to em1 and stop pushing the traffic over the Ipsec VPN, to stop it i have to manually disable ipsec.

    I have also noticed that when I shut down em1 on the primary firewall, OSPF re-routes traffic for 10.99/16 to the 2nd firewall because it still has a link via that host to the target, but from what i've seen is that once all the links are correctly up again, there's traffic for 10.99/16 being seen over the VPN, the LAN and VPLS interfaces, what is going on ??

    Whilst I know the traffic is going over the VPN, i can do a traceroute from a host on the admin lan that's sending data to the VPC.

    |–----------------------------------------------------------------------------------------|
    |                                      WinMTR statistics                                  |

    Host              -  % Sent Recv Best Avrg Wrst Last
    enf-dr-fw1-adm.fmlocal -    0 2 2 0 0 0 0
    No response from host -  100 1 0 0 0 0 0
    No response from host -  100 1 0 0 0 0 0
    No response from host -  100 1 0 0 0 0 0
    aws-dev-p-app-1.fmlocal -    0 2 2 12 12 13 13
    ________________________________________________ ______ ______ ______ ______ ______ ______

    WinMTR v0.92 GPL V2 by Appnor MSP - Fully Managed Hosting & Cloud Provider

    When the VPLS is working, it shows the routes going over the 8 hops to amazon.

    -----------------------------------------------------------------------------------------|
    |                                      WinMTR statistics                                  |

    Host              -  % Sent Recv Best Avrg Wrst Last
    enf-dr-fw1-adm.fmlocal -    0 5 5 0 0 0 0
    expo-e-router1-vpls-vrf.fmlocal -    0 5 5 1 2 7 1
    192.168.0.2 -    0 5 5 1 1 1 1
    192.168.0.1 -    0 5 5 1 1 1 1
    80.85.65.157 -    0 5 5 1 1 1 1
    80.85.65.158 -    0 5 5 1 1 1 1
    aws-dev-p-app-1.fmlocal -    0 5 5 11 11 11 11
    ________________________________________________ ______ ______ ______ ______ ______ ______

    WinMTR v0.92 GPL V2 by Appnor MSP - Fully Managed Hosting & Cloud Provider</full-duplex></performnud,auto_linklocal></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_magic></broadcast,simplex,multicast>


Log in to reply