Intermittent connection initiation problem when using CARP + NAT



  • EDIT[22 May 2013]: Clarified that I watch the WAN interface with tcpdump. Also a link to another user with a similar problem.
    EDIT[23 May 2013]: Able to reproduce it on the secondary as well.

    (Another user is experiencing a very similar problem: http://forum.pfsense.org/index.php/topic,62590.0.html i.e. he/she watches the WAN interface and sees the LAN address as the source when it should be the NAT'ed address instead.)

    We are experiencing an intermittent problem with initiating connections (both UDP and TCP) to a remote host.
    We use two Dell PE R200's in a failover configuration with CARP VIPs for the LAN and WAN.
    It only happens when I use the LAN VIP (which is 192.168.100.1) as the LAN gateway address.
    If I use the LAN interface of the firewall  (which is 192.168.100.250) it does not happen.
    More importantly, it only happens with the primary firewall.
    I cannot reproduce this problem on the secondary at all.
    It is reproducible on the secondary but happens far less often.

    While monitoring packets on all interfaces the WAN interface with tcpdump, I noticed the following:
    When the intermittent problem occurs, the client's LAN address appears as the source address - when the WAN VIP should be the source address.

    Legend:
    XX.XX.XX.XX = IP address of remote host
    YY.YY.YY.YY = External CARP VIP of firewall.
    192.168.100.19 = One of many PC's or servers on the LAN
    192.168.100.250 = Internal LAN interface address (i.e. not a CARP VIP)

    Problem: 192.168.100.19 –-> 192.168.100.1    ----> YY.YY.YY.YY ---> XX.XX.XX.XX  
    Works:   192.168.100.19 ---> 192.168.100.250 ----> YY.YY.YY.YY ---> XX.XX.XX.XX

    This is strange as the source address should be the external VIP rather than the IP address of a PC on the LAN.
    When using TCP, repeated SYN packets are sent until the connection works. It looks like this (I have removed public IPs):

    Legend:
    XX.XX.XX.XX = IP address of remote host
    YY.YY.YY.YY = External CARP VIP of firewall.
    192.168.100.19 = One of many PC's or servers on the LAN

    
    08.996358 00:1e:c9:ba:f4:12 > 6c:20:56:fd:99:01, ethertype IPv4 (0x0800), length 74: 192.168.100.19.41440 > XX.XX.XX.XX.80: Flags [s], seq 2988224571, win 5840, options [mss 1460,sackOK,TS val 14569825 ecr 0,nop,wscale 2], length 0
    15:23:11.995862 00:1e:c9:ba:f4:12 > 6c:20:56:fd:99:01, ethertype IPv4 (0x0800), length 74: 192.168.100.19.41440 > XX.XX.XX.XX.80: Flags [s], seq 2988224571, win 5840, options [mss 1460,sackOK,TS val 14572825 ecr 0,nop,wscale 2], length 0
    15:23:17.991411 00:1e:c9:ba:f4:12 > 6c:20:56:fd:99:01, ethertype IPv4 (0x0800), length 74: 192.168.100.19.41440 > XX.XX.XX.XX.80: Flags [s], seq 2988224571, win 5840, options [mss 1460,sackOK,TS val 14578825 ecr 0,nop,wscale 2], length 0
    15:23:29.982908 00:1e:c9:ba:f4:12 > 6c:20:56:fd:99:01, ethertype IPv4 (0x0800), length 74: 192.168.100.19.41440 > XX.XX.XX.XX.80: Flags [s], seq 2988224571, win 5840, options [mss 1460,sackOK,TS val 14590825 ecr 0,nop,wscale 2], length 0
    15:23:53.967700 00:1e:c9:ba:f4:12 > 6c:20:56:fd:99:01, ethertype IPv4 (0x0800), length 74: 192.168.100.19.41440 > XX.XX.XX.XX.80: Flags [s], seq 2988224571, win 5840, options [mss 1460,sackOK,TS val 14614825 ecr 0,nop,wscale 2], length 0
    15:24:41.927803 00:1e:c9:ba:f4:12 > 6c:20:56:fd:99:01, ethertype IPv4 (0x0800), length 74: YY.YY.YY.YY.41440 > XX.XX.XX.XX.80: Flags [s], seq 2988224571, win 5840, options [mss 1460,sackOK,TS val 14662825 ecr 0,nop,wscale 2], length 0
    15:24:41.931087 6c:20:56:fd:99:01 > 00:00:5e:00:01:05, ethertype IPv4 (0x0800), length 66: XX.XX.XX.XX.80 > YY.YY.YY.YY.41440: Flags [S.], seq 1716458126, ack 2988224572, win 5840, options [mss 1460,nop,nop,sackOK,nop,wscale 4], length 0
    
    Note how there are repeated SYN attempts using the client's IP address as the source address. But on the sixth attempt the connection works and the source address is now correct (i.e. it is the external CARP VIP).
    I also ran tcpdump on the remote host, namely XX.XX.XX.XX and the packets do not appear until the sixth attempt.
    
    When using UDP, a similar problem occurs.
    
    Things I have tried in order:
    
    1\. Tested the Broadcom NICs using the official diagnostic tool from Dell. All low level tests passed.
    2\. Tested the RAM using memtest
    3\. Upgraded the BIOS of the motherboard
    4\. Re-installed pfSense
    5\. Took the configuration from the correctly working secondary, modified it accordingly and restored it on the newly re-built primary
    
    I have also tried watching pflog0 with tcpdump. There is no NAT translation logged when the problem occurs. 
    Only on the sixth attempt (i.e. when it uses the correct source address) is the NAT translation logged.
    
    No success.
    
    Details of the installation are:
    [list]
    [li]Two Dell PowerEdge R200s[/li]
    [li]pfSense 2.0.2 i386[/li]
    [li]Broadcom NetXtreme1 NICs[/li]
    [li]Dedicated 80Mb/s leased line[/li]
    [li]Static port on outgoing NAT[/li]
    [/list]
    
    If this problem has been solved before, please let me know.
    
    [/s][/s][/s][/s][/s][/s]
    


  • Possibly resolved.

    I read somewhere that bridges do not work too well with my setup.
    I deleted the bridge that I was using for OpenVPN and created a new, separate OpenVPN server for remote users to connect to.
    The problem has not reappeared after testing for 20 minutes.
    In the past, it took less than a minute for it to happen.



  • Left my test running over night. The issue has not reappeared so I am going to consider it resolved.


Log in to reply