IPSEC + CARP = missing traffic in April 25, 2011 build on i386 (running as a VM)
-
While working with a developer build of beta-2 (built yesterday, April 25, 2011) I found that outbound IPSEC traffic seems to be missing. I am using CARP for several interfaces, including the WAN. IPSEC tunnels are built using this WAN-CARP interface. (This worked fine in the official RC1).
With 2 different endpoints (2 different IPSEC tunnels), I was unable to get phase1 to complete (timeout). Looking at the Packet Capture, I saw outbound traffic on the WAN-CARP address using UDP-500, but nothing coming back; however the remote side did not see our traffic. At first the other side's vendor thought that the ISP was blocking our UDP-500 traffic, but we have more than 100 tunnels working in the same subnet using a different VPN endpoint. In troubleshooting, I changed the WAN CARP IP address to something else, and the WAN interface took over old the WAN CARP IP address. The tunnels to both endpoints came up.
Below is a snippet of the logs while I was experiencing the error:
Apr 26 10:54:32 racoon: ERROR: phase1 negotiation failed due to send error. 2b157c3497a45a82:0000000000000000
Apr 26 10:54:32 racoon: INFO: begin Identity Protection mode.
Apr 26 10:54:32 racoon: [xxxxtunnelnamexxxx remotehost-ip.33.10 Realtime, remotehost-ip.33.45 FTP)]: INFO: initiate new phase 1 negotiation: CARP-WAN-IP.228.90[500]<=>peer-ip.39.72[500]
Apr 26 10:54:32 racoon: [xxxxtunnelnamexxxx rehosthost-ip.33.10 Realtime, remotehost-ip.33.45 FTP)]: INFO: IPsec-SA request for peer-ip.39.72 queued due to no phase1 found.
Apr 26 10:54:27 racoon: ERROR: phase1 negotiation failed due to send error. 5605c438dfa3163b:0000000000000000
Apr 26 10:54:19 racoon: INFO: delete phase 2 handler.
Apr 26 10:54:19 racoon: [Emdeon Memphis remote-host.33.10 Realtime, remote-host.33.45 FTP)]: [peer-ip.39.72] ERROR: phase2 negotiation failed due to time up waiting for phase1 [Remote Side not responding]. ESP peer-ip.39.72[0]->CARP-WAN-IP.228.90[0]
Apr 26 10:53:47 racoon: INFO: begin Identity Protection mode.
Apr 26 10:53:47 racoon: [Emdeon Memphis remote-host.33.10 Realtime, remote-host.33.45 FTP)]: INFO: initiate new phase 1 negotiation: CARP-WAN-IP.228.90[500]<=>peer-ip.39.72[500]
Apr 26 10:53:47 racoon: [Emdeon Memphis remote-host.33.10 Realtime, remote-host.33.45 FTP)]: INFO: IPsec-SA request for peer-ip.39.72 queued due to no phase1 found.
Apr 26 10:52:35 racoon: ERROR: phase1 negotiation failed due to time up. 2884dc1be98f0c19:0000000000000000Thank you,
KB at PA -
If you see it in tcpdump, it's on the wire. It may be going out but getting dropped somewhere in between. It may not be specific to UDP/500, but that CARP VIP. Can you ping them from your WAN IP? If so, can you ping them from your CARP VIP?
try it from the shell:
ping -S <wan.carp.ip> <their ip=""></their></wan.carp.ip>
-
I can ping them from the WAN IP, but I cannot ping it from the CARP address. I have tried changing both IP addresses, and the problem follows CARP, not the IP address (so we know that the IP address is not being blocked).
Something that I'm noticing that may be related: in my system logs, I do have a few errors:
php: : There were error(s) loading the rules: /tmp/rules.debug:159: rule label too long (max 63 chars) /tmp/rules.debug:165: rule label too long (max 63 chars) pfctl: Syntax error in config file: pf rules not loaded - The line in question reads [159]: pass out on $WAN route-to ( em0 wan-gw-ip.228.65 ) proto esp from any to peer-ip.39.72 keep state label "IPsec: peer name host-ip.33.10 Realtim - outbound esp proto"
php: : The command '/sbin/pfctl -o basic -f /tmp/rules.debug' returned exit code '1', the output was '/tmp/rules.debug:159: rule label too long (max 63 chars) /tmp/rules.debug:165: rule label too long (max 63 chars) pfctl: Syntax error in config file: pf rules not loaded'
There are a few similar errors. Any chance this is related?
-
Can you ping anything from the CARP VIP? It may be that your ISP isn't properly routing that CARP VIP back to you.
It could very well be related to the filter reload issue. I thought we had a fix in the code for that rule descr length. Try shortening the description on that IPsec tunnel and it should be ok.
-
Thanks for your help. For those who experience something similar: Resolution turned out to be a vhid conflict - unrelated vhid's shared on the same network segment caused all kinds of problems. Once I changed the vhid to be unique, the problems went away.