PfSense 2.1 simple site-to-site VPN - possible bug.



  • Howdy folks!

    I've set up about 3 days ago a site-to-site VPN with two pfSense 2.1 boxes. The tunnel worked perfectly fine till now when I came to work this morning the VPN was down and wouldn't reconnect. I've checked connectivity between the two routers and everything checked out OK. I double checked the phase 1 and 2 and the only mismatch I had was in the NAT transversal option, which at one site it was enabled and at the other not. After straightening the settings out, still it wouldn't establish the tunnel. Everything I got on the logs was a simple:

    racoon: []: [{Site2IP}] ERROR: phase2 negotiation failed due to time up waiting for phase1 [Remote Side not responding]. ESP {Site2IP}[0]->{Site1IP}[0]

    (BTW in the debug mode I had also a bunch of "Unknown Dynamic Gateway" error messages. )

    Restarting racoon didn't help.

    However, after deleting both IPSEC phase 1 and 2 in both sites and then configuring them again the tunnel is now back up with no errors. Am I doing something wrong?

    Settings:

    Router 01
    WAN: static IP
    LAN: static IP

    Router 02
    WAN: PPPOE static IP
    LAN: static ip
    OPT1: DHCP (Site 2 has a adsl backup)
    Default Gateway: Wan's gateway.

    phase 1
    IPV4
    Interface: WAN
    Remote GW: {Site2IP or Site1IP}
    Authentication method: mutual PSK
    Negotiation mode: main
    My Id: IP address
    P Id:  P address
    key: key
    Policy : Default
    Proposal : Default
    Encryption: 3DES
    HASH: SHA1
    Lifetime: 28800
    NAT-T : Disable
    DPD: Enable 10,5

    Phase 2:

    Tunnel IPv4
    Local Net:  Type: Lan subnet
    Nat/Binat: None
    Remote Network: {Remote network and Mask}
    Protocol: ESP
    Encryption Algorithms: AES
    Hash: MD5
    PFS Key Group: OFF
    Lifetime 3600
    APH: Blank

    IPsec firewall rule is created (Allow any IPSEC)

    I really want this to work stably  :(


  • Rebel Alliance Developer Netgate

    Were you seeing any messages in the system logs on the firewall tab about blocked traffic on either side? Maybe ESP traffic getting dropped?

    OpenVPN copes much better with dynamic WANs, so you might consider switching to that. Even though your PPPoE line has a static IP, it's still dynamic to pfSense because it's PPPoE.



  • @jimp:

    OpenVPN copes much better with dynamic WANs, so you might consider switching to that.

    OK just done that. Hope it stays on  :-X



  • Hello,

    just wanted to let you now, that we have the same problem with 2.1. Following situation (sry for my bad english)

    We have a main firewall with 2.0.1 and two branch offices (2.0.1 & 2.1). Every night at 0:00 the Dynamic WAN connections reset. The first branch office 2.0.1 reconnects only a few seconds, after the new wan connection is established. The second office doesn't. Every morning an employee reboots the firewall there.
    This night I watched the firewall, to see what happens.
    This is, what happens at the headquarter:

    Nov 5 00:03:42	racoon: [Aussenlager02 Branch Office]: [177.144.168.227] ERROR: phase2 negotiation failed due to time up waiting for phase1 [Remote Side not responding]. ESP 177.144.168.227[0]->86.129.941.240[0]
    Nov 5 00:03:11	racoon: INFO: begin Aggressive mode.
    Nov 5 00:03:11	racoon: [Aussenlager02 Branch Office]: INFO: initiate new phase 1 negotiation: 86.129.941.240[500]<=>177.144.168.227[500]
    Nov 5 00:03:11	racoon: [Aussenlager02 Branch Office]: INFO: IPsec-SA request for 177.144.168.227 queued due to no phase1 found.
    Nov 5 00:03:01	racoon: ERROR: phase1 negotiation failed due to time up. 48927a5202a3e806:0000000000000000
    Nov 5 00:02:43	racoon: INFO: delete phase 2 handler.
    Nov 5 00:02:43	racoon: [Aussenlager02 Branch Office]: [177.144.168.227] ERROR: phase2 negotiation failed due to time up waiting for phase1 [Remote Side not responding]. ESP 177.144.168.227[0]->86.129.941.240[0]
    Nov 5 00:02:11	racoon: INFO: begin Aggressive mode.
    Nov 5 00:02:11	racoon: [Aussenlager02 Branch Office]: INFO: initiate new phase 1 negotiation: 86.129.941.240[500]<=>177.144.168.227[500]
    Nov 5 00:02:11	racoon: [Aussenlager02 Branch Office]: INFO: IPsec-SA request for 177.144.168.227 queued due to no phase1 found.
    

    And this is, what happend at the 2.1 branch office (thats right, nothing):

    Nov 5 00:00:46	racoon: [Hauptstelle Headquarter]: INFO: ISAKMP-SA deleted 88.64.116.125[500]-86.129.941.240[500] spi:8ad16106915e6828:586cf85d9885594c
    Nov 5 00:00:46	racoon: INFO: purged ISAKMP-SA spi=8ad16106915e6828:586cf85d9885594c.
    Nov 5 00:00:46	racoon: INFO: purged IPsec-SA spi=265111803.
    Nov 5 00:00:46	racoon: INFO: purged IPsec-SA spi=192301555.
    Nov 5 00:00:46	racoon: INFO: purging ISAKMP-SA spi=8ad16106915e6828:586cf85d9885594c.
    Nov 5 00:00:46	racoon: [Hauptstelle Headquarter]: [86.129.941.240] INFO: DPD: remote (ISAKMP-SA spi=8ad16106915e6828:586cf85d9885594c) seems to be dead.
    

    I checked the firewall logs twice, there was no entry at all, so nothing got blocked.

    BUT I did also a Packet Capture at the Brach Office, and I saw a lot of UDP 500 requests from the HQ, but no one answered.

    00:25:11.409211 IP 86.129.941.240.500 > 177.144.168.227.500: UDP, length 288
    00:25:21.421175 IP 86.129.941.240.500 > 177.144.168.227.500: UDP, length 288
    00:25:31.441285 IP 86.129.941.240.500 > 177.144.168.227.500: UDP, length 288
    00:25:41.461160 IP 86.129.941.240.500 > 177.144.168.227.500: UDP, length 288
    00:25:51.481516 IP 86.129.941.240.500 > 177.144.168.227.500: UDP, length 288
    

    Second, I found out, that as soon as the midnight reconnect is finished, I can access 2.1 still via WebGUI but not via SSH (2.0.1 office still accepts SSH sessions!). Tried everything. After I reboot the firewall, SSH works again.

    What I also found out, there is no need to reboot the firewall to make IPsec work again. It is enough to restart racoon. The VPN works instant and the connection works again for 24 hours.

    I know, the right solution would be to switch top Site-to-Site OpenVPN, and we sure will. But first, I would like to get this fixed! And in my opinion its a bug. Seems to me like racoon in 2.1 still listens to the IP from before the reconnect.

    Thank you very much!



  • Same problem here.
    I have two pfsense 2.1 routers. A side is PPPoE (adsl) and B side is DHCP (cable modem).
    When I try to connect from A to B I get the "ERROR: phase2 negotiation failed due to time up waiting for phase1 [Remote Side not responding]" message.
    I think restarting firewall resolves problem because when I set "System: Advanced: Firewall and NAT>Disable Auto-added VPN rules" On and Off tunnel starts to work.
    Also setting whole firewall Off and On again help.

    What could it be?


  • Rebel Alliance Developer Netgate

    I suspect it is something to do with the automatic VPN rules, which is why I was asking earlier about dropped ESP traffic.

    Can you get a copy of /tmp/rules.debug when it's broken and then a new copy when it works? It would help to compare them. Something must not be getting updated in the auto VPN rules when an IP changes.



  • I have compared rules.debug with rules.debug.old. The only difference is in this part of files:

    pass out on $WAN  route-to ( dc0 xx.xxx.xxx.x )  proto udp from any to xx.xx3.26.170 port = 500 keep state label "IPsec: xx.xx3.26.170 - outbound isakmp"
    pass in on $WAN  reply-to ( dc0 xx.xxx.xxx.x )  proto udp from xx.xx3.26.170 to any port = 500 keep state label "IPsec: xx.xx3.26.170 - inbound isakmp"
    pass out on $WAN  route-to ( dc0 xx.xxx.xxx.x )  proto esp from any to xx.xx3.26.170 keep state label "IPsec: xx.xx3.26.170 - outbound esp proto"
    pass in on $WAN  reply-to ( dc0 xx.xxx.xxx.x )  proto esp from xx.xx3.26.170 to any keep state label "IPsec: xx.xx3.26.170 - inbound esp proto"
    pass out on $WAN  route-to ( dc0 xx.xxx.xxx.x )  proto udp from any to xx.xx.107.215 port = 500 keep state label "IPsec: xx.xx.107.215 - outbound isakmp"
    pass in on $WAN  reply-to ( dc0 xx.xxx.xxx.x )  proto udp from xx.xx.107.215 to any port = 500 keep state label "IPsec: xx.xx.107.215 - inbound isakmp"
    pass out on $WAN  route-to ( dc0 xx.xxx.xxx.x )  proto esp from any to xx.xx.107.215 keep state label "IPsec: xx.xx.107.215 - outbound esp proto"
    pass in on $WAN  reply-to ( dc0 xx.xxx.xxx.x )  proto esp from xx.xx.107.215 to any keep state label "IPsec: xx.xx.107.215 - inbound esp proto"

    Both files are dated this morning so I guess this was the time when the reset of Firewall was done. If I'm wrong I can get a copy tommorow just like you said.
    These are two IPSEC tunnels, maybe those rules are created from dynamic hostname of peer GW when firewall starts and are not updated when tunnel is trying to connect later?



  • I have created a bug in redmine for this:

    https://redmine.pfsense.org/issues/3321



  • This is broken again in 2.1.2



  • Is this bug fixed as of 2.1.4? I have one IPSEC tunnel that always seems to go down after a while, and nothing short of fully rebooting the router gets it running again.  This is an APU2 router in our office that's running 2.1.4, tunneling into our DC which also runs pfSense.  The DC router has 5 IPsec tunnels set up on it, all configured the same way - only this one seems problematic.


Log in to reply