Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Firewall stops routing completely...almost.

    Scheduled Pinned Locked Moved Routing and Multi WAN
    2 Posts 1 Posters 151 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • B
      b_levitt
      last edited by

      We have now had several events in the morning where we have no internet access. We can connect to the admin interface of the firewall and everything looks ok, including gateway status shows online. If we wait for it, it will usually come back on line in 30 minutes to an hour.

      The firewall is version 2.4.4-RELEASE-p1. It is connected to a comcast cable modem with a static pass-thru IP. We have other devices connected directly to the cable modem that are able to connect to the internet.

      Most of the time, there are no events in the logs that indicate any status change at all of anything.

      However, I have placed this in the multiwan section because there is a 3rd nic connected to a cradlepoint router that provides internet via 4G and is configured as a gateway. We had a gateway group with this gateway for failover. Despite, the failover set for "never" (as opposed to tier2), and the default gateway set to the primary gateway and not the gateway group, the cradlepoint suddently provided the pfsense box an ip yesterday and then went offline. This event did correspond to an outage. In all other cases, including the most recent one, there has been no such event.

      The only tidbit of info I have is that an outbound connection response for an IPSec connection IS leaving the firewall because I can see it on the other end of the tunnel:

      Other end:

      May 1 12:17:55	charon		14[IKE] <con2|643> giving up after 5 retransmits
      May 1 12:16:49	charon		14[IKE] <648> IKE_SA (unnamed)[648] state change: CONNECTING => DESTROYING
      May 1 12:16:49	charon		14[JOB] <648> deleting half open IKE_SA with x.x.x.36 after timeout
      May 1 12:16:40	charon		04[NET] sending packet: from y.y.y.130[500] to x.x.x.36[500]
      May 1 12:16:40	charon		14[NET] <con2|643> sending packet: from y.y.y.130[500] to x.x.x.36[500] (336 bytes)
      May 1 12:16:40	charon		14[IKE] <con2|643> retransmit 5 of request with message ID 0
      May 1 12:16:25	charon		14[CFG] ignoring acquire, connection attempt pending
      May 1 12:16:25	charon		10[KNL] creating acquire job for policy y.y.y.130/32|/0 === x.x.x.36/32|/0 with reqid {2}
      May 1 12:16:19	charon		04[NET] sending packet: from y.y.y.130[500] to x.x.x.36[500]
      May 1 12:16:19	charon		10[NET] <648> sending packet: from y.y.y.130[500] to x.x.x.36[500] (336 bytes)
      May 1 12:16:19	charon		10[ENC] <648> generating IKE_SA_INIT response 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) N(HASH_ALG) N(MULT_AUTH) ]
      <<<proposal and hash negotiations>>
      May 1 12:16:19	charon		10[CFG] <648> selecting proposal:
      May 1 12:16:19	charon		10[IKE] <648> IKE_SA (unnamed)[648] state change: CREATED => CONNECTING
      May 1 12:16:19	charon		10[IKE] <648> x.x.x.36 is initiating an IKE_SA
      May 1 12:16:19	charon		10[CFG] <648> found matching ike config: y.y.y.130...XXX.dynu.net with prio 3100
      May 1 12:16:19	charon		10[CFG] <648> candidate: y.y.y.130...XXX.dynu.net, prio 3100
      May 1 12:16:19	charon		10[CFG] <648> candidate: %any...%any, prio 24
      May 1 12:16:19	charon		10[CFG] <648> looking for an ike config for y.y.y.130...x.x.x.36
      May 1 12:16:19	charon		10[ENC] <648> parsed IKE_SA_INIT request 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) N(HASH_ALG) N(REDIR_SUP) ]
      May 1 12:16:19	charon		10[NET] <648> received packet: from x.x.x.36[500] to y.y.y.130[500] (336 bytes)
      May 1 12:16:19	charon		03[NET] waiting for data on sockets
      May 1 12:16:19	charon		03[NET] received packet: from x.x.x.36[500] to y.y.y.130[500]°(information text)
      

      "Down" end:

      May 1 08:17:34	charon		12[IKE] <con1000|11> establishing IKE_SA failed, peer not responding
      May 1 08:17:34	charon		12[IKE] <con1000|11> giving up after 5 retransmits
      May 1 08:17:12	charon		12[CFG] ignoring acquire, connection attempt pending
      May 1 08:17:12	charon		02[KNL] creating acquire job for policy x.x.x.36/32|/0 === y.y.y.130/32|/0 with reqid {1}
      May 1 08:16:47	charon		02[CFG] ignoring acquire, connection attempt pending
      May 1 08:16:47	charon		12[KNL] creating acquire job for policy x.x.x.36/32|/0 === y.y.y.130/32|/0 with reqid {1}
      May 1 08:16:20	charon		12[CFG] ignoring acquire, connection attempt pending
      May 1 08:16:20	charon		02[KNL] creating acquire job for policy x.x.x.36/32|/0 === y.y.y.130/32|/0 with reqid {1}
      May 1 08:16:19	charon		02[NET] <con1000|11> sending packet: from x.x.x.36[500] to y.y.y.130[500] (336 bytes)
      

      The biggest problem I have here is the lack of information. Other than the one time with the secondary gateway coming back online, all other times show nothing unusual in the system log under general, gateway, or routing. The gateway shows online according to it's monitor (which is the next hop). It appears be using the correct IP based on the ipsec log. I didn't get a chance to do a basic ping or tracert from the server before it came back up but I will next time. But is there something else I can look for, or some service to reset to narrow down the issue? Rebooting always fixes it.

      1 Reply Last reply Reply Quote 0
      • B
        b_levitt
        last edited by

        We are now thinking it was the comcast router/modem that stopped routing. As soon as we disconnected another device that was connected to it, it went away.

        1 Reply Last reply Reply Quote 0
        • First post
          Last post
        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.