Firewall stops routing completely...almost.



  • We have now had several events in the morning where we have no internet access. We can connect to the admin interface of the firewall and everything looks ok, including gateway status shows online. If we wait for it, it will usually come back on line in 30 minutes to an hour.

    The firewall is version 2.4.4-RELEASE-p1. It is connected to a comcast cable modem with a static pass-thru IP. We have other devices connected directly to the cable modem that are able to connect to the internet.

    Most of the time, there are no events in the logs that indicate any status change at all of anything.

    However, I have placed this in the multiwan section because there is a 3rd nic connected to a cradlepoint router that provides internet via 4G and is configured as a gateway. We had a gateway group with this gateway for failover. Despite, the failover set for "never" (as opposed to tier2), and the default gateway set to the primary gateway and not the gateway group, the cradlepoint suddently provided the pfsense box an ip yesterday and then went offline. This event did correspond to an outage. In all other cases, including the most recent one, there has been no such event.

    The only tidbit of info I have is that an outbound connection response for an IPSec connection IS leaving the firewall because I can see it on the other end of the tunnel:

    Other end:

    May 1 12:17:55	charon		14[IKE] <con2|643> giving up after 5 retransmits
    May 1 12:16:49	charon		14[IKE] <648> IKE_SA (unnamed)[648] state change: CONNECTING => DESTROYING
    May 1 12:16:49	charon		14[JOB] <648> deleting half open IKE_SA with x.x.x.36 after timeout
    May 1 12:16:40	charon		04[NET] sending packet: from y.y.y.130[500] to x.x.x.36[500]
    May 1 12:16:40	charon		14[NET] <con2|643> sending packet: from y.y.y.130[500] to x.x.x.36[500] (336 bytes)
    May 1 12:16:40	charon		14[IKE] <con2|643> retransmit 5 of request with message ID 0
    May 1 12:16:25	charon		14[CFG] ignoring acquire, connection attempt pending
    May 1 12:16:25	charon		10[KNL] creating acquire job for policy y.y.y.130/32|/0 === x.x.x.36/32|/0 with reqid {2}
    May 1 12:16:19	charon		04[NET] sending packet: from y.y.y.130[500] to x.x.x.36[500]
    May 1 12:16:19	charon		10[NET] <648> sending packet: from y.y.y.130[500] to x.x.x.36[500] (336 bytes)
    May 1 12:16:19	charon		10[ENC] <648> generating IKE_SA_INIT response 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) N(HASH_ALG) N(MULT_AUTH) ]
    <<<proposal and hash negotiations>>
    May 1 12:16:19	charon		10[CFG] <648> selecting proposal:
    May 1 12:16:19	charon		10[IKE] <648> IKE_SA (unnamed)[648] state change: CREATED => CONNECTING
    May 1 12:16:19	charon		10[IKE] <648> x.x.x.36 is initiating an IKE_SA
    May 1 12:16:19	charon		10[CFG] <648> found matching ike config: y.y.y.130...XXX.dynu.net with prio 3100
    May 1 12:16:19	charon		10[CFG] <648> candidate: y.y.y.130...XXX.dynu.net, prio 3100
    May 1 12:16:19	charon		10[CFG] <648> candidate: %any...%any, prio 24
    May 1 12:16:19	charon		10[CFG] <648> looking for an ike config for y.y.y.130...x.x.x.36
    May 1 12:16:19	charon		10[ENC] <648> parsed IKE_SA_INIT request 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) N(HASH_ALG) N(REDIR_SUP) ]
    May 1 12:16:19	charon		10[NET] <648> received packet: from x.x.x.36[500] to y.y.y.130[500] (336 bytes)
    May 1 12:16:19	charon		03[NET] waiting for data on sockets
    May 1 12:16:19	charon		03[NET] received packet: from x.x.x.36[500] to y.y.y.130[500]°(information text)
    

    "Down" end:

    May 1 08:17:34	charon		12[IKE] <con1000|11> establishing IKE_SA failed, peer not responding
    May 1 08:17:34	charon		12[IKE] <con1000|11> giving up after 5 retransmits
    May 1 08:17:12	charon		12[CFG] ignoring acquire, connection attempt pending
    May 1 08:17:12	charon		02[KNL] creating acquire job for policy x.x.x.36/32|/0 === y.y.y.130/32|/0 with reqid {1}
    May 1 08:16:47	charon		02[CFG] ignoring acquire, connection attempt pending
    May 1 08:16:47	charon		12[KNL] creating acquire job for policy x.x.x.36/32|/0 === y.y.y.130/32|/0 with reqid {1}
    May 1 08:16:20	charon		12[CFG] ignoring acquire, connection attempt pending
    May 1 08:16:20	charon		02[KNL] creating acquire job for policy x.x.x.36/32|/0 === y.y.y.130/32|/0 with reqid {1}
    May 1 08:16:19	charon		02[NET] <con1000|11> sending packet: from x.x.x.36[500] to y.y.y.130[500] (336 bytes)
    

    The biggest problem I have here is the lack of information. Other than the one time with the secondary gateway coming back online, all other times show nothing unusual in the system log under general, gateway, or routing. The gateway shows online according to it's monitor (which is the next hop). It appears be using the correct IP based on the ipsec log. I didn't get a chance to do a basic ping or tracert from the server before it came back up but I will next time. But is there something else I can look for, or some service to reset to narrow down the issue? Rebooting always fixes it.



  • We are now thinking it was the comcast router/modem that stopped routing. As soon as we disconnected another device that was connected to it, it went away.


Log in to reply