Firewall stops routing completely...almost.
-
We have now had several events in the morning where we have no internet access. We can connect to the admin interface of the firewall and everything looks ok, including gateway status shows online. If we wait for it, it will usually come back on line in 30 minutes to an hour.
The firewall is version 2.4.4-RELEASE-p1. It is connected to a comcast cable modem with a static pass-thru IP. We have other devices connected directly to the cable modem that are able to connect to the internet.
Most of the time, there are no events in the logs that indicate any status change at all of anything.
However, I have placed this in the multiwan section because there is a 3rd nic connected to a cradlepoint router that provides internet via 4G and is configured as a gateway. We had a gateway group with this gateway for failover. Despite, the failover set for "never" (as opposed to tier2), and the default gateway set to the primary gateway and not the gateway group, the cradlepoint suddently provided the pfsense box an ip yesterday and then went offline. This event did correspond to an outage. In all other cases, including the most recent one, there has been no such event.
The only tidbit of info I have is that an outbound connection response for an IPSec connection IS leaving the firewall because I can see it on the other end of the tunnel:
Other end:
May 1 12:17:55 charon 14[IKE] <con2|643> giving up after 5 retransmits May 1 12:16:49 charon 14[IKE] <648> IKE_SA (unnamed)[648] state change: CONNECTING => DESTROYING May 1 12:16:49 charon 14[JOB] <648> deleting half open IKE_SA with x.x.x.36 after timeout May 1 12:16:40 charon 04[NET] sending packet: from y.y.y.130[500] to x.x.x.36[500] May 1 12:16:40 charon 14[NET] <con2|643> sending packet: from y.y.y.130[500] to x.x.x.36[500] (336 bytes) May 1 12:16:40 charon 14[IKE] <con2|643> retransmit 5 of request with message ID 0 May 1 12:16:25 charon 14[CFG] ignoring acquire, connection attempt pending May 1 12:16:25 charon 10[KNL] creating acquire job for policy y.y.y.130/32|/0 === x.x.x.36/32|/0 with reqid {2} May 1 12:16:19 charon 04[NET] sending packet: from y.y.y.130[500] to x.x.x.36[500] May 1 12:16:19 charon 10[NET] <648> sending packet: from y.y.y.130[500] to x.x.x.36[500] (336 bytes) May 1 12:16:19 charon 10[ENC] <648> generating IKE_SA_INIT response 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) N(HASH_ALG) N(MULT_AUTH) ] <<<proposal and hash negotiations>> May 1 12:16:19 charon 10[CFG] <648> selecting proposal: May 1 12:16:19 charon 10[IKE] <648> IKE_SA (unnamed)[648] state change: CREATED => CONNECTING May 1 12:16:19 charon 10[IKE] <648> x.x.x.36 is initiating an IKE_SA May 1 12:16:19 charon 10[CFG] <648> found matching ike config: y.y.y.130...XXX.dynu.net with prio 3100 May 1 12:16:19 charon 10[CFG] <648> candidate: y.y.y.130...XXX.dynu.net, prio 3100 May 1 12:16:19 charon 10[CFG] <648> candidate: %any...%any, prio 24 May 1 12:16:19 charon 10[CFG] <648> looking for an ike config for y.y.y.130...x.x.x.36 May 1 12:16:19 charon 10[ENC] <648> parsed IKE_SA_INIT request 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(FRAG_SUP) N(HASH_ALG) N(REDIR_SUP) ] May 1 12:16:19 charon 10[NET] <648> received packet: from x.x.x.36[500] to y.y.y.130[500] (336 bytes) May 1 12:16:19 charon 03[NET] waiting for data on sockets May 1 12:16:19 charon 03[NET] received packet: from x.x.x.36[500] to y.y.y.130[500]°(information text)
"Down" end:
May 1 08:17:34 charon 12[IKE] <con1000|11> establishing IKE_SA failed, peer not responding May 1 08:17:34 charon 12[IKE] <con1000|11> giving up after 5 retransmits May 1 08:17:12 charon 12[CFG] ignoring acquire, connection attempt pending May 1 08:17:12 charon 02[KNL] creating acquire job for policy x.x.x.36/32|/0 === y.y.y.130/32|/0 with reqid {1} May 1 08:16:47 charon 02[CFG] ignoring acquire, connection attempt pending May 1 08:16:47 charon 12[KNL] creating acquire job for policy x.x.x.36/32|/0 === y.y.y.130/32|/0 with reqid {1} May 1 08:16:20 charon 12[CFG] ignoring acquire, connection attempt pending May 1 08:16:20 charon 02[KNL] creating acquire job for policy x.x.x.36/32|/0 === y.y.y.130/32|/0 with reqid {1} May 1 08:16:19 charon 02[NET] <con1000|11> sending packet: from x.x.x.36[500] to y.y.y.130[500] (336 bytes)
The biggest problem I have here is the lack of information. Other than the one time with the secondary gateway coming back online, all other times show nothing unusual in the system log under general, gateway, or routing. The gateway shows online according to it's monitor (which is the next hop). It appears be using the correct IP based on the ipsec log. I didn't get a chance to do a basic ping or tracert from the server before it came back up but I will next time. But is there something else I can look for, or some service to reset to narrow down the issue? Rebooting always fixes it.
-
We are now thinking it was the comcast router/modem that stopped routing. As soon as we disconnected another device that was connected to it, it went away.