System crash on gateway alarm?



  • Hi folks,

    My pfSense system (RCC-VE 2440) has gone offline three times in as many days. It's unpingable on the LAN interface, no services are reachable, and the serial console is blank and unresponsive. I have to power-cycle it to get it back online and everything is fine after that (until the next crash). Here are the last few lines in the system log before the reboot:

    Jan 18 23:58:43 cerberus rc.gateway_alarm[71831]: >>> Gateway alarm: WAN_DHCP6 (
    Addr:REDACTED Alarm:1 RTT:12894ms RTTsd:3415ms Loss:21%)
    Jan 18 23:58:43 cerberus check_reload_status: updating dyndns WAN_DHCP6
    Jan 18 23:58:43 cerberus check_reload_status: Restarting ipsec tunnels
    Jan 18 23:58:43 cerberus check_reload_status: Restarting OpenVPN tunnels/interfa
    ces
    Jan 18 23:58:43 cerberus check_reload_status: Reloading filter
    Jan 18 23:58:44 cerberus rc.gateway_alarm[72628]: >>> Gateway alarm: WAN_DHCP (A
    ddr:REDACTED Alarm:1 RTT:10710ms RTTsd:3043ms Loss:22%)
    Jan 18 23:58:44 cerberus check_reload_status: updating dyndns WAN_DHCP
    Jan 18 23:58:44 cerberus check_reload_status: Restarting ipsec tunnels
    Jan 18 23:58:44 cerberus check_reload_status: Restarting OpenVPN tunnels/interfa
    ces
    Jan 18 23:58:44 cerberus check_reload_status: Reloading filter

    I expect the WAN interface to go offline if there's a connectivity issue, but certainly not the whole system. And the internet connection seems to be fine after a power-cycle, so I'm not convinced there isn't something else going on here.

    igb0 is connected to a SB6120 (Comcast). igb1 and igb2 are LAGGed (LACP) to a UniFi managed switch. 2.4.2-RELEASE-p1. Any suggestions?

    EDIT: I may need to undo what I did here, or perhaps I need to set the net.inet.tcp.tso tunable to 0. I'll try those one at a time to see if one or the other prevents a fourth or fifth crash.



  • Welp, this is still happening about once a day, and the two things I thought might be responsible turned out to be non-factors (the tso tunable is set to zero and vlanhwtso is left alone and it's still happening). I also no longer believe it has anything to do with the gateway alarm.

    I'm looking for any clues as to why this might be happening. I'm going to try to capture the console messages at the time of the crash using ttylog, but I'm not hopeful.





  • @mwp821 Hi, Do you find any solutions?

    This is the first time i saw this message. This box i have been running almost 2 months.

    Jun 28 10:34:21 check_reload_status Reloading filter
    Jun 28 10:34:21 php-fpm 4876 /rc.newipsecdns: IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing.
    Jun 28 10:34:05 php-fpm 27845 /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use LAN_GW.
    Jun 28 10:34:04 check_reload_status Reloading filter
    Jun 28 10:34:04 check_reload_status Restarting OpenVPN tunnels/interfaces
    Jun 28 10:34:04 check_reload_status Restarting ipsec tunnels
    Jun 28 10:34:04 check_reload_status updating dyndns LAN_GW
    Jun 28 10:34:04 rc.gateway_alarm 85513 >>> Gateway alarm: LAN_GW (Addr:REDACTED Alarm:1 RTT:4006ms RTTsd:4053ms Loss:21%)



  • Any resolution to this issue? I am experiencing the same thing.


  • Netgate Administrator

    So you also have LACP to a Unifi switch?

    The console becomes unresponsive?

    Steve



  • UniFi switch(es) yes, but LACP no. And it's the GUI that becomes unresponsive. I didn't try the console.


Log in to reply