Running pfsense 2.7.0-release (amd64) and it randomly fails losing connectiion to ISP
-
-
@stephenw10
Read that, and wasn't sure of everything, but then I went to the Gateway and changed it to use 8.8.8.8 for testing.So I'll let that run for a while. Unless you have other better ideas.
But I have to admit, I've learned something here about ISP behaviors. I can now understand why certain things I was doing for heartbeat testing a year or two ago was having problems.
Wylbur
-
Had it happen again after changing to go against 8.8.8.8. I've been a bit busy that's why it has taken me a bit to get the log copied. I think I caught the problem....:
Sep 25 18:45:00 sshguard 97559 Exiting on signal.
Sep 25 18:45:00 sshguard 16795 Now monitoring attacks.
Sep 25 19:18:00 sshguard 16795 Exiting on signal.
Sep 25 19:18:00 sshguard 62985 Now monitoring attacks.
Sep 25 20:19:26 php-fpm 52303 /widgets/widgets/snort_alerts.widget.php: Session timed out for user 'admin' from: 192.168.1.21 (Local Database)
Sep 25 20:29:26 php-fpm 52303 /status_services.php: Successful login for user 'admin' from: 192.168.1.21 (Local Database)
Sep 26 00:23:00 sshguard 62985 Exiting on signal.
Sep 26 00:23:00 sshguard 98064 Now monitoring attacks.
Sep 26 02:46:22 rc.gateway_alarm 92587 >>> Gateway alarm: WAN_DHCP (Addr:8.8.8.8 Alarm:1 RTT:24.661ms RTTsd:.925ms Loss:22%)
Sep 26 02:46:22 check_reload_status 64002 updating dyndns WAN_DHCP
Sep 26 02:46:22 check_reload_status 64002 Restarting IPsec tunnels
Sep 26 02:46:22 check_reload_status 64002 Restarting OpenVPN tunnels/interfaces
Sep 26 02:46:22 check_reload_status 64002 Reloading filter
Sep 26 02:46:23 php-fpm 65862 /rc.openvpn: Gateway, none 'available' for inet, use the first one configured. 'WAN_DHCP'
Sep 26 02:46:23 php-fpm 65862 /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. 'WAN_DHCP6'
Sep 26 03:52:30 php-fpm 52220 /status_interfaces.php: Session timed out for user 'admin' from: 192.168.1.21 (Local Database)
Sep 26 03:52:33 php-fpm 52220 /status_interfaces.php: Successful login for user 'admin' from: 192.168.1.21 (Local Database)
Sep 26 03:54:14 php-fpm 52220 /diag_reboot.php: Stopping all packages.
Sep 26 03:54:17 reroot 40140 rerooted by root
Sep 26 03:54:21 syslogd kernel boot file is /boot/kernel/kernel
Sep 26 03:54:21 kernel pflog0: promiscuous mode disabled
Sep 26 03:54:21 kernel Trying to mount root from zfs:pfSense/ROOT/default []...
Sep 26 03:54:21 kernel CPU: AMD Ryzen 5 5500 (3593.25-MHz K8-class CPU)
Sep 26 03:54:21 kernel Origin="AuthenticAMD" Id=0xa50f00 Family=0x19 Model=0x50 Stepping=0
Sep 26 03:54:21 kernel Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
Sep 26 03:54:21 kernel Features2=0x7ef8320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
Sep 26 03:54:21 kernel AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
Sep 26 03:54:21 kernel AMD Features2=0x75c237ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX,ADMSKX>
Sep 26 03:54:21 kernel Structured Extended Features=0x219c97a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,PQE,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA>
Sep 26 03:54:21 kernel Structured Extended Features2=0x40069c<UMIP,PKU,OSPKE,VAES,VPCLMULQDQ,RDPID>
Sep 26 03:54:21 kernel Structured Extended Features3=0x10<FSRM>
Sep 26 03:54:21 kernel XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
Sep 26 03:54:21 kernel AMD Extended Feature Extensions ID EBX=0x191ef657<CLZERO,IRPerf,XSaveErPtr,RDPRU,WBNOINVD,IBPB,IBRS,STIBP,STIBP_ALWAYSON,PREFER_IBRS,SSBD>
Sep 26 03:54:21 kernel SVM: (disabled in BIOS) NP,NRIP,VClean,AFlush,DAssist,NAsids=32768
Sep 26 03:54:21 kernel TSC: P-state invariant, performance statistics
Sep 26 03:54:21 check_reload_status 14298 rc.newwanip starting re1
Sep 26 03:54:21 php-cgi 19716 rc.bootup: calling interface_dhcpv6_configure.
Sep 26 03:54:21 php-cgi 19716 rc.bootup: Accept router advertisements on interface re1
Sep 26 03:54:21 php-cgi 19716 rc.bootup: Starting DHCP6 client for interfaces re1 in DHCP6 without RA mode
Sep 26 03:54:21 php-cgi 19716 rc.bootup: Starting rtsold process on wan(re1)
Sep 26 03:54:22 php-fpm 4257 /rc.newwanip: rc.newwanip: Info: starting on re1.
Sep 26 03:54:22 php-fpm 4257 /rc.newwanip: rc.newwanip: on (IP address: 100.66.97.204) (interface: WAN[wan]) (real interface: re1).
Sep 26 03:54:22 php-fpm 4257 /rc.newwanip: Removing static route for monitor 8.8.8.8 and adding a new route through 100.66.96.1
Sep 26 03:54:23 kernel done.
Sep 26 03:54:23 kernel pflog0: promiscuous mode enabled
Sep 26 03:54:23 php-cgi 19716 rc.bootup: Resyncing OpenVPN instances.
Sep 26 03:54:23 kernel ....
Sep 26 03:54:24 php-cgi 19716 rc.bootup: Removing static route for monitor 8.8.8.8 and adding a new route through 100.66.96.1
Sep 26 03:54:24 kernel .done.
Sep 26 03:54:24 kernel done.
Sep 26 03:54:24 php-cgi 19716 rc.bootup: Gateway, NONE AVAILABLE
Sep 26 03:54:24 php-cgi 19716 rc.bootup: Default gateway setting Interface WAN_DHCP Gateway as default.
Sep 26 03:54:24 php-cgi 19716 rc.bootup: Gateway, none 'available' for inet6, use the first one configured. 'WAN_DHCP6'
Sep 26 03:54:24 kernel done.
Sep 26 03:54:25 php-cgi 19716 rc.bootup: Unbound start waiting on dhcp6c.
Sep 26 03:54:26 php-cgi 19716 rc.bootup: Unbound start waiting on dhcp6c.
Sep 26 03:54:27 php-cgi 19716 rc.bootup: Unbound start waiting on dhcp6c.
Sep 26 03:54:28 php-cgi 19716 rc.bootup: Unbound start waiting on dhcp6c.
Sep 26 03:54:29 php-cgi 19716 rc.bootup: Unbound start waiting on dhcp6c.
Sep 26 03:54:30 php-cgi 19716 rc.bootup: Unbound start waiting on dhcp6c.
Sep 26 03:54:31 php-cgi 19716 rc.bootup: Unbound start waiting on dhcp6c.
Sep 26 03:54:32 php-cgi 19716 rc.bootup: Unbound start waiting on dhcp6c.
Sep 26 03:54:33 php-cgi 19716 rc.bootup: Unbound start waiting on dhcp6c.
Sep 26 03:54:34 php-cgi 19716 rc.bootup: Unbound start waiting on dhcp6c.
Sep 26 03:54:35 php-cgi 19716 rc.bootup: sync unbound done.
Sep 26 03:54:35 kernel done.
Sep 26 03:54:36 kernel done.
Sep 26 03:54:42 kernel done.
Sep 26 03:54:42 kernel done.
Sep 26 03:54:42 php-cgi 19716 rc.bootup: NTPD is starting up.
Sep 26 03:54:43 kernel done.
Sep 26 03:54:43 check_reload_status 14298 Updating all dyndns
Sep 26 03:54:43 kernel done.
Sep 26 03:54:43 kernel ....
Sep 26 03:54:44 php-cgi 19716 rc.bootup: The command '/usr/local/sbin/strongswanrc stop' returned exit code '1', the output was 'strongswan not running? (check /var/run/daemon-charon.pid).'
Sep 26 03:54:44 kernel .done.
Sep 26 03:54:48 php-cgi 19716 rc.bootup: Creating rrd update script
Sep 26 03:54:48 syslogd exiting on signal 15
Sep 26 03:54:48 syslogd kernel boot file is /boot/kernel/kernel
Sep 26 03:54:48 kernel done.
Sep 26 03:54:48 php-fpm 4258 /rc.start_packages: Restarting/Starting all packages.
Sep 26 03:54:48 php-fpm 4258 /rc.start_packages: [zeek] Removing cronjobs ...
Sep 26 03:54:48 root 57927 Bootup complete
Sep 26 03:54:50 login 77807 login on ttyv0 as root
Sep 26 03:54:50 sshguard 80105 Now monitoring attacks.
Sep 26 03:55:13 php-fpm 4258 /pkg_mgr_install.php: Successful login for user 'admin' from: 192.168.1.21 (Local Database) -
Hmm, nothing is shown there really. The gateway monitoring shows packet loss and marked the gateway as down. Then you logged in and rebooted.
The NIC did not lose link. Nothing else is logged. Was anything else shown at the time? Or did modem show any unusual behaviour perhaps?
-
To answer your question directly - The modem looked like it was functional. It had the normal data light flickering. No alarm or error light lit.
However, DNS and streaming stopped - trying restart of streaming device could not resolve address(es). Ping of 8.8.8.8 failed (done via DOS box from my W11 system).
So I checked to see what errors I could find looking at the logs.
And that is all I had. So I rebooted with re-root (maybe just plain reboot would also work). And we've been up and running ever since.
I will be shutting down the system tomorrow afternoon before we leave for a conference. And I'll bring it all back online when I get back.
-
Hmm, what NICs do you have in that?
Do you see errors or collisions in Status > Interfaces?
-
I'm not sure what the Nics are. I have a card with two ports. Don't know who the maker is or what chips they used. Then the MOBO has a port on it.
This is what the status shows:
WAN Interface (wan, re1)
Status
up
DHCP
upRelinquish Lease
MAC Address
00:e0:4c:61:b4:94
IPv4 Address
100.66.97.204
Subnet mask IPv4
255.255.240.0
Gateway IPv4
100.66.96.1
IPv6 Link Local
fe80::2e0:4cff:fe61:b494%re1
DNS servers
206.225.75.225
206.225.75.226
MTU
1500
Media
1000baseT <full-duplex>
In/out packets
1034069/552948 (1.13 GiB/49.02 MiB)
In/out packets (pass)
1034069/552948 (1.13 GiB/49.02 MiB)
In/out packets (block)
205/0 (18 KiB/0 B)
In/out errors
0/0
Collisions
0
Interrupts
1400102 (60/s)LAN Interface (lan, re0)
Status
up
MAC Address
00:e0:4c:61:b4:93
IPv4 Address
192.168.1.1
Subnet mask IPv4
255.255.255.0
IPv6 Link Local
fe80::2e0:4cff:fe61:b493%re0
MTU
1500
Media
1000baseT <full-duplex>
In/out packets
494979/978908 (53.66 MiB/1.13 GiB)
In/out packets (pass)
494979/978908 (53.66 MiB/1.13 GiB)
In/out packets (block)
1342/0 (113 KiB/0 B)
In/out errors
0/0
Collisions
0
Interrupts
1083589 (46/s) -
Disable IP6 and reboot the FW.
-
Thanx.
Interesting. I had disabled IPv6 on all interfaces and thought I had disabled this on the WAN. But there it is, IPV^ config type was not set to NONE.
So I set WAN IPV6 to none. And then told it to apply the change. But I won't be able to do a reboot for another hour or so.
Wylbur
-
That's only a link local address.
Both of those interfaces are Realtek NICs, re0 and re1, which are known to have issues.
Are those both the add-on NICs? Is the other NIC different? If it is I would try that as WAN.Steve
-
Note: I will be gone to a convention. I will have this system powered down while I am gone. Will be back 10-OCT-23, but I will have some access to email.
Wylbur.
-
@Wylbur I made the switch from the dual port card (realtek) (mac 00:e0:4c:61:b4:94) to the one on the MOBO (unknown what it is) and it took me a bit to figure out a few changes that had to be made and so far so good (I did this about 30 minutes ago).
This was forced because of some weird problem I am having with a government site so I decided now is the time to do this just in case the problem is with the WAN port.
Nice idea while it lasted. -- But things are otherwise working with this swap.
I don't see collisions, but I do see a large number of interrupts. I don't exactly know what those are (I work with interrupt driven mainframes, so I expect to see interrupts coming out my ears. Every I/O is at least 1 interrupt, then there are various caused by instruction streams, the system timer generates many time interrupts for dispatcher processing...).
Wylbur.
-
@Wylbur said in Running pfsense 2.7.0-release (amd64) and it randomly fails losing connectiion to ISP:
I don't see collisions, but I do see a large number of interrupts. I don't exactly know what those are (I work with interrupt driven mainframes, so I expect to see interrupts coming out my ears. Every I/O is at least 1 interrupt, then there are various caused by instruction streams, the system timer generates many time interrupts for dispatcher processing...).
Exactly, interrupts are required for the NIC to function so that's not a problem unless the rate is very high.
Steve
-
@stephenw10 It has finally locked up twice now since the change to using the MOBO ethernet port. This is what I captured before a reboot (and I do not understand the error):
Oct 28 04:37:18 kernel .done.
Oct 28 04:37:22 php-cgi 482 rc.bootup: Creating rrd update script
Oct 28 04:37:22 kernel done.
Oct 28 04:37:23 syslogd exiting on signal 15
Oct 28 04:37:23 syslogd kernel boot file is /boot/kernel/kernel
Oct 28 04:37:23 php-fpm 382 /rc.start_packages: Restarting/Starting all packages.
Oct 28 04:37:23 php-fpm 382 /rc.start_packages: [zeek] Removing cronjobs ...
Oct 28 04:37:23 root 45144 Bootup complete
Oct 28 04:37:25 login 56559 login on ttyv0 as root
Oct 28 04:37:25 sshguard 65433 Now monitoring attacks.
Oct 28 04:44:00 sshguard 65433 Exiting on signal.
Oct 28 04:44:00 sshguard 11155 Now monitoring attacks.
Oct 28 07:04:00 sshguard 11155 Exiting on signal.
Oct 28 07:04:00 sshguard 70639 Now monitoring attacks.
Oct 28 19:46:00 sshguard 70639 Exiting on signal.
Oct 28 19:46:00 sshguard 70135 Now monitoring attacks.
Oct 28 20:48:00 sshguard 70135 Exiting on signal.
Oct 28 20:48:00 sshguard 95525 Now monitoring attacks.
Oct 29 08:36:00 sshguard 95525 Exiting on signal.
Oct 29 08:36:00 sshguard 22645 Now monitoring attacks.
Oct 29 12:44:00 sshguard 22645 Exiting on signal.
Oct 29 12:44:00 sshguard 65234 Now monitoring attacks.
Oct 29 17:36:52 rc.gateway_alarm 27177 >>> Gateway alarm: WAN_DHCP (Addr:8.8.8.8 Alarm:1 RTT:15.698ms RTTsd:.952ms Loss:22%)
Oct 29 17:36:52 check_reload_status 443 updating dyndns WAN_DHCP
Oct 29 17:36:52 check_reload_status 443 Restarting IPsec tunnels
Oct 29 17:36:52 check_reload_status 443 Restarting OpenVPN tunnels/interfaces
Oct 29 17:36:52 check_reload_status 443 Reloading filter
Oct 29 17:36:53 php-fpm 382 /rc.openvpn: Gateway, none 'available' for inet, use the first one configured. 'WAN_DHCP'
Oct 29 17:36:53 php-fpm 382 /rc.openvpn: Gateway, NONE AVAILABLE
Oct 29 19:29:43 php-fpm 382 /index.php: Session timed out for user 'admin' from: 192.168.1.21 (Local Database)
Oct 29 19:29:48 php-fpm 382 /index.php: Successful login for user 'admin' from: 192.168.1.21 (Local Database) -
@Wylbur re1 = realTek 8168/8111 as is re0. re0 is now on the MOBO.
What does anyone recommend for a better adapter? I'd like to try to replace the dual port card I have.
Wylbur.
-
Any Intel 1G NIC would be far better.
Nothing really logged there beyond the packet loss alarm. No watchdog timeouts logged.
-
I have had this problem happen again -- Loss of connections with ISP (within the last 15 minutes) and I have an Intel chip dual port ethernet 1Gb card. The following is what the syslog shows (I did a reroot reboot):
Nov 20 19:46:04 php-fpm 60708 [Snort] Snort STOP for WAN(igb1)...
Nov 20 19:46:05 snort 68178 *** Caught Term-Signal
Nov 20 19:46:05 kernel igb1: promiscuous mode disabled
Nov 20 20:03:00 sshguard 85448 Exiting on signal.
Nov 20 20:03:00 sshguard 55524 Now monitoring attacks.
Nov 20 21:11:00 sshguard 55524 Exiting on signal.
Nov 20 21:11:00 sshguard 38547 Now monitoring attacks.
Nov 20 21:16:00 sshguard 38547 Exiting on signal.
Nov 20 21:16:00 sshguard 43043 Now monitoring attacks.
Nov 21 00:20:00 kernel pid 77019 (php), jid 0, uid 0: exited on signal 6 (core dumped)
Nov 21 01:10:00 sshguard 43043 Exiting on signal.
Nov 21 01:10:00 sshguard 18059 Now monitoring attacks.
Nov 21 05:33:00 sshguard 18059 Exiting on signal.
Nov 21 05:33:00 sshguard 6020 Now monitoring attacks.
Nov 21 10:09:00 sshguard 6020 Exiting on signal.
Nov 21 10:09:00 sshguard 6274 Now monitoring attacks.
Nov 21 10:47:00 sshguard 6274 Exiting on signal.
Nov 21 10:47:00 sshguard 32946 Now monitoring attacks.
Nov 21 11:36:52 rc.gateway_alarm 70343 >>> Gateway alarm: WAN_DHCP (Addr:8.8.8.8 Alarm:1 RTT:24.697ms RTTsd:.955ms Loss:21%)
Nov 21 11:36:52 check_reload_status 443 updating dyndns WAN_DHCP
Nov 21 11:36:52 check_reload_status 443 Restarting IPsec tunnels
Nov 21 11:36:52 check_reload_status 443 Restarting OpenVPN tunnels/interfaces
Nov 21 11:36:52 check_reload_status 443 Reloading filter
Nov 21 11:36:53 php-fpm 60708 /rc.openvpn: Gateway, none 'available' for inet, use the first one configured. 'WAN_DHCP'
Nov 21 11:36:53 php-fpm 60708 /rc.openvpn: Gateway, NONE AVAILABLE
Nov 21 11:39:47 php-fpm 60708 /status_dhcp_leases.php: Session timed out for user 'admin' from: 192.168.1.37 (Local Database)
Nov 21 11:39:49 php-fpm 60708 /status_dhcp_leases.php: Successful login for user 'admin' from: 192.168.1.37 (Local Database)
Nov 21 11:41:16 php-fpm 83302 /diag_reboot.php: Stopping all packages.
Note that I stopped SNORT because of some anomalies with a US Gov't web site. Snort was not causing it just haven't turned it back on.
-
Are you still running the Realtek NICs?
-
Negative. I am running an INTEL dual port NIC. Both LAN and WAN go through that card. The MOBO has a port, but it is Realtek based so I decided to not use it.
-
Ah OK. So the WAN shows a gateway alarm there then you logged in and rebooted. I assume after reboot the WAN gateway shows as up? And if you did not reboot it stays down?