Running pfsense 2.7.0-release (amd64) and it randomly fails losing connectiion to ISP
-
Hmm, what NICs do you have in that?
Do you see errors or collisions in Status > Interfaces?
-
I'm not sure what the Nics are. I have a card with two ports. Don't know who the maker is or what chips they used. Then the MOBO has a port on it.
This is what the status shows:
WAN Interface (wan, re1)
Status
up
DHCP
upRelinquish Lease
MAC Address
00:e0:4c:61:b4:94
IPv4 Address
100.66.97.204
Subnet mask IPv4
255.255.240.0
Gateway IPv4
100.66.96.1
IPv6 Link Local
fe80::2e0:4cff:fe61:b494%re1
DNS servers
206.225.75.225
206.225.75.226
MTU
1500
Media
1000baseT <full-duplex>
In/out packets
1034069/552948 (1.13 GiB/49.02 MiB)
In/out packets (pass)
1034069/552948 (1.13 GiB/49.02 MiB)
In/out packets (block)
205/0 (18 KiB/0 B)
In/out errors
0/0
Collisions
0
Interrupts
1400102 (60/s)LAN Interface (lan, re0)
Status
up
MAC Address
00:e0:4c:61:b4:93
IPv4 Address
192.168.1.1
Subnet mask IPv4
255.255.255.0
IPv6 Link Local
fe80::2e0:4cff:fe61:b493%re0
MTU
1500
Media
1000baseT <full-duplex>
In/out packets
494979/978908 (53.66 MiB/1.13 GiB)
In/out packets (pass)
494979/978908 (53.66 MiB/1.13 GiB)
In/out packets (block)
1342/0 (113 KiB/0 B)
In/out errors
0/0
Collisions
0
Interrupts
1083589 (46/s) -
Disable IP6 and reboot the FW.
-
Thanx.
Interesting. I had disabled IPv6 on all interfaces and thought I had disabled this on the WAN. But there it is, IPV^ config type was not set to NONE.
So I set WAN IPV6 to none. And then told it to apply the change. But I won't be able to do a reboot for another hour or so.
Wylbur
-
That's only a link local address.
Both of those interfaces are Realtek NICs, re0 and re1, which are known to have issues.
Are those both the add-on NICs? Is the other NIC different? If it is I would try that as WAN.Steve
-
Note: I will be gone to a convention. I will have this system powered down while I am gone. Will be back 10-OCT-23, but I will have some access to email.
Wylbur.
-
@Wylbur I made the switch from the dual port card (realtek) (mac 00:e0:4c:61:b4:94) to the one on the MOBO (unknown what it is) and it took me a bit to figure out a few changes that had to be made and so far so good (I did this about 30 minutes ago).
This was forced because of some weird problem I am having with a government site so I decided now is the time to do this just in case the problem is with the WAN port.
Nice idea while it lasted. -- But things are otherwise working with this swap.
I don't see collisions, but I do see a large number of interrupts. I don't exactly know what those are (I work with interrupt driven mainframes, so I expect to see interrupts coming out my ears. Every I/O is at least 1 interrupt, then there are various caused by instruction streams, the system timer generates many time interrupts for dispatcher processing...).
Wylbur.
-
@Wylbur said in Running pfsense 2.7.0-release (amd64) and it randomly fails losing connectiion to ISP:
I don't see collisions, but I do see a large number of interrupts. I don't exactly know what those are (I work with interrupt driven mainframes, so I expect to see interrupts coming out my ears. Every I/O is at least 1 interrupt, then there are various caused by instruction streams, the system timer generates many time interrupts for dispatcher processing...).
Exactly, interrupts are required for the NIC to function so that's not a problem unless the rate is very high.
Steve
-
@stephenw10 It has finally locked up twice now since the change to using the MOBO ethernet port. This is what I captured before a reboot (and I do not understand the error):
Oct 28 04:37:18 kernel .done.
Oct 28 04:37:22 php-cgi 482 rc.bootup: Creating rrd update script
Oct 28 04:37:22 kernel done.
Oct 28 04:37:23 syslogd exiting on signal 15
Oct 28 04:37:23 syslogd kernel boot file is /boot/kernel/kernel
Oct 28 04:37:23 php-fpm 382 /rc.start_packages: Restarting/Starting all packages.
Oct 28 04:37:23 php-fpm 382 /rc.start_packages: [zeek] Removing cronjobs ...
Oct 28 04:37:23 root 45144 Bootup complete
Oct 28 04:37:25 login 56559 login on ttyv0 as root
Oct 28 04:37:25 sshguard 65433 Now monitoring attacks.
Oct 28 04:44:00 sshguard 65433 Exiting on signal.
Oct 28 04:44:00 sshguard 11155 Now monitoring attacks.
Oct 28 07:04:00 sshguard 11155 Exiting on signal.
Oct 28 07:04:00 sshguard 70639 Now monitoring attacks.
Oct 28 19:46:00 sshguard 70639 Exiting on signal.
Oct 28 19:46:00 sshguard 70135 Now monitoring attacks.
Oct 28 20:48:00 sshguard 70135 Exiting on signal.
Oct 28 20:48:00 sshguard 95525 Now monitoring attacks.
Oct 29 08:36:00 sshguard 95525 Exiting on signal.
Oct 29 08:36:00 sshguard 22645 Now monitoring attacks.
Oct 29 12:44:00 sshguard 22645 Exiting on signal.
Oct 29 12:44:00 sshguard 65234 Now monitoring attacks.
Oct 29 17:36:52 rc.gateway_alarm 27177 >>> Gateway alarm: WAN_DHCP (Addr:8.8.8.8 Alarm:1 RTT:15.698ms RTTsd:.952ms Loss:22%)
Oct 29 17:36:52 check_reload_status 443 updating dyndns WAN_DHCP
Oct 29 17:36:52 check_reload_status 443 Restarting IPsec tunnels
Oct 29 17:36:52 check_reload_status 443 Restarting OpenVPN tunnels/interfaces
Oct 29 17:36:52 check_reload_status 443 Reloading filter
Oct 29 17:36:53 php-fpm 382 /rc.openvpn: Gateway, none 'available' for inet, use the first one configured. 'WAN_DHCP'
Oct 29 17:36:53 php-fpm 382 /rc.openvpn: Gateway, NONE AVAILABLE
Oct 29 19:29:43 php-fpm 382 /index.php: Session timed out for user 'admin' from: 192.168.1.21 (Local Database)
Oct 29 19:29:48 php-fpm 382 /index.php: Successful login for user 'admin' from: 192.168.1.21 (Local Database) -
@Wylbur re1 = realTek 8168/8111 as is re0. re0 is now on the MOBO.
What does anyone recommend for a better adapter? I'd like to try to replace the dual port card I have.
Wylbur.
-
Any Intel 1G NIC would be far better.
Nothing really logged there beyond the packet loss alarm. No watchdog timeouts logged.
-
I have had this problem happen again -- Loss of connections with ISP (within the last 15 minutes) and I have an Intel chip dual port ethernet 1Gb card. The following is what the syslog shows (I did a reroot reboot):
Nov 20 19:46:04 php-fpm 60708 [Snort] Snort STOP for WAN(igb1)...
Nov 20 19:46:05 snort 68178 *** Caught Term-Signal
Nov 20 19:46:05 kernel igb1: promiscuous mode disabled
Nov 20 20:03:00 sshguard 85448 Exiting on signal.
Nov 20 20:03:00 sshguard 55524 Now monitoring attacks.
Nov 20 21:11:00 sshguard 55524 Exiting on signal.
Nov 20 21:11:00 sshguard 38547 Now monitoring attacks.
Nov 20 21:16:00 sshguard 38547 Exiting on signal.
Nov 20 21:16:00 sshguard 43043 Now monitoring attacks.
Nov 21 00:20:00 kernel pid 77019 (php), jid 0, uid 0: exited on signal 6 (core dumped)
Nov 21 01:10:00 sshguard 43043 Exiting on signal.
Nov 21 01:10:00 sshguard 18059 Now monitoring attacks.
Nov 21 05:33:00 sshguard 18059 Exiting on signal.
Nov 21 05:33:00 sshguard 6020 Now monitoring attacks.
Nov 21 10:09:00 sshguard 6020 Exiting on signal.
Nov 21 10:09:00 sshguard 6274 Now monitoring attacks.
Nov 21 10:47:00 sshguard 6274 Exiting on signal.
Nov 21 10:47:00 sshguard 32946 Now monitoring attacks.
Nov 21 11:36:52 rc.gateway_alarm 70343 >>> Gateway alarm: WAN_DHCP (Addr:8.8.8.8 Alarm:1 RTT:24.697ms RTTsd:.955ms Loss:21%)
Nov 21 11:36:52 check_reload_status 443 updating dyndns WAN_DHCP
Nov 21 11:36:52 check_reload_status 443 Restarting IPsec tunnels
Nov 21 11:36:52 check_reload_status 443 Restarting OpenVPN tunnels/interfaces
Nov 21 11:36:52 check_reload_status 443 Reloading filter
Nov 21 11:36:53 php-fpm 60708 /rc.openvpn: Gateway, none 'available' for inet, use the first one configured. 'WAN_DHCP'
Nov 21 11:36:53 php-fpm 60708 /rc.openvpn: Gateway, NONE AVAILABLE
Nov 21 11:39:47 php-fpm 60708 /status_dhcp_leases.php: Session timed out for user 'admin' from: 192.168.1.37 (Local Database)
Nov 21 11:39:49 php-fpm 60708 /status_dhcp_leases.php: Successful login for user 'admin' from: 192.168.1.37 (Local Database)
Nov 21 11:41:16 php-fpm 83302 /diag_reboot.php: Stopping all packages.
Note that I stopped SNORT because of some anomalies with a US Gov't web site. Snort was not causing it just haven't turned it back on.
-
Are you still running the Realtek NICs?
-
Negative. I am running an INTEL dual port NIC. Both LAN and WAN go through that card. The MOBO has a port, but it is Realtek based so I decided to not use it.
-
Ah OK. So the WAN shows a gateway alarm there then you logged in and rebooted. I assume after reboot the WAN gateway shows as up? And if you did not reboot it stays down?
-
That is correct.
So I know something is not right when a streaming device stops. Or at your desk you tell your email client to fetch mail and it says it can't connect to .... Or a browser says the server is no longer responding....
Then I go and pop up the tab into pfSense and check status, and look at the log. When I see that message, I know the only way out (at this time) is reboot, so I also select reroot. Maybe that is over kill, but things come back up quickly after that and nothing is hung.
-
@Wylbur said in Running pfsense 2.7.0-release (amd64) and it randomly fails losing connectiion to ISP:
When I see that message, I know the only way out (at this time) is reboot
Which message specifically? The dpinger packet loss alarm?
-
This one or one like it:
Gateway alarm: WAN_DHCP (Addr:8.8.8.8 Alarm:1 RTT:24.697ms RTTsd:.955ms Loss:21 <<< Loss will be at 21 or higher -
Ok, I think we're going to need to dig into exactly what is failing here. Since there's nothing else logged when this happens it doesn't appear to be NIC link issue or routing change etc.
I think I would try running a packet capture on the WAN when it's in that state. See what's actually leaving there and if anything is coming back.
-
I've been looking at tracing and packet captures. But I'm not seeing what I would have expected. And it may be because of a difference in terminology. For NDM or Connect:Direct (a Managed File XFER product) I would turn on tracing for a specific thing, having to do with hand-shake or TCP|UDP packets for a specific address. In this case it is the WAN port that I need to trace. Is this Dataplane packet tracing? Also note, I have blocked IPv6 in/out for our environment should that be a possible problem. And if I understand correctly this is all CLI, so it can't be set up from the GUI, right?