Individual Interfaces stop connecting to internet
-
I have a firebox 550 with 2gb ram running 2.3.2-RELEASE-p1 (i386) that loses internet connectivity every other day or so. It isn't predictable, everything will be fine and I'll be out of town on business and my wife will call and tell me the "internet is broken". It has been going on for over a year to varying degrees over the previous pfsense releases as well. I have in that time replaced the wifi ap, the switch, the obi box, the cable modem, even some cables to fix the problem. Sometimes I'll go for two weeks without the drop. Lately it has been every couple days. Traditionally I've just told my wife to reset the power switch on the firebox to "fix the broken internet". Rebooting the firewall will always fix the issue.
Here's the quirk. Recently, while I was home, I lost internet on my laptop that was using wireless from LAN2. When I looked up, the SmartTV (hardwired on LAN1) netflix was still happily streaming. Plus I was still able to login to pfsense wirelessly, so I assume the AP is fine. When I logged in, the dashboard will show all three interfaces as green arrows, the Gateway (WAN) as "Online" and the cable modem lights are normal. I'm convinced that all the hardware is fine, that I have a setting that is causing my headache. It can happen the other away around as well. LAN1 will lose internet, but LAN2(my wireless) will be humming along fine.
I can also "fix" the problem by hot unplugging the offending interface's cable and plugging it back in. Bam!, internet is back.
Summary: The firewall says an interface is up, I can communicate through the interface with pfsense, just not the internet. Rebooting or resetting the interface fixes the problem.
My best guess is that this is a DNS problem. I just found an old post suggesting changing my monitor IP of the WAN gateway to something other than itself like google's 8.8.8.8. (BTW I Use Google's DNS servers I just applied that change. Fingers crossed. If I make it to SuperBowl without a drop, I'll let you guys know that was the problem.
If you're still reading, thank you very much. I want to offer up as much info as you may need short of dumping all my logs. Let me know if a specific command readout will tell you what you want to know about my setup.
Quick rundown of hardware setup:
Comcast 200 MBPS "Blast" Cable data plan Cable -> Arris SB6183 Cable modem -> Firebox Firebox port 1 is Wan IP assigned by comcast, this port takes the cable from the cable modem Firebox port 2 is Lan1 LAN 192.168.2.1, this then connects to a Netgear smart switch (almost entirely on default settings), and about 20 cat6 lines I ran all throughout the house. They run everything from my son's playstation, smartTV's, 2 printers, a couple workstations, my garage door gateway, an OBI box, etc. Firebox port 3 is Lan2 OPT1 192.168.3.1, which connects to a ddwrt equipped TP-LINK Archer C7 AC1750 running in AP mode, which is for wireless clients (many phones, tablets, and laptops in my home) I separated the wifi onto its own subnet in an earlier attempt to troubleshoot the connectivity problem. Too lazy to merge it back with the wired network through the switch. Firebox port 4 is OPT2 and is empty.
current sysinfo (everything is working currently)
Name xxxx System embedded (unknown) Serial: xxxx Version 2.3.2-RELEASE-p1 (i386) built on Tue Sep 27 12:13:32 CDT 2016 FreeBSD 10.3-RELEASE-p9 The system is on the latest version. Platform nanobsd (2g) NanoBSD Boot Slice pfsense0 / ada0s1(rw) CPU Type Intel(R) Pentium(R) M processor 2.00GHz Current: 1600 MHz, Max: 2000 MHz Uptime 03 Hours 50 Minutes 42 Seconds Current date/time Tue Jan 31 23:20:06 PST 2017 DNS server(s) 127.0.0.1 8.8.8.8 8.8.4.4 Last config change Tue Jan 31 22:21:47 PST 2017 State table size 0% (113/200000) Show states MBUF Usage 7% (1776/26368) Load average 0.08, 0.12, 0.09 CPU usage 3% Memory usage 5% of 2007 MiB Disk usage ( / ) 64% of 908MiB - ufs Disk usage ( /cf ) 9% of 49MiB - ufs Disk usage ( /tmp ) 0% of 38MiB - ufs in RAM Disk usage ( /var ) 35% of 58MiB - ufs in RAM
netstat -m (everything working)
1847/688/2535 mbufs in use (current/cache/total) 1434/342/1776/26368 mbuf clusters in use (current/cache/total/max) 1434/337 mbuf+clusters out of packet secondary zone in use (current/cache) 1/21/22/13184 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/3906 9k jumbo clusters in use (current/cache/total/max) 0/0/0/2197 16k jumbo clusters in use (current/cache/total/max) 3346K/940K/4286K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/7/6656 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 0 requests for I/O initiated by sendfile
ifconfig -a (everything working)
sk0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500 options=8000b <rxcsum,txcsum,vlan_mtu,linkstate>ether 00:90:7f:45:1f:05 inet6 fe80::290:7fff:fe45:1f05%sk0 prefixlen 64 scopeid 0x1 inet 73.xx.xx.xx netmask 0xfffffe00 broadcast 255.255.255.255 nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect (1000baseT <full-duplex,master>) status: active sk1: flags=8943 <up,broadcast,running,promisc,simplex,multicast>metric 0 mtu 1500 options=8000b <rxcsum,txcsum,vlan_mtu,linkstate>ether 00:90:7f:45:1f:04 inet6 fe80::290:7fff:fe45:1f04%sk1 prefixlen 64 scopeid 0x2 inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255 nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect (1000baseT <full-duplex>) status: active sk2: flags=8943 <up,broadcast,running,promisc,simplex,multicast>metric 0 mtu 1500 options=8000b <rxcsum,txcsum,vlan_mtu,linkstate>ether 00:90:7f:45:1f:03 inet6 fe80::290:7fff:fe45:1f03%sk2 prefixlen 64 scopeid 0x3 inet 192.168.3.1 netmask 0xffffff00 broadcast 192.168.3.255 nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect (1000baseT <full-duplex,master>) status: active sk3: flags=8943 <up,broadcast,running,promisc,simplex,multicast>metric 0 mtu 1500 options=8000b <rxcsum,txcsum,vlan_mtu,linkstate>ether 00:90:7f:45:1f:02 inet6 fe80::290:7fff:fe45:1f02%sk3 prefixlen 64 scopeid 0x4 nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect (none) status: no carrier pflog0: flags=100 <promisc>metric 0 mtu 33184 pfsync0: flags=0<> metric 0 mtu 1500 syncpeer: 224.0.0.240 maxupd: 128 defer: on syncok: 1 enc0: flags=0<> metric 0 mtu 1536 nd6 options=21 <performnud,auto_linklocal>lo0: flags=8049 <up,loopback,running,multicast>metric 0 mtu 16384 options=600003 <rxcsum,txcsum,rxcsum_ipv6,txcsum_ipv6>inet 127.0.0.1 netmask 0xff000000 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x8 nd6 options=21 <performnud,auto_linklocal>bridge0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500 ether 02:c6:fa:fa:15:00 nd6 options=1 <performnud>id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 member: sk3 flags=143 <learning,discover,autoedge,autoptp>ifmaxaddr 0 port 4 priority 128 path cost 55 member: sk2 flags=143 <learning,discover,autoedge,autoptp>ifmaxaddr 0 port 3 priority 128 path cost 55 member: sk1 flags=143 <learning,discover,autoedge,autoptp>ifmaxaddr 0 port 2 priority 128 path cost 55</learning,discover,autoedge,autoptp></learning,discover,autoedge,autoptp></learning,discover,autoedge,autoptp></performnud></up,broadcast,running,simplex,multicast></performnud,auto_linklocal></rxcsum,txcsum,rxcsum_ipv6,txcsum_ipv6></up,loopback,running,multicast></performnud,auto_linklocal></promisc></performnud,auto_linklocal></rxcsum,txcsum,vlan_mtu,linkstate></up,broadcast,running,promisc,simplex,multicast></full-duplex,master></performnud,auto_linklocal></rxcsum,txcsum,vlan_mtu,linkstate></up,broadcast,running,promisc,simplex,multicast></full-duplex></performnud,auto_linklocal></rxcsum,txcsum,vlan_mtu,linkstate></up,broadcast,running,promisc,simplex,multicast></full-duplex,master></performnud,auto_linklocal></rxcsum,txcsum,vlan_mtu,linkstate></up,broadcast,running,simplex,multicast>
-
I've been seeing a similar problem on version 2.3.2, and it's been random virtual interfaces that cut out for a moment and then come back (ie: probably not hardware related). I posted details here in case our issues turn out to have similar roots:
https://forum.pfsense.org/index.php?topic=124909.0
-
Followup, everything was working fine since I made my original post, and was actually starting to believe that the monitor IP on the WAN gateway was the fix. Alas, came home and OPT1 was down, although the other internal interface (LAN) was up. Unplugging the cable from the nic assigned to opt1 and reinserting brought the internet back. So very annoying. I hot unplugged and plugged at 18:09. The recurring error is "arpresolve: can't allocate llinfo" And what I've read this can point to a driver error (old firebox not keeping up with new revisions). As usual, any help is appreciated.
Here's the error log, looks like the interface was down since before 15:00.
Feb 6 18:10:12 php-fpm 38594 /index.php: Successful login for user 'admin' from: 192.168.3.29 Feb 6 18:09:42 php lcdproc: Start client procedure. Error counter: (0) Feb 6 18:09:40 php-fpm 38594 /rc.start_packages: Restarting/Starting all packages. Feb 6 18:09:40 php-fpm 38594 lcdproc: Sync: Begin package sync Feb 6 18:09:40 php-fpm 38594 lcdproc: Sync: End package sync Feb 6 18:09:40 check_reload_status Syncing firewall Feb 6 18:09:39 php-fpm 46277 /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 0.0.0.0 -> 192.168.3.1 - Restarting packages. Feb 6 18:09:39 check_reload_status Starting packages Feb 6 18:09:37 php-fpm 46277 /rc.newwanip: Resyncing OpenVPN instances for interface OPT1. Feb 6 18:09:37 php-fpm 46277 /rc.newwanip: Creating rrd update script Feb 6 18:09:36 xinetd 16971 Starting reconfiguration Feb 6 18:09:36 xinetd 16971 Swapping defaults Feb 6 18:09:36 xinetd 16971 readjusting service 6969-udp Feb 6 18:09:36 xinetd 16971 Reconfigured: new=0 old=1 dropped=0 (services) Feb 6 18:09:36 php-fpm 46277 /rc.newwanip: Removing static route for monitor 8.8.8.8 and adding a new route through "My external IP" Feb 6 18:09:36 xinetd 16971 Starting reconfiguration Feb 6 18:09:36 xinetd 16971 Swapping defaults Feb 6 18:09:36 xinetd 16971 readjusting service 6969-udp Feb 6 18:09:36 xinetd 16971 Reconfigured: new=0 old=1 dropped=0 (services) Feb 6 18:09:35 php-fpm 46277 /rc.newwanip: rc.newwanip: Info: starting on sk2. Feb 6 18:09:35 php-fpm 46277 /rc.newwanip: rc.newwanip: on (IP address: 192.168.3.1) (interface: OPT1[opt1]) (real interface: sk2). Feb 6 18:09:35 php-fpm 46277 /rc.newwanip: IP has changed, killing states on former IP 0.0.0.0. Feb 6 18:09:34 php-fpm 46277 /rc.linkup: Hotplug event detected for OPT1(opt1) static IP (192.168.3.1 ) Feb 6 18:09:34 check_reload_status rc.newwanip starting sk2 Feb 6 18:09:34 check_reload_status Reloading filter Feb 6 18:09:33 check_reload_status Linkup starting sk2 Feb 6 18:09:33 kernel sk2: link state changed to UP Feb 6 15:00:17 php lcdproc: Start client procedure. Error counter: (0) Feb 6 15:00:15 check_reload_status Syncing firewall Feb 6 15:00:14 php-fpm 46277 /rc.start_packages: Restarting/Starting all packages. Feb 6 15:00:14 php-fpm 46277 lcdproc: Sync: Begin package sync Feb 6 15:00:14 php-fpm 46277 lcdproc: Sync: End package sync Feb 6 15:00:13 php-fpm 64300 /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 192.168.100.11 -> "My new external IP" - Restarting packages. Feb 6 15:00:13 check_reload_status Starting packages Feb 6 15:00:11 php-fpm 64300 /rc.newwanip: Resyncing OpenVPN instances for interface WAN. Feb 6 15:00:11 php-fpm 64300 /rc.newwanip: Creating rrd update script Feb 6 15:00:10 kernel arpresolve: can't allocate llinfo for 192.168.100.1 on sk0 Feb 6 15:00:10 php-fpm 64300 /rc.newwanip: rc.newwanip: Info: starting on sk0. Feb 6 15:00:10 php-fpm 64300 /rc.newwanip: rc.newwanip: on (IP address: "My new external IP") (interface: WAN[wan]) (real interface: sk0). Feb 6 15:00:10 php-fpm 64300 /rc.newwanip: IP has changed, killing states on former IP 192.168.100.11. Feb 6 15:00:10 xinetd 16971 Starting reconfiguration Feb 6 15:00:10 xinetd 16971 Swapping defaults Feb 6 15:00:10 xinetd 16971 readjusting service 6969-udp Feb 6 15:00:10 xinetd 16971 Reconfigured: new=0 old=1 dropped=0 (services) Feb 6 15:00:10 php-fpm 64300 /rc.newwanip: ROUTING: setting default route to "My external IP" Feb 6 15:00:10 php-fpm 64300 /rc.newwanip: Removing static route for monitor 8.8.8.8 and adding a new route through "My external IP" Feb 6 15:00:09 kernel arpresolve: can't allocate llinfo for 192.168.100.1 on sk0 Feb 6 15:00:09 check_reload_status rc.newwanip starting sk0 Feb 6 15:00:09 kernel arpresolve: can't allocate llinfo for 192.168.100.1 on sk0 Feb 6 15:00:08 kernel arpresolve: can't allocate llinfo for 192.168.100.1 on sk0 Feb 6 15:00:08 kernel arpresolve: can't allocate llinfo for 192.168.100.1 on sk0 Feb 6 15:00:07 kernel arpresolve: can't allocate llinfo for 192.168.100.1 on sk0 Feb 6 15:00:07 kernel arpresolve: can't allocate llinfo for 192.168.100.1 on sk0 Feb 6 14:59:58 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:54 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:50 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:47 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:46 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:42 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:38 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:38 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:34 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:30 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:28 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:27 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:26 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:26 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:23 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:21 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:19 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:19 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:19 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:19 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:18 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:14 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:14 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:12 php lcdproc: Start client procedure. Error counter: (0) Feb 6 14:59:10 xinetd 16971 Starting reconfiguration Feb 6 14:59:10 xinetd 16971 Swapping defaults Feb 6 14:59:10 xinetd 16971 readjusting service 6969-udp Feb 6 14:59:10 xinetd 16971 Reconfigured: new=0 old=1 dropped=0 (services) Feb 6 14:59:10 php-fpm 22955 /rc.start_packages: Restarting/Starting all packages. Feb 6 14:59:10 php-fpm 22955 lcdproc: Sync: Begin package sync Feb 6 14:59:10 php-fpm 22955 lcdproc: Sync: End package sync Feb 6 14:59:10 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:10 check_reload_status Syncing firewall Feb 6 14:59:09 check_reload_status updating dyndns WAN_DHCP Feb 6 14:59:09 check_reload_status Restarting ipsec tunnels Feb 6 14:59:09 check_reload_status Restarting OpenVPN tunnels/interfaces Feb 6 14:59:09 check_reload_status Reloading filter Feb 6 14:59:09 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:09 php-fpm 22955 /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 0.0.0.0 -> 192.168.100.11 - Restarting packages. Feb 6 14:59:09 check_reload_status Starting packages Feb 6 14:59:07 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:07 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0 Feb 6 14:59:07 php-fpm 22955 /rc.newwanip: Resyncing OpenVPN instances for interface WAN. Feb 6 14:59:07 php-fpm 22955 /rc.newwanip: Creating rrd update script Feb 6 14:59:06 xinetd 16971 readjusting service 6969-udp Feb 6 14:59:06 xinetd 16971 Reconfigured: new=0 old=1 dropped=0 (services) Feb 6 14:59:06 kernel arpresolve: can't allocate llinfo for "My external IP" on sk0
The gateway error log shows a latency alarm at 14:59, so could this trigger the interface going down, or is it a symptom of it being down? Also, I've replaced the wan IP with "My external IP". But it should be said the IP in the below log was different. I gather this means that Comcast assigned me a new external IP, and the interface could not handle the change.
Time Process PID Message Feb 6 18:09:36 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 8.8.8.8 bind_addr "My external IP" identifier "WAN_DHCP " Feb 6 15:00:10 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 8.8.8.8 bind_addr "My external IP" identifier "WAN_DHCP " Feb 6 14:59:09 dpinger WAN_DHCP 8.8.8.8: Alarm latency 0us stddev 0us loss 100% Feb 6 14:59:06 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 8.8.8.8 bind_addr 192.168.100.11 identifier "WAN_DHCP " Feb 6 14:58:28 dpinger WAN_DHCP 8.8.8.8: Alarm latency 15968us stddev 866us loss 21% Feb 3 11:23:50 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 8.8.8.8 bind_addr "My external IP" identifier "WAN_DHCP " Feb 3 11:23:49 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 8.8.8.8 bind_addr "My external IP" identifier "WAN_DHCP "
-
1 more thing, this is from the cable modem log:
Mon Feb 06 14:59:54 2017 Critical (3) Received Response to Broadcast Maintenance Request, But no Unicast Maintenance opportunities received - T4 time out
So the cable modem had a hiccup with the comcast signal, and when the gateway (wan) handled the change, one interface (lan) was able to cope with it, but one (opt1) was not. I fear I'm out of my league here as they're setup the same.