Individual Interfaces stop connecting to internet



  • I have a firebox 550 with 2gb ram running 2.3.2-RELEASE-p1 (i386) that loses internet connectivity every other day or so.  It isn't predictable, everything will be fine and I'll be out of town on business and my wife will call and tell me the "internet is broken".  It has been going on for over a year to varying degrees over the previous pfsense releases as well.  I have in that time replaced the wifi ap, the switch, the obi box, the cable modem, even some cables to fix the problem.  Sometimes I'll go for two weeks without the drop.  Lately it has been every couple days. Traditionally I've just told my wife to reset the power switch on the firebox to "fix the broken internet". Rebooting the firewall will always fix the issue.

    Here's the quirk.  Recently, while I was home, I lost internet on my laptop that was using wireless from LAN2.  When I looked up, the SmartTV (hardwired on LAN1) netflix was still happily streaming.  Plus I was still able to login to pfsense wirelessly, so I assume the AP is fine.  When I logged in, the dashboard will show all three interfaces as green arrows, the Gateway (WAN) as "Online" and the cable modem lights are normal.  I'm convinced that all the hardware is fine, that I have a setting that is causing my headache.  It can happen the other away around as well.  LAN1 will lose internet, but LAN2(my wireless) will be humming along fine.

    I can also "fix" the problem by hot unplugging the offending interface's cable and plugging it back in.  Bam!, internet is back.

    Summary: The firewall says an interface is up, I can communicate through the interface with pfsense, just not the internet. Rebooting or resetting the interface fixes the problem.

    My best guess is that this is a DNS problem. I just found an old post suggesting changing my monitor IP of the WAN gateway to something other than itself like google's 8.8.8.8. (BTW I Use Google's DNS servers  I just applied that change.  Fingers crossed.  If I make it to SuperBowl without a drop, I'll let you guys know that was the problem.

    If you're still reading, thank you very much.  I want to offer up as much info as you may need short of dumping all my logs.  Let me know if a specific command readout will tell you what you want to know about my setup.

    Quick rundown of hardware setup:

    Comcast 200 MBPS "Blast" Cable data plan
    Cable -> Arris SB6183 Cable modem -> Firebox 
    
    Firebox port 1 is Wan IP assigned by comcast, this port takes the cable from the cable modem
    Firebox port 2 is Lan1 LAN 192.168.2.1, this then connects to a Netgear smart switch (almost entirely on default settings), and about 20 cat6 lines I ran all throughout the house.  They run everything from my son's playstation, smartTV's, 2 printers, a couple workstations, my garage door gateway, an OBI box, etc.
    Firebox port 3 is Lan2 OPT1 192.168.3.1,  which connects to a ddwrt equipped TP-LINK Archer C7 AC1750 running in AP mode, which is for wireless clients (many phones, tablets, and laptops in my home)  I separated the wifi onto its own subnet in an earlier attempt to troubleshoot the connectivity problem.  Too lazy to merge it back with the wired network through the switch.
    Firebox port 4 is OPT2 and is empty.
    

    current sysinfo (everything is working currently)

    Name	xxxx
    System	embedded (unknown)	
    Serial: xxxx
    Version	2.3.2-RELEASE-p1 (i386) 
    built on Tue Sep 27 12:13:32 CDT 2016 
    FreeBSD 10.3-RELEASE-p9 
    
    The system is on the latest version.
    Platform	nanobsd	(2g)
    NanoBSD Boot Slice	pfsense0 / ada0s1(rw)
    CPU Type	Intel(R) Pentium(R) M processor 2.00GHz
    Current: 1600 MHz, Max: 2000 MHz
    Uptime	03 Hours 50 Minutes 42 Seconds
    Current date/time	
    Tue Jan 31 23:20:06 PST 2017
    DNS server(s)	
    127.0.0.1
    8.8.8.8
    8.8.4.4
    Last config change	Tue Jan 31 22:21:47 PST 2017
    State table size	
    0% (113/200000) Show states
    MBUF Usage	
    7% (1776/26368)
    Load average	
    0.08, 0.12, 0.09
    CPU usage	
    3%
    Memory usage	
    5% of 2007 MiB
    Disk usage ( / )	
    64% of 908MiB - ufs
    Disk usage ( /cf )	
    9% of 49MiB - ufs
    Disk usage ( /tmp )	
    0% of 38MiB - ufs in RAM
    Disk usage ( /var )	
    35% of 58MiB - ufs in RAM
    

    netstat -m (everything working)

    1847/688/2535 mbufs in use (current/cache/total)
    1434/342/1776/26368 mbuf clusters in use (current/cache/total/max)
    1434/337 mbuf+clusters out of packet secondary zone in use (current/cache)
    1/21/22/13184 4k (page size) jumbo clusters in use (current/cache/total/max)
    0/0/0/3906 9k jumbo clusters in use (current/cache/total/max)
    0/0/0/2197 16k jumbo clusters in use (current/cache/total/max)
    3346K/940K/4286K bytes allocated to network (current/cache/total)
    0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
    0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
    0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
    0/0/0 requests for jumbo clusters denied (4k/9k/16k)
    0/7/6656 sfbufs in use (current/peak/max)
    0 requests for sfbufs denied
    0 requests for sfbufs delayed
    0 requests for I/O initiated by sendfile
    

    ifconfig -a (everything working)

    sk0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
    	options=8000b <rxcsum,txcsum,vlan_mtu,linkstate>ether 00:90:7f:45:1f:05
    	inet6 fe80::290:7fff:fe45:1f05%sk0 prefixlen 64 scopeid 0x1
    	inet 73.xx.xx.xx netmask 0xfffffe00 broadcast 255.255.255.255
    	nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect (1000baseT <full-duplex,master>)
    	status: active
    sk1: flags=8943 <up,broadcast,running,promisc,simplex,multicast>metric 0 mtu 1500
    	options=8000b <rxcsum,txcsum,vlan_mtu,linkstate>ether 00:90:7f:45:1f:04
    	inet6 fe80::290:7fff:fe45:1f04%sk1 prefixlen 64 scopeid 0x2
    	inet 192.168.2.1 netmask 0xffffff00 broadcast 192.168.2.255
    	nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect (1000baseT <full-duplex>)
    	status: active
    sk2: flags=8943 <up,broadcast,running,promisc,simplex,multicast>metric 0 mtu 1500
    	options=8000b <rxcsum,txcsum,vlan_mtu,linkstate>ether 00:90:7f:45:1f:03
    	inet6 fe80::290:7fff:fe45:1f03%sk2 prefixlen 64 scopeid 0x3
    	inet 192.168.3.1 netmask 0xffffff00 broadcast 192.168.3.255
    	nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect (1000baseT <full-duplex,master>)
    	status: active
    sk3: flags=8943 <up,broadcast,running,promisc,simplex,multicast>metric 0 mtu 1500
    	options=8000b <rxcsum,txcsum,vlan_mtu,linkstate>ether 00:90:7f:45:1f:02
    	inet6 fe80::290:7fff:fe45:1f02%sk3 prefixlen 64 scopeid 0x4
    	nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect (none)
    	status: no carrier
    pflog0: flags=100 <promisc>metric 0 mtu 33184
    pfsync0: flags=0<> metric 0 mtu 1500
    	syncpeer: 224.0.0.240 maxupd: 128 defer: on
    	syncok: 1
    enc0: flags=0<> metric 0 mtu 1536
    	nd6 options=21 <performnud,auto_linklocal>lo0: flags=8049 <up,loopback,running,multicast>metric 0 mtu 16384
    	options=600003 <rxcsum,txcsum,rxcsum_ipv6,txcsum_ipv6>inet 127.0.0.1 netmask 0xff000000
    	inet6 ::1 prefixlen 128
    	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x8
    	nd6 options=21 <performnud,auto_linklocal>bridge0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
    	ether 02:c6:fa:fa:15:00
    	nd6 options=1 <performnud>id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
    	maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
    	root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
    	member: sk3 flags=143 <learning,discover,autoedge,autoptp>ifmaxaddr 0 port 4 priority 128 path cost 55
    	member: sk2 flags=143 <learning,discover,autoedge,autoptp>ifmaxaddr 0 port 3 priority 128 path cost 55
    	member: sk1 flags=143 <learning,discover,autoedge,autoptp>ifmaxaddr 0 port 2 priority 128 path cost 55</learning,discover,autoedge,autoptp></learning,discover,autoedge,autoptp></learning,discover,autoedge,autoptp></performnud></up,broadcast,running,simplex,multicast></performnud,auto_linklocal></rxcsum,txcsum,rxcsum_ipv6,txcsum_ipv6></up,loopback,running,multicast></performnud,auto_linklocal></promisc></performnud,auto_linklocal></rxcsum,txcsum,vlan_mtu,linkstate></up,broadcast,running,promisc,simplex,multicast></full-duplex,master></performnud,auto_linklocal></rxcsum,txcsum,vlan_mtu,linkstate></up,broadcast,running,promisc,simplex,multicast></full-duplex></performnud,auto_linklocal></rxcsum,txcsum,vlan_mtu,linkstate></up,broadcast,running,promisc,simplex,multicast></full-duplex,master></performnud,auto_linklocal></rxcsum,txcsum,vlan_mtu,linkstate></up,broadcast,running,simplex,multicast>
    


  • I've been seeing a similar problem on version 2.3.2, and it's been random virtual interfaces that cut out for a moment and then come back (ie: probably not hardware related). I posted details here in case our issues turn out to have similar roots:

    https://forum.pfsense.org/index.php?topic=124909.0



  • Followup, everything was working fine since I made my original post, and was actually starting to believe that the monitor IP on the WAN gateway was the fix.  Alas, came home and OPT1 was down, although the other internal interface (LAN) was up.  Unplugging the cable from the nic assigned to opt1 and reinserting brought the internet back.  So very annoying.  I hot unplugged and plugged at 18:09.  The recurring error is "arpresolve: can't allocate llinfo"  And what I've read this can point to a driver error (old firebox not keeping up with new revisions).  As usual, any help is appreciated.

    Here's the error log, looks like the interface was down since before 15:00.

    Feb 6 18:10:12	php-fpm	38594	/index.php: Successful login for user 'admin' from: 192.168.3.29
    Feb 6 18:09:42	php		lcdproc: Start client procedure. Error counter: (0)
    Feb 6 18:09:40	php-fpm	38594	/rc.start_packages: Restarting/Starting all packages.
    Feb 6 18:09:40	php-fpm	38594	lcdproc: Sync: Begin package sync
    Feb 6 18:09:40	php-fpm	38594	lcdproc: Sync: End package sync
    Feb 6 18:09:40	check_reload_status		Syncing firewall
    Feb 6 18:09:39	php-fpm	46277	/rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 0.0.0.0 -> 192.168.3.1 - Restarting packages.
    Feb 6 18:09:39	check_reload_status		Starting packages
    Feb 6 18:09:37	php-fpm	46277	/rc.newwanip: Resyncing OpenVPN instances for interface OPT1.
    Feb 6 18:09:37	php-fpm	46277	/rc.newwanip: Creating rrd update script
    Feb 6 18:09:36	xinetd	16971	Starting reconfiguration
    Feb 6 18:09:36	xinetd	16971	Swapping defaults
    Feb 6 18:09:36	xinetd	16971	readjusting service 6969-udp
    Feb 6 18:09:36	xinetd	16971	Reconfigured: new=0 old=1 dropped=0 (services)
    Feb 6 18:09:36	php-fpm	46277	/rc.newwanip: Removing static route for monitor 8.8.8.8 and adding a new route through "My external IP"
    Feb 6 18:09:36	xinetd	16971	Starting reconfiguration
    Feb 6 18:09:36	xinetd	16971	Swapping defaults
    Feb 6 18:09:36	xinetd	16971	readjusting service 6969-udp
    Feb 6 18:09:36	xinetd	16971	Reconfigured: new=0 old=1 dropped=0 (services)
    Feb 6 18:09:35	php-fpm	46277	/rc.newwanip: rc.newwanip: Info: starting on sk2.
    Feb 6 18:09:35	php-fpm	46277	/rc.newwanip: rc.newwanip: on (IP address: 192.168.3.1) (interface: OPT1[opt1]) (real interface: sk2).
    Feb 6 18:09:35	php-fpm	46277	/rc.newwanip: IP has changed, killing states on former IP 0.0.0.0.
    Feb 6 18:09:34	php-fpm	46277	/rc.linkup: Hotplug event detected for OPT1(opt1) static IP (192.168.3.1 )
    Feb 6 18:09:34	check_reload_status		rc.newwanip starting sk2
    Feb 6 18:09:34	check_reload_status		Reloading filter
    Feb 6 18:09:33	check_reload_status		Linkup starting sk2
    Feb 6 18:09:33	kernel		sk2: link state changed to UP
    Feb 6 15:00:17	php		lcdproc: Start client procedure. Error counter: (0)
    Feb 6 15:00:15	check_reload_status		Syncing firewall
    Feb 6 15:00:14	php-fpm	46277	/rc.start_packages: Restarting/Starting all packages.
    Feb 6 15:00:14	php-fpm	46277	lcdproc: Sync: Begin package sync
    Feb 6 15:00:14	php-fpm	46277	lcdproc: Sync: End package sync
    Feb 6 15:00:13	php-fpm	64300	/rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 192.168.100.11 -> "My new external IP" - Restarting packages.
    Feb 6 15:00:13	check_reload_status		Starting packages
    Feb 6 15:00:11	php-fpm	64300	/rc.newwanip: Resyncing OpenVPN instances for interface WAN.
    Feb 6 15:00:11	php-fpm	64300	/rc.newwanip: Creating rrd update script
    Feb 6 15:00:10	kernel		arpresolve: can't allocate llinfo for 192.168.100.1 on sk0
    Feb 6 15:00:10	php-fpm	64300	/rc.newwanip: rc.newwanip: Info: starting on sk0.
    Feb 6 15:00:10	php-fpm	64300	/rc.newwanip: rc.newwanip: on (IP address: "My new external IP") (interface: WAN[wan]) (real interface: sk0).
    Feb 6 15:00:10	php-fpm	64300	/rc.newwanip: IP has changed, killing states on former IP 192.168.100.11.
    Feb 6 15:00:10	xinetd	16971	Starting reconfiguration
    Feb 6 15:00:10	xinetd	16971	Swapping defaults
    Feb 6 15:00:10	xinetd	16971	readjusting service 6969-udp
    Feb 6 15:00:10	xinetd	16971	Reconfigured: new=0 old=1 dropped=0 (services)
    Feb 6 15:00:10	php-fpm	64300	/rc.newwanip: ROUTING: setting default route to "My external IP"
    Feb 6 15:00:10	php-fpm	64300	/rc.newwanip: Removing static route for monitor 8.8.8.8 and adding a new route through "My external IP"
    Feb 6 15:00:09	kernel		arpresolve: can't allocate llinfo for 192.168.100.1 on sk0
    Feb 6 15:00:09	check_reload_status		rc.newwanip starting sk0
    Feb 6 15:00:09	kernel		arpresolve: can't allocate llinfo for 192.168.100.1 on sk0
    Feb 6 15:00:08	kernel		arpresolve: can't allocate llinfo for 192.168.100.1 on sk0
    Feb 6 15:00:08	kernel		arpresolve: can't allocate llinfo for 192.168.100.1 on sk0
    Feb 6 15:00:07	kernel		arpresolve: can't allocate llinfo for 192.168.100.1 on sk0
    Feb 6 15:00:07	kernel		arpresolve: can't allocate llinfo for 192.168.100.1 on sk0
    Feb 6 14:59:58	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:54	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:50	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:47	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:46	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:42	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:38	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:38	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:34	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:30	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:28	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:27	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:26	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:26	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:23	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:21	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:19	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:19	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:19	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:19	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:18	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:14	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:14	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:12	php		lcdproc: Start client procedure. Error counter: (0)
    Feb 6 14:59:10	xinetd	16971	Starting reconfiguration
    Feb 6 14:59:10	xinetd	16971	Swapping defaults
    Feb 6 14:59:10	xinetd	16971	readjusting service 6969-udp
    Feb 6 14:59:10	xinetd	16971	Reconfigured: new=0 old=1 dropped=0 (services)
    Feb 6 14:59:10	php-fpm	22955	/rc.start_packages: Restarting/Starting all packages.
    Feb 6 14:59:10	php-fpm	22955	lcdproc: Sync: Begin package sync
    Feb 6 14:59:10	php-fpm	22955	lcdproc: Sync: End package sync
    Feb 6 14:59:10	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:10	check_reload_status		Syncing firewall
    Feb 6 14:59:09	check_reload_status		updating dyndns WAN_DHCP
    Feb 6 14:59:09	check_reload_status		Restarting ipsec tunnels
    Feb 6 14:59:09	check_reload_status		Restarting OpenVPN tunnels/interfaces
    Feb 6 14:59:09	check_reload_status		Reloading filter
    Feb 6 14:59:09	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:09	php-fpm	22955	/rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 0.0.0.0 -> 192.168.100.11 - Restarting packages.
    Feb 6 14:59:09	check_reload_status		Starting packages
    Feb 6 14:59:07	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:07	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    Feb 6 14:59:07	php-fpm	22955	/rc.newwanip: Resyncing OpenVPN instances for interface WAN.
    Feb 6 14:59:07	php-fpm	22955	/rc.newwanip: Creating rrd update script
    Feb 6 14:59:06	xinetd	16971	readjusting service 6969-udp
    Feb 6 14:59:06	xinetd	16971	Reconfigured: new=0 old=1 dropped=0 (services)
    Feb 6 14:59:06	kernel		arpresolve: can't allocate llinfo for "My external IP" on sk0
    

    The gateway error log shows a latency alarm at 14:59, so could this trigger the interface going down, or is it a symptom of it being down?  Also, I've replaced the wan IP with "My external IP".  But it should be said the IP in the below log was different.  I gather this means that Comcast assigned me a new external IP, and the interface could not handle the change.

    Time	Process	PID	Message
    Feb 6 18:09:36	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 8.8.8.8 bind_addr "My external IP" identifier "WAN_DHCP "
    Feb 6 15:00:10	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 8.8.8.8 bind_addr "My external IP" identifier "WAN_DHCP "
    Feb 6 14:59:09	dpinger		WAN_DHCP 8.8.8.8: Alarm latency 0us stddev 0us loss 100%
    Feb 6 14:59:06	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 8.8.8.8 bind_addr 192.168.100.11 identifier "WAN_DHCP "
    Feb 6 14:58:28	dpinger		WAN_DHCP 8.8.8.8: Alarm latency 15968us stddev 866us loss 21%
    Feb 3 11:23:50	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 8.8.8.8 bind_addr "My external IP" identifier "WAN_DHCP "
    Feb 3 11:23:49	dpinger		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 8.8.8.8 bind_addr "My external IP" identifier "WAN_DHCP "
    


  • 1 more thing, this is from the cable modem log:

    Mon Feb 06 14:59:54 2017	Critical (3)	Received Response to Broadcast Maintenance Request, But no Unicast Maintenance opportunities received - T4 time out
    

    So the cable modem had a hiccup with the comcast signal, and when the gateway (wan) handled the change, one interface (lan) was able to cope with it, but one (opt1) was not.  I fear I'm out of my league here as they're setup the same.