Internet Connection Drops. PFSense 2.1



  • First let me say I have spent about an hour searching the forums for similar cases. I have found a few, but nothing that exactly fits

    Problem: I have a cable internet service that has been having issues. During the last 5-7 weeks the internet will drop at random times, the cable company has acknowledged the issue. However, they are sure they have fixed the issue. I still see internet outages with some frequency (due to packet loss).

    Background info: PFSense running directly behind the cable modem. I have setup smokeping on 2 computers behind pfsense to monitor a half dozen sites both via DNS and actual IPs. They report packet loss which agrees with PFSense's packet loss RRD. I have been running PFSense with the same setup for the last 2 years (updating as required) so I am now on PFSense 2.1. I have taken the extrodinary step of installing a second PFSense instance and turning off the first thinking that maybe upgrading between versions left some cruft that was exposed by my ISP's issues.

    PFSense problems: I believe that at this point it is actually PFSense's fault. The modem's uptime has been good, signal strength to noise ratio has normalized etc.

    Troubleshooting done: I put a switch between the modem and the PFSense box, my internet provider allows me to pull up to 3 IP's off my modem. One port of the switch goes to PFSense and then to the rest of the house. The other port goes into a spare machine which I have wiped, hardened and installed smokeping on. Over the last several days I have noted 100% packet loss 2-3 times a day for PFSense and machines behind PFSense. On the hardened machine on the switch, no such service interruptions have been seen. This is why I believe it to be a PFSense issue

    I have not changed anything on the PFSense box since its initial setup so I am not really sure where the problem is. I have changed the machines (thus nics), I have changed the ethernet cables as well. I have tried virtualizing PFSense and importing the configs with the same result.

    I am not doing anything overly complex with pfsense. I have half dozen forwarding rules, I am running OpenVPN and I have only 3 packages installed: OpenVPN BandwidthD and RRD summary.

    I have attempted dropping the interfaces to 100TX/Full Duplex as I found suggested in a thread here, but that has not made any appreciable change that I can find.

    Observation: the problem will clear itself in 15-25 minutes or if I reboot PFSense it will fix itself when PFsense comes back online.

    I would appreciate any troubleshooting hints/tips. I am a Linux Admin by trade, but this is my only experience with and BSD type things so when saying things like "check this log" I would appreciate the kindness of specifying the location of said resource

    Cheers!



  • So I have confirmed that it is definitely a PFSense issue.

    The little machine, while registering some packet loss does not go down. I have noticed that my modem attempts to give an RFC1918 IP, which I then blocked. I have wiped the box and started fresh instead of importing my previous settings. This did not seem to make a difference.

    PFSense thinks that the wan gateway is going down

    Mar 5 21:29:05	check_reload_status: Reloading filter
    Mar 5 21:28:55	apinger: alarm canceled: WAN(198.84.152.1) *** down ***
    Mar 5 21:23:15	check_reload_status: Reloading filter
    Mar 5 21:23:05	apinger: ALARM: WAN(198.84.152.1) *** down ***
    Mar 5 21:09:08	check_reload_status: Reloading filter
    Mar 5 21:08:58	apinger: alarm canceled: WAN(198.84.152.1) *** down ***
    Mar 5 21:02:19	check_reload_status: Reloading filter
    Mar 5 21:02:09	apinger: ALARM: WAN(198.84.152.1) *** down ***
    Mar 5 18:49:35	check_reload_status: Reloading filter
    Mar 5 18:49:25	apinger: alarm canceled: WAN(198.84.152.1) *** down ***
    Mar 5 18:32:04	check_reload_status: Reloading filter
    Mar 5 18:31:54	apinger: ALARM: WAN(198.84.152.1) *** down ***
    

    However, there is no outage reported by the machine that is exposed to the web. I have taken the step of wiping the machine again and installing 2.0.3 to see if it is a regression in 2.1. So far I am at a loss.

    Any help would really be appreciated



  • If you disable gateway monitoring, does the connection stabilize? Have you tried going to an earlier version of pfSense to see if there was a regression?



  • Thanks for the reply.

    I will try disabling the gateway monitoring at a later time. As I stated, I have actually installed 2.0.3 just last night. I am going to run it for a few days to see what happens. If the problem persists, I will disable gateway monitoring as you suggest.

    If disabling the gateway monitoring stabilizes things, I will try moving to 2.1 and run with gateway monitoring disabled for several days as well.

    Thanks for the suggestion.

    It will probably be a few days before I post again as I would like to establish a pattern on pfsense as there is indeed a problem from my ISP as well

    I find it interesting that these problems crop up now after being stable for almost 3 years. Is PFSense really that sensitive to packet loss?



  • Default values are, but you can change them in the gateway monitoring advanced settings menu.



  • Are you saturating your upload when this happens?



  • No, it happens randomly throughout the day/night cycle. Sometimes it happens when my wife is playing WoW, other times when using netflix, still other times I have recorded outages when no one is home (I have smoke ping setup on 2 computers and confirmed with the RRD graphs in pfsense)



  • So in the last day and a bit I have had 2 more instances of complete internet outages as seen by PFSense.

    I have gone to System -> Routing -> Edit Gateway and Checked "Disable Gateway Monitoring"

    We'll see what happens



  • After a few days with gateway monitoring disabled, I have noticed no appreciable change (still running 2.0.3). I have re-enabled gateway monitoring and set the Frequency Probe to 3 and down to 20 as well as the low water mark for Package Loss to 25%.

    I am really hoping someone can point me in the right direction for this. I would hate to dump pfsense for another solution. The interruptions to service is very disheartening, especially considering I have migrated hardware and cables. The box outside of pfsense still has not registered a downed state during the same time periods.



  • Thought I would post an observation. Every time PFSense marks the WAN interface as offline I see the following in the modem's log

    
    Sun Mar 09 11:02:50 2014  	 Notice (6) 	 TLV-11 - unrecognized OID;CM-MAC=8c:04:ff:1a:ee:fa;CMTS-MAC=0... 
     Sun Mar 09 11:02:49 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.9;CM-MAC=8c:04:... 
     Sun Mar 09 11:02:49 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.8;CM-MAC=8c:04:... 
     Sun Mar 09 11:02:49 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.7;CM-MAC=8c:04:... 
     Sun Mar 09 11:02:49 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.6;CM-MAC=8c:04:... 
     Sun Mar 09 11:02:49 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.5;CM-MAC=8c:04:... 
     Sun Mar 09 11:02:49 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.4;CM-MAC=8c:04:... 
     Sun Mar 09 11:02:49 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.3;CM-MAC=8c:04:... 
     Sun Mar 09 11:02:49 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.2;CM-MAC=8c:04:... 
     Sun Mar 09 11:02:49 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.1;CM-MAC=8c:04:... 
     Sun Mar 09 11:02:49 2014  	 Warning (5) 	 DHCP WARNING - Non-critical field invalid in response ;CM-MAC... 
     Sun Mar 09 11:02:40 2014  	 Critical (3) 	 No Ranging Response received - T3 time-out;CM-MAC=8c:04:ff:1a... 
     Sat Mar 08 23:43:27 2014  	 Notice (6) 	 TLV-11 - unrecognized OID;CM-MAC=8c:04:ff:1a:ee:fa;CMTS-MAC=0... 
     Sat Mar 08 23:43:27 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.9;CM-MAC=8c:04:... 
     Sat Mar 08 23:43:27 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.8;CM-MAC=8c:04:... 
     Sat Mar 08 23:43:27 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.7;CM-MAC=8c:04:... 
     Sat Mar 08 23:43:27 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.6;CM-MAC=8c:04:... 
     Sat Mar 08 23:43:27 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.5;CM-MAC=8c:04:... 
     Sat Mar 08 23:43:27 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.4;CM-MAC=8c:04:... 
     Sat Mar 08 23:43:27 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.3;CM-MAC=8c:04:... 
     Sat Mar 08 23:43:27 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.2;CM-MAC=8c:04:... 
     Sat Mar 08 23:43:27 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.1;CM-MAC=8c:04:... 
     Sat Mar 08 23:43:27 2014  	 Warning (5) 	 DHCP WARNING - Non-critical field invalid in response ;CM-MAC... 
     Sat Mar 08 23:43:18 2014  	 Critical (3) 	 No Ranging Response received - T3 time-out;CM-MAC=8c:04:ff:1a... 
     Sat Mar 08 17:45:02 2014  	 Notice (6) 	 TLV-11 - unrecognized OID;CM-MAC=8c:04:ff:1a:ee:fa;CMTS-MAC=0... 
     Sat Mar 08 17:45:02 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.9;CM-MAC=8c:04:... 
     Sat Mar 08 17:45:02 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.8;CM-MAC=8c:04:... 
     Sat Mar 08 17:45:02 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.7;CM-MAC=8c:04:... 
     Sat Mar 08 17:45:02 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.6;CM-MAC=8c:04:... 
     Sat Mar 08 17:45:02 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.5;CM-MAC=8c:04:... 
     Sat Mar 08 17:45:02 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.4;CM-MAC=8c:04:... 
     Sat Mar 08 17:45:02 2014  	 Error (4) 	 Missing BP Configuration Setting TLV Type: 17.3;CM-MAC=8c:04:... 
    
    

    Is there anything that I can do to reduce the impact of this? Alternatively, is there a way to cycle the interface after it's been down for say 2 minutes?



  • I made the following adjustment yesterday:

    Routing -> Edit Gateway

    
    Probe Interval: 3
    
    Down: 60
    
    

    I dont know if it is just a fluke or not, but I did not register any outages last night. I will continue to monitor and update this post as I discover things



  • I have noted the same errors showing up in my modem's log but I did not suffer the corresponding disruption in service.

    I will continue to monitor for the next few days and report back, but I am hopeful that this actually worked around the problem



  • I think it might help work around the problem, but the issue with the ISP remains if you are also noticing issues in the modem.



  • @podilarius:

    I think it might help work around the problem, but the issue with the ISP remains if you are also noticing issues in the modem.

    This is true enough. My goal was to mitigate the problem. The ISP has already acknowledged a problem with a cable in the ground, which cannot be fixed until ground thaws enough for them to dig

    I have had another day without service interruption (aside from dropped packets). This may be a viable work around for others I hope. A couple more days and I will mark this as solved



  • @stratus:

    I made the following adjustment yesterday:

    Routing -> Edit Gateway

    
    Probe Interval: 3
    
    Down: 60
    
    

    I dont know if it is just a fluke or not, but I did not register any outages last night. I will continue to monitor and update this post as I discover things

    This worked for me.  Made an account just to thank you for it.  Had been troubleshooting it for 2 days.