My marriage is in trouble - Resolving host…



  • Without stable internet my marriage may dissolve. In all seriousness, having two people working remotely who rely on a connection that is unstable is never fun. To provide some background:

    I've had pfsense in one form or another running for about 7 years (on the same hardware):

    SUPERMICRO SYS-5015A-PHF

    I have never had any issues related to PFSense until this June. Out of the blue (no changes to the box, network or connection/modem), I started having random drops. Basically anytime a client would attempt to connect it would get the "Resolving host…". I'd connect to my PFSense box and see there was a random drop in connection (Packet loss on the WAN interface). WAN Interface still showed it had an IP from the ISP. This would resolve itself in 30-60 seconds. But it would happen numerous times a day. My first thought was an issue with my cable connection/modem. Had them come out to check the lines, replacing the line to the house and check the signal.  All fine. I even changed out the cable modem as well.

    Did further troubleshooting by restoring my PFSense box to defaults, it would be fine for a day then return. I stopped using third party DNS the issue still remained. Finally just to see if PFSense was the issue I removed it out of the equation entirely and used my Asus 68R Wireless Router as the firewall. I left this in place for a couple months (against better judgment) but the connection was rock solid. No drops and everything worked as expected.

    At that point I decided to replace the hard drive in the PFSense box with a new SSD. Installed PFSense 2.1.5 and put everything back in place. The issue with the resolving host unfortunately returned. Tried updating to PFSense 2.2 Beta and same thing.

    So at this point it is likely the hardware (though no 100% certain which is why this is posted to General), but how can I be certain before I spend a few hundred dollars on another server?

    My network configuration is simple:
    Cable Modem --> PF Sense --> Gigabit switch --> Patch Panel --> Cat 6 to all end points including a couple of wireless APs.

    Any help would be GREATLY appreciated. Thanks.



  • logs would be helpful … it could be lots of things



  • @heper:

    logs would be helpful … it could be lots of things

    Sorry for not providing detailed logs. Let me know what would be useful and I'll post them in. Keep in mind, what I am currently on is a fresh install so the logs will be limited (no real history). But since this issue continues to happen (even in the last hour) maybe there is something there that I am overlooking.


  • Rebel Alliance Netgate Administrator

    it could be many many many things

    Who is your upstream DNS provider?  I assume PfSense is your internal network, and that forwards along to a 3rd party.

    Find IP of this place and do a ping…my gut says you are still having ISP issues <cable 7="" modems="" can="" have="" many="" issues="" even="" after="" a="" tech="" visit. ="" i="" years="" experience="" with="" the="" rf="" side="" of="" thing="">...

    I would try google dns 8.8.8.8, 8.8.4.4 or opendns 208.67.222.222, 208.67.220.220....</cable>



  • Is the VPN running on a port other than 443/80 TCP?

    If so, its common that ISPs will for some unknowable reason try to block VPNs.

    I assume its because only a "bad" person would need a vpn…

    I know my ISP on both ends of the connection is guilty of this sort of thing as well as throttling and doing a crap job of it.



  • @chrismacmahon:

    it could be many many many things

    Who is your upstream DNS provider?  I assume PfSense is your internal network, and that forwards along to a 3rd party.

    Find IP of this place and do a ping…my gut says you are still having ISP issues <cable 7="" modems="" can="" have="" many="" issues="" even="" after="" a="" tech="" visit. ="" i="" years="" experience="" with="" the="" rf="" side="" of="" thing="">...

    I would try google dns 8.8.8.8, 8.8.4.4 or opendns 208.67.222.222, 208.67.220.220....</cable>

    I've used Open DNS for many years and tried Google for a period of time as well.  When this issue manifested itself one of the first things I did was take them out of the equation. I used my ISPs DNS, but same issue.  All the lines in the house are pretty much brand new (remodel 3 years ago). The run from the outside line to the modem is short (about 15 feet). My signal from the Cable Modem is fantastic.

    When I replaced my PFSense box with my Asus 68R wireless router as the firewall and dns, everything was perfectly stable for months. Putting back PFSense (with a fresh install of 2.1.5 and 2.2 and factory default settings) and the issue started again. So it seems to be something specific to PFSense or the hardware that I'm using, the trouble is narrowing it down.



  • @kejianshi:

    Is the VPN running on a port other than 443/80 TCP?

    If so, its common that ISPs will for some unknowable reason try to block VPNs.

    I assume its because only a "bad" person would need a vpn…

    I know my ISP on both ends of the connection is guilty of this sort of thing as well as throttling and doing a crap job of it.

    Yep, did have VPN running on a non-standard port but I also removed that out of the equation as well when I did a fresh install and didn't have it in place. Same issue manifested itself. It is really quite odd and almost makes me think it is hardware related at this point but I want to be certain before investing a few hundred dollars in another server.



  • Question - Is it flakey for internet in general or only when openvpn client is connected?



  • @kejianshi:

    Question - Is it flakey for internet in general or only when openvpn client is connected?

    Unfortunately in general. For the gateways it would report this during one of these down periods (IPs intentionally removed):

    Oct 26 17:53:21 apinger: ALARM: WAN_DHCP(x.x.x.x) *** down ***
    Oct 26 17:53:35 apinger: alarm canceled: WAN_DHCP(x.x.x.x) *** down ***
    Oct 26 17:54:33 apinger: ALARM: WAN_DHCP(x.x.x.x) *** down ***
    Oct 26 17:54:42 apinger: alarm canceled: WAN_DHCP(x.x.x.x) *** down ***
    Oct 26 17:54:52 apinger: ALARM: WAN_DHCP(x.x.x.x) *** down ***
    Oct 26 17:55:03 apinger: alarm canceled: WAN_DHCP(x.x.x.x) *** down ***
    Oct 26 17:56:18 apinger: ALARM: WAN_DHCP(x.x.x.x) *** down ***
    Oct 26 17:56:26 apinger: alarm canceled: WAN_DHCP(x.x.x.x) *** down ***
    Oct 26 17:57:34 apinger: ALARM: WAN_DHCP(x.x.x.x) *** down ***
    Oct 26 17:57:53 apinger: alarm canceled: WAN_DHCP(x.x.x.x) *** down ***

    But if I then take it down and put my wireless router in place, there would be absolutely no drops.  This is in a fresh install of PFSense with no packages installed on both 2.1.5 and 2.2 Beta.

    RRD Graph of WAN DHCP_Quality: (see attached), just imagine that happening dozens of times per day




  • Ohhhhhhhh….  That....

    Apinger has a threshhold issue sometimes.

    Maybe ask what can be done to adjust or correct apinger issue.

    Or, try this.  Set the gateway monitor to not monitored.

    system > routing >

    then click the little "e" to the right of wan_dhcp gateway (if yours is named that)

    Check disable gateway monitor and see if problem goes away.



  • Its nice to have a gateway monitor, but not if its actually the cause of issues right?

    It could be.  Try that.  Change it back if you don't get good results.



  • @kejianshi:

    Its nice to have a gateway monitor, but not if its actually the cause of issues right?

    It could be.  Try that.  Change it back if you don't get good results.

    Thanks for the suggestion as it is something I never tried. I did read about the apinger inconsistencies in some threads but it seemed mostly prevalent in multi wan setups. Just tried it but unfortunately it didn't work. Didn't have any connection at all with it disabled. I have two showing there, one for ipv4 and one for ipv6. Everything in pfsense is default values after a fresh install.



  • Sorry - Maybe you have actual hardware issues.  I don't know.



  • apinger IMHO can be way toooo sensitive. I no issues until sometime around 2.1.x

    These are my settings for IPv4 and IPv6 and my connection is stable expect for overnight maintenance

    Latency thresholds 1000 1300
    Packet Loss thresholds 20 30
    Probe Interval 5
    Down 30

    Also, add the Service WatchDog package and config apinger to restart if it goes down..

    Once this is done, restart apinger or restart your box



  • @Cino:

    apinger IMHO can be way toooo sensitive. I no issues until sometime around 2.1.x

    These are my settings for IPv4 and IPv6 and my connection is stable expect for overnight maintenance

    Latency thresholds 1000 1300
    Packet Loss thresholds 20 30
    Probe Interval 5
    Down 30

    Also, add the Service WatchDog package and config apinger to restart if it goes down..

    Once this is done, restart apinger or restart your box

    Really appreciate the specific advice. I've made the changes, added service watchdog and will monitor over the next 24 hours. It will be a good test tomorrow since I am on all day remotely.



  • did you try replacing the cable between pfsense & modem ? could also be a faulty network card. (since you don't have issues with another router)



  • If this doesn't work, you may have to fine tweak the settings. If it goes down, check you gateway log. The times should match up when you lose internet… Take note if it was ping or packet lose then increase the either one or both. Latency I would increase by 100 for both settings and Packet lost by 10.

    Another thing you could try is just disable monitoring all together... I didn't want to this, that is why i've tweak the settings



  • @Cino:

    If this doesn't work, you may have to fine tweak the settings. If it goes down, check you gateway log. The times should match up when you lose internet… Take note if it was ping or packet lose then increase the either one or both. Latency I would increase by 100 for both settings and Packet lost by 10.

    Another thing you could try is just disable monitoring all together... I didn't want to this, that is why i've tweak the settings

    Unfortunately it did not work. Still having regular drops though there was a few hour period today when things were fine. The apinger logs are not necessarily correlating to every dropped event or packet loss (as shown in the quality graphs) which is odd.  I have temporarily put the wireless router back in place and all is well. That tells me that its not a problem with the ISP or the cables being used as the same ones are used to the wireless router.

    I wish I had another machine to install pfsense as a test. I do have a mac mini lying around that I can use temporarily though I would need another nic.



  • @heper:

    did you try replacing the cable between pfsense & modem ? could also be a faulty network card. (since you don't have issues with another router)

    As a test I did but no change (also the same cables work fine when my wireless router is used in place of pfsense). I definitely think it is likely hardware related at this point but really wanted to confirm before I purchase an updated supermicro server.



  • Another thing you can try to totally rule out apinger… Tic Disable Gateway Monitoring and see how that works for you..  If it drops, then possibly a hardware issue.

    Anything in your system log?


  • Netgate Administrator

    @mulder00:

    Just tried it but unfortunately it didn't work. Didn't have any connection at all with it disabled.

    If this indicates you tried disabling gateway monitoring already the result doesn't look right. You should have at least the same connectivity as before.

    Steve