Workstation dropping inet connectivity, pfsense box is not



  • I got a strange problem going on here.

    I am not a linux proficient user here, so anything you need me to do you will have to give me command lines or refer me to a how-to of sorts. I am quite proficient on windows systems however, so its not that I can't, I just don't know the file structure or commands.

    Anyhow, this is pfSense 2.0.2 on a celeron 2ghz with I believe 1gb ram. The OS has been on for I would say at least a year, maybe more. It has been stable until recently. A week or two ago, I was dropping internet connectivity from my main machine. Rebooted the router, problem was gone.

    Topology is DSL modem>pfsense wan>pfsense lan>switch>pcs. PPPOE wan address, with dhcp leases assigned and many restrictions in place. However, for simplicity I turned off dhcp server and dns forwarding services and the problem persists.

    Of course the wan ip is valid. My local nic settings are all correct, using 192.168.x.x with 255.255.255.0 and the correct gateway. My pfsense uses manual dns addresses (norton), but my machine uses the isp ones. Iptable rules allow any port 53 from my ip (192.168.1.10) and the rest of the lan is limited to only norton dns addresses on port 53.

    So, for a year or more now, it has all worked. Tonight it is dropping on average every 2 minutes. Sometimes 5 minutes, sometimes 1 minute. Most of the "drops" in ability to ping a wan address last only 20 seconds. Some last maybe 3 minutes.

    While the "drop" is going on, I can ping my NAS box, any other computer and the router. Other computers in the house are not affected by this, only mine it seems. While the "drop" is going on, I can be in the webadmin to the router, and even ping google.com. It is only my machine that cannot ping or otherwise get out.

    If I am downloading something, or connected to something like a teamspeak server, my connection drops. I cannot ping by dns name nor ip address outside of the lan.

    I've been watching the system logs, and while I have seen a lot of values, only two keep coming up that seem odd. I look at the system logs a lot although I don't keep them. Here they are.

    Sep 23 22:34:37 dhcpleases: Could not deliver signal HUP to process because its pidfile does not exist, No such file or directory.

    Sep 23 22:34:52 miniupnpd[44995]: SendNATPMPPublicAddressChangeNotification: sendto(s_udp=14): No route to host

    And after lets say 5 iterations of the "drop" this is all the system log shows

    Sep 23 23:56:48	php: : Resyncing OpenVPN instances for interface WAN.
    Sep 23 23:56:43	check_reload_status: Reloading filter
    Sep 23 23:56:42	php: : ROUTING: setting default route to 72.160.18.1
    Sep 23 23:56:39	miniupnpd[6300]: SendNATPMPPublicAddressChangeNotification: sendto(s_udp=14): No route to host
    Sep 23 23:56:39	miniupnpd[6300]: SendNATPMPPublicAddressChangeNotification: sendto(s_udp=14): No route to host
    Sep 23 23:56:35	miniupnpd[6300]: SendNATPMPPublicAddressChangeNotification: sendto(s_udp=14): No route to host
    Sep 23 23:56:35	miniupnpd[6300]: SendNATPMPPublicAddressChangeNotification: sendto(s_udp=14): No route to host
    Sep 23 23:56:35	check_reload_status: Rewriting resolv.conf
    Sep 23 23:50:21	syslogd: kernel boot file is /boot/kernel/kernel
    

    I do have a fair number of packages installed and a lot of customized settings in place. Here are the packages

    anyterm
    arping
    arpwatch
    cron
    file manager
    ipguard-dev (never been started)
    lightsquid
    nmap
    notes
    pfblocker
    sarg
    squid
    squidguard
    

    have reinstalled squidguard. Rebooted the modem. Rebooted the router. Rebooted the computer. The "drops" keep occurring, many exactly at 2 minutes apart to the second.

    I am at a loss as to what might be causing this. In the last bit since I have been writing this and watching the issues, this is what the system logs show (see attachement).

    I'm not here at home to work on this that much, but I can run diagnostics from work remotely to get output. I won't be able to check the issue though until later in the evenings from my pc.

    So, I guess I need some expert help on where to proceed from here.

    I don't believe it to be hardware as the rest of the network seems unaffected and I don't actually lose the nic on my pc, only outbound internet. And for what its worth, there has been no new hardware on the network. I have it set to deny unknown macs anyway.

    Also, I turned off the upnp service, problem persists. I did some nat work to allow a game through, but set that back to default, problem persists.

    Thanks to anyone willing to help.

    pfs_drops.txt



  • Leaving my router run all night with nothing but my NAS box (synology) actually powered on (which is typical) shows these logs. Constant errors with squid and constant renewing of wan ip from the ISP.

    Some friends also on same ISP do not have this issue. They are not on same circuit as I am most likely, but are local in town, so not a general ISP issue although could be local to my circuit.

    The pfsense box has been running fine. Not overheating at all, and I don't do torrents or other things really. Mostly general browsing by wife and me and some homework for kids. I do game a little bit with some friends at night. No activity is really going on when this problem cropped up last night.

    Was going to isolate everything tonight, but after seeing such log activity (in the attached file) I decided maybe there is an issue with my pfsense install.

    It is fair to say that I have many rules in place, many with squid and squidguard, but I also make heavy use of aliases and iptable entries. Some port forwarding has been done mostly for remote access and only one NAT/UPNP instance.

    Some of the packages I installed (ie. ipguard) have been more of a learning event rather than useage. Computers are my hobby, so I like to learn. Which is why I haven't implemented them yet. I don't believe they are the culprit. Nothing has changed in the last few months other than one NAT rule I put in place, which was simply to allow static mapping of a small range of ports. UPNP was turned on since perhaps Feb. of 2012 with no issues all year.

    Any thoughts on what the issue might be? The "url_rewriter" issue I have seen many times, but never been able to find info to define and therefore fix that issue. It is unfortunate that I never invested the time into unix based systems or I might be able to do more.

    Also, I would prefer to fix the situation rather than just reinstall. I like learning, so reinstalling doesn't teach me anything. As well, I have tweaked this install enough to know my backup xml files will not make reinstalling an easy process. Been there and done that, and would prefer not to.

    Actually, it would be neat to make an image of a fresh install with most of my settings. While I do that all the time in the windows workstations, I haven't a clue if I can even do that with pfsense or not.

    pfs_sep22-23.txt


  • Netgate Administrator

    Hmm, well that squid stuff doesn't look good but I suspect it's unrelated since it would effect all your internal clients.
    I would look for a problem with NAT. Possibly something is causing upnp to change the NAT rules effecting your connectivity. That can happen spontaneously triggered by some outside event.
    When you are 'dropped' and you try to ping something external what is the reported failure?

    Steve



  • kalispell….  Thats the party capital of the USA...

    Anyway - Maybe your Centurylink connection is getting latency occasionally high enough to make your system think its lost connection?

    Then causing all that other stuff to happen?



  • @stephenw10

    I would think the squid issues would affect everyone else, as I am the only one who does not go through that proxy, which is why I also have static DNS servers configured. My iptable rules make sure ports 80 and 443 are denied for the other machines. Well, my wifes machine is exempt from squid too, but she does get the assigned norton DNS servers via dhcp.

    I've disabled squid and squidguard and removed most packages that might interfere. No change in issue.

    I've a batch file set up to ping test. My machine is A, another is B. When machine A "drops", continuous ping to machine B and the router is good, but ping to google just times out, both when using name and ip. However, when machine A is "dropped" out, machine B can continuously ping machine A, the router and google by both name and ip.

    I would presume then that my nic on machine A is good as it does work. And the pppoe wan connection remains as well, as machine B can get online without issue.

    I've examine my iptable rules and there is not a restriction for machine A, but many for machine B. I can see nothing that should affect machine A. I turned off the only NAT rule I had, which was basically route any port to machine A so I get get static port mapping. No change again.

    At this point I am out of ideas. Like I said, I'm not versed in unix based systems so if it isn't logical, it ain't going to get fixed. I'm contemplating upgrading the pfsense box to the latest release. I usually install fresh as upgrades, in my experience, are only marginally trustworthy. But thats a lot of work to set it all up, again. I've always been the type of guy that thinks if it ain't broken don't fix it. I was under the assumption that as with a consumer router (ie. linksys etc) a properly setup pfsense box would "just work". It could be hardware, but the symptoms are strange and don't lean that way.

    Thanks for the reply though.

    @kejianshi

    You know Kalispell then?

    I thought the same thing about the Centurylink connection. The logs from night before last show a ton of wan ip changes when no box was on except the NAS. Yesterday morning though I logged into the router from work and did not have one wan ip change the whole day. Really no logs to speak of. Last night I was removing packages and shutting off services and the issue persisted - my pc loses internet but others in the network keep it, while my pc is not malfunctioning NIC wise. Its the strangest thing really.

    As I said, the reported failure for a ping to google is just a timeout. Happens to domain name ping or ip ping. Only internal ips are unaffected.

    I'm a hardcore windows geek. I thought perhaps I picked up a nasty, so I put my clean system image on that was made without the computer being online. Its squeeky clean now. Problem persists.

    Doesn't look like I'm going to learn much about this, so I'll probably reinstall the OS tonight.

    Thanks for the replies.



  • You don't have to have any computers active on the internet for the pfsense to detect latency or offline WAN.  It COULD have been just a bad day for internet in kallispell.  (do they have good nights)?  PFsense pings its gateway IP constantly if if apinger sees a loss of packets or WAN IP change, it will react to it no matter what is or isn't up on your LAN or VPN.

    PS.  Did you check the WAN IP?  Was it changing?

    Yeah - I know that area.  Its all nature all the time in that area.



  • Have you tried changing the cable?


  • Netgate Administrator

    Did you try disabling UPNP?
    You didn't say what the ping failure meathod is; no route to host? no reply? no DNS? All useful clues.
    Check the logs while you are trying to ping, anything useful there?

    Possible causes of your symptoms are:
    Firewall rule blocking traffic. Though you can still ping the LAN interface which would be blocked and logged if that were the case.
    NAT to your machine not working.
    Your machine lost it's gateway informtion for some reason.
    Software firewall on the local box.

    Just for information pfSense is built on FreeBSD which isn't Linux. Consequently it doesn't use iptables but we all knew what you meant.  ;)

    Steve



  • To address all the questions:

    The pfSense box can ping anywhere without issue when the issue happens. The WAN connection does not go down at all.

    The pings I am referring to are my machine, the one being affected and another machine in the house.

    I have disabled UPNP and removed nat rules. I have not removed lan iptable rules though as there are none that should affect the machine in question. It has a static ip, they all do really.

    I have not switched the cable because I don't lose connection with the affected machine, and there are no errors reported at all, as in dropped packets etc. It could be the cable but I doubt it in this case.

    I have turned off the gateway monitor option.

    I have removed squid and squid guard. Down to only the filemanager package now.

    Enabling to log the default block rule shows nothing. The system log really shows nothing when it happens. Enabling various firewall rules to log show again, nothing.

    There is no software firewall on the local box. Its a heavily tweaked win7 install. I do mean heavily. Lean and mean as I can make it. Been tweaking windows boxes for many years and not seen this issue. Besides, I just put a clean image back on, and it still happens, so not workstation software related I would bet.

    I knew pfSense was BSD derived. I guess I assumed it used iptables lol. I'll file that away in the archives ;)

    Brief conclusion:

    Problem started a few days ago, and continues. Making changes like stopping services and rebooting does not work. Removing packages and reboot not work. Turning off or on many different options such as apinger, nat, upnp, dchp server, etc etc have no effect. Only one machine is affected for some reason.

    As troubleshooting method, I hooked up a dlink dir655 router and a wr54g and a dlink gamers lounge. They each could get a wan ip. They each functioned well. My connection on my machine was not affected. Reboot the pfsense box, same issue is back. I have no idea what changed as it has been static for a number of months with the exception of one outbound nat rule I put in place about a week ago. It was a simple rule to allow any port to my machines ip and to statically map the ports. Used it many times in the past. I don't normally leave things open like that, so I usually delete or disable that outbound nat stuff after I am done. Thats it, nothing else changed.

    I am going to try a firmware update and see what happens. I guess if I have to do a total rebuild I will. I certainly don't want to be without a pfsense box. Its so much better than most else I've used I guess I am addicted  ;D

    Thanks for the ideas and replies!



  • Updating the firmware to the latest release appears to have fixed whatever issue it was. I wish I was able to determine what happened, but don't have the knowledge on BSD to do so.

    I will put all my settings/packages back in place and see if this happens again. If it does not after a few weeks, then I will wipe the disc and reinstall fresh and chalk it up to some obscure setting or config that developed on the pfSense install. A hardware issue on the machine it was affecting would have been better as I would have found the issue I suppose. I don't like "not knowing" what would cause such a weird situation really. Oh well.

    Thank you to all the responses.


Log in to reply