Random client IP's just stop working - No Internet



  • Hi all, I don't know if this is a pfsense issue/cause, I have 2.4.3 running on my ESXi host, I have had it this way for 4 years or so, about 6 weeks ago I upgraded to the current 2.4.3 version, now this may just be coincidental but since then I get random calls two or three times a day where a customer loses their internet connection, I then change their IP address and presto it begins working again.

    I have several subnets, I have my main LAN 172.16.0.0/16, then 3x subnets 10.1../22. The 10.1.. subnets first connect to a Mikrotik gateway before routing over my core pfsense before going out onto the net. This particular problem happens on any IP from any subnet, I have rebooted all my switches and routers.

    When the problem arises, I connect to the problem client, I can ping everything on my LAN, I can even ping the IP on the WAN of my pfsense, I just cannot ping anything on the internet, like 8.8.8.8.

    Change the customer IP and everything works again. 3 or 4 times a day this happens on random IP's. This morning I took a backup of my pfsense config and rebuilt it from ISO installation media, restored the config but the problem persists.

    Does anyone have any advice here please? What else can I check, what does it sound like? I first thought ARP, but the pfsense only holds ARP entries for my main subnet…and all subnets are affected. I don't have any weird rules on the box, I haven't changing anything that I can think of that may cause this...



  • Any packages installed such as Snort, Suricata or pfBlockerNG?



  • Nope, non of those, I only have the vmtools and cronjob packages.



  • Just had another client fail, so I tested one more thing, I unchecked the "Block bogon networks" and "Block private networks and loopback addresses" from my WAN interface and it resolved the issue right away…it's as if it's seeing the LAN ip going out the WAN interface as coming IN!!!!

    I have around 1500 live IP's, and 3 to 4 are affected each day!



  • Is your WAN on rcf1918 space?

    That setting should only apply to unsolicited incoming connections.  If you create a connection state outbound from LAN then the return traffic is automatically allowed.  This setting would affect any port forwards you might have, where the user is the one initiating the connection inbound and that would be blocked on WAN with the Block private networks checked.

    I don't think this setting is causing your problem from what I see so far.



  • My WAN has my ISP public IP's directly on it. In all other cases Ive had to change the IP…in this case unchecking those options got the IP going again...I'm not convinced either though and dont see the correlation. I guess I'll know more over the next 24 hours.


  • Netgate

    If that is the reason it will be logged in the firewall logs by default.

    block anything from private networks on interfaces with the option set

    block in log quick on $WAN from 10.0.0.0/8 to any tracker 12000 label "Block private networks from WAN block 10/8"
    block in log quick on $WAN from 127.0.0.0/8 to any tracker 12000 label "Block private networks from WAN block 127/8"
    block in log quick on $WAN from 172.16.0.0/12 to any tracker 12000 label "Block private networks from WAN block 172.16/12"
    block in log quick on $WAN from 192.168.0.0/16 to any tracker 12000 label "Block private networks from WAN block 192.168/16"
    block in log quick on $WAN from fc00::/7 to any tracker 12000 label "Block ULA networks from WAN block fc00::/7"

    It could just be that reloading the filter was enough to fix what wasn't working which would happen if you change that setting.

    You can try that manually in  Status > Filter Reload.

    I would packet capture one of the hosts that is giving you trouble before you do anything to fix it, check the firewall logs, be sure you are not running out of states or something, and narrow it down to exactly what is failing for them (DNS, routing, DHCP (it could be something happening only when the lease expires), etc).



  • You're right. Just had another one. I then rechecked tge bogon options…filter reloaded and the ip had outside access again.
    My states are only hovering around 36% at peak times. I dont use dhcp or dns. I have a constant 700mb going out my wan interface...so looking at logs might be a bit hectic.

    Im considering rebuilding my pfsense to a previous version. Would a config backup from current version to previous version work?



  • For what its worth - I've been experiencing something similar recently:  random IP access lost.

    In my case, I have a single LAN bridged across (3) ports using a Jetway motherboard (quad core, 8GB memory, 256GB SSD - eg. plenty of horsepower).

    When sitting on my PC, most things will work, but I won't be able to access SOME sites.  Like I can get to www.usps.com but not their "manage PO Box" page.  I've had access to Google stop.  Currently having a problem getting to fedex.com - e.g.  mainstay sites.

    I have about 150 computer each doing a bit of traffic around 18 times a minute, so a bit of a load, but nothing extraordinary.

    It gets weirder:  I've been monitoring on linux box that does GPU cryptocurrency mining.  Access to the site it uses comes and goes, without much of a pattern:   My personal PC, on the same LAN (although a different port) can sometimes access the site, sometimes not, and doesn't seem to correlate with when the linux box has successful access.  If you look at the chart, access doesn't go to zero since my machine in another town is running without problems (E.g.  The site itself is fine).

    I have tried bumping Firewall Maximum Table entires up to 1,200,000.

    I just tried unchecking "Block private networks and loopback addresses" and "Block bogon networks" without any impact.

    Symptoms cleared up a lot when my ISP changed my external static IP, and often change when I reboot the PFSense box.

    All suggestions highly welcomed.  I've used PFSense for years, but this is impacting my business.


  • Netgate

    "Not being able to get to a web page" is simply too vague of a trouble description.

    You need to start at layer 1 (is it plugged in) and move up and figure out exactly what is failing.

    Is it plugged in?
    Does it get a DHCP address?
    can it ping its gateway my IP address (if enabled by rules)?
    does it get proper ARP for things on the local segment?
    can it ping out by IP address?
    can it resolve names?
    Do all of its configured DNS servers work? Do they give consistent answers to the same queries?
    can it connect out by IP address? By name?
    What do the states look like when a connection is attempted from the PC having problems at the time it is having problems?

    https://doc.pfsense.org/index.php/Connectivity_Troubleshooting



  • Let me summarize:

    The vast majority of functionality is just fine.  Thus layer 1 appears healthy.

    From a statically addressed PC:  Sometimes SOME Internet sites are unreachable, as described below, but most work just fine.  Thus DNS, DHCP, cabling, DNAT rules, etc. are unlikely a problem.

    From a  statically addressed linux box:  I've noticed intermittent access to zec.slushpool.com port 4444.  I have 100% access from St. Louis, and "sometimes" access, lasting minutes to days, from a linux box behind the PFSense firewall of concern.  A PC on a different port of that same concerning PFSense firewall also has "sometimes" access to zec.slushpool.com port 4444 - and access outages do not correlate between the PC and the linux box.  I don't think there is anything special about zec.slushpool.com - it just happens to be the site the linux box and PC are configured to use.

    From my 160+ DHCP addressed processing machines, all linux based, I've seen a couple of instances of not being able to reach their primary site oh1.kano.is and have confirmed with the operator of that site they were not experiencing any issues.  Their backup site, stratum.kano.is functions fine when needed, so I only loose about 5 minutes of failover time.  I'm stating this just because its likely related.

    DNS resolution works fine ALL the time.  Pinging of zec.slushpool.com fails when access stops.

    Access to both zec.slushpool.com and oh1.kano.is will randomly and independently toggle, without any administrative changes occurring on the PFSense box.  (Note that oh1.kano.is is AWS based and requires a TCP ping, not ICMP).  Normally access is stable for hours - but under a curve.  e.g.  I've seen access for as little as a few minutes to days.

    I have not specifically checked if the linux box can ping the firewall, but SSH sessions continue to work.  Clearly the PC can access the firewall since most web browsing functions.

    Rebooting the PFSense box will sometimes resolve the access issues although its become a guessing game as to any individual website working or not.  Most do.

    Changing my external static address resolved about 90% of the access issues, at least for now, but that only occurred a few days ago.

    ALL of these problems started when I upgraded recently.  Prior to that I had no problems accessing everything.

    ps.  I've disabled Snort blocking just to eliminate it from suspicion.  Snort is the only add-on package installed.  Also switched to 8.8.8.8 and 8.8.4.4 to minimize the chances of this being a DNS issue, although the PFSense DNS Resolver is enabled (provides effective caching for most of my machines).

    pps.  Basic firewall health stats: