PfSense looses connection every 28-30 days.

jacksnack2

Thank you all for the quick feedback.

I am a Linux Admin by trade, although networking does occasionally fall under my perview :>

I blocked the modem IP as suggested. Also, I have re-enabled dpinger as this allows the router to re-obtain a lease. However the issue is the while obtaining a lease via dpinger, DNS resolution fails for internal clients. A router reboot is required.

I don't see a solution here. But again, I do appreciate the help.

johnpoz

@jacksnack2 said in PfSense looses connection every 28-30 days.:

while obtaining a lease via dpinger,

Huh? Dpinger doesn't have anything to do with renewing a dhcp lease??

jacksnack2

@johnpoz dpinger does not directly deal with leases, but it does fire actions:

/usr/local/sbin/pfSctl
-c "service reload dyndns ${GW}"
-c "service reload ipsecdns"
-c "service reload openvpn ${GW}"
-c "filter reload" >/dev/null 2>&1

I can state the when dpinger was enabled, the router held an IP address, even though DNS did not work internally.

This allowed me to ssh into the machine and reboot.

Once I disabled dpinger, no IP address existed for the WAN.

Are you saying this is a coincidence?

johnpoz

None of that would have anything to do with dhcp lease renew..

stephenw10

Indeed the dhclient is independent of dpinger. Something it triggered may have restarted the dhclient perhaps but if it was able to pull a lease it would have done so anyway.

You said the rebooting the modem also allowed it to come back up. I would try simply pulling the WAN cable from either the modem or pfSense and reconnecting it. Does that also bring back the connection?

Are you running 2.4.4p3 now? It's possible you're hitting this: https://redmine.pfsense.org/issues/9267

That is fixed in current 2.5 snapshots if you're able to test one.

Steve

jacksnack2

@stephenw10 Bug #9267 does give me hope in a resolution.

I will look into this.

Thank You.

Derelict

That redmine does not seem to match. The packet capture is pretty clear. At least it seems pretty clear to me.

Another test would be not rebooting anything and simply disconnecting the coax from the modem, letting it drop, and reconnecting it. That would eliminate any interface bounces unless the modem does one in that case.

stephenw10

Yes, you're right. In the case of that bug the client stops requesting a new lease but here it clearly continues. Ignore me!

Steve

jacksnack2

https://redmine.pfsense.org/issues/9267
"...DHCP timeout occurs and the cached gateway address is not pingable. This results in a case where the cached IP is removed from the interface, but dhclient is informed via the exit status of 0 that the IP was added successfully. As a result, the impacted interface remains without an IPv4 address..."

Seems plausible this is the issue.

Why remove the cached IP?

Derelict

I have no doubt that might cause some people problems, but I don't see how it will make your modem stop responding to DHCPREQUEST/DHCPDISCOVER as it apparently does.

Kimberly3475

What is happening when those ARP resolve messages start? You showed the end, what about the beginning? Is the MTU showing anything strange on the interface when it is not working?

dpinger is trying to ping the gateway address but it cannot because it is not receiving an ARP response for it on WAN. Then it miraculously does for some reason. ttrockstars

If it were me I'd packet capture for ARP on WAN and see what is happening. I'd just set interface WAN protocol ARP and a packet count of 100000 or 1000000 and let it run. Then get the times of the start and end of the can't allocate llinfo logs and see what's happening there in wireshark.

jacksnack2

@Kimberly3475 Thanks.

I will most likely start a packet capture in the next several weeks, as this is the time for the next event to occur.

Derelict

I would run it on the command line and capture both DHCP and ARP.

Something like this should work:

Stop any running capture in the gui
SSH or console in
Menu option 8
nohup /usr/sbin/tcpdump -i eth0 -c 1000000 -s 0 -w /root/packetcapture.cap arp or port 67 &
exit

eth0 needs to be your WAN interface (em0, igb1, etc). You can get that interface name from Status > Interfaces.

You should be able to log out and the GUI should show the capture running there. Should be able to stop it and view it normally when the time comes.

You might want to start one, let it soak, and stop it to see how much ARP there is out there. It might be a lot and will vary due to the design at the ISP. You might want to up that count to 10000000.

jacksnack2

@Derelict hello.

I am going to do exactly that, but may from a Linux box on the network via ssh.

I’ll post back when something comes up.

Thanks again for all the help.

jacksnack2

@Derelict Thanks again for the reply.

However I think there is error in you command:

what does '1.' represent?

I ran this via ssh without the '1.'

ARP captures are brutal...filling the logs. Already at 554684. Is there anyway to minimize the captures to something more specific?

Thanks.

Derelict

Yeah that's a mistake. Corrected.

Not that I can think of. You can do a circular capture that keeps overwriting the older files but you can miss the event if you don't stop it soon enough after it happens. See if adding -p helps:

nohup /usr/sbin/tcpdump -i eth0 -p -c 1000000 -s 0 -w /root/packetcapture.cap arp or port 67 &