DHCP causing internet outage
-
I'm trying to track down the specifics of the problem, but from what I can gather my ISP is going something funky with DHCP leases that causes pfSense to suddenly "lose internet" every now and then (1-2 times a day usually, though seemingly not at a set time).
Firewall: Netgate SG-3100 (had the same issue with my SG-2220 before upgrading a few weeks ago)
Modem: Smart/RG SR808ac (bridge mode)
ISP: TekSavvy (reselling Rogers cable internet)TCP dump from the WAN interface when the outage is occurring:
Things that resolve the outage:
- unplug the cable from modem -> pfSense
- ifconfig mvneta2 down; ifconfig mvneta2 up
- the "Release WAN" button on the Status -> Interfaces tab
I've tried to automate the renewal, but that doesn't work:
[2.4.5-RELEASE][admin@pfSense.localdomain]/root: dhclient mvneta2 dhclient already running, pid: 74535. exiting. [2.4.5-RELEASE][admin@pfSense.localdomain]/root: service dhclient restart mvneta2 'mvneta2' is not a DHCP-enabled interface dhclient already running? (pid=74535). [2.4.5-RELEASE][admin@pfSense.localdomain]/root:
I've tried messing with the DHCP values for the WAN and that didn't change anything:
I've pretty much reached the end of my technical knowledge for resolving this. My brute-force attempt by running ping followed by a renew if ping failed doesn't work because I can't figure out a command that will renew the DHCP, and I don't know enough about how DHCP works (especially with an ISP) to debug anything further. These daily outages are becoming more problematic as I'm getting cut off in the middle of video calls for work etc.
And suggestions on how to further debug or fix this would be much apprciated!
-
@endlessdiy said in DHCP causing internet outage:
TCP dump
Filter for DHCP traffic. The usual arp's and https traffic is not related to DHCP.
Use "UDP" and "the DHCP ports" as a filter. -
@gertjan sorry I should have clarified, that's is all the traffic. It was a 1000 packet dump, ~5s of traffic, and DNS and ARP is the only thing in the capture, nothing else at all.
I'll try and get a longer capture next time it breaks and see if there's anything being missed.
-
@gertjan Okay so I got a much longer capture this time and it caught 2 DHCP packets. Pictures below, again I'm not familiar enough with DHCP yet to know what is or isn't good here. I pulled just under 2 min of packets without limit and this is all I saw. Let me know if there's anything else I can pull for troubleshooting or if you have any ideas what is wrong here.
Edit: I'm pretty sure these DHCP packets are from when I manually hit the renew button in the GUI. So it would seem like pfsense isn't even attempting a renew until I tell it to since these are the only DHCP packets.
-
An expansion of the response in case it helps:
-
Ran fresh pcaps tonight, and after 2 minutes of pcap data there was no DHCP traffic going on onthe WAN interface. The 2 packets above are definitely from when I manually release/renew in the UI (got them in a separate capture tonight).
So basically pfSense doesn't seem to be respecting the DHCP renewal time, because the capture yesterday said renew time 14h, but I'm still here with an expired lease.
Today's DHCP renew said
Lease 2 days
,Renew 1 day
,Rebinding 1 day 18 hours
. But when I look in/var/db/dhclient.leases.mvneta2
it looks like the renew is more than 1 day out:[2.4.5-RELEASE][admin@pfSense.localdomain]/root: date Sat Mar 20 17:09:34 PDT 2021 [2.4.5-RELEASE][admin@pfSense.localdomain]/root: less /var/db/dhclient.leases.mvneta2 lease { <snip> } lease { interface "mvneta2"; <snip> option dhcp-lease-time 172800; option dhcp-message-type 5; option dhcp-server-identifier 7.127.2.90; option dhcp-renewal-time 86400; option dhcp-rebinding-time 151200; renew 0 2021/3/21 23:53:46; rebind 1 2021/3/22 17:53:46; expire 1 2021/3/22 23:53:46; }
Is this intentional? Am I misunderstanding how DHCP renews are supposed to work? Or is pfSense just ignoring the values sent by the server and making up its own?