Verizon FiOS and pfSense DHCP Issue



  • Hello everyone. I have an issue here that started around 8 pm last night and I am not really sure what to do here. Basically every 1-2 hours (its been 2 hours so far this morning) the WAN DHCP Gateway on pfSense is going offline and I am getting packet loss. The internet is down at this point. Now if I wait long enough it seems like it will eventually come back up (at least an hour but it has taken more). If I just disable the WAN interface on pfSense and then re-enable it, the internet comes back up and is good for another two hours.

    My setup is fairly simple. I have ethernet from the ONT ----> pfSense WAN Interface. The Verizon router is connected through its lan port to switch which uplinks to the pfSense. I followed this guide when I set everything up. All the TV services work great and even the caller-id and remote DVR functinos are working.

    Now I looked at the DHCP logs and it seems like it is trying to renew DHCP every 3600 seconds. From what I have seen this morning, the first renewal goes through and it binds to the same IP it already has. However, when it goes to do it again an hour later, it keeps sending requests to the gateway but gets no response. Eventually it sends a request to and gets a response from a gateway. This is what happens if I power cycle the ONT or Disable/Re-Enable the WAN interface on pfSense. I will post a small amount of logs from this morning below but I can post more if needed. I am blanking my wan ips and such out with x.x.x.x

    Any idea what I should do here? This have been working fine for 2+ years now and all of sudden last night at 8 all hell broke loose. I haven't called Verizon yet but I may have to do that soon. It is just difficult to talk to them when you have your own router because sometime they don't want to help. Sorry for the long post. I apppreciate your help and time. Thank you.

    pfSense DHCP Log - Pastebin Link



  • @capn783

    My guess would be the DHCP server. There is a difference between getting an address from start and renewing. From a cold start, the process goes through the full discover, offer, request, ack sequence, but only request, ack when renewing. Use Packet Capture to see what's actually happening.

    Do other devices fail in the same way?



  • Hi JKnott,

    I just ran a packet capture but I am not sure I did it right. What I did was run it on the WAN interface for any protocol and I did not specify a specific host. I did not enable Promiscuous mode either. I ran it for almost 20 minutes and it really doesn't seem like there was much in the log. What I am seeing is every 2 hours, on the dot pretty much, I get this message in the gateway log:

    WAN_DHCP 96.232.195.1: Alarm latency 4450us stddev 8794us loss 22%

    That is one the WAN IF goes down and I sit at around 40% packet loss on the interface. Shortly after I will start seeing DHCP Request messages on igb0 (WAN) to the WAN_DHCP gateway on port 67. It will just stay like this for a while but eventually will send requests to 255.255.255.255 and get an ACK from one of verizon's servers. The following is what I see in the Gateway and DHCP log after I restart the WAN interface. I am also attaching the packet capture as well.

    pcap.txt

    EDIT: Forgot to answer your question about other devices. All my internal equipment continues to work fine. pfSense is hosting a few vlans. I have a 2 Gig Eth bond from my pfsense box to my Cisco Nexus 3048 switch as a trunk for the VLAN routing. I also have a single trunk back to pfsense for the native LAN interface. I have a Windows Server 2016 DC that is running DHCP and handling all my internal DHCP .

    Gateway

    ![0_1575575132597_packetcapture.cap](Uploading 100%) Dec 5 14:16:54 dpinger WAN_DHCP 96.232.195.1: sendto error: 65
    Dec 5 14:16:54 dpinger WAN_DHCP 96.232.195.1: sendto error: 65
    Dec 5 14:16:55 dpinger WAN_DHCP 96.232.195.1: sendto error: 65
    Dec 5 14:16:55 dpinger WAN_DHCP 96.232.195.1: sendto error: 65
    Dec 5 14:16:56 dpinger WAN_DHCP 96.232.195.1: sendto error: 65
    Dec 5 14:16:56 dpinger WAN_DHCP 96.232.195.1: sendto error: 65
    Dec 5 14:17:26 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 96.232.195.1 bind_addr 96.232.195.135 identifier "WAN_DHCP "
    Dec 5 14:17:28 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 96.232.195.1 bind_addr 96.232.195.135 identifier "WAN_DHCP "

    DHCP

    Dec 5 14:17:21 dhclient 6879 DHCPREQUEST on igb0 to 255.255.255.255 port 67
    Dec 5 14:17:21 dhclient 6879 DHCPNAK from 72.69.227.1
    Dec 5 14:17:21 dhclient 6879 DHCPDISCOVER on igb0 to 255.255.255.255 port 67 interval 1
    Dec 5 14:17:21 dhclient 18142 connection closed
    Dec 5 14:17:21 dhclient 18142 exiting.
    Dec 5 14:17:22 dhclient PREINIT
    Dec 5 14:17:22 dhclient 24391 DHCPREQUEST on igb0 to 255.255.255.255 port 67
    Dec 5 14:17:22 dhclient 24391 DHCPNAK from 72.69.227.1
    Dec 5 14:17:22 dhclient 24391 DHCPDISCOVER on igb0 to 255.255.255.255 port 67 interval 2
    Dec 5 14:17:22 dhclient 24391 DHCPOFFER from 96.232.195.1
    Dec 5 14:17:22 dhclient ARPSEND
    Dec 5 14:17:24 dhclient ARPCHECK
    Dec 5 14:17:24 dhclient 24391 DHCPREQUEST on igb0 to 255.255.255.255 port 67
    Dec 5 14:17:24 dhclient 24391 DHCPACK from 96.232.195.1
    Dec 5 14:17:24 dhclient BOUND
    Dec 5 14:17:24 dhclient Starting add_new_address()
    Dec 5 14:17:24 dhclient ifconfig igb0 inet 96.232.195.135 netmask 255.255.255.0 broadcast 96.232.195.255
    Dec 5 14:17:24 dhclient New IP Address (igb0): 96.232.195.135
    Dec 5 14:17:24 dhclient New Subnet Mask (igb0): 255.255.255.0
    Dec 5 14:17:24 dhclient New Broadcast Address (igb0): 96.232.195.255
    Dec 5 14:17:24 dhclient New Routers (igb0): 96.232.195.1
    Dec 5 14:17:24 dhclient Adding new routes to interface: igb0
    Dec 5 14:17:24 dhclient /sbin/route add default 96.232.195.1
    Dec 5 14:17:24 dhclient Creating resolv.conf
    Dec 5 14:17:24 dhclient 24391 bound to 96.232.195.135 -- renewal in 3600 seconds.



  • @capn783 said in Verizon FiOS and pfSense DHCP Issue:

    ec 5 14:17:21 dhclient 6879 DHCPREQUEST on igb0 to 255.255.255.255 port 67
    Dec 5 14:17:21 dhclient 6879 DHCPNAK from 72.69.227.1
    Dec 5 14:17:21 dhclient 6879 DHCPDISCOVER on igb0 to 255.255.255.255 port 67 interval 1

    Why is there a request to 255.255.255.255? Only the discover should do that. By the time it gets to the request, the client should have the server address. The NAK means the server doesn't like what's being requested.

    When you capture, you can filter on port 67, which will also capture 68 for both sides of the DHCP process. You can also select IPv4 and UDP, though port 67 by itself should be enough. Also, Packet Capture by itself doesn't show all the details, so it helps to download the capture and view it with Wireshark.

    Also, that sendto error: 65 means there's no route to that address. Is that a valid address? I assume that what you're using for the monitor address. You generally don't have to monitor the connection.



  • Hi JKnott

    I just called Verizon and they went to reset the ont again and then my phone stopped working. I was on the phone with the guy for a while. Apparently there is an outage with the phone and he worked with someone who said the internet should be ok now but he is supposed to call me in 2 hours to verify. I will run the packet capture again as well in 2 hours and filter by the port. I actually did view it in wireshark but it wouldn't let me upload the .cap file. I guess I should have just zipped it. I will do that next time. As far as the 255.255.255.255 requests I have no idea why it is doing that. I just noticed in the DHCP log that is what is coming up when I disable and re-enable the wan interface to get the internet back up. The only other thing I did last night when this started was I changed out my rack switch from a Catalyst 2960s to the Nexus 3048. The configs are the same though and I never lose anything internal. I will post again in 2 hours with an update. Thank you for your help.



  • @JKnott Alright so quick update. Verizon wanted me to connect their router directly to the ONT as a test (knew they were going to ask to do this). I did that and have about an hour left to make it two hours to see if it stays up this way or if I still get an issue. I ran the capture last time it failed. I am attaching it here. You can ignore the entries for 192.168.1.1. Thats when they had me switch it back to their router to the ONT. I guess if I don't get dropped through their router then it could be something with pfSense. I never spoofed my mac address. I read way back when I was setting this up that you needed to do that but I never had any issues so I never did that. I will post back once this test is done.

    packetcapture (1).zip



  • @JKnott Ok so with their router connected to the ONT directly and pfsense behind that, the internet has stayed up now for almost 3 hours. I guess on Saturday and I will reconnect everything back the way I had it. I will try the mac spoofing too. I really don't understand what could be causing this but I will troubleshoot further on the weekend and update as I know more. If you have anything you think I should look at, let me know. Thank you JKnott for your help.



  • @capn783

    I have uploaded capture files from Packet Capture. There used to be a problem where the file extension had to be change, but I thought that had been fixed.



  • @JKnott Hi JKnott. This morning I reconnected everything back through pfSense. I tried the mac spoof. Still went down right at the 2 hour mark. What I just did a few minutes ago was remove that new nexus 3048 switch i put in and put my old catalyst 2960s in. That is the only change that was made when this all started happening. I really don't know why that would cause any issues but I figured i just try it. If this doesn't work though, I have no idea what could be causing this. It goes down on the dot every 2 hours. It just goes offline and starts experiencing packet loss with the gateway and thus can't renew a dhcp address. I will update further as I know more. If you have anything you think I should check let me know. Thank you.



  • @JKnott Ok update real quick. So putting the old switch back did not change anything which I thought would be the case. What I am currently trying is I took an off the shelf router (Linksys AC1200). I have my FiOS ONT connected to that and single LAN connection from the router to a pc here. I have internet right now. I am going to wait 2-3 hours here to see if it stays up or not. If not, then there has to be something Verizon changed. If it does stay up, then I have a problem in pfSense somewhere and I will probably need some help because I changed nothing on pfSense recently other than some of my vlans. The vlan numbers didn't change I just changed some interface addresses. For example, I changed my wired subnet from 192.168.20.0/24 to 172.16.20.0/24. All my internal vlan routing and connections have been fine though. No issues there.

    Do you think it would be worth it to just reinstall pfSense? I am not really running any major packages right now. I use Avahi, DHCP Relay (to my Windows DHCP Server), Squid (mainly just for the AV), and I have the DNS Resolver on which I dont even really need on anymore. I had it on for when I was messing around with pfBlockerNG. I will update after this test. If anyone has any ideas, I would appreciate it. Thank you for your help and time.



  • @capn783

    See what happens with the other router. I don't know that reinstalling pfSense would help, but it wouldn't hurt.



  • @JKnott Hi JKnott. So the Linksys router never went offline. It was online for like 4 hours with no issues before I went back to pfSense. I really have no idea what to check at this point. Exactly two hours after reset the WAN interface, I get packet loss according to the gateway monitor (I tried turning this off but that didn't work) and then the WAN goes offline. I can simply release and renew the wan in the status > interfaces and it comes back up. Anything else I should check short of just reinstalling pfSense from scratch? Thank you for your help.



  • @capn783 said in Verizon FiOS and pfSense DHCP Issue:

    @JKnott Hi JKnott. So the Linksys router never went offline. It was online for like 4 hours with no issues before I went back to pfSense. I really have no idea what to check at this point. Exactly two hours after reset the WAN interface, I get packet loss according to the gateway monitor (I tried turning this off but that didn't work) and then the WAN goes offline. I can simply release and renew the wan in the status > interfaces and it comes back up. Anything else I should check short of just reinstalling pfSense from scratch? Thank you for your help.

    Hi @capn783 - a fellow pfSense user here with a FiOS internet connection. I have not seen troubles like the ones you describe though. Quick question: Do you have IPv6 enabled on the pfSense firewall (e.g. WAN interface)? If yes, do you see any difference if you disable it? Also, just be sure, have you already tried using a different ethernet cable between the ONT and pfSense router?

    Hope this helps.



  • @JKnott @tman222 Hi guys. Thank you for sticking with me on this. As of this morning it looks like I have it fixed. I have had the ONT connected directly to pfsense now since 12:25 AM and the internet has not gone down since. Here is what happened.

    The box I use for my pfSense is a SuperMicro SYS-5018A-FTN4. When I initially set this up, I had 5 connections. The wan to igb0, LAN (Management) to igb1, and lagg0 for the vlan routing (igb2 and igb3). The 5th connection was an ethernet from the IPMI to my switch. Now a little while ago I forgot the password to the IPMI login so I stopped using it. When I threw my new Nexus switch in I never reconnected the IPMI connection because I was like I don't remember the password so I'll just leave it disconnected for now and fix it later. Well after some researching last night, I ran across this post on this forum. A little ways down the user bamhm 182 mentions having the same issue and that it ended up being caused by him having IPMI enabled on his r210 with nothing connected to it. This what I believe caused my issue here. The IPMI defaults to failover mode when enabled on the SuperMicro box and will use igb0 if there is no connection to the IPMI port. I reconnected the IPMI last night (even though I can't log in because I can't remember the password lol) at 12:25 and since then I have had 0 issues. I will continue to monitor but I think this is solved now.

    Again thank you for help, time, and recommendations.



  • Ah, I see. I actually use a Supermicro 5018D-FN8T as my pfSense box so I know the setting you are referring to. It looks like that IPMI and pfSense may have been competing for the WAN IP Address causing the intermittent connectivity. This probably also means that some point your IPMI admin interface may have been publicly exposed. Try to see if there is way to reset the IPMI password (e.g. through the BIOS perhaps) if you can't remember it. Once you have done that, log into the IPMI admin interface and change the IPMI interface from "failover" to "dedicate".

    https://serverfault.com/questions/361940/configuring-supermicro-ipmi-to-use-one-of-the-lan-interfaces-instead-of-the-ipmi

    Hope this helps.



  • i know this is old but when i did a google search i found this so i thought i would comment here. So i too replaced my Verizon FiOS router with a pfsense firewall/router. and sure enough every 24 hours i received the send error 65 message and everything would hang until i rebooted the pfsense. I started playing around with the DHCP settings on the WAN interface. When i set them to "freeBSD default" the problem went away. so the defaults listed in my protocol timing section are timeout=60 retry=3-- select timeout=0 reboot=10 backoff cutoff=120 and initial interval=10

    so i am day 3 with no problems. i have asked the quality assurance team (my 14 year old daughter who is home 24/7 now because covid-19, with iPhone and iPad) to let me know of any problems. so far she and not generate any bug reports :)


Log in to reply