WAN DHCP Problem



  • Hi All,

    My pfsense box has been driving me nuts the last couple of months, I have 2 WAN connections 1 x adsl and 1 x cable.

    The adsl line is allways fine with its static IP, but I seem to have major problems getting a DHCP address for the cable connection.

    When first booted it works fine but after a few hours/days/weeks pfsense just seems to get stuck in a loop trying to get an ip, the cable nic shows as down and I see lots of DHCP errors in the pfsense log and the cable modem log:

    pfsense:

    php: : The command '/sbin/dhclient -c /var/etc/dhclient_wan.conf ue0 > /tmp/ue0_output > /tmp/ue0_error_output' returned exit code '1', the output was ''

    I also get exit code 15 a lot too.

    cable modem:

    DHCP FAILED - Requested Info not supported.;CM-MAC=removed;CMTS-MAC=removed;CM-QOS=1.1;CM-VER=3.0;

    I am currently running 2.0.1-RELEASE (i386) but I have also tried 2.1 and have the same issues.

    I have recently had a new cable modem but I was having this issue way before that too, I was fine with pfsense 1 and indeed in the early days of 2 but this has been an issue for quite a while now.

    I have searched and searched, I am not using MAC spoofing and I have sticky connections turned off.

    Has anyone got any ideas, are there any known issues with WAN DHCP?



  • @jp141:

    The adsl line is allways fine with its static IP, but I seem to have major problems getting a DHCP address for the cable connection.

    When first booted it works fine but after a few hours/days/weeks pfsense just seems to get stuck in a loop trying to get an ip,

    Have you tried swapping the NICs over, connect the NIC that now connects to the cable modem to the ADSL modem and connect the NIC that now connects to the ADSL modem to the cable modem (and adjust pfSense configuration suitably)?
    If the problem goes away you probably have a problem with the NIC.

    @jp141:

    php: : The command '/sbin/dhclient -c /var/etc/dhclient_wan.conf ue0 > /tmp/ue0_output > /tmp/ue0_error_output' returned exit code '1', the output was ''

    Is there anything interesting in /tmp/ue0_output or /tmp/ue0_error_output?

    @jp141:

    cable modem:

    DHCP FAILED - Requested Info not supported.;CM-MAC=removed;CMTS-MAC=removed;CM-QOS=1.1;CM-VER=3.0;

    I don't know how to decode that. Perhaps the modem documentation will help.

    @jp141:

    are there any known issues with WAN DHCP?

    I don't know of any.



  • Thanks wallabybob, this is on a watchguard firebox so I have 8 nics, I have tried a few of the others, same problem :(

    I have also tried:

    New lan cable
    Putting a switch in between pfsense and the cable modem (this does seem to make it happen less often but could be coincidence)
    Setting MTU manually
    Setting link speed/duplex manually
    Deleting and recreating interfaces and routing groups etc

    Running out of things to try, next time I get the error I will have a look in /tmp



  • I found this in a file called sk1_error_output (ips removed and replaced with xxx for obvious reasons) looks pretty std to me?

    sk1: no link …... got link
    dhclient: PREINIT
    DHCPDISCOVER on sk1 to 255.255.255.255 port 67 interval 2
    DHCPDISCOVER on sk1 to 255.255.255.255 port 67 interval 2
    DHCPOFFER from xxxx
    DHCPREQUEST on sk1 to 255.255.255.255 port 67
    DHCPACK from xxxx
    bound to xxxx -- renewal in 236053 seconds.



  • Do you have an interpretation of the cable modem's none too elegant speech:
    @jp141:

    DHCP FAILED - Requested Info not supported.;CM-MAC=removed;CMTS-MAC=removed;CM-QOS=1.1;CM-VER=3.0;



  • If I goggle it, there are lots of posts from people with cable line quality issues so it could be that, I will log a call with the cable supplier and see what they say.

    But the way that error reads to me is that it doesn't like something about the request sent from pfsense?


  • Netgate Administrator

    I see that other people with similar problems (as you say there are many) have the date/time on the modem logs completely wrong. Is it possible that pfSense is requesting a dhcp lease with a timestamp that is way in future from the modems point of view? I.e. excessive time discrepancy causes dhcp to fail.

    Steve



  • Yeah I saw that too, I will see if I can sync pfsense with the cable modems time, I also think there might be something like them blocking too many/frequent DHCP requests going on.



  • Well it seems to have been ok for 24 hours now but that is the annoying thing with this problem, you think you have it fixed for days/weeks then it resurfaces to bite you! :(



  • I knew it was too good to be true!!

    It has just started again, in the sk1_error_output log file I get this, it seems to try twice then the interface goes down for about 1 second, not sure if that is the cable modem or PFsense taking it down then it loops again:

    sk1: no link ….. got link
    dhclient: PREINIT
    DHCPREQUEST on sk1 to 255.255.255.255 port 67
    DHCPREQUEST on sk1 to 255.255.255.255 port 67

    The only way to get it working again is to reboot pfsense :(

    A reboot of the cable modem does not fix it, any more ideas guys?



  • I wonder if it is timing out waiting for a reply as there doesn't seem to be much of a delay between DHCPREQUEST attempts.

    Is it possible to change the dhclient wait time?



  • I also seem to get stuck in this loop of doom if I reboot or uplug the cable modem, the only way to get it back is to reboot pfsense

    When it happens here is the pfsense log.

    This just loops constantly:

    Apr 13 19:56:31 kernel: sk1: link state changed to DOWN
    Apr 13 19:56:31 check_reload_status: Linkup starting sk1
    Apr 13 19:56:31 php: : HOTPLUG: Configuring interface opt1
    Apr 13 19:56:31 php: : DEVD Ethernet attached event for opt1
    Apr 13 19:56:27 php: : The command '/sbin/dhclient -c /var/etc/dhclient_opt1.conf sk1 > /tmp/sk1_output > /tmp/sk1_error_output' returned exit code '15', the output was ''
    Apr 13 19:56:27 dhclient[16283]: exiting.
    Apr 13 19:56:27 dhclient[16283]: exiting.
    Apr 13 19:56:27 dhclient[16283]: connection closed
    Apr 13 19:56:27 dhclient[16283]: connection closed
    Apr 13 19:56:27 php: : DEVD Ethernet detached event for opt1
    Apr 13 19:56:27 dhclient[31620]: DHCPREQUEST on sk1 to 255.255.255.255 port 67
    Apr 13 19:56:26 dhclient[31620]: DHCPREQUEST on sk1 to 255.255.255.255 port 67
    Apr 13 19:56:26 dhclient: PREINIT
    Apr 13 19:56:25 kernel: sk1: link state changed to UP
    Apr 13 19:56:25 check_reload_status: Linkup starting sk1



  • Right hopefully a bit of success!

    I got it in to a position where it was stuck in the loop of doom

    Then I changed the interface from auto select to 1000 Base T full duplex

    That brought it out of its loop of death and is working without a reboot!

    So fingers crossed this is now fixed! :D

    I think there is a problem with it trying to negotiate speed/duplex and get an IP at the same time, possibly only a problem with certain cable modems?


  • Netgate Administrator

    Hmm, seems odd.
    When you change the connection type like that the interface is brought down and back up so that may have triggered the reconnect.
    Good luck!

    Steve



  • yeah that's true, but the connection was going up and down like a yoyo before unless it is a full reset of the connection as opposed to just the link state?


  • Netgate Administrator

    The link state is reported by the driver to the OS in order to trigger the DHCP client etc. Bringing the interface up and down is more than that. You can test it easily enough if it starts flapping again:

    
    ifconfig sk1 down
    ifconfig sk1 up
    
    

    Steve



  • Ok thanks I will keep an eye on it and keep you posted.

    I really hope this is fixed, I love pfSense but it is driving me absolutely balmy almost to the point I want to scrap this FW and get something else :(



  • Well that does indeed seemed to have fixed it! fingers crossed :)



  • I have this EXACT same issue. It only affects DHCP on the WAN. My static interfaces are not affected. Only a reboot will correct it.

    I too found the manual duplex work around, and that does seem to work. However, here is the problem..

    So, you force pfsense to a manual duplex setting. You better hope you can do the same on the cable modem, or you WILL have a duplex mismatch. In my case, I cannot manually set the duplex on my cable modem. Since the cable modem only supports auto negotiation on its interface, the instant you manually set your duplex in pfsense the cable modem's NIC will default to half duplex. So, you have 100 full duplex set in pfsense, and the cable modem will be running at 100 half duplex. This is by design.

    I'm left with two options here.. a duplex mismatch or force everything to 100 half duplex. This isn't a great work around.

    Now, what I said works a little different with gig links (I'm limited to 10/100), but the bottom line is that you should always set auto or manual duplex on both ends. You should never set auto on one end, and manual on the other.

    I'd love to know what is causing this so I don't have to fool with manually adjusting my duplex settings.



  • @stephenw10:

    The link state is reported by the driver to the OS in order to trigger the DHCP client etc. Bringing the interface up and down is more than that. You can test it easily enough if it starts flapping again:

    
    ifconfig sk1 down
    ifconfig sk1 up
    
    

    Steve

    I tried this the last time the problem occurred, and it made no difference. I'd down the interface, but something kept automatically bringing it back up - and the link state kept flapping.



  • It's a NIC driver issue that's outside of our control, may want to try a 2.1 snapshot since it has a newer base OS where that may not be an issue. See the 2.1 board heree.



  • Thanks for the info. I've been searching for problems related to the VR4 driver which supports the VT6105M NIC on 8.1, but I'm coming up empty. Can you share where you found that? I'm curious as to the exact issue, and in what release it's fixed.



  • An earlier post reported this problem on sk interfaces. Are you seeing a similar problem on vr interfaces?



  • I have the same issue with my Cable Modem and em interface when I do mac address cloning. Interface flaps up and down sometimes. I don't know what triggers it. Now that I have selected 100BT Half Duplex, it seems to be working good. I don't think it is as simple as driver issues in FreeBSD.


  • Netgate Administrator

    Did you ever try JimP's suggestion, here?

    Steve



  • @wallabybob:

    An earlier post reported this problem on sk interfaces. Are you seeing a similar problem on vr interfaces?

    Can you reference the post, and I'll take a look?



  • @stephenw10:

    Did you ever try JimP's suggestion, here?

    Steve

    Yes, I did make those changes to my inc files but no help. Still some up and down flapping loop.



  • @yaw:

    @wallabybob:

    An earlier post reported this problem on sk interfaces. Are you seeing a similar problem on vr interfaces?

    Can you reference the post, and I'll take a look?

    See replies 3, 9 and 11 of this topic. I'm curious if this is a problem of a particular class of NIC or a more generic DHCP problem.



  • @GoldServe:

    I have the same issue with my Cable Modem and em interface when I do mac address cloning. Interface flaps up and down sometimes.

    Pre-2.0.1 would flap repeatedly when MAC spoofing in some circumstances, upgrade to 2.0.1 to fix that.



  • @wallabybob:

    @yaw:

    @wallabybob:

    An earlier post reported this problem on sk interfaces. Are you seeing a similar problem on vr interfaces?

    Can you reference the post, and I'll take a look?

    See replies 3, 9 and 11 of this topic. I'm curious if this is a problem of a particular class of NIC or a more generic DHCP problem.

    Sorry.. I thought you were talking about another thread. Yes, this is the same issue except with the vr interfaces. It only happens with DHCP, and I can replicate it every time.


Log in to reply