Dhclient on WAN occasionally fails to renew lease with cable ISP



  • Hi,

    first let me describe my setup:

    ISP(Unitymedia Germany) <-> Cisco EPC3212 <-> pfsense <-> LAN
    The cisco EPC3212 modem is bridged and the pfsense box uses a LAN side bridge (not filtering between LAN ports) to connect to different devices and switches.
    The ISP very rarely changes my WAN IPv4 (every few months) but only hands out 1h leases via a DHCP which sits behind a relay in the 10/8 space (the DHCP itself is on 81.210.x.x).

    My problem is, that from time to time (~ 1-4 times a day) the dhclient fails to extend the WAN side lease. This leads to dhclient reporting an "expire" followed by dropped WAN connections until a new lease is obtained (see log below).

    
    Mar 10 12:43:01 	dhclient 		Creating resolv.conf
    Mar 10 12:43:01 	dhclient 		/sbin/route add default 176.198.32.1
    Mar 10 12:43:01 	dhclient 		Adding new routes to interface: re0
    Mar 10 12:43:01 	dhclient 		New Routers (re0): 176.198.32.1
    Mar 10 12:43:01 	dhclient 		New Broadcast Address (re0): 255.255.255.255
    Mar 10 12:43:01 	dhclient 		New Subnet Mask (re0): 255.255.252.0
    Mar 10 12:43:01 	dhclient 		New IP Address (re0): 176.198.x.y
    Mar 10 12:43:01 	dhclient 		ifconfig re0 inet 176.198.x.y netmask 255.255.252.0 broadcast 255.255.255.255
    Mar 10 12:43:01 	dhclient 		Starting add_new_address()
    Mar 10 12:43:01 	dhclient 		BOUND
    Mar 10 12:43:00 	dhclient 		ARPCHECK
    Mar 10 12:42:58 	dhclient 		ARPSEND
    Mar 10 12:42:58 	dhclient 		PREINIT
    Mar 10 12:42:58 	dhclient 		Deleting old routes
    Mar 10 12:42:58 	dhclient 		EXPIRE
    Mar 10 11:42:57 	dhclient 		Creating resolv.conf
    Mar 10 11:42:57 	dhclient 		RENEW
    
    

    Now I did a packet capture to find out why the WAN DHCP doesn't answer the dhclients requests. It seems like my ISP uses a DHCP relay on 10.207.0.1 which only answers to requests made via broadcast. However, the dhclient in pfsense typically send requests repeatedly via unicasts to this relay, thus failing to get a new lease. Most of the time dhclient sends one broadcast after many failed unicasts, which finally gets a unicast answer from the ISP's DHCP relay. Every now and then this broadcasted request is not send, which then leads to the observed behaviour, i.e. the lease expires and internet connectivity is lost for a short duration until a new lease is obtained. Below is the dhcient's log which shows a regular renew.

    
    Mar 11 12:11:07 	dhclient 	3976 	bound to 176.198.x.y -- renewal in 1800 seconds.
    Mar 11 12:11:07 	dhclient 		Creating resolv.conf
    Mar 11 12:11:07 	dhclient 		RENEW
    Mar 11 12:11:07 	dhclient 	3976 	DHCPACK from 10.207.0.1
    Mar 11 12:11:07 	dhclient 	3976 	DHCPREQUEST on re0 to 255.255.255.255 port 67
    Mar 11 12:04:14 	dhclient 	3976 	DHCPREQUEST on re0 to 10.207.0.1 port 67
    Mar 11 11:58:43 	dhclient 	3976 	DHCPREQUEST on re0 to 10.207.0.1 port 67
    Mar 11 11:55:54 	dhclient 	3976 	DHCPREQUEST on re0 to 10.207.0.1 port 67
    Mar 11 11:54:13 	dhclient 	3976 	DHCPREQUEST on re0 to 10.207.0.1 port 67
    Mar 11 11:52:49 	dhclient 	3976 	DHCPREQUEST on re0 to 10.207.0.1 port 67
    Mar 11 11:50:44 	dhclient 	3976 	DHCPREQUEST on re0 to 10.207.0.1 port 67
    Mar 11 11:46:49 	dhclient 	3976 	DHCPREQUEST on re0 to 10.207.0.1 port 67
    Mar 11 11:45:12 	dhclient 	3976 	DHCPREQUEST on re0 to 10.207.0.1 port 67
    Mar 11 11:43:46 	dhclient 	3976 	DHCPREQUEST on re0 to 10.207.0.1 port 67
    Mar 11 11:43:04 	dhclient 	3976 	DHCPREQUEST on re0 to 10.207.0.1 port 67
    Mar 11 11:42:36 	dhclient 	3976 	DHCPREQUEST on re0 to 10.207.0.1 port 67
    Mar 11 11:42:10 	dhclient 	3976 	DHCPREQUEST on re0 to 10.207.0.1 port 67
    Mar 11 11:42:00 	dhclient 	3976 	DHCPREQUEST on re0 to 10.207.0.1 port 67
    Mar 11 11:41:55 	dhclient 	3976 	DHCPREQUEST on re0 to 10.207.0.1 port 67
    Mar 11 11:41:51 	dhclient 	3976 	DHCPREQUEST on re0 to 10.207.0.1 port 67
    Mar 11 11:41:49 	dhclient 	3976 	DHCPREQUEST on re0 to 10.207.0.1 port 67
    Mar 11 11:41:48 	dhclient 	3976 	DHCPREQUEST on re0 to 10.207.0.1 port 67
    Mar 11 11:41:47 	dhclient 	3976 	DHCPREQUEST on re0 to 10.207.0.1 port 67
    Mar 11 11:11:47 	dhclient 	3976 	bound to 176.198.x.y -- renewal in 1800 seconds.
    Mar 11 11:11:47 	dhclient 		Creating resolv.conf
    Mar 11 11:11:47 	dhclient 		RENEW
    Mar 11 11:11:47 	dhclient 	3976 	DHCPACK from 10.207.0.1
    Mar 11 11:11:47 	dhclient 	3976 	DHCPREQUEST on re0 to 255.255.255.255 port 67
    Mar 11 11:11:07 	dhclient 	3976 	DHCPREQUEST on re0 to 10.207.0.1 port 67
    Mar 11 11:10:47 	dhclient 	3976 	DHCPREQUEST on re0 to 10.207.0.1 port 67 
    
    

    Any idea how this can be fixed? Is there perhaps a way to force dhclient on pfsense to always send broadcast DHCP requests in order to get an answer from the ISP's DHCP hiding behind the relay? If necessary I can also provide a packet capture which clearly shows the DHCP relay (and the DHCP behind it) only answering to broadcasts.

    Thanks and kind regards,
    Mr.Goodcat



  • In order to force the dhclient to always send its requests via broadcast (to get an answer from the ISP's DHCP relay) I added the following to the /etc/inc/interfaces.inc file:

    
     $dhclientconf .= <<<eod<br>interface "{$wanif}" {
    supersede dhcp-server-identifier 255.255.255.255;     <--- This is what I added</eod<br> 
    

    The statement is successfully transfered by pfsense to the dhclient_wan.conf but is NOT applied, i.e. the dhclient keeps sending unicasts to the DHCP relay during the renew phase.
    Now years ago there has been a patch to fix this: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=126999, so I'm not sure why it doesn't work here.

    The general problem of having a DHCP relay (often with a 10/8 IP) only answering to broadcasts leading to the expiration of WAN IP leases seems to be around since several years:

    https://forum.pfsense.org/index.php?topic=79015.0
    https://dev.openwrt.org/ticket/17279#no1

    Unfortunately so far no solution has been found, so I'd like to ask if any dev/expert round here could have a look to finally getting this fixed. It seems to be a problem affecting customers of cable ISPs, oftentimes Unitymedia (aka Liberty Global's subsidiaries such as UPC, Virgin, Ziggo, etc.).

    I can provide logs, captures and the likes to help getting this fixed. Thanks!



  • OK, I got it fixed.

    First off the line  "supersede dhcp-server-identifier 255.255.255.255" can also be added via Interfaces->WAN->Advanced configuration->Option Modifiers which is certainly easier than editing the files of pfsense from shell.

    After I'd taken a look at https://github.com/pfsense/FreeBSD-src/blob/RELENG_2_3_3/sbin/dhclient/dhclient.c it became clear, that the supersede option is not implemented for dhcp-server-identifier, which is why this didn't work…

    Thus I made an addtion to the following function of dhclient.c:

    
    void
    state_bound(void *ipp)
    {
    	struct interface_info *ip = ipp;
    
    	ASSERT_STATE(state, S_BOUND);
    
    	/* T1 has expired. */
    	make_request(ip, ip->client->active);
    	ip->client->xid = ip->client->packet.xid;
    
    /* Start of the updated section. */
    
    if (ip->client->config->default_actions[DHO_DHCP_SERVER_IDENTIFIER] == ACTION_SUPERSEDE) {
    		memcpy(ip->client->destination.iabuf, ip->client->config->defaults[DHO_DHCP_SERVER_IDENTIFIER].data, 4);
    		ip->client->config->defaults[DHO_DHCP_SERVER_IDENTIFIER].len = 4;
    	} else if (ip->client->active->options[DHO_DHCP_SERVER_IDENTIFIER].len == 4) {
    		memcpy(ip->client->destination.iabuf, ip->client->active->
    		    options[DHO_DHCP_SERVER_IDENTIFIER].data, 4);
    		ip->client->destination.len = 4;
    	} else
    ip->client->destination = iaddr_broadcast;
    
    /* End of the updated section */
    
    	ip->client->first_sending = cur_time;
    	ip->client->interval = ip->client->config->initial_interval;
    	ip->client->state = S_RENEWING;
    
    	/* Send the first packet immediately. */
    	send_request(ip);
    }
    
    

    After compiling dhclient with this update the supersede option works for dhcp-server-identifier and dhclient sends its renew requests via broadcast, just as it's supposed to. This gets the DHCP behind the relay to update the lease ahead of expiry.

    A compiled dhclient is attached. In case anyone is having the same problem (i.e. uses Unitymedia) just make a backup of your current dhclient (/sbin/dhclient) and replace the original with the attached file (renamed to just dhclient of course). At least this works for my PFsense 2.3.3_p1 box.

    @Pfsense Devs: can you please verify this and include it in the next point release? Or even better, submit it to upstream? Thanks!

    dhclient.super.txt



  • Thanks for this, from my initial testing it appears to have resolved a similar issue I was facing with my cable internet provider!



  • @UnCheckd:

    Thanks for this, from my initial testing it appears to have resolved a similar issue I was facing with my cable internet provider!

    Great! May I ask which provider you are using?



  • My ISP is Eastlink based in Canada. Using your modified binary to force the server-identifier to use a broadcast address did allow dhclient to renew the address at the appropriate time. For the first time since switching to pfsense I didn't have to manually renew the wan interface manually the next day to restore the internet connection.

    Unfortunately the problem reoccured on the third day (this morning). :( Based on the timings in the lease file I suspect the original issue might be reoccurring during the rebind stage now.

      option dhcp-lease-time 216000;
      option dhcp-renewal-time 100832;
      option dhcp-rebinding-time 189000;
    

    To dig into this deeper, I think I need to understand whats happening differently in how pfsense/dhclient is making a request versus a consumer router, or even a PC plugged directly into the cable modem both of which do not exhibit this behavior.  Since this issue is highly repeatable I'm going to collect some packet captures over the next week and see if I can learn anything.  I will post my findings here.



  • ISP(Charter in USA) <-> Techicolor (actually CISCO) DPC3216 cable modem <-> SG-1000 with pfsense

    I think I am getting the same thing, although I never have been issued an address. The DHCP request on WAN is simply ignored. Some folks on the net say Charter requires a broadcast request.
    http://www.techsupportforum.com/forums/f135/dhcp-problem-with-charter-modems-322842.html

    When I replace the SG-1000 with the funky Sagemcom 5260 router supplied by Charter, everything works fine. Unfortunately the user interface for that router is very primitive, no logs, etc so I can't tell if it is doing anything else. Apparently no hostname is required, from what I read, or anything else of that nature.

    I haven't tried your change to the SG-1000; that is a bit beyond my abilities (needing to compile for the Arm processor).



  • I think the issue is the dhcp client it self, sending renew request waaaay too late, it actually send only renew message over dhcp server  in last 1-5 minutes (very random), usually with dnsmasq, this dhcp renew message always occur every half-time of the lease time (eg, if the lease time is 30 minutes, the renewal will being after 15minutes before it expires and it's very persistent, it will send every 30-sec-1minute depending how it was compiled until the client receives a successful IP renewal).

    in pfsense, my IP gets changed every week sometimes every day.

    while in my tomato firmware powered router, my WAN IP stay even for many many months sometimes I even managed it to get the same IP address owner for a year.



  • @Paul47:

    ISP(Charter in USA) <-> Techicolor (actually CISCO) DPC3216 cable modem <-> SG-1000 with pfsense

    I think I am getting the same thing, although I never have been issued an address. The DHCP request on WAN is simply ignored. Some folks on the net say Charter requires a broadcast request.
    http://www.techsupportforum.com/forums/f135/dhcp-problem-with-charter-modems-322842.html

    When I replace the SG-1000 with the funky Sagemcom 5260 router supplied by Charter, everything works fine. Unfortunately the user interface for that router is very primitive, no logs, etc so I can't tell if it is doing anything else. Apparently no hostname is required, from what I read, or anything else of that nature.

    I haven't tried your change to the SG-1000; that is a bit beyond my abilities (needing to compile for the Arm processor).

    If you want to find out why one router works and the other doesn't, it might help if you used wireshark to capture the packets.



  • Well, I gave that a try. I made an ethernet tap that seems to be OK if I just wire between the modem and router (any two connections to the tap work). Strangely though, if I add the third cable, whether connected or not to my wireshark laptop, the modem loses connection with the router. I'm scratching my head on this. Maybe a homemade tap causes poor termination?

    -later-

    Ah, in looking at the manuals, both devices are gigabyte ethernet, but my laptop is only 10/100, so that's not going to work! And even if I got a faster ethernet for the wireshark, the passive tap probably won't hack it either. Back to the drawing board…


Log in to reply