Dhclient doesn't work after boot up for WAN conected to modem



  • My setup

    • Pfsense WAN (igb0) connected directly to ISP modem (configured as bridge)
    • Pfesnse LAN (igb1 - with a few VLANs) connected to managed switch - not important for this problem
    • WAN is configured to get IP from DHCP
    • Modem is Compal ch7465lg from UPC

    Short description of the problem

    • if I power off both pfSense and modem and after that start them at the same time dhclient won't start properly
    • on dashboard WAN will get n/a instead of IP and gateway will get Unknown status.
    • after that if I pull out the cable from igb0 and plug it back I will get IP immediately from DHCP
    • instead of that I can also use following commands
    ifconfig igb0 down
    ifconfig igb0 up
    
    • ... or just click renew in "Interfaces"

    Probably important details

    • pfSense starts really fast (about 20s)
    • on the other hand modem is not so fast
      ** after turning it on it takes 40s to get light next to ethernet port (link up?)
      ** after that I have to wait next 50s to see blinking light next to ethernet port - indicating any communication
      ** traffic mentioned above is only arp who has (if I remember correctly)... nothing from pfSense WAN

    Logs indicating problem

    There is nothing about dhclient in dhcp logs and in system logs I can find following

    Apr 18 17:41:05 pfSense check_reload_status: Linkup starting igb1
    Apr 18 17:41:05 pfSense kernel: igb1: link state changed to UP
    Apr 18 17:41:05 pfSense kernel: igb1.50: link state changed to UP
    Apr 18 17:41:05 pfSense kernel: igb1.200: link state changed to UP
    Apr 18 17:41:05 pfSense kernel: igb1.10: link state changed to UP
    Apr 18 17:41:05 pfSense check_reload_status: Linkup starting igb1.50
    Apr 18 17:41:05 pfSense check_reload_status: Linkup starting igb1.200
    Apr 18 17:41:05 pfSense check_reload_status: Linkup starting igb1.10
    Apr 18 17:41:13 pfSense php-cgi: rc.bootup: The command '/sbin/dhclient -c /var/etc/dhclient_wan.conf igb0 > /tmp/igb0_output 2> /tmp/igb0_error_output' returned exit code '1', the output was ''
    

    and in /tmp/igb0_output I get this

    igb0: no link .............. giving up
    

    My observations

    • as already mentioned this problem occurs only while booting... I tried different experiments (turing off only modem or only pfSense) and each time pfSense worked perfectly
    • Line from /tmp/igb0_output made me look into dhclient.c sources and I believe that dhclient exits because of this code in main function (after not being able to get link status)
        if (!interface_link_status(ifi->name)) {
    		fprintf(stderr, "%s: no link ...", ifi->name);
    		fflush(stderr);
    		sleep(1);
    		while (!interface_link_status(ifi->name)) {
    			fprintf(stderr, ".");
    			fflush(stderr);
    			if (++i > 10) {
    				fprintf(stderr, " giving up\n");
    				exit(1);
    			}
    			sleep(1);
    		}
    		fprintf(stderr, " got link\n");
    	}
    	ifi->linkstat = 1;
    
    • I'm able to reproduce this problem only with mentioned modem, so it might be hard to check it by anyone else - for example I wasn't able to force this error using switch and playing with cables and turning it on and off at specific times during boot.

    My workaround
    Put switch between pfSense and modem. With this in logs I get (proper last line instead of failing dhclient)

    Apr 18 17:48:29 pfSense check_reload_status: Linkup starting igb1
    Apr 18 17:48:29 pfSense kernel: igb1: link state changed to UP
    Apr 18 17:48:29 pfSense kernel: igb1.50: link state changed to UP
    Apr 18 17:48:29 pfSense kernel: igb1.200: link state changed to UP
    Apr 18 17:48:29 pfSense kernel: igb1.10: link state changed to UP
    Apr 18 17:48:29 pfSense check_reload_status: Linkup starting igb1.50
    Apr 18 17:48:29 pfSense check_reload_status: Linkup starting igb1.200
    Apr 18 17:48:29 pfSense check_reload_status: Linkup starting igb1.10
    Apr 18 17:48:29 pfSense check_reload_status: Linkup starting igb0
    

    and /tmp/igb0_output starts with

    igb0: no link ...... got link
    dhclient: PREINIT
    DHCPREQUEST on igb0 to 255.255.255.255 port 67
    

    My question
    Is there easy fix for this problem so I don't have to use additional switch? Or maybe I'll have to wait longer, because it won't be fixed without code change? I tried to find something in the code, but unfortunately it looks like I don't understand it well enough :)



  • You shouldn't need a switch between your bridged ISP modem & pfSense box to get a public IP on the WAN side. The output you show doesn't seem to indicate the switch in the middle is helping. If your modem/router is correctly in bridge mode then your ISP issued public IP address should flow through the modem and be picked up on the pfSense WAN port. I don't understand why you turn them on simultaneously. Normally you would cycle on the modem until it if fully booted up (wait a few minutes). Then power on your pfSense box. Assuming your modem/router bridge mode settings are all correct, pfSense should work like a charm. If your modem/router is not not correctly in bridge mode and you have it trying to route along with pfSense routing, then you get confusing double NAT going on creating problems. Anyways I'm not the expert most are around here but this is my train of thought on your issue.



  • I don't understand why you turn them on simultaneously. Normally you would cycle on the modem until it if fully booted up (wait a few minutes).

    I agree, and at the same time I want to be prepared for situation in case of power outage. Modem and pfSense will start automatically when power is back.

    I believe there are other scenarios possible. I just described my situation.

    Also we can look at this without my context. Basically it is possible (with proper timing?) to boot pfSense and don't get IP from IPS's DHCP without manual interwention - dhclient dies and never comes back (if I understand it correctly). And I have just found one scenario which triggers it. Apart from that pfSense works like a charm



  • Hi,

    You're using the DHCP Client on WAN.

    Did you try the advanced configuration settings ? It's all about time outs etc.

    9c0ce035-0418-4d3b-9e23-e4b67403b437-image.png

    @tomashk said in Dhclient doesn't work after boot up for WAN conected to modem:

    if I power off ... pfSense

    How ?
    A "power" down like 'rip out the power' ?



  • if I power off ... pfSense

    How ?
    A "power" down like 'rip out the power' ?

    Sorry for not being precise. For experiments I shut it down with Halt system in Diagnostics and after that I take out the power cord. But just to be sure, once I turned off the power without proper shutdown. After restoring power it behaved the same. There will be UPS to make sure that it is done properly in case of power outage.

    You're using the DHCP Client on WAN.

    Did you try the advanced configuration settings ? It's all about time outs etc.

    Yes, I have Timeout set to 900 due to other problems already mentioned in pfSense forum (I can't find it now but it was something about loosing gateway after 60s if there is no DHCP response, but I don't want to make offtopic).

    But here it won't help for this problem. Values for Protocol timing are used by dhclient only if it is properly started.

    If I understand source code of dhclient correctly, in main function:

    • it makes some initialization
    • when all is OK at the end of main function it goes into background to run dhclient as a daemon

    but

    igb0: no link .............. giving up
    

    in /tmp/igb0_output means that initialization has never finished and dhclient process is closed and won't work in background. So values from Protocol timing won't be used.

    Of course I might be wrong - it is first time I look into this code.



  • In the first post I should have posted a little more of the log - because it might be important for someone with proper knowledge.

    Apr 18 17:41:05 pfSense check_reload_status: Linkup starting igb1
    Apr 18 17:41:05 pfSense kernel: igb1: link state changed to UP
    Apr 18 17:41:05 pfSense kernel: igb1.50: link state changed to UP
    Apr 18 17:41:05 pfSense kernel: igb1.200: link state changed to UP
    Apr 18 17:41:05 pfSense kernel: igb1.10: link state changed to UP
    Apr 18 17:41:05 pfSense check_reload_status: Linkup starting igb1.50
    Apr 18 17:41:05 pfSense check_reload_status: Linkup starting igb1.200
    Apr 18 17:41:05 pfSense check_reload_status: Linkup starting igb1.10
    Apr 18 17:41:13 pfSense php-cgi: rc.bootup: The command '/sbin/dhclient -c /var/etc/dhclient_wan.conf igb0 > /tmp/igb0_output 2> /tmp/igb0_error_output' returned exit code '1', the output was '' 
    Apr 18 17:41:13 pfSense php-cgi: rc.bootup: Resyncing OpenVPN instances.
    Apr 18 17:41:13 pfSense kernel: pflog0: promiscuous mode enabled
    Apr 18 17:41:13 pfSense php-cgi: rc.bootup: Gateway, none 'available' for inet, use the first one configured. 'WAN_DHCP'
    Apr 18 17:41:13 pfSense kernel: DUMMYNET 0 with IPv6 initialized (100409)
    Apr 18 17:41:13 pfSense kernel: load_dn_sched dn_sched FIFO loaded
    Apr 18 17:41:13 pfSense kernel: load_dn_sched dn_sched QFQ loaded
    Apr 18 17:41:13 pfSense kernel: load_dn_sched dn_sched RR loaded
    Apr 18 17:41:13 pfSense kernel: load_dn_sched dn_sched WF2Q+ loaded
    Apr 18 17:41:13 pfSense kernel: load_dn_sched dn_sched PRIO loaded
    Apr 18 17:41:13 pfSense kernel: load_dn_sched dn_sched FQ_CODEL loaded
    Apr 18 17:41:13 pfSense kernel: load_dn_sched dn_sched FQ_PIE loaded
    Apr 18 17:41:13 pfSense kernel: load_dn_aqm dn_aqm CODEL loaded
    Apr 18 17:41:13 pfSense kernel: load_dn_aqm dn_aqm PIE loaded
    Apr 18 17:41:13 pfSense php-cgi: rc.bootup: Gateway, none 'available' for inet6, use the first one configured. ''
    Apr 18 17:41:14 pfSense php-cgi: rc.bootup: sync unbound done.
    Apr 18 17:41:15 pfSense php-cgi: rc.bootup: NTPD is starting up.
    Apr 18 17:41:15 pfSense check_reload_status: Updating all dyndns
    Apr 18 17:41:15 pfSense check_reload_status: Linkup starting igb0
    Apr 18 17:41:15 pfSense kernel: igb0: link state changed to UP
    

    At the beginning you can see how dhclient can't start, but at the end we can see that OS see linkup for igb0

    Apr 18 17:41:15 pfSense check_reload_status: Linkup starting igb0
    Apr 18 17:41:15 pfSense kernel: igb0: link state changed to UP
    

    I have no idea what (not) happened, but my first guesses are

    • some kind of race condition and check_reload_status did not work properly
    • devd did not use info from pfSense-devd.conf to start dhclient again
    #
    # Signal upper levels that an event happened on ethernet class interface
    #
    notify 0 {
            match "system"          "IFNET";
            match "type"            "LINK_UP";
            media-type              "ethernet";
            action "/usr/local/sbin/pfSctl -c 'interface linkup start $subsystem'";
    };
    
    notify 0 {
            match "system"          "IFNET";
            match "type"            "LINK_DOWN";
            media-type              "ethernet";
            action "/usr/local/sbin/pfSctl -c 'interface linkup stop $subsystem'";
    };
    

    or maybe commands listed above are not responsible for starting dhclient

    • ... anything else I don't know about :)


  • As here I don't see any discussion bug was reported - https://redmine.pfsense.org/issues/9484



  • I have not idea what changed but with version 2.4.4-RELEASE-p3 problem no longer exists.



  • Yes it started to work but I wanted to understand why. So I was changing some options and I found out that it won't be fixed for default protocol timings in advanced options of WAN interface.

    To solve it is enough to set timeout to higher value like below
    fedad313-5538-4b2d-b3f6-e510782574b8-image.png
    I had it like that from the start.

    ... and now I don't understand it at all :)


Log in to reply