WAN DHCP - N/A IP
-
My current setup is pfSense machine plus UniFi Switch US-16-150W. The nuance is that I'm using the switch as optical media converter for WAN interface (there is a separate VLAN defined only on a switch between SFP and RJ-45 port). Everything works as expected except the case of outages. pfSense machine is set to be always up and it boots way faster than UniFi switch. After power recovers, pfSense can't receive IP for WAN via DHCP because switch is not yet initialized, leaving the interface in uninitialized state (n/a is shown as IP for the interface). If I manually power-up pfSense after UniFi switch boots up, everything is fine. I've also tried to play with various dhclient parameters (timeout, etc) but nothing seems to help, dhclient doesn't retry IP renewing.
I've read a couple of similar cases on the internet, but they are a bit different from mine.
Looking at system.log I've found the next error:
pfSense php[430]: rc.bootup: The command '/sbin/dhclient -c /var/etc/dhclient_wan.conf igc0 > /tmp/igc0_output 2> /tmp/igc0_error_output' returned exit code '1', the output was ''
The naive workaround that comes to mine mind is somehow to delay boot / init process of pfSense to allow UniFi switch to fully initialize, but I'm not sure how to properly implement it. If anyone can suggest the cleaner solution I would highly appreciate.
-
There may be better/cleaner solutions to this but try this to delay the boot - edit /boot/loader.conf and change
autoboot_delay="n"
where n is number of seconds.
This might be reset to the default after an update.
-
Does the WAN appear to be down when dhclient initially runs then?
I recall seeing this before where you can hit a timing issue if the WAN comes UP between the dhclient initially failing but before boot has finished.
If you replug the WAN cable after this has happened does that then trigger the dhclient to start again and works correctly?
Steve
-
@stephenw10 Replug of the cable helps.
It seems WAN interface somehow appears to be up during the boot (even though switch is not yet initialized), if I understand this log lines correctly.Nov 8 12:17:30 pfSense check_reload_status[408]: Linkup starting igc0 Nov 8 12:17:30 pfSense kernel: Nov 8 12:17:30 pfSense kernel: igc0: link state changed to UP
And after that there is and error:
pfSense php[430]: rc.bootup: The command '/sbin/dhclient -c /var/etc/dhclient_wan.conf igc0 > /tmp/igc0_output 2> /tmp/igc0_error_output' returned exit code '1', the output was ''
-
If the link is actually down the dhclient will fail because it cannot run. If the interface is up dhclient can run even if it can't pull a lease and should just continue to try until it gets a lease.
The fact it;s failing implies igc was down at that point but the logs showing it come up are before that at boot? -
system-boot.log.zip @stephenw10 Yes, looks like it. Attaching a more complete log.
-
Yes, that looks familiar. So igc1 and the VLANs on it are connected to the same switch?
-
@stephenw10 Yes, everything is on one switch.
igc0 - WAN interface (connected to 16 RJ-45 port, that is on one VLAN with 18 SFP port).
igc1 - LAN trunk port connected to port #15 of the switch. -
Hmm, what you're describing is exactly this: https://redmine.pfsense.org/issues/9484
That is reported as solved some time ago though. What advanced settings in the dhcp client config did you test?
Steve
-
@stephenw10 initially tried with default options. Then tried with increased timeout (set to 120), also tried to set reboot=60.
If there are any suggestions for dhclient - I can try and play with them. -
Ah, yes I knew this was familiar. Setting a boot delay will probably fix this, see:
https://forum.netgate.com/post/1038593This probably needs a bug opening/re-opening though, let me see....
-
@stephenw10 Yes, boot delay solves an issue. I’ve tested today, thanks for help @biggsy
If there are any additional info that I can provide in order to isolate/understand the exact cause of the issue I would be happy to help.
-
It would be interesting to try a much longer timeout in the dhcp settings instead, The suggested 900s for example. However I don't expect that to work since the timing difference in the logs is only ~15s. A setting of 120s would have worked if it could.
Steve