Unable to re-issue static IP to awakening client without pfsense reboot

Pennyless

Our system is simple and straightforward.
One box, 8 cores, 29 Intel gigabit nics.

1 nic is configured with a static IP as the default gateway "wan".
12 nics are configured with static IP's as gateways which are combined into a single group with "wan" and balanced in one tier.
1 nic is configured with a static IP as "lan" which has it's DHCP enabled.
14 nics are configured as "none" and are bridged into a single subnet along with Lan.

A number of servers attached to the bridge are configured by DHCP and are issued "static mapped" IP's which have reservations outside of the Lan DHCP space.
Other devices attached to the bridge are issued dynamic IP's which are within the DHCP assignment space.

With the exception of the following; pfSense works as advertised.

Anytime a bridge attached client device has been issued a "static mapped" IP, pfSense will not reissue that client device the same desired static IP upon that client device being awakened from a dormant condition.
The only way we have found to re-establish communications with the awakened device ON THE DISIRED STATIC IP is to accomplish a reboot of the pfSense box.

This condition appears to be explained by Section 21.1.1.14 of the Definitive Guide which states

"This IP address is really a preference, and not a reservation.
Assigning an IP address here will not prevent someone else from using the same IP address.
If this IP address is in use when this client requests a lease, it will instead receive one from
the general pool."

Although irrelevent to our requirement for static IP's, it appears that a re-awakening client device is NOT reissued an IP which is located withing the "Pool" UNTIL the original DHCP lease time has expired.

Does a method exist which enables the pfSense box to reissue a "static mapped" IP to a reawakening client device prior to DHCP lease expiration and without operator intervention?

wallabybob

@Pennyless:

Although irrelevent to our requirement for static IP's, it appears that a re-awakening client device is NOT reissued an IP which is located withing the "Pool" UNTIL the original DHCP lease time has expired.

It is my understanding that the DHCP client is supposed to request renewal of the lease - it is not the responsibility of the DHCP server to monitor the network for systems waking up and PUSH an IP address to them.

On my home network I have observed a number of DHCP client behaviours that I consider bugs:

Ubuntu 10.04 system - DHCP client gives up "too quickly" if it doesn't get an answer (e.g. the DHCP server is rebooting after a power fail); system doesn't "reconnect" to the network if the DHCP server is inaccessible for too long then comes back online.
Ubuntu 11.? system - Realtek motherboard NIC doesn't recover after a "suspend" shutdown, Intel fxp NIC on PCI card does. If the system is connected to the network through the Realtek NIC it doesn't come back online when it is resumed after a suspend shutdown.

I suspect that what is supposed to happen when a system wakes up after being suspended is the interface transitions to UP state and that kicks the DHCP client into requesting an IP address. On your systems that don't get an IP address after waking up:

Does the interface recover to up/running state?
Is a DHCP request issued?
Does the request get logged in the pfSense DHCP log?

Pennyless

@wallabybob:

@Pennyless:

Does the interface recover to up/running state?

Is a DHCP request issued?

Does the request get logged in the pfSense DHCP log?

Yes, continuous cycling for 2 seconds up then 5 seconds down
Yes
Yes

The following pattern repeats without ending about every 7 seconds.
Network connection is established for about 2 seconds then lost for 5, over and over.
Rebooting the client machine has no effect.
Rebooting pfsense reconnects the client using the desired static ip with no further operator action required.

08:52:29 check_reload_status: Configuring interface opt24
08:52:29 check_reload_status: Linkup starting em13
08:52:29 kernel: em13: link state changed to UP
08:52:31 php: : Hotplug event detected for opt24 but ignoring since interface is configured with static IP ()
08:52:31 check_reload_status: rc.newwanip starting em13
08:52:34 check_reload_status: Linkup starting em13
08:52:34 kernel: em13: link state changed to DOWN
08:52:34 php: : The command '/sbin/ifconfig bridge0 addm em13' returned exit code '1', the output was 'ifconfig: BRDGADD em13: File exists'
08:52:34 php: : Hotplug event detected for opt24 but ignoring since interface is configured with static IP ()
08:52:34 check_reload_status: rc.newwanip starting em13
08:52:36 check_reload_status: Linkup starting em13
08:52:36 kernel: em13: link state changed to UP
08:52:36 php: : rc.newwanip: Informational is starting em13.
08:52:36 php: : rc.newwanip: on (IP address: ) (interface: opt24) (real interface: em13).
08:52:36 php: : rc.newwanip: Failed to update opt24 IP, restarting…

All we want to do is simply turn off a client (server) and have it reconnect to the network through pfsense when it is re-awakened...without operator intervention.
With numerous servers attached to our pfsense box a reboot of pfSense every time that a "mapped IP" client is shutdown and restarted is not a viable condition.

wallabybob

I suspect em13 is the physical interface connecting to a recently awakened server? Correct?

I don't know if

08:52:34 php: : The command '/sbin/ifconfig bridge0 addm em13' returned exit code '1', the output was 'ifconfig: BRDGADD em13: File exists'

causes

08:52:36 php: : rc.newwanip: Failed to update opt24 IP, restarting…

Is em13 directly connected to another computer or does it go through a switch/hub/bridge/repeater?

I suspect the problem is that the "link up" handler isn't smart enough to realise that the interface already being a bridge member is not really an error and so it disables then enables the interface in a vain attempt to clear the "error" condition. That theory needs to be checked.

An alternate theory is that the previously discussed error report is ignored and the link link UP, DOWN, UP cycle is caused by the other computer being unhappy with DHCP responses and it is cycling its interface to try to clear the condition. It could be helpful to see the pfSense DHCP log at the time (pfSense web GUI: Status -> System Logs, click on DHCP tab) and the DHCP log from the newly awakened server.

Pennyless

Thank you again for the timely and thoughtful response.

em13 is indeed directly connected to a server.
Everything about the connection operates correctly with the exception of the wakeup static IP issue.

This "reaquisition failure" behavior is typical of ALL the servers which are directly connected to the bridge subnet VIA STATIC IP. (8 dis-similar servers in total)
The problem is not unique to this connection, thus it seems unlikely that the issue is "caused by the other computer being unhappy with DHCP".
But stranger things have happened.

As indicated earlier, any and all client devices which are served dynamic IP's from this same bridge by the same DHCP perform flawlessly upon wakeup.

I will capture the "DHCP log from the newly awakened server" and post it.

It is reasonable to want to simply turn on a fully functional server and have it operate as it did 5 minutes earlier WITHOUT having to reboot the pfSense box, isn't it?

Your thoughts are helpful.

Thanks again.

wallabybob

@Pennyless:

It is reasonable to want to simply turn on a fully functional server and have it operate as it did 5 minutes earlier WITHOUT having to reboot the pfSense box, isn't it?

I agree. Unfortunately sometimes seemingly insignificant details can get in the way. I expect you are running a very uncommon configuration in the number of interfaces bridged and using "direct connection" rather than connecting your computers through a switch.

Now that you have confirmed the computers are directly connected I expect my first suggestion is close to describing the problem. Can you put a switch between one of the bridged ports and its corresponding server. This should stop pfSense thinking the link has gone down when the server is suspended and operation should resume successfully when the server is resumed.

If that seems to fix the problem then you can work on persuading the pfSense developers to produce a patch - perhaps modifying the "link down" handler to remove an interface from a bridge when the interface goes down so that the "link up" handler doesn't get an error when it tries to add the interface to a bridge (or stop attempting to add an interface to a bridge when it already a member of the bridge).

cmb

What is meant by "If this IP address is in use when this client requests a lease, it will instead receive one from the general pool." is ISC dhcpd's behavior in not assigning IPs that are actively in use somewhere else. As far as I'm aware, that means something other than the host requesting the lease is already using that IP, it suggests you have an IP conflict somewhere.

On the link recovery, is it going into a constant state of link down/up there? Seems to do that at least once but not sure if it's more than that.

wallabybob

@cmb:

On the link recovery, is it going into a constant state of link down/up there? Seems to do that at least once but not sure if it's more than that.

Earlier reply says @Pennyless:

The following pattern repeats without ending about every 7 seconds.
Network connection is established for about 2 seconds then lost for 5, over and over.

which strongly suggests to me its more than once.

cmb

Ah yeah I didn't see that. The problem at one point in the past was the Intel drivers cycle link when you add an interface to a bridge, which then causes the interface to be added back to the bridge, which cycles link, which causes the interface to be added to the bridge, and repeats the process over and over endlessly. Though the OP makes no mention of which version, 2.0 and 2.0.1 release versions shouldn't exhibit this behavior. We ran into that scenario on a customer's system pre-2.0 release and fixed it prior to release.

Pennyless

@cmb:

Ah yeah I didn't see that. The problem at one point in the past was the Intel drivers cycle link when you add an interface to a bridge, which then causes the interface to be added back to the bridge, which cycles link, which causes the interface to be added to the bridge, and repeats the process over and over endlessly. Though the OP makes no mention of which version, 2.0 and 2.0.1 release versions shouldn't exhibit this behavior. We ran into that scenario on a customer's system pre-2.0 release and fixed it prior to release.

Thank you

Version 2.0.1
Intel Pro 16.8 drivers
This behavior accurately discribes the issue.
Is there anything that I can do to resolve this cycling, short of rebooting pfSense every time a client is awakened?

wallabybob

@Pennyless:

Is there anything that I can do to resolve this cycling, short of rebooting pfSense every time a client is awakened?

Are you able to try my suggestion in reply 5?

Pennyless

Thank you, not yet.
But it's underway.
Take a couple of days.

Pennyless

@wallabybob:

@Pennyless:

Is there anything that I can do to resolve this cycling, short of rebooting pfSense every time a client is awakened?

Are you able to try my suggestion in reply 5?

As you recommended, a switch was placed between the pfSense nic and a Server.
This test was accomplished with 1 to 5 dis-similar servers, individually and concurrently.
In each case all servers operated correctly with no "nic linkup cycling" whatsoever.
Every configuration we could think of performed correctly.
Each client device was cycled from a cold shutdown to a stable connection numerous times with static IP assignment, dynamic IP assignment, and "statically mapped" dynamic assignment.
We were completely unable to reproduce the "cycling" with an intermediate switch in place.

However immediately upon removal of the recommended switch, the exact same previous behavior returned.

We need these servers connected directly to the pfSense bridge nics without intermediate switches.
That should be a perfectly acceptable configuration because on a cold startup everything does exactly what it is suppose to do and no "cycling" occurs. So it does work without a switch. Just not on a restart of any of the client devices.

Do you have any ideas how to stop this "cycling" upon reconnect?

Thank you!

wallabybob

@Pennyless:

We need these servers connected directly to the pfSense bridge nics without intermediate switches.

Why? If readers know a bit more about your requirements they might be better able to suggest a solution.

@Pennyless:

Do you have any ideas how to stop this "cycling" upon reconnect?

1. Unfortunately you have already rejected what seemed to me (in my ignorance of the nuances of your requirements) a quite workable solution.

2. The log reports @Pennyless:

08:52:36 php: : rc.newwanip: Informational is starting em13.
08:52:36 php: : rc.newwanip: on (IP address: ) (interface: opt24) (real interface: em13).
08:52:36 php: : rc.newwanip: Failed to update opt24 IP, restarting…

The section of /etc/rc.newwanip that reports this reads:```
if($curwanip == "0.0.0.0" || !is_ipaddr($curwanip)) {
log_error("rc.newwanip: Failed to update {$interface} IP, restarting...");
send_event("interface reconfigure {$interface}");
exit;
}


> 08:52:31  php: : Hotplug event detected for opt24 but ignoring since interface is configured with static IP ()

which might imply the interface has an "empty" IP address. At this point I'm thinking its not a good use of my volunteer time to pursue this any further especially when I have a couple of still unresolved pfSense issues of my own.

3\. Wait for another reader to come up with a solution.

4\. File a pfSense bug report on [http://redmine.pfsense.org](http://redmine.pfsense.org)

5\. Offer a bounty to have it fixed

6\. Purchase pfSense support

etc.

cmb

Disabling the link up actions entirely may suffice for your scenario, check the source where wallabybob gave you some pointers.

Pennyless

OK, understood.
Thank you both again for your time.

I apologize for not making our requirements clear.

We simply needed to connect a static IP server directly to a bridged subnet within our pfSense box, shut the server down in the evening, and wake it up in the AM…without restarting pfSense.

Pennyless

For anyone else having trouble with this particular cycling nic issue, this problem has been previously documented.
It was marked as resolved 2 months ago by Chris Buechler.
Just as this thread indicates;

http://redmine.pfsense.org/issues/1572

The only solution in my particular case was setting the "Speed and Duplex to nothing other than "default"."
All cycling vanished.

v2.0.1