The link state of an interface (bridge member) goes up/down continuously

CS

Hi all!
I have the following problem. I have setup a bridge with two interfaces (opt1/2) but when I plug in a device to opt2 I get the following error.

…
Sep 20 14:51:38 check_reload_status: Linkup starting em1
Sep 20 14:51:38 php: rc.interfaces_wan_configure: The command '/sbin/ifconfig 'em1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
Sep 20 14:51:35 check_reload_status: Configuring interface opt2
Sep 20 14:51:35 php: rc.newwanip: rc.newwanip: Failed to update opt2 IP, restarting...
Sep 20 14:51:35 php: rc.newwanip: rc.newwanip: on (IP address: ) (interface: opt2) (real interface: em1).
Sep 20 14:51:35 php: rc.newwanip: rc.newwanip: Informational is starting em1.
Sep 20 14:51:32 check_reload_status: rc.newwanip starting em1
Sep 20 14:51:32 php: rc.linkup: Hotplug event detected for OPT1(opt2) but ignoring since interface is configured with static IP ( )
Sep 20 14:51:30 php: rc.linkup: Hotplug event detected for OPT1(opt2) but ignoring since interface is configured with static IP ( )
Sep 20 14:51:29 kernel: em1: link state changed to UP
Sep 20 14:51:29 check_reload_status: Linkup starting em1
Sep 20 14:51:27 kernel: em1: link state changed to DOWN
Sep 20 14:51:27 check_reload_status: Linkup starting em1
Sep 20 14:51:27 php: rc.interfaces_wan_configure: The command '/sbin/ifconfig 'em1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
Sep 20 14:51:24 check_reload_status: Configuring interface opt2
Sep 20 14:51:24 php: rc.newwanip: rc.newwanip: Failed to update opt2 IP, restarting…
Sep 20 14:51:24 php: rc.newwanip: rc.newwanip: on (IP address: ) (interface: opt2) (real interface: em1).
Sep 20 14:51:24 php: rc.newwanip: rc.newwanip: Informational is starting em1.
Sep 20 14:51:21 check_reload_status: rc.newwanip starting em1
Sep 20 14:51:21 php: rc.linkup: Hotplug event detected for OPT1(opt2) but ignoring since interface is configured with static IP ( )
Sep 20 14:51:18 kernel: em1: link state changed to UP
Sep 20 14:51:18 check_reload_status: Linkup starting em1

The blue part is repeated continuously, so I need to unplug the cable of Opt2. If not it might cause the same issue to Opt1 which normally has no issue. If both are not stable I need to reboot pfsense and then opt1 is fine again. I read several posts in this forum about this loop issue but no concrete resolution. Do you have any idea?

My setup/config:
pfSense 2.1 on soekris net6501-50 board (Intel 82574L Gigabit Ethernet ports)
OPT1/OPT2 are the two interfaces of the LAN bridge (the bridge has been created following the tutorial which has been posted in this forum and several other users have followed).
OPT1/OPT2: IPv4/6 none, MTU/MSS blank, speed/duplex auto.
LAN has static IPv4 and DHCP server running.
Cables and devices have been replaced and they are not responsible for the errors.
No spoofing in place.

Mavy

I just posted a thread about this same issue here.

frizouille

im the same thing , I have 4 lan ports and I create two bridges with two lan each and after having applied the configuration I have disconnections every 4 or 5 second.
it is impossible to use the VOIP

stephenw10

Do you have a link to the tutorial you followed?

Sep 20 14:51:30    php: rc.linkup: Hotplug event detected for OPT1(opt2) but ignoring since interface is configured with static IP ( )

Seems a bit odd that OPT1 is referencing opt2 here. Have you renamed the interfaces?

Steve

jcrutchf

This may not be reproducible, but I may have found a workaround:

The build is on a re-purposed machine while we find hardware to dedicate:
pfSense 2.1-RELEASE (amd64)
built on Wed Sep 11 18:17:34 EDT 2013
FreeBSD 8.3-RELEASE-p11
4-port Gigabit NIC (igb driver, interfaces igb0-3)
1 integrated Broadcom NetXtreme Gigabit Ethernet Controller
Remainder is commodity hardware (4x procs, 300 GB storage, 12 GB RAM)

igb0=WAN (static IP)
igb1=LAN (no IP)
OPT1=bridge0 (no IP)
igb0 and igb1 are members of bridge0

I wanted to use the 4-port NIC exclusively, and had enormous difficulty getting the bridge configuration to work at all: the configuration would set but all four ports would activate and deactivate every second or two. Rebooting after setting the configuration seemed to placate the igb drivers somewhat - the WAN interface was steady, but as others have seen, the LAN side was up for 10 seconds, down for two, up for ten, etc. Unsuitable. Since the drivers had already caused a problem, I moved the LAN interface to the integrated Broadcom NIC at bge0, and retained all other settings. It seems that using different drivers for different sides of the bridge fixed it for me, no errors for the last 30 minutes.

Try using NICs with different drivers for each side of the bridge, and see if that doesn't help.

CS

@stephenw10:

Do you have a link to the tutorial you followed?
Sep 20 14:51:30    php: rc.linkup: Hotplug event detected for OPT1(opt2) but ignoring since interface is configured with static IP ( )
Seems a bit odd that OPT1 is referencing opt2 here. Have you renamed the interfaces?

Steve

@Steve,
I followed your guide: http://forum.pfsense.org/index.php/topic,48947.msg269592.html#msg269592
I had renamed them in the past because my setup was different and my bridge had 3 ifaces. Then I changed it and now I have 2 for the bridge and 1 for the DMZ. My WAN and DMZ are stable.
I don't think that the names could be a problem, because in the log files I can see the real interfaces (em1) too.
I noticed that after the upgrade to 2.1 I get very often the same error with OPT1 too, even when OPT2 is unplugged. This never happened before.

@jcrutchf,
At the moment I can't use different NICs.

stephenw10

I hadn't checked as I have some unused ports setup as bridge for occasional use. I am seeing the same behaviour, the connected bridge member continually cycles up-down with approximately a 10s period.
To be clear this is on a 2.1 32bit Nano install that was upgraded from RC0 and before that 2.0.3.
The bridge is using all Intel Gigabit NICs, em(4) driver.

Sep 21 17:16:03	kernel: em3: link state changed to DOWN
Sep 21 17:16:03	check_reload_status: Linkup starting em3
Sep 21 17:16:03	php: rc.interfaces_wan_configure: The command '/sbin/ifconfig 'em3' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
Sep 21 17:16:01	check_reload_status: Configuring interface opt3
Sep 21 17:16:01	php: rc.newwanip: rc.newwanip: Failed to update opt3 IP, restarting...
Sep 21 17:16:01	php: rc.newwanip: rc.newwanip: on (IP address: ) (interface: opt3) (real interface: em3).
Sep 21 17:16:01	php: rc.newwanip: rc.newwanip: Informational is starting em3.
Sep 21 17:15:58	check_reload_status: rc.newwanip starting em3
Sep 21 17:15:58	php: rc.linkup: Hotplug event detected for OPT3(opt3) but ignoring since interface is configured with static IP ( )
Sep 21 17:15:56	kernel: em3: link state changed to UP
Sep 21 17:15:56	check_reload_status: Linkup starting em3
Sep 21 17:15:54	php: rc.linkup: Hotplug event detected for OPT3(opt3) but ignoring since interface is configured with static IP ( )
Sep 21 17:15:52	check_reload_status: Linkup starting em3

Is everyone else here seeing this on em NICs?

Steve

frizouille

My 2 bridge is using a Intel Gigabit card with 4 port.
First bridge: igb0 and igb1
Second bridge: igb2 and igb3
Before making the bridge, i havn't cycles up-down.

CS

Exactly! That's the same problem I have. Before creating the bridge there was no issue. Do you have problems with specific bridge members or all of them?

Mavy

There has been a simial case in the past here.

In it SMB mentions:

Ah yeah I didn't see that. The problem at one point in the past was the Intel drivers cycle link when you add an interface to a bridge, which then causes the interface to be added back to the bridge, which cycles link, which causes the interface to be added to the bridge, and repeats the process over and over endlessly. Though the OP makes no mention of which version, 2.0 and 2.0.1 release versions shouldn't exhibit this behavior. We ran into that scenario on a customer's system pre-2.0 release and fixed it prior to release.

I guess it was solved but for some reason got introduced again in the latest version?

frizouille

I tried creating only one bridge with two ports and the problem appears

stephenw10

The linked problem is similar but not exactly the same. The symptoms are the same but the cause appears to be adding the NIC to a bridge rather than forcing the mtu or speed/duplex. Setting the speed made no difference to my box.

Steve

CS

I left only one bridge interface plugged in and still every morning I get the same error and I need to reboot pfsense. This never happened with 2.0.3 and now it seems to be a serious problem…

stephenw10

With similar problems I've experienced it made a different which order the links were powered up in. if you have the two machines connected to OPT1 and OPT2 already up when pfSense boots do the link still flap?
I agree this seems like a problem. It's not a show stopper for me, I don't sue those links regularly, but I can see it will be for others. The more useful information we can gather here for the developers the easier/quicker it will be for them to resolve.

Steve

CS

Let's provide some feedback to help developers find a solution:

Bridge0 (LAN) has 2 members: OPT1 and OPT2

1st scenario:
OPT1 is up and I see no errors, OPT2 is down/unplugged.
Problem: Without doing any changes, suddenly (random time) OPT1 starts going down/up continuously.
Resolution: Reboot pfsense.

2nd scenario:
OPT1 is up and I see no errors, OPT2 is down/unplugged.
Problem: I plug in OPT2 and it starts going down/up continuously.
Resolution: Unplug OPT2.

3rd scenario:
OPT1 and OPT2 are down/unplugged.
Problem: I plug in OPT2 and it starts going down/up continuously. I unplug OPT2 and then I plug in OPT1. It also starts going down/up continuously.
Resolution: Reboot pfsense.

4th scenario:
I boot pfsense with OPT1 and OPT2 plugged in.
Problem: N/A. No errors in the system logs, no errors in the interface statistics, OPT1 and OPT2 seem to be up and both devices connected to them have access to the network. I will keep an eye and update this post if I see anything abnormal.
Resolution: N/A.

stephenw10

Yes, I'm seeing the same behaviour. If I boot the box with the bridge connected NICs already up everything seems fine. I'll leave it up for a while and monitor things.

Steve

CS

So far when the box boots with all the NICs up it works fine. The problem is that I don't have all the NICs up and running 24x7…which means I need to reboot pfsense every time a NIC goes up. :(

chpalmer

@/CS:

Let's provide some feedback to help developers find a solution:

Bridge0 (LAN) has 2 members: OPT1 and OPT2

2.1 release on a Watchguard X550e with Marvell interfaces.

Ive got a similar setup as you. Ive renamed the interfaces V1 and V2 and the bridge Voipbridge.

I can duplicate your findings exactly.

One thing thats interesting to note is that Siproxd still sees the OPT interfaces as OPT (n) and not the renamed names. I had no problems with this setup under 2.0.3.

jimp

Try this fix:
https://github.com/pfsense/pfsense/commit/f3a4601c85c4de78caa4f12fefd64067fd83dbe8

It seems to only affect certain NICs that bounce their link on some configuration operations.

CS

@jimp:

Try this fix:
https://github.com/pfsense/pfsense/commit/f3a4601c85c4de78caa4f12fefd64067fd83dbe8

It seems to only affect certain NICs that bounce their link on some configuration operations.

Many thanks jimp! It works fine for me. ;D