The link state of an interface (bridge member) goes up/down continuously



  • Hi all!
    I have the following problem. I have setup a bridge with two interfaces (opt1/2) but when I plug in a device to opt2 I get the following error.


    Sep 20 14:51:38 check_reload_status: Linkup starting em1
    Sep 20 14:51:38 php: rc.interfaces_wan_configure: The command '/sbin/ifconfig 'em1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
    Sep 20 14:51:35 check_reload_status: Configuring interface opt2
    Sep 20 14:51:35 php: rc.newwanip: rc.newwanip: Failed to update opt2 IP, restarting...
    Sep 20 14:51:35 php: rc.newwanip: rc.newwanip: on (IP address: ) (interface: opt2) (real interface: em1).
    Sep 20 14:51:35 php: rc.newwanip: rc.newwanip: Informational is starting em1.
    Sep 20 14:51:32 check_reload_status: rc.newwanip starting em1
    Sep 20 14:51:32 php: rc.linkup: Hotplug event detected for OPT1(opt2) but ignoring since interface is configured with static IP ( )
    Sep 20 14:51:30 php: rc.linkup: Hotplug event detected for OPT1(opt2) but ignoring since interface is configured with static IP ( )
    Sep 20 14:51:29 kernel: em1: link state changed to UP
    Sep 20 14:51:29 check_reload_status: Linkup starting em1
    Sep 20 14:51:27 kernel: em1: link state changed to DOWN
    Sep 20 14:51:27 check_reload_status: Linkup starting em1
    Sep 20 14:51:27 php: rc.interfaces_wan_configure: The command '/sbin/ifconfig 'em1' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
    Sep 20 14:51:24 check_reload_status: Configuring interface opt2
    Sep 20 14:51:24 php: rc.newwanip: rc.newwanip: Failed to update opt2 IP, restarting…
    Sep 20 14:51:24 php: rc.newwanip: rc.newwanip: on (IP address: ) (interface: opt2) (real interface: em1).
    Sep 20 14:51:24 php: rc.newwanip: rc.newwanip: Informational is starting em1.
    Sep 20 14:51:21 check_reload_status: rc.newwanip starting em1
    Sep 20 14:51:21 php: rc.linkup: Hotplug event detected for OPT1(opt2) but ignoring since interface is configured with static IP ( )
    Sep 20 14:51:18 kernel: em1: link state changed to UP
    Sep 20 14:51:18 check_reload_status: Linkup starting em1

    The blue part is repeated continuously, so I need to unplug the cable of Opt2. If not it might cause the same issue to Opt1 which normally has no issue. If both are not stable I need to reboot pfsense and then opt1 is fine again. I read several posts in this forum about this loop issue but no concrete resolution. Do you have any idea?

    My setup/config:
    pfSense 2.1 on soekris net6501-50 board (Intel 82574L Gigabit Ethernet ports)
    OPT1/OPT2 are the two interfaces of the LAN bridge (the bridge has been created following the tutorial which has been posted in this forum and several other users have followed).
    OPT1/OPT2: IPv4/6 none, MTU/MSS blank, speed/duplex auto.
    LAN has static IPv4 and DHCP server running.
    Cables and devices have been replaced and they are not responsible for the errors.
    No spoofing in place.



  • I just posted a thread about this same issue here.



  • im the same thing , I have 4 lan ports and I create two bridges with two lan  each and after having applied the configuration I have  disconnections every 4 or 5 second.
    it is impossible to use the VOIP


  • Netgate Administrator

    Do you have a link to the tutorial you followed?

    Sep 20 14:51:30    php: rc.linkup: Hotplug event detected for OPT1(opt2) but ignoring since interface is configured with static IP ( )
    

    Seems a bit odd that OPT1 is referencing opt2 here. Have you renamed the interfaces?

    Steve



  • This may not be reproducible, but I may have found a workaround:

    The build is on a re-purposed machine while we find hardware to dedicate:
        pfSense 2.1-RELEASE (amd64)
        built on Wed Sep 11 18:17:34 EDT 2013
        FreeBSD 8.3-RELEASE-p11
        4-port Gigabit NIC (igb driver, interfaces igb0-3)
        1 integrated Broadcom NetXtreme Gigabit Ethernet Controller
        Remainder is commodity hardware (4x procs, 300 GB storage, 12 GB RAM)

    igb0=WAN (static IP)
        igb1=LAN  (no IP)
        OPT1=bridge0  (no IP)
        igb0 and igb1 are members of bridge0

    I wanted to use the 4-port NIC exclusively, and had enormous difficulty getting the bridge configuration to work at all: the configuration would set but all four ports would activate and deactivate every second or two. Rebooting after setting the configuration seemed to placate the igb drivers somewhat - the WAN interface was steady, but as others have seen, the LAN side was up for 10 seconds, down for two, up for ten, etc. Unsuitable. Since the drivers had already caused a problem, I moved the LAN interface to the integrated Broadcom NIC at bge0, and retained all other settings. It seems that using different drivers for different sides of the bridge fixed it for me, no errors for the last 30 minutes.

    Try using NICs with different drivers for each side of the bridge, and see if that doesn't help.



  • @stephenw10:

    Do you have a link to the tutorial you followed?

    Sep 20 14:51:30    php: rc.linkup: Hotplug event detected for OPT1(opt2) but ignoring since interface is configured with static IP ( )
    

    Seems a bit odd that OPT1 is referencing opt2 here. Have you renamed the interfaces?

    Steve

    @Steve,
    I followed your guide: http://forum.pfsense.org/index.php/topic,48947.msg269592.html#msg269592
    I had renamed them in the past because my setup was different and my bridge had 3 ifaces. Then I changed it and now I have 2 for the bridge and 1 for the DMZ. My WAN and DMZ are stable.
    I don't think that the names could be a problem, because in the log files I can see the real interfaces (em1) too.
    I noticed that after the upgrade to 2.1 I get very often the same error with OPT1 too, even when OPT2 is unplugged. This never happened before.

    @jcrutchf,
    At the moment I can't use different NICs.


  • Netgate Administrator

    I hadn't checked as I have some unused ports setup as bridge for occasional use. I am seeing the same behaviour, the connected bridge member continually cycles up-down with approximately a 10s period.
    To be clear this is on a 2.1 32bit Nano install that was upgraded from RC0 and before that 2.0.3.
    The bridge is using all Intel Gigabit NICs, em(4) driver.

    Sep 21 17:16:03	kernel: em3: link state changed to DOWN
    Sep 21 17:16:03	check_reload_status: Linkup starting em3
    Sep 21 17:16:03	php: rc.interfaces_wan_configure: The command '/sbin/ifconfig 'em3' inet delete' returned exit code '1', the output was 'ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address'
    Sep 21 17:16:01	check_reload_status: Configuring interface opt3
    Sep 21 17:16:01	php: rc.newwanip: rc.newwanip: Failed to update opt3 IP, restarting...
    Sep 21 17:16:01	php: rc.newwanip: rc.newwanip: on (IP address: ) (interface: opt3) (real interface: em3).
    Sep 21 17:16:01	php: rc.newwanip: rc.newwanip: Informational is starting em3.
    Sep 21 17:15:58	check_reload_status: rc.newwanip starting em3
    Sep 21 17:15:58	php: rc.linkup: Hotplug event detected for OPT3(opt3) but ignoring since interface is configured with static IP ( )
    Sep 21 17:15:56	kernel: em3: link state changed to UP
    Sep 21 17:15:56	check_reload_status: Linkup starting em3
    Sep 21 17:15:54	php: rc.linkup: Hotplug event detected for OPT3(opt3) but ignoring since interface is configured with static IP ( )
    Sep 21 17:15:52	check_reload_status: Linkup starting em3
    

    Is everyone else here seeing this on em NICs?

    Steve



  • My 2 bridge is using a Intel Gigabit card with 4 port.
    First bridge: igb0 and igb1
    Second bridge: igb2 and igb3
    Before making the bridge, i havn't cycles up-down.



  • Exactly! That's the same problem I have. Before creating the bridge there was no issue. Do you have problems with specific bridge members or all of them?



  • There has been a simial case in the past here.

    In it SMB mentions:

    Ah yeah I didn't see that. The problem at one point in the past was the Intel drivers cycle link when you add an interface to a bridge, which then causes the interface to be added back to the bridge, which cycles link, which causes the interface to be added to the bridge, and repeats the process over and over endlessly. Though the OP makes no mention of which version, 2.0 and 2.0.1 release versions shouldn't exhibit this behavior. We ran into that scenario on a customer's system pre-2.0 release and fixed it prior to release.

    I guess it was solved but for some reason got introduced again in the latest version?



  • I tried creating only one bridge with two ports and the problem appears


  • Netgate Administrator

    The linked problem is similar but not exactly the same. The symptoms are the same but the cause appears to be adding the NIC to a bridge rather than forcing the mtu or speed/duplex. Setting the speed made no difference to my box.

    Steve



  • I left only one bridge interface plugged in and still every morning I get the same error and I need to reboot pfsense. This never happened with 2.0.3 and now it seems to be a serious problem…


  • Netgate Administrator

    With similar problems I've experienced it made a different which order the links were powered up in. if you have the two machines connected to OPT1 and OPT2 already up when pfSense boots do the link still flap?
    I agree this seems like a problem. It's not a show stopper for me, I don't sue those links regularly, but I can see it will be for others. The more useful information we can gather here for the developers the easier/quicker it will be for them to resolve.

    Steve



  • Let's provide some feedback to help developers find a solution:

    Bridge0 (LAN) has 2 members: OPT1 and OPT2

    1st scenario:
    OPT1 is up and I see no errors, OPT2 is down/unplugged.
    Problem: Without doing any changes, suddenly (random time) OPT1 starts going down/up continuously.
    Resolution: Reboot pfsense.

    2nd scenario:
    OPT1 is up and I see no errors, OPT2 is down/unplugged.
    Problem: I plug in OPT2 and it starts going down/up continuously.
    Resolution: Unplug OPT2.

    3rd scenario:
    OPT1 and OPT2 are down/unplugged.
    Problem: I plug in OPT2 and it starts going down/up continuously. I unplug OPT2 and then I plug in OPT1. It also starts going down/up continuously.
    Resolution: Reboot pfsense.

    4th scenario:
    I boot pfsense with OPT1 and OPT2 plugged in.
    Problem: N/A. No errors in the system logs, no errors in the interface statistics, OPT1 and OPT2 seem to be up and both devices connected to them have access to the network. I will keep an eye and update this post if I see anything abnormal.
    Resolution: N/A.


  • Netgate Administrator

    Yes, I'm seeing the same behaviour. If I boot the box with the bridge connected NICs already up everything seems fine. I'll leave it up for a while and monitor things.

    Steve



  • So far when the box boots with all the NICs up it works fine. The problem is that I don't have all the NICs up and running 24x7…which means I need to reboot pfsense every time a NIC goes up.  :(



  • @/CS:

    Let's provide some feedback to help developers find a solution:

    Bridge0 (LAN) has 2 members: OPT1 and OPT2

    2.1 release on a Watchguard X550e with Marvell interfaces.

    Ive got a similar setup as you. Ive renamed the interfaces V1 and V2 and the bridge Voipbridge.

    I can duplicate your findings exactly.

    One thing thats interesting to note is that Siproxd still sees the OPT interfaces as OPT (n)  and not the renamed names.  I had no problems with this setup under 2.0.3.


  • Rebel Alliance Developer Netgate

    Try this fix:
    https://github.com/pfsense/pfsense/commit/f3a4601c85c4de78caa4f12fefd64067fd83dbe8

    It seems to only affect certain NICs that bounce their link on some configuration operations.



  • @jimp:

    Try this fix:
    https://github.com/pfsense/pfsense/commit/f3a4601c85c4de78caa4f12fefd64067fd83dbe8

    It seems to only affect certain NICs that bounce their link on some configuration operations.

    Many thanks jimp! It works fine for me.  ;D


  • Netgate Administrator

    Nice.  :)
    I'll have to give this a try when I get home.
    Thanks Jim.

    Steve



  • @/CS:

    Many thanks jimp! It works fine for me.  ;D

    Ditto!  8)



  • Fixed it for me aswell  :)


  • Netgate Administrator

    Thought I'd report back that this worked for me too.
    I have another bridge of fxp(4) NICs that didn't have that problem so as you say it's not all NICs that are affected.

    First chance I've had to try the System Patches package. Nice.  :)

    Steve



  • @jimp:

    Try this fix:
    https://github.com/pfsense/pfsense/commit/f3a4601c85c4de78caa4f12fefd64067fd83dbe8

    It seems to only affect certain NICs that bounce their link on some configuration operations.

    This worked on some interfaces for me, but not others.
    Looking at the system logs and the patch code, I could see that it was still going through the same path, and the IP address was empty:

    Oct 17 16:59:32 	php: rc.newwanip: rc.newwanip: Informational is starting em1.
    Oct 17 16:59:32 	php: rc.newwanip: rc.newwanip: on (IP address: ) (interface: opt1) (real interface: em1).
    Oct 17 16:59:32 	php: rc.newwanip: rc.newwanip: Failed to update opt1 IP, restarting...
    

    I then looked at /cf/conf/config.xml and found that there were empty <ipaddr>xml tags on the interfaces that still had problems.
    The IP address was set at some point on those in the process of creating the bridge.
    I removed the empty tags from the config.xml file, rebooted, and the problem went away:

    Oct 17 19:09:16 	php: rc.newwanip: rc.newwanip: Informational is starting em1.
    Oct 17 19:09:16 	php: rc.newwanip: Interface does not have an IP address, nothing to do.
    

    I'm not familiar with php at all, but I assume the isset function perhaps doesn't account for empty tags and returns true:

      if (($curwanip == "") && !(isset($config['interfaces'][$interface]['ipaddr']))) {
        log_error("Interface does not have an IP address, nothing to do.");
        return;
      } 
    

    FYI, I'm running the same configuration as the OP:
    pfSense 2.1 on a Soekris net6501-50 (Intel 82574L Gigabit Ethernet ports)
    I have WAN assigned to em0, OPT1-7 assigned to em1-7 with IPv4 and IPv6 set to none, and LAN is assigned to Bridge0, consisting of OPT1-7.

    Anyway, thanks for the patch, that made things work a lot better. ;D

    Also, the way that system patches process works is rather impressive.</ipaddr>


  • Netgate Administrator

    Interesting, thanks for the heads up.  :)

    No spurious tags left in my config file. I don't think I ever had them set as anything but bridged on that box though.

    Steve



  • Thanks Steve.
    I'll try it.  Some others tried the system patches feature.  I guess that's the way to do it.  There I go re-inventing the wheel again.



  • The official fix works for the msk driver.  I like the system patch tool.  It lets you use your own patches or official github links.



  • @/CS:

    @jimp:

    Try this fix:
    https://github.com/pfsense/pfsense/commit/f3a4601c85c4de78caa4f12fefd64067fd83dbe8

    It seems to only affect certain NICs that bounce their link on some configuration operations.

    Many thanks jimp! It works fine for me.  ;D

    Please pardon my ignorance, but how do I apply this fix?

    Do I just replace my existing /etc/rc.newwanip with this file?



  • @mattlach:

    Please pardon my ignorance, but how do I apply this fix?

    Do I just replace my existing /etc/rc.newwanip with this file?

    So, I just tried doing this, rebooted the pfSense box, but my devices keep going up and down.

    My configuration is an AMD E350 board with two Intel Pro/1000 dual NIC's and the on board Realtek 8111C junk.

    Any thoughts?



  • I added the 4 highlighted lines to my existing file.

    Otherwise you could chance it and gitsync…



  • @mattlach:

    @mattlach:

    Please pardon my ignorance, but how do I apply this fix?

    Do I just replace my existing /etc/rc.newwanip with this file?

    So, I just tried doing this, rebooted the pfSense box, but my devices keep going up and down.

    My configuration is an AMD E350 board with two Intel Pro/1000 dual NIC's and the on board Realtek 8111C junk.

    Any thoughts?

    So, making these changes to /etc/rc.newwanip did not fix the problem on it's own, but then I decided to try to just disable the onboard ethernet (Realtek 8111C) and that actually did the trick.  They are no longer cycling for me.

    I left the Realtek interface on just as an extra port in case I ever needed it, not something to rely on, as I know they are garbage, but I hadn't considered that it could interfere with the proper working of my good NIC's

    Thanks for the help folks.


  • Netgate Administrator

    Just for reference using the system patches package takes away the possibility of typo errors. You just enter the commit number and everything is done for you.

    Steve



  • @stephenw10:

    Thought I'd report back that this worked for me too.
    I have another bridge of fxp(4) NICs that didn't have that problem so as you say it's not all NICs that are affected.

    First chance I've had to try the System Patches package. Nice.  :)

    Steve

    I have an issue that has almost the same symptoms with Sun hardware part 501-5406-07.  I know there has to be hundreds of units out there with this hardware in them.  I have personally built about 25 units with this quad fast Ethernet card.  I tried to submit it to Redmine but it was rejected.  I can fix it myself but I was wondering if it should be fixed in the pfSense code or in the hme driver.  Also, being new at reporting bugs, I am curious if I should try to contact the author of the driver if Redmine rejects it?  It is a real bug in my opinion.  I am a little surprised I haven't heard any one else complaining.  Here is the Redmine issue with the reject by Chris Buechler. https://redmine.pfsense.org/issues/3373
    It defines the issue.


  • Netgate Administrator

    Sounds exactly like this thread: http://forum.pfsense.org/index.php/topic,69125.15.html
    Unfortunately that wasn't really resolved usefully in my opinion.

    Steve


  • Rebel Alliance Developer Netgate

    If you are using the system patches package make sure that not only you get patch  58ee84b4b2f9daba87e44abf663026c6266a7cd8 but also 793299b8f5bdc0fd167093cc5ab9f3f30f0d77ac

    Both could be relevant to similar issues, but it would still only apply for interfaces that were set to an IP type of "none" (typically bridged interfaces)



  • I would like to chime in and also confirm this fix worked for me.

    I am using a Soekris Net6501-50 with the Intel 82574L 4 port Gigabit Ethernet card.  I experienced the same issue with an interface configured into a bridge group bouncing continuously when connecting a device and establishing link.  This patch worked for me.

    I applied it using the System Patch package.  For anyone who is unfamiliar with it, it's pretty easy.  Install the "System Patch" package, and then under the Patches screen, add the link from jimp.  Fetch the patch, test it (making sure it doesn't have any issues applying), and then apply the patch.  It seemed to work without rebooting, but I did anyway just for good measure.

    Thanks for those who contributed to this thread and thanks to jimp for the fix!



  • I have a bridged setting (WAN - DMZ) on a asus server rs300-e7 with 4 NIC. i have done this patch https://github.com/pfsense/pfsense/commit/f3a4601c85c4de78caa4f12fefd64067fd83dbe8

    unfortunatly i still recieve a lot of time outs on every server behind the DMZ.

    the first rule in the WAN is Allow ICMP from all to all. Are there any options which i can test?

    p.s. it looks like the more data is beeing processed by pfsense the more time outs it gets…. all loggings are out on the rules. pftop is showing ok status



  • i am almost certain that if there are more connections through the pfsense firewall the more time outs we get, the server hardware is 3 years old with 32GB of ram… states counts is now 6125 and a recieve some pings traffic in is 1MB traffic out is 3MB ..


  • Netgate Administrator

    Where are yoy seeing these 'timeouts'?

    Steve


Log in to reply