The link state of an interface (bridge member) goes up/down continuously
-
I tried creating only one bridge with two ports and the problem appears
-
The linked problem is similar but not exactly the same. The symptoms are the same but the cause appears to be adding the NIC to a bridge rather than forcing the mtu or speed/duplex. Setting the speed made no difference to my box.
Steve
-
I left only one bridge interface plugged in and still every morning I get the same error and I need to reboot pfsense. This never happened with 2.0.3 and now it seems to be a serious problem…
-
With similar problems I've experienced it made a different which order the links were powered up in. if you have the two machines connected to OPT1 and OPT2 already up when pfSense boots do the link still flap?
I agree this seems like a problem. It's not a show stopper for me, I don't sue those links regularly, but I can see it will be for others. The more useful information we can gather here for the developers the easier/quicker it will be for them to resolve.Steve
-
Let's provide some feedback to help developers find a solution:
Bridge0 (LAN) has 2 members: OPT1 and OPT2
1st scenario:
OPT1 is up and I see no errors, OPT2 is down/unplugged.
Problem: Without doing any changes, suddenly (random time) OPT1 starts going down/up continuously.
Resolution: Reboot pfsense.2nd scenario:
OPT1 is up and I see no errors, OPT2 is down/unplugged.
Problem: I plug in OPT2 and it starts going down/up continuously.
Resolution: Unplug OPT2.3rd scenario:
OPT1 and OPT2 are down/unplugged.
Problem: I plug in OPT2 and it starts going down/up continuously. I unplug OPT2 and then I plug in OPT1. It also starts going down/up continuously.
Resolution: Reboot pfsense.4th scenario:
I boot pfsense with OPT1 and OPT2 plugged in.
Problem: N/A. No errors in the system logs, no errors in the interface statistics, OPT1 and OPT2 seem to be up and both devices connected to them have access to the network. I will keep an eye and update this post if I see anything abnormal.
Resolution: N/A. -
Yes, I'm seeing the same behaviour. If I boot the box with the bridge connected NICs already up everything seems fine. I'll leave it up for a while and monitor things.
Steve
-
So far when the box boots with all the NICs up it works fine. The problem is that I don't have all the NICs up and running 24x7…which means I need to reboot pfsense every time a NIC goes up. :(
-
@/CS:
Let's provide some feedback to help developers find a solution:
Bridge0 (LAN) has 2 members: OPT1 and OPT2
2.1 release on a Watchguard X550e with Marvell interfaces.
Ive got a similar setup as you. Ive renamed the interfaces V1 and V2 and the bridge Voipbridge.
I can duplicate your findings exactly.
One thing thats interesting to note is that Siproxd still sees the OPT interfaces as OPT (n) and not the renamed names. I had no problems with this setup under 2.0.3.
-
Try this fix:
https://github.com/pfsense/pfsense/commit/f3a4601c85c4de78caa4f12fefd64067fd83dbe8It seems to only affect certain NICs that bounce their link on some configuration operations.
-
Try this fix:
https://github.com/pfsense/pfsense/commit/f3a4601c85c4de78caa4f12fefd64067fd83dbe8It seems to only affect certain NICs that bounce their link on some configuration operations.
Many thanks jimp! It works fine for me. ;D
-
Nice. :)
I'll have to give this a try when I get home.
Thanks Jim.Steve
-
@/CS:
Many thanks jimp! It works fine for me. ;D
Ditto! 8)
-
Fixed it for me aswell :)
-
Thought I'd report back that this worked for me too.
I have another bridge of fxp(4) NICs that didn't have that problem so as you say it's not all NICs that are affected.First chance I've had to try the System Patches package. Nice. :)
Steve
-
Try this fix:
https://github.com/pfsense/pfsense/commit/f3a4601c85c4de78caa4f12fefd64067fd83dbe8It seems to only affect certain NICs that bounce their link on some configuration operations.
This worked on some interfaces for me, but not others.
Looking at the system logs and the patch code, I could see that it was still going through the same path, and the IP address was empty:Oct 17 16:59:32 php: rc.newwanip: rc.newwanip: Informational is starting em1. Oct 17 16:59:32 php: rc.newwanip: rc.newwanip: on (IP address: ) (interface: opt1) (real interface: em1). Oct 17 16:59:32 php: rc.newwanip: rc.newwanip: Failed to update opt1 IP, restarting...
I then looked at /cf/conf/config.xml and found that there were empty <ipaddr>xml tags on the interfaces that still had problems.
The IP address was set at some point on those in the process of creating the bridge.
I removed the empty tags from the config.xml file, rebooted, and the problem went away:Oct 17 19:09:16 php: rc.newwanip: rc.newwanip: Informational is starting em1. Oct 17 19:09:16 php: rc.newwanip: Interface does not have an IP address, nothing to do.
I'm not familiar with php at all, but I assume the isset function perhaps doesn't account for empty tags and returns true:
if (($curwanip == "") && !(isset($config['interfaces'][$interface]['ipaddr']))) { log_error("Interface does not have an IP address, nothing to do."); return; }
FYI, I'm running the same configuration as the OP:
pfSense 2.1 on a Soekris net6501-50 (Intel 82574L Gigabit Ethernet ports)
I have WAN assigned to em0, OPT1-7 assigned to em1-7 with IPv4 and IPv6 set to none, and LAN is assigned to Bridge0, consisting of OPT1-7.Anyway, thanks for the patch, that made things work a lot better. ;D
Also, the way that system patches process works is rather impressive.</ipaddr>
-
Interesting, thanks for the heads up. :)
No spurious tags left in my config file. I don't think I ever had them set as anything but bridged on that box though.
Steve
-
Thanks Steve.
I'll try it. Some others tried the system patches feature. I guess that's the way to do it. There I go re-inventing the wheel again. -
The official fix works for the msk driver. I like the system patch tool. It lets you use your own patches or official github links.
-
@/CS:
Try this fix:
https://github.com/pfsense/pfsense/commit/f3a4601c85c4de78caa4f12fefd64067fd83dbe8It seems to only affect certain NICs that bounce their link on some configuration operations.
Many thanks jimp! It works fine for me. ;D
Please pardon my ignorance, but how do I apply this fix?
Do I just replace my existing /etc/rc.newwanip with this file?
-
Please pardon my ignorance, but how do I apply this fix?
Do I just replace my existing /etc/rc.newwanip with this file?
So, I just tried doing this, rebooted the pfSense box, but my devices keep going up and down.
My configuration is an AMD E350 board with two Intel Pro/1000 dual NIC's and the on board Realtek 8111C junk.
Any thoughts?
-
I added the 4 highlighted lines to my existing file.
Otherwise you could chance it and gitsync…
-
Please pardon my ignorance, but how do I apply this fix?
Do I just replace my existing /etc/rc.newwanip with this file?
So, I just tried doing this, rebooted the pfSense box, but my devices keep going up and down.
My configuration is an AMD E350 board with two Intel Pro/1000 dual NIC's and the on board Realtek 8111C junk.
Any thoughts?
So, making these changes to /etc/rc.newwanip did not fix the problem on it's own, but then I decided to try to just disable the onboard ethernet (Realtek 8111C) and that actually did the trick. They are no longer cycling for me.
I left the Realtek interface on just as an extra port in case I ever needed it, not something to rely on, as I know they are garbage, but I hadn't considered that it could interfere with the proper working of my good NIC's
Thanks for the help folks.
-
Just for reference using the system patches package takes away the possibility of typo errors. You just enter the commit number and everything is done for you.
Steve
-
Thought I'd report back that this worked for me too.
I have another bridge of fxp(4) NICs that didn't have that problem so as you say it's not all NICs that are affected.First chance I've had to try the System Patches package. Nice. :)
Steve
I have an issue that has almost the same symptoms with Sun hardware part 501-5406-07. I know there has to be hundreds of units out there with this hardware in them. I have personally built about 25 units with this quad fast Ethernet card. I tried to submit it to Redmine but it was rejected. I can fix it myself but I was wondering if it should be fixed in the pfSense code or in the hme driver. Also, being new at reporting bugs, I am curious if I should try to contact the author of the driver if Redmine rejects it? It is a real bug in my opinion. I am a little surprised I haven't heard any one else complaining. Here is the Redmine issue with the reject by Chris Buechler. https://redmine.pfsense.org/issues/3373
It defines the issue. -
Sounds exactly like this thread: http://forum.pfsense.org/index.php/topic,69125.15.html
Unfortunately that wasn't really resolved usefully in my opinion.Steve
-
If you are using the system patches package make sure that not only you get patch 58ee84b4b2f9daba87e44abf663026c6266a7cd8 but also 793299b8f5bdc0fd167093cc5ab9f3f30f0d77ac
Both could be relevant to similar issues, but it would still only apply for interfaces that were set to an IP type of "none" (typically bridged interfaces)
-
I would like to chime in and also confirm this fix worked for me.
I am using a Soekris Net6501-50 with the Intel 82574L 4 port Gigabit Ethernet card. I experienced the same issue with an interface configured into a bridge group bouncing continuously when connecting a device and establishing link. This patch worked for me.
I applied it using the System Patch package. For anyone who is unfamiliar with it, it's pretty easy. Install the "System Patch" package, and then under the Patches screen, add the link from jimp. Fetch the patch, test it (making sure it doesn't have any issues applying), and then apply the patch. It seemed to work without rebooting, but I did anyway just for good measure.
Thanks for those who contributed to this thread and thanks to jimp for the fix!
-
I have a bridged setting (WAN - DMZ) on a asus server rs300-e7 with 4 NIC. i have done this patch https://github.com/pfsense/pfsense/commit/f3a4601c85c4de78caa4f12fefd64067fd83dbe8
unfortunatly i still recieve a lot of time outs on every server behind the DMZ.
the first rule in the WAN is Allow ICMP from all to all. Are there any options which i can test?
p.s. it looks like the more data is beeing processed by pfsense the more time outs it gets…. all loggings are out on the rules. pftop is showing ok status
-
i am almost certain that if there are more connections through the pfsense firewall the more time outs we get, the server hardware is 3 years old with 32GB of ram… states counts is now 6125 and a recieve some pings traffic in is 1MB traffic out is 3MB ..
-
Where are yoy seeing these 'timeouts'?
Steve
-
when i ping to servers behind the pfsense firewall. ( i have now 2 servers behind that)
the pfsense box itselfs does not give time out, when i ping to other servers (not behind the pfsense ) no time out occurs.
I have also got a server, when i put this behind the pfsense firewall ping time outs occurrs , and when i put it back behind the sonicwall firewall no ping time out occurs
Like i said, the more traffic over the pfsense the more time out occurs…. the hardware should not be a problem:
24GB memory Intel Xeon 3470 Raid 4x 1TB raid 10
When ping time outs also apache etc is connecting
-
That doesn't sound like the interface flapping problem described in this thread. Are you seeing the interface going up and down in the logs?
Steve
-
I had that until i did:
https://github.com/pfsense/pfsense/commit/f3a4601c85c4de78caa4f12fefd64067fd83dbe8
now i dont have that anymore but still the ping time outs when more traffic comes over the firewall
-
The lan cards in the asus server are Intel 82574L known issues with that type of LAN?
-
No the 82574L is well tested and widely used.
I think this is probably unrelated to the bridge issue, maybe start a new thread.
Interestingly I see that JimP has actually posted three patches for this issue:
Initially:
https://github.com/pfsense/pfsense/commit/f3a4601c85c4de78caa4f12fefd64067fd83dbe8and then:
https://github.com/pfsense/pfsense/commit/58ee84b4b2f9daba87e44abf663026c6266a7cd8
and
https://github.com/pfsense/pfsense/commit/793299b8f5bdc0fd167093cc5ab9f3f30f0d77acHave you done this?:
https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards#Intel_igb.284.29_and_em.284.29_CardsSteve
-
Thank you Stephan for youre time !
I will check those links out and will publish the result.
-
Unfurtantly the links you provided did not solve the problem.
When i ping my server behind the pfsense bridge it pings good and then loses sometimes 5 pings or more…. i do not understand why.. server is down for sometime.. I put the same server behind an other firewall ( Sonicwall) and no problems...
I ping from differtent locations to the server and different internet providers.. the logs of the pfsense do not show a thing..
-
Oke what i did;
reinstalled pfsense 2.1 now with version i386 in stead of amd64.
After install a added NO Packages.
I did a WAN -> DMZ setting and i did a LAN setting (3x NIC)
A added all 3 GitHub rules ( like mentioned before)
So Setup 1 non busy server behind DMZ looks oke , seconds one (non busy server looks oke) Then a third server with some more traffic and ping loss. more down then up. (load pfsense fw is 0.2)
If i put only the busier server behind the pfsense all is fine…
So the busier server behind the SOnicwall again and all goes smootly.
It looks like if i get over 10000 states the error occurs ... in the firewall settings is see this:
Note: Leave this blank for the default. On your system the default size is: 326000 so that should be oke...
Fact is that the more traffic over the pfsense FW the more time out and problems i have....
-
You tried the tuning options on your em NICs?
It's not surprising that once you hit, whatever your problem is, it shows more the more traffic you push through the box.
Is there anything in the logs? As you've said 10000 states show be no problem for your hardware.
Steve
-
Hi stephen , i made a new topic because this looks like another problem..
http://forum.pfsense.org/index.php/topic,71432.0.html