Unplugging Lan cable requires reboot to reconnect, 2.1RC1



  • Well lets start again shall we, the truncated version.  The 60 min logout deleted my composition.  I have an AMD64 ver 2.1RC1 w/dual (egb0) and single port (em0) Intel adapters. When I don't have a client LAN device connected when booting up PfSense or I unplug the same for a few seconds then plug back in, I get a popup errors cycle every 2 seconds betwen "local area connection is now is now connected" and "a network cable is unplugged".

    Tried reassigning LAN from dual port adapter to single port, no change.
    Lan has DHCP server enabled.
    Problem occurs with lease from free pool or reserve lease pool.

    I don't get this behavior on a 32-bit ver 2.0.3 using the same config…xml.



  • Reformatted and reinstalled to factory defaults and it resolved this issue.  Without forum tips on what to look for, my only thought is to regress back through the DHCP server setup first by removing all lease reserves by MAC and just relying on the lease pool.  I hope I'm on the right track narrowing down the problem area.


  • Netgate Administrator

    @markn62:

    I have an AMD64 ver 2.1RC1 w/dual (egb0) and single port (em0) Intel adapters.

    You mean igb(4)?

    @markn62:

    I get a popup errors cycle every 2 seconds betwen "local area connection is now is now connected" and "a network cable is unplugged".

    I take it that's on client machine behind pfSense? That's with the cable connected directly to the pfSense box from the client?

    That is a speed/duplex negotiation error. Does it stop happening if you have a switch between pfSense and the client?

    Steve



  • Steve, yes, it's a laptop behind PfSense and I did mean igb.  I'll try a switch between.  But since it doesn't happen with factory defaults I don't think there is anything wrong with the adapters except perhaps the use of 64-bit drivers.  It doesn't happen on the 32-bit box using the same config file either.  I should mention when the adapter ping pongs to "local area connection is now connected" it also tries to get a DHCP lease for a second or two then reports unplugged.

    I forced a 100m full negotiation on the client adapter, didn't help.  I also removed all DHCP reserve leases, set the DHCP server page to factory defaults, and removed all from PPTP that calls on DHCP and no joy.



  • Tried 10M full on client, no go.  Put a switch between client and PfSense.  Then I can pull the cable between laptop and switch and plug back in without a problem, as expected.  When I pull the plug between the switch and PfSense the three front panel lights, on the switch, all lite up then go dark every couple seconds. Reboot PfSense restores normal ethernet function.

    Any other suggestions?


  • Netgate Administrator

    Don't unplug the switch?  ;)

    It's an interesting problem. The drivers control how the NIC hardware negotiates the speed and duplex so changing to a 64bit version or simply a newer one could well be causing this. Hard to see how DHCP changes could do though since that's higher level.  I have a box with an Intel NIC that will not come up unless it's plugged in at boot. I have no idea why it does it and I only found out it did it at all by accident when I unplugged the wrong cable. The cables on the pfSense box are very rarely plugged/unplugged whilst it's up.

    One thing you can try is forcing the speed on the pfSense NIC: http://doc.pfsense.org/index.php/Forcing_Interface_Speed_or_Duplex_Settings

    That should stop it trying to re-negotiate the connection continually. However it didn't work in my case.

    Steve

    Edit: Typo



  • I found out the same way you did, by booting PfSense without a client plugged into it's ethernet adapter.  Prompted me to unplug while active and discovered the ping-pong effect.

    Don't think I can tolerate a 2 minute reboot dropping customer connections just to swap out a cable or reboot a modem that can occur in a fraction of this time often without connections timing out.  I would sooner revert back to 32-bit and toss 4G of ram out the window.

    I was wanting to try forcing the PfSense 1G Intel adapter to 100M but didn't know how.  Thanks for the link, I'll try it.  Can't imagine we are the only ones to have encountered this.  With Intel recommended, there are only 3 models of NIC's that work in PCIe, a single, dual and quad port. Perhaps I'll also try the integrated RealTek adapter to see if it exhibits the same.


  • Netgate Administrator

    There has been some recent work on the igb driver in 2.1. The driver was updated to a new version and then it was backed out as it was found to have broken AltQ. Which 2.1 snapshot were you running exactly?
    Trying a different NIC, even if it is a Realtek, would be a good test.
    Another thing to consider might be if your NIC is setup for WOL. If it is (and possibly if it isn't) then it is able to negotiate a link, or at least maintain it, even when the machine is in standby. Thus when the machine it turned on the NIC establishes a connection before the OS and any drivers have loaded. Then when the drivers load they may reset the NIC and then establish a new connection based on your configuration. This is different behaviour to a NIC that is completely off before boot. Might be completely irrelevant but it's something I've always found curious.

    Steve



  • I'm running pfSense-LiveCD-2.1-RC1-amd64-20130827-1655.iso.gz @ http://snapshots.pfsense.org/FreeBSD_RELENG_8_3/amd64/pfSense_RELENG_2_1/livecd_installer/?C=M;O=D

    I tried reassigning NIC's and all four behave the same.  Tells me it's not related to any one particular driver and that something else is the cause.  Doesn't matter if it's em0, rl0, or igb.

    I tried "ifconfig em0 media 10baseT/UTP mediaopt full-duplex" at the command line, the message on my laptop said 10mbps for a couple seconds, then onto the ping-pong.

    Dang I don't wanna rebuild the config from scratch.  It's gotta be several hours of work, more if I test ethernet as I go.  I'm still suspicous that the config transfer from 32-bit to 64-bit isn't 100% since 64-bit factory defaults don't exhibit this problem and 32-bit with said config doesn't do it either.

    Before I redo the config, my next test may be putting pfSense-LiveCD-2.1-RC1-i386-20130826-1650.iso.gz on this AMD64 box.  Means I'll have to reinstall TRIM for the SSD but may shed some light on this problem.


  • Netgate Administrator

    Good test.
    All of your config (except TRIM) should be in the config.xml file so re-installing shouldn't be too much of a problem.
    If all of your interfaces are doing this then, yes, something's gone badly wrong.

    Steve



  • Reinstalling is very much a problem without a good XML.  I've already reinstalled using a backed up XML just before I reinstalled and it just brings the same issue back.  The only other idea I've come up with is to compare a factory 64-bit XML with a factory 32-bit XML for additional lines, then adding the additionals to my custom if the 64-bit upload didn't add them.

    I've gained much knowledge reading forum posts, and the cookbook purchased, not so much posting my own questions.  I assumed more were active on this forum.

    Thanks for your help Steve.


  • Netgate Administrator

    The update URL is stored in the XML file so if you switch architectures then it may cause a problem, particularly on 2.1 with daily updates. If you restore an old config file make sure you manually select the update URL afterwards.

    Steve



  • Where would I go to "manually select the update URL", Stephen, the WebGUI?


  • Netgate Administrator

    Yes, sorry.
    System: Firmware: Updater Settings: Select the update URL from the drop down.

    Steve



  • Did an update.  Now when ethernet is unplugged I still get popup errors every 2 seconds but instead of it toggling between "local area connection is now connected" and "a network cable is unplugged" it toggles between no error popup (normal desktop) to "a network cable is unplugged" popup.  So I guess the RC1 snapshot fixed it 1/2 way, but still just as disfunctional.

    I installed NotePad++ to do XML comparisons between a working factory config and my custom config.  The free app has a nice compare plugin and search to next difference which makes short work of a 4000 line file.  Before I tweek too much with this tool, I think first I'll work through a section restore of my custom config onto the working factory config one section at a time until the ethernet toggle issue comes to life.  I'm hoping this will narrow down my search area in NotePad++.

    Any other suggestions welcome.
    Mark



  • What is reported in the pfSense system log(s) when the ping-ponging occurs?



  • FINALLY !!!  I did a section restore of my custom config onto the working factory config one section at a time until the ethernet toggle issue comes to life.  The "interfaces" section was the culprit.  Went into the GUI to look for something that "should" have been obvious and found it.  Set the speed and duplex from auto to 100TxFull on Lan and Wan, since that's their capability, and the problem disappeared.

    I can't believe the solution was this simple…. aarrrg!!!



  • @markn62:

    Went into the GUI to look for something that "should" have been obvious and found it.

    And the obvious thing was ….?


  • Netgate Administrator

    Yes, more details please.  :)

    I wouldn't fix the speed/duplex on any interface you don't have to. It can cause problems later.

    Steve



  • I read a post sometime back that said on occasion there were problems with PfSense negotiating a connection and fixing the neg speed was the fix.  It was the first thing I tried, on the interface page, and the problem went away.  Negotiates the link and DHCP in seconds now.

    I tried "no preference" and it works fine too.  Just the auto-negotiate that seems to give it fits.  Using 3 Intel adapters and one onboard RealTek. All four adapters exhibit the same behavior using 3 different drivers.  Steve, your point is well taken, I set mine to "no preference".  Not sure what will happen when I switch cables from a 100M to 1G adapter. It might envolk auto-negotiate and choke again.  Yet to be seen.  Can always revert to fixed-rate.

    Thanks for working through this with me…
    Mark



  • @markn62:

    I tried "no preference" and it works fine too.  Just the auto-negotiate that seems to give it fits.

    Thanks for the details.


  • Netgate Administrator

    Yes, thanks for coming back. A couple of things confuse me here though. This seemed like a speed/duplex negotiation problem from the beginning which is why I suggested:
    @stephenw10:

    One thing you can try is forcing the speed on the pfSense NIC: http://doc.pfsense.org/index.php/Forcing_Interface_Speed_or_Duplex_Settings

    That should stop it trying to re-negotiate the connection continually.

    I was forgetting that there is a gui option for this in 2.1 at that point.  ::)

    However you then reported:
    @markn62:

    I tried "ifconfig em0 media 10baseT/UTP mediaopt full-duplex" at the command line, the message on my laptop said 10mbps for a couple seconds, then onto the ping-pong.

    It seems like that should have worked. Maybe an ifconfig down/up was required.  :-\

    Anyway glad you got it sorted, just trying to get a better idea of what happened.  :)

    Steve



  • Ya Steve, I didn't try long forcing 10m.  It may have needed an ifconfig down/up.  Though I was looking for a solution that didn't require anything manual after connection.  When I have to cold swap to/from current/backup router I don't want any issues adding to customer downtime.

    One more tidbit about how "no preference" would behave if I plugged into a 1G adapter.  Today I finally got brave enough to take the 2.0.3 box out of production and…cringe...swap in the 2.1 box.  I forgot my cablemodem has a 1G adapter.  Appears PfSense had no problem negotiating a link.  The swap-over took <5 seconds and no connections were lost.

    Happy days again...  8)


  • Netgate Administrator

    Ha, nice.  :)

    Steve


Log in to reply