Help, Random "Hot Plug" Events!
-
Yes, I'm assuming some low level issue that a lagg might workaround. And, yes, if both links went down at the same time that might tell us more. Cheaper than replacing the switch.
-
Just an update:
I had a few network disconnects last night and this morning on BOTH ports on the router (igc1 and igc2) at the same time for every disconnect. I'm not sure exactly what that behavior could indicate, but I'm guessing LAGG isn't going to work if both ports like to disconnect at the same time. Maybe there is a way to see more verbose logging in pfSense outside of the Status>System Logs>System>General logs? Just a thought...
I could still try:
- Replacing the switch with another 2.5gbe switch (different model, but still QNAP). I'll have to do this over the weekend.
- Explicitly set the port speed to 2.5gbe. When I tried this before, but I just set the port speed on the switch end. I just realized I didn't set it on the router end. I'll try setting the port speed on both ends and see what happens. Probably won't help, but I'll try anything at this point.
Thanks again, for all of the suggestions.
-
Hmm, is there any logging in the switch? Anything on any other port on the switch show the link bounce?
Unfortunately there really isn't any additional logging to be had there. It sees the link go down. Nothing is logged in pfSense that might be a cause of it.
-
Yeah, the current QNAP switch I have does have logs, but it only logs when a port goes down/up and what time. The other QNAP switch doesn't even have logs (at least in their GUI).
Well, I guess I'll keep trying some of those suggestions and report back. I figure documenting my journey trying to get this exact 2.5gbe router/switch combo running might be helpful in case someone else runs into the same issue. Sometimes it's nice to know you're not alone when stuff like this happens
Thanks!
-
UPDATE:
Hey all, I tried the following over the weekend to no avail:
-
Replaced my current 2.5gbe QNAP switch with another (different model) 2.5gbe QNAP managed switch. I still saw random "hot plug" events even with a different switch. Granted, this wasn't the best test because there is a high chance that QNAP uses the same NICs and software in all of their 2.5gbe managed switches.
-
Explicitly set the port speed on both the router and the switch. Unfortunately, this didn't help any.
Things I could still try (and thoughts):
-
I'm tempted to get my hands on a Netgate 4100/6100 to see if it will work with my QNAP switch. Although, both the 4100/6100 have the same NIC chipset (i225) as the fan less Chinese boxes I'm using. I suppose if a the 4100/6100 works, then that would mean there is probably some low level incompatibility with fan less Chinese boxes that I'm using (e.g BIOS FW, etc)?
-
I'm tempted to try a new 2.5gbe managed switch by another brand. Although, I'm not sure at the moment which brand other than QNAP that has an affordable 2.5gbe managed switch. I suppose if a new switch works, then that probably means a low level incompatibility with the QNAP switches I'm using.
I really wish I could find someone out there running a stable 2.5gbe router/switch setup to see what gear their running. Anyways, I'll try to post back when I have more updates.
-
-
We actually use a QNAP switch for testing 2.5G NICs here. It's a QSW-M2108-2C. No problems between that and the i225 or i226 NICs in the Netgate hardware.
-
@uplink Just wondering if you figured this out. I have just setup a Trigkey mini pc (i225-v) dual nic. I also get the exact same issue on my LAN interface. Its currently hooked up to a new dlink switch that supports 2.5GB. Debating putting my old switch back in but its only gigabit.....
-
It's a good test even if you don't keep it there permanently.
-
Hey @mark77ap
It seems I forgot to give an update on where I ended up, thanks for the reminder!
Let's see...When I last posted I was going to try a Netgate appliance and/or try another 2.5gbe switch (non QNAP) . Unfortunately, I didn't do either. However, I did find a quasi-solution that seemed to work. I ended up buying an SFP+ RJ45 module for the switch and have the Chinese fanless router plugged into that. So far, it's been about a month and a half without a single drop. I'm glad it works, but it's not ideal. I don't like that it's occupying one of my 10gbe SFP+ ports when it could be using one of the 16 2.5gbe ports. I also don't like that I kinda have to run a "router on a stick" setup because the other 2 available LAN ports on the router are essentially useless with my switch.
Hey @stephenw10 It's interesting to know that you've used the QNAP QSW-M2108-2C in your testing. That switch is very close in spec to the 2 QNAP switches that I have. I assume that you were testing with Netgate hardware? This makes me think the issue might be on my router end. I may have to pick up a 4100 someday and give that a try :)
@mark77ap - That's a good test, I'd be curious to know if you have better luck on your old gigabit switch.
-
@uplink Thanks for the update.
The switch I have (Dlink DMS-107) has 2 x 2.5 GB ports and 5 1X GB ports. I was getting nonstop drops when my LAN connection was plugged into the 2.5GB port. I moved to use the 1 GB port and it was stable so far for 12 hours (but at 1 GB :( ) My router is plugged into my modems 2.5GB port and has not seen any drops which is odd.
I have since reinstalled pfsense and upgrade to plus, retrying the 2.5GB ports. Fingers crossed but seems unlikely a re-install is going to fix this.
I did check in my BIOS and the I225 is the third revision so rumour is it should be ok but based on google results these NIC's seem to be plagued with issues.
Was the SFP a 2.5GB or was it a 10 GB.?
-
If you have any power saving options in the BIOS for the NICs or PCIe bus, like ASPM, I would try disabling that. I have seen that resolve link issues in some NICs.
-
Yeah, I tried the same thing (upgrading to plus) it didn't work for me. Hope you have better luck than I did. If I remember correctly, I think I also tried a 1Gbe switch and had success there too, so that's doesn't surprise me. Of course that's not ideal, since that's a waste of having a 2.5gbe port on the router.
My router is plugged into my modems 2.5GB port and has not seen any drops which is odd.
Were getting drops on the WAN to your modem too? Is you router WAN 2.5Gbe and your modem 2.5Gbe ? I thought the drops were only on the LAN interface on the router?
So, my switch is reporting that my SFP+ RJ45 module is connected at 10Gbe and pfsense is reporting 2.5Gbe. My SFP+ module is capable of negotiating down to 2.5Gbe so I think it's just the switch reporting incorrectly (which is common). I also tested the throughput and it's indeed 2.5gbe.
@stephenw10
Good idea, I might take a look for that in the BIOS later today and see if I have any power saving options like that. -
If you have any power saving options in the BIOS for the NICs or PCIe bus, like ASPM, I would try disabling that. I have seen that resolve link issues in some NICs.
Surprisingly, I do not have any options in the BIOS for the onboard ethernet. Nothing, I can't even see a place to disable the adapters let alone any energy saving options.
As for the PCI bus options, I did see 4 "PCI Express Root Port" options which I presume are my 4x 2.5Gbe NICs. I checked each one and they all have ASPM disabled already. However, I did see a "DMI Link ASPM Control" option set to "L1". If I understand this PCI stuff correctly, that "DMI Link" is the link between the Southbridge and the CPU and the "PCI Express Root Port" is the link between the Southbridge and the device.
Maybe I'll try setting the DMI Link to "disable" and see if that helps? Haha, I'll try anything, I can always change it back
-
Yeah it likely wouldn't be a setting for the NIC(s) specifically rather than the PCIe bus/lanes. If it is exposed in that bios at all.
-
Same here, Bios is definitly not the same as a normal PC lol. I did manage to change the turbo efficient mode to off but not sure that is really going to do much other than give me some more cpu cycles.
Even though I had less drops yesterday with pfsense+ I did still get some overnight. I have switched my ethernet cable ( really have my doubts this is it) and will see how it goes.
Only thing I have left to try is to buy another 2.5GB switch and try that.
Mark
-
Well it has been 3 days with the new cable and no connection drops. Not sure how this cable just went bad, or maybe it allways was bad and would work at 1 GB and not 2.5 GB.
Mark
-
@mark77ap That's exactly why cables are rated at different "cats". Lots of esoteric sounding calculations relating to transmission lines come into play at different frequencies, so yes a cable may be perfectly fine at 1G and fail at 2.5G or higher.
-
@mer This was a cat6 rated cable and only 3 ft long so surely it "should" have been ok. There must have been some issue with it that wasnt noticble until It has being used at 2.5GB.
-
@mark77ap "Just because the packaging said cat6 rated doesn't mean it was tested at cat6" :) In the long run, cables are usually the cheapest part so having extras on hand and swapping them out is usually step 2 after making sure everything is really plugged in.
-
Wow, the cable? Well, I tried two different 6 foot CAT6 cables (one unknown brand and another Amazon basics). Both were UTP (unshielded). Maybe I'll have to try some other cable brands, or different types of cables like CAT6e or CAT7 or even some STP (shielded) cables. I'll have to swing by my work and "borrow" a few to test with
I'm glad you were able to solve your dropouts with just a cable, that's great news! I hope I have the same luck!
-