Intel i350 NIC troubles



  • Hopefully I'm in the right place for this. I thought it was a Suricata problem, so I started a thread over here. But now we're thinking it might be a NIC related issue.

    I have a Supermicro X9DRD-LF MOBO with dual Intel i350 NICs, an e5-2609v2, and 8GB RAM running pfsense 2.4.4 release p1.

    The last two times I had issues I would get ping times through the roof, gateway status would show down even though I could still get out to the internet. I would have high CPU usage with almost no traffic flowing. Recycling the Suricata service seemed to fix the problem, although it appears to have happened two other times later in the day. I did nothing and it seems to have resolved on its own. This only happens every few weeks, and I haven't found a way to make it happen. Normal idol CPU usage is 3-7% and doesn't really get out of the 20% under full WAN saturation.

    Here are some logs and graphs that will hopefully show what was going on while things weren't working correctly.
    0_1546430615898_pfsense help.txt
    0_1546430769949_2018-12-31 05_33_09-Window.png 0_1546430783629_2019-01-01 22_27_56-Window.png 0_1546430791049_2019-01-01 22_28_25-Window.png 0_1546430797152_2019-01-01 22_28_42-Window.png 0_1546430802183_2019-01-01 22_28_56-Window.png

    Anyone know what is breaking and how to fix it?


  • Netgate Administrator

    What sort of internet connection do you have? What bandwidth is it?

    When this happens can you actually still ping the gateway IP from a client behind pfSense? If you run a packet capture do you see pfSense pinging the gateway IP and no replies as expected?

    If the gateway is down or is going up and down then CPU usage will be higher. A number of events are triggered each time that happens.

    Steve



  • @stephenw10 My ISP is Spectrum Cable with a 100/10 connection.

    I am able to brows the internet and I think I was still able to do a ping test out or 8.8.8.8 or something like that. Web pages load slowly and the ping results were high. I don't remember what the RTT was on the gateway monitor, but the RTTsd was several hundred ms. I will have to wait until it happens again to look at the packet capture, it only acts us every few weeks.



  • It's acting up again. Here is some packet capture of the gateway ping and what I was talking about with the gateways showing offline. It looks like everything gets through. The response just takes a while. So, is the issue on my end or their's?

    20:01:42.278213 IP 71.89.24.110 > 71.89.24.1: ICMP echo request, id 23662, seq 16787, length 8
    20:01:42.289592 IP 71.89.24.1 > 71.89.24.110: ICMP echo reply, id 23662, seq 16787, length 8
    20:01:42.779324 IP 71.89.24.110 > 71.89.24.1: ICMP echo request, id 23662, seq 16788, length 8
    20:01:42.788557 IP 71.89.24.1 > 71.89.24.110: ICMP echo reply, id 23662, seq 16788, length 8
    20:01:43.280583 IP 71.89.24.110 > 71.89.24.1: ICMP echo request, id 23662, seq 16789, length 8
    20:01:43.290698 IP 71.89.24.1 > 71.89.24.110: ICMP echo reply, id 23662, seq 16789, length 8
    20:01:43.781401 IP 71.89.24.110 > 71.89.24.1: ICMP echo request, id 23662, seq 16790, length 8
    20:01:43.825586 IP 71.89.24.1 > 71.89.24.110: ICMP echo reply, id 23662, seq 16790, length 8
    20:01:44.288002 IP 71.89.24.110 > 71.89.24.1: ICMP echo request, id 23662, seq 16791, length 8
    20:01:44.303712 IP 71.89.24.1 > 71.89.24.110: ICMP echo reply, id 23662, seq 16791, length 8
    20:01:44.789178 IP 71.89.24.110 > 71.89.24.1: ICMP echo request, id 23662, seq 16792, length 8
    20:01:44.805659 IP 71.89.24.1 > 71.89.24.110: ICMP echo reply, id 23662, seq 16792, length 8
    20:01:45.291172 IP 71.89.24.110 > 71.89.24.1: ICMP echo request, id 23662, seq 16793, length 8
    20:01:45.569230 IP 71.89.24.1 > 71.89.24.110: ICMP echo reply, id 23662, seq 16793, length 8
    

    0_1546568215975_Selection_001.png
    0_1546568588308_Selection_003.png



  • @mouseskowitz said in Intel i350 NIC troubles:

    Spectrum Cable

    I'll ask before @chpalmer does: What cable modem are you using?

    Something on this list?



  • @biggsy Yes it is an Arris TM1602A MTA. I'll have to see if I can get something not on the list. I've heard that they are starting to give out DOCSIS 3.1 modems. Maybe that'll fix things.



  • Well, I hope that's the problem. It does look like the prime suspect.



  • New modem is installed. It's a Spectrum E31U2V1 DOCSIS 3.1. We'll give it a couple of days and see what happens.


  • Netgate Administrator

    Yes, a suspect modem is exactly what I was going to suggest also given your symptoms.
    It will be interesting to see how this new modem behaves.

    Steve



  • That was quick! Maybe they have them on-hand to replace the Intel Puma-based ones. 2018 really was not a good year for Intel.

    Do let us know how it goes.



  • Well, it wasn't the old modem. Had problems with the connection after replacing it. I called the ISP and we couldn't find any issues with my laptop connected directly to the modem, but we did have to reboot the modem to get my laptop to connect. Once I got pfsense hooked back up everything was normal as well.

    I have a spare Unifi managed switch laying around. Anyone know how I would configure it to mirror a port so I can test from pfsense and my laptop at the same time direct to the router? I'm trying to nail down if this is a router issue or ISP issue.


  • Netgate Administrator

    Although the ports should be identical you could try re-assigning them then other way around just to be sure it's not a hardware problem.

    Steve



  • @stephenw10 I'll have to try that next. Right now I figured out how to be able to switch between pfsense and my laptop being directly connected to the modem with out resetting the modem. Of course, as soon as I get that figured out everything is working properly. The hope is to be able to switch between devices quickly and isolate the issue to my router or the ISP.

    There does seem to be a trend on blocks of time during the day when there are issues. I'm not sure yet if that is from the usage within the network or ISP issues.



  • I'm pretty sure it's a router problem. I've swapped igb0 and igb1 around, and we'll see what happens. If I still have issues I might have to swap out the mobo and see what happens.



  • Things have been working well since I swapped the NICs around. I'm not sure if it was the swap or the reboots for the reconfiguration and recent update. I guess time will tell. If it was the swap that fixed it, does that mean that one of the NICs is failing or what?


  • Netgate Administrator

    It seems unlikely but I guess it could be. Some small variation in the components.
    If it works reliably connected that way I would just move on TBH.

    Steve