[SOLVED] WAN goes DOWN on MiniPC box (Intel network adapter)



  • Hello everyone,

    Looking for help with the following issue:

    Have purchased the following MiniPC for deploying pfSense:
    Intel-J1900-Mini-PC-4-LAN-Micro-Computer
    Configuration: 2Gb RAM, 16Gb SSD, with WiFi module

    Getting the following issue:
    After some time (around 8-10 hours), WAN interface goes down (ping to 8.8.8.8 doesn’t go from the box itself). Meanwhile LAN interface remains accessible (so I still can login to the admin panel). Reboot doesn’t help. But if I switch it OFF and switch back ON in several hours – everything works fine again.

    I have tried to wipe and install the pfSense from scratch, also tried to use different ports (igb0, igb3) – result the same. Tested by downloading 14 torrents (~18GB) – worked fine at average download speed around 10MB/s (on 100Mbps I-net link).

    I am continue my debugging, and looking for advice which logs / or adjustments I could check/made as well. As currently I do suspect that it is something wrong with Hardware itself (overheat, etc).

    Thank you in advance!

    P.S.

    1. pfSense v2.4.3_1

    2. According to the site the following network adapter is installed:
      4 x Intel I211-AT- 10/100/1000 Controller
      During device boot network adapter shown as (igb0 - igb3):
      <Intel (R) PRO/1000 Network Connection, Version – 2.5.3-k>

    3. SYSTEM log file with the issue is attached. 0_1534062140190_system.txt

    • You may see the line:
      Aug 10 10:12:35 kv2 rc.gateway_alarm[51708]: >>> Gateway alarm: GW_WAN (Addr:94.XXX.YYY.1 Alarm:1 RTT:920ms RTTsd:3197ms Loss:22%)
    • It is the moment in time where WAN went down. (FYI: then this box was disconnected and the old router was connected, thus the rest of the interfaces also went down as well starting from 10:15)

  • Netgate Administrator

    @andrewgr said in WAN goes DOWN on MiniPC box (Intel network adapter):

    ping to 8.8.8.8 doesn’t go from the box itself

    Ping requests actually don't leave or just replies don't come back?

    You are not using 8.8.8.8 as the monitor IP so I assume 94.x.x.x is you real gateway address. Have you tried using a different monitor address such as 8.8.8.8? The gateway itself may not reliably respond to ping or react badly to being pinged even.

    Does this happen regardless of the traffic level at the time?

    Steve



  • @stephenw10, thank you for your reply! I have just switched back to the MiniPC box and once this happen again - will do more debugging (have to hurry as office gets cut off from the Internet, thus I do not have much time to do a deep debugging). Will update this post accordingly.

    As per the question: "Does this happen regardless of the traffic level at the time?" - yes, I have got the impression that it does happen regardless the traffic level at the time.

    Will provide more details soon!

    Thank you!



  • I have got the same situation again and have gathered some more diag data:

    1. pfSense on MiniPC box worked well whole business day and only failed at around 5am next day, so looks like it is not related to the network activity (looks like issue happens after some period of time: 15...20 hours)
    2. I can access admin console from LAN
    3. I can ping WAN address from LAN
    4. Status -> Gateways shows main GW as Offline with Loss = 100% (see attached screen)
      0_1534249535803_20180814_081746-Window.jpg
    5. I can not ping neither Provider GW (94.XX.YY.1) nor 8.8.8.8, Provider can not ping WAN address as well
    6. Reboot doesn't help, after reboot the situation remains the same. Probably the issue will gone after some time the MiniPC box remains switched OFF.
    7. Provider told me that they can not see WAN's MAC. During the reboot - they told they saw MAC, but after reboot - it is not seen online anymore.
    8. WAN interface on pfSense Status page shown as UP (I was trying to disable/enable WAN interface - no effect. I tried to delete / add main GW - also no effect).

    Any ideas what could be the reason of the issue? Is it SW (pfSense) or HW (MiniPC) related?

    Any more ideas how to localize and fix the issue?

    Thank you so much!


  • Netgate Administrator

    Ok, that's some useful info.

    Does it recover if you disconnect and re-connect the WAN Ethernet?

    Does it correctly show as down when it's disconnected?

    What is the WAN actually attached to? Can you power cycle that device?

    If you run a packet capture on WAN do you see any packets leaving at all? Do the NIC activity LEDs show traffic on WAN?

    Steve



  • @stephenw10, thank you for your questions! Here is the answers (updated):

    • Does it recover if you disconnect and re-connect the WAN Ethernet?
      No. (checked)

    • Does it correctly show as down when it's disconnected?
      Yes. (checked)

    • What is the WAN actually attached to? Can you power cycle that device?
      WAN attached to TP-LINK TL-SG1005D switch.
      There is one cable goes to I-net Provider, and there are two routers with white public IP-addresses connected to the switch.
      This setup works absolutely stable with the old router (Cisco RV220W), which I was planning to replace with this MiniPC+pfSense box.
      Yes, I have power cycled the TP-Link switch (changed the port as well), but that didn't help.

    • If you run a packet capture on WAN do you see any packets leaving at all? Do the NIC activity LEDs show traffic on WAN?
      Packet capture: yes, I can see the packets leaving the WAN. See 0_1534322188558_packetcapture.txt
      NIC activity: yes, LEDs show the traffic activity properly.

    I will switch back to the MiniPC box after EoB today, hopefully the issue will reappear tomorrow morning around 10am.

    Any suggestions on other diag data / logs to check and collect?

    Thank you, Steve, for your time! Really appreciate your help!



  • Okay, it failed again with the same issue after around 15 hours.

    I have updated my answers above according to the latest checks. I am getting the impression that Provider's Switch and my MiniPC+pfSense box just get into conflict with each other...

    Additionally I did the following checks:

    1. I have reassigned the WAN interface from igb3 to igb2 and rebooted the pfSense. But didn't get any success: main GW still was shown as offline.
    2. I have connected the MiniPC bo directly to Internet Provider cable (excluding intermediate switch) - no success as well.

    Thanks!


  • Netgate Administrator

    Hmm, so the packet capture shows pings leaving as expected. It does not show it ARPing for the gateway for example so it appears to be working correctly at layer 2.

    Are you able to ping the other router on the switch?

    Can you try swapping out the switch?

    Can you packet capture anywhere else, on a switch mirror port perhaps?

    Since you see link activity at the switch end during this outage we have to assume pfSense really is sending those ping requests and the replies are just not coming back. You might check the packet capture to make sure it is sending to the correct MAC though.

    Steve



  • Thank you, Steve! I am feeling that I am about to give up fighting with this issue. I will try to do suggested tests, but what I am planning next - is to install this MiniPC box in another location and test it there with another provider.
    Will update this post once I have new input.

    Thank you!

    --
    Andrew



  • By the way, a quick clarification:

    1. the MiniPC box has 4 lan ports, 1 of them is used for the WAN, and remaining 3 are added to the Bridge.
    2. the MiniPC box also has a WiFi card, identified during boot as Ralink 802.11 n WLAN, RT3070. Which is also added to the Bridge together with 3 LAN ports. I have also tested by not adding the WiFi card to the Bridge and defined a separate network for it, but issue anyway was caught.

    In case these facts may somehow be related to the issue discussed in this topic – pls., let me know!

    Thank you!


  • Netgate Administrator

    The bridge won't help with performance (much better to use a switch) but it shouldn't affect WAN connectivity at all.

    Steve



  • Another question: I am using MAC address spoofing on the WAN interface (as it is registered with my ISP). Can it be a reason of the issue?
    Thanks!



  • Hi everyone,

    It seems that I managed to fix the issue! Recently I have implemented set of changes which helped to stabilize MiniPC box behavior:

    The changes were:

    1. I avoid usage of MAC spoofing on WAN (by updating on ISP side MAC that is set for the WAN by hardware setting). – this is the most likely the reason of the issue described in this topic!
    2. I have physically removed WiFi adapter, and as result – didn’t set up Wireless interface
    3. I did not assign the remaining 2 LAN Ports (just left two default interfaces WAN and LAN)
    4. As result – I didn’t create Bridge for the internal interfaces (LAN1, LAN2, LAN3, WIFI)

    Now I will be adding back items 2-4 one by one and checking the result. Most probably the issue was exactly due to MAC spoofing, as both ports (WAN and LAN) were extensively used (with traffic like IP CCTV via IPsec), and only WAN had a MAC address spoofed and was going down.

    Will update this post once Items 2-4 returned back.

    Thanks everyone!



  • The latest update: looks like the issue was solved!

    I have returned back all the initial settings, except MAC address cloning (spoofing) on WAN interface. And now MiniPC box works absolutely fine!