Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    (2.2.4) Loss of WAN link brings VLAN interfaces down temporarily

    Scheduled Pinned Locked Moved General pfSense Questions
    5 Posts 2 Posters 1.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • A Offline
      ajrg
      last edited by

      Couldn't find a suitable place to post, so I've stuck it here.

      Today, whilst messing about with a test rig, I noticed that pulling the WAN cable from the box caused all interfaces to stop passing traffic for a few tens of seconds.
      During this time, ifconfig showed the NIC as still having an IP and an active link.

      When the system eventually noticed, the IP and link would get cleared, and internal network traffic starts flowing again.

      Reconnecting the cable has no effect at first, but again, after a few tens of seconds, internal traffic stops for a few seconds, then starts again, once a WAN DHCP lease has been taken.

      Has anyone seen this before, or got any ideas? No fancy dual-WAN or CARP/HA setup, and curiously, no output at all to the local console while this happens.

      I'm gonna trawl the logs tomorrow while I fiddle around with cables, but I just wondered if this has already been seen.

      Quick system specs:
      pfSense 2.2.4-RELEASE
      Pentium G 3.5GHz Dual Core CPU
      16GB ECC RAM
      Dual 1TB SATA in GEOM Mirror
      Quad Intel onboard NIC
      MBUF system tunable to 1 million
      Packages: Avahi, Captive Portal, Cron, DHCP Server, DNS Resolver, FreeRADIUS, FTP Client Proxy, Squid3+SquidGuard, Service Watchdog, Shellcmd, Snort

      Typically sits around 25% CPU usage (but shows ~52% thanks to idlepoll), 9GB RAM usage. Plenty of headroom - or so it seems.
      I had initially wondered if Snort was loading the system when interfaces changed, but top (via serial port) showed no such event.

      igb0 WAN - Straight to ISP
      igb1 LAN - Untagged to OOB switch (only used for management) - this loses the ability to pass traffic
      igb2+igb3 OPT1-100 - Tagged VLANs via LACP - these lose the ability to pass traffic

      1 Reply Last reply Reply Quote 0
      • johnpozJ Online
        johnpoz LAYER 8 Global Moderator
        last edited by

        So to do a simple simulation of what it sounds like you did, I got a couple of side by side pings going from my workstation (lan 192.168.9.100/24 em1 on pfsense) to public IP, and one to other segment (printer 192.168.2.50/24 em2 on pfsense) and one that is wlan guest (192.168.6.101/24 em2_vlan300) that vlan is on the same pfsense interface as printer and then pulled the ethernet out of my cable modem that goes to pfsense wan em0.

        While I didn't want to leave it disconnected for long - you can see ping to 8.8.8.8 timeout, without any blips to other segments..

        Do you have pfsense set to reset states on loss of gateway??  Guess I could turn that on and try same test…  That would be my guess under advanced, misc..  But it should only kill states of traffic going to that gateway..

        State Killing on Gateway Failure
        The monitoring process will flush states for a gateway that goes down if this box is not checked. Check this box to disable this behavior.

        So I removed the check mark there and saved, then did the test again.. No loss of connectivity between lan interfaces when wan goes away that I can see.

        wanpulltest.png
        wanpulltest.png_thumb
        killstatesunchecked.png
        killstatesunchecked.png_thumb

        An intelligent man is sometimes forced to be drunk to spend time with his fools
        If you get confused: Listen to the Music Play
        Please don't Chat/PM me for help, unless mod related
        SG-4860 24.11 | Lab VMs 2.8, 24.11

        1 Reply Last reply Reply Quote 0
        • A Offline
          ajrg
          last edited by

          Thanks for trying that out! It's an issue I've never seen (or at least noticed) on this rig before.

          Yeah, I do have state killing turned on. During one of the times I tried this, I did quit the pings and restart them several times - no dice… So, I don't think it's a state issue (but I'll switch it off and try again).

          Having thought about it more (but not being near the box at the moment), I am wondering if it's got something to do with device polling. Not sure how the Intel drivers are built, but could it be feasible that link detection/notifications are interrupt based?

          I'll turn device polling off and give it another go tomorrow - and report back :)

          1 Reply Last reply Reply Quote 0
          • A Offline
            ajrg
            last edited by

            Tried disabling state killing - no difference.
            Tried disabling device polling (and rebooting) - no difference.

            Bizarre. I'll be messing about more tomorrow - been a busy day, today!

            1 Reply Last reply Reply Quote 0
            • A Offline
              ajrg
              last edited by

              Finally solved this problem - seems like the onboard NICs (Intel) had some fault or pathology.

              Disabled the onboard NICs, installed a four port Intel server card, and it's working fine now.

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.