Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Clients experience interruptions and timeouts when using Multi WAN

    Scheduled Pinned Locked Moved Routing and Multi WAN
    2 Posts 1 Posters 180 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F
      fbmm
      last edited by fbmm

      Hey everyone,

      I'm currently setting up Multi WAN on a school network to meet the bandwith demand we are experiencing due to the current situation (lots of video conferencing going on at the moment, inbound and outbound).

      Here is our network

      100 Mbit    50 Mbit   50 Mbit 6 Mbit    16Mbit
      Fritzbox   Fritzbox  Fritzbox Fritzbox  bintec
      WAN 1        WAN 2    WAN 3    WAN 4    WAN 5
          |         |         |        |       |
          |         |         |        |       |
           \        |         |        |       /
             \      |         |        |      /
           =====================================
           ||              pfSense            ||
           ||    LAN 1             LAN 2      ||
           =====================================
                /                      \
              /                          \
      Internal network          USG (Unifi Security Gateway)
          |                       |         |        |
      Clients                    AP1       AP2       AP3
                                  |          |        |
                                 Clients  Clients  Clients
      

      We mainly use WAN1-3 (as they have most capacity) in a Gateway Group, WAN 4 and 5 are only for backup.
      On the network we have about 100 wired clients on LAN 1 and about 300-400 wifi clients connected through Unifi Access Points controlled by a Unifi Controller in conjunction with a USG that is connected to pfSense on LAN 2.

      The WAN routers are all consumer routers called Fritzbox - we got them from the ISP, even though we are a school.

      Here is the problem: Since I started setting up MultiWAN, I've seemed to make internet access for our wifi clients worse than it was before (just WAN 1 and WAN 2 connected to USG, configured as Load Balancing).

      Users are reporting repeated interruptions and timeouts (not loading website for a number of seconds, until website is refreshed). Often browsing works like a charm and is really quick, but even when I go to random webpages every 6th or 7th time I load a website, I get stuck. Refreshing the website helps, but the problem is recurring and the students are quite irritated when using Microsoft Teams for example, with documents not loading or not syncing changes.

      Also, during breaks when all students are using the wifi at the same time gateway quickly become unresponsive, with packet losses and RTT going up quickly, so that the gateways are marked as offline.

      Here are some screenshots I took during a 10 minute break.

      alt text

      alt text

      alt text

      The thing is, we've always experienced outages and slow connections during breaks. But what is new is that interruptions and timeouts also when the load is low on the network, e.g. in the afternoon when few students are left at school.

      Here are my questions:

      • why will my WAN routers go down so quickly? Is it the fault of the consumer routers? Or is pfSense flooding them with too much traffic at a time? I've got traffic shaping in place, but it doesn't seem to help.
      • how can the occasional timeouts be explained, especially during times with a low network load?
      • how can I find bottlenecks on pfSense?

      ideas I have:

      • is the triple NAT (router, pfSense, USG) a problem?
      • is USG seen as one device (not as the 300 indiviual clients behind USG), so that load balancing won't really kick in?
      • is DNS resolution a problem? Clients query 8.8.8.8 and 1.1.1.1 directly (assigned by DHCP)
      • playing around with the latency thresholds in the Gateway config

      Any suggestions will be much appreciated.

      1 Reply Last reply Reply Quote 0
      • F
        fbmm
        last edited by

        So seems the problem was the triple NAT. I changed the topology in a way that clients after authentication will be placed into a VLAN directly connected to pfSense, with pfSense acting as DHCP server. Now clients don't experience timeouts or interruptions anymore, at least not when there is a low network load.

        Issues with WAN routers going fairly quickly remain, however, even though they withstand the load for a bit longer than before.

        1 Reply Last reply Reply Quote 0
        • First post
          Last post
        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.