Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    NanoBSD CPU usage high on MultiWan, PHP processes abound

    Scheduled Pinned Locked Moved Hardware
    26 Posts 5 Posters 4.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P
      phil.davis
      last edited by

      The Mac is connected to the Alix router through two unmanaged switches, and a firewall rule sends it through the secondary WAN.

      Maybe the rule is only for TCP/UDP and ICMP ping from the Mac is still going out the default gateway? That would explain the ping being so good. You could traceroute from Mac to an external host to see which way the ICMP is really going.
      It does sound like the link is so saturated that the apinger monitoring is struggling to see any decent numbers and is deciding the link is bad.

      As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
      If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

      1 Reply Last reply Reply Quote 0
      • A
        arriflex
        last edited by

        @phil.davis:

        The Mac is connected to the Alix router through two unmanaged switches, and a firewall rule sends it through the secondary WAN.

        Maybe the rule is only for TCP/UDP and ICMP ping from the Mac is still going out the default gateway? That would explain the ping being so good. You could traceroute from Mac to an external host to see which way the ICMP is really going.
        It does sound like the link is so saturated that the apinger monitoring is struggling to see any decent numbers and is deciding the link is bad.

        You are correct, I noticed before the reboot that my rule was limited to TCP/UDP, and updated it to "any." That explains the new consistency. Nice catch!

        It's working fine with the ludicrous numbers for monitoring delays and decreasing the number of checks that gateway does. If I find the time, I'll give a snapshot a try as this is the box for doing that on.

        arri

        1 Reply Last reply Reply Quote 0
        • B
          bryan.paradis
          last edited by

          @arriflex:

          @phil.davis:

          The Mac is connected to the Alix router through two unmanaged switches, and a firewall rule sends it through the secondary WAN.

          Maybe the rule is only for TCP/UDP and ICMP ping from the Mac is still going out the default gateway? That would explain the ping being so good. You could traceroute from Mac to an external host to see which way the ICMP is really going.
          It does sound like the link is so saturated that the apinger monitoring is struggling to see any decent numbers and is deciding the link is bad.

          You are correct, I noticed before the reboot that my rule was limited to TCP/UDP, and updated it to "any." That explains the new consistency. Nice catch!

          It's working fine with the ludicrous numbers for monitoring delays and decreasing the number of checks that gateway does. If I find the time, I'll give a snapshot a try as this is the box for doing that on.

          arri

          I am not sure if its going to change anything. Turn off gateway monitor on that wan and then test with backblaze and ping out and see if you see a difference. There could be something going on where the added load on the box is degrading the connection further or something. If you get the same loss/latency without apinger running on that WAN then you are dealing with normal saturation? This will need to be fixed using one of the things listed in my above post.

          1 Reply Last reply Reply Quote 0
          • A
            arriflex
            last edited by

            @bryan.paradis:

            I am not sure if its going to change anything. Turn off gateway monitor on that wan and then test with backblaze and ping out and see if you see a difference. There could be something going on where the added load on the box is degrading the connection further or something. If you get the same loss/latency without apinger running on that WAN then you are dealing with normal saturation? This will need to be fixed using one of the things listed in my above post.

            I finally got around to trying this for you. With the gateway monitor off on the secondary wan, and fully saturated upstream traffic out of it, ping times from both the configurator page using the secondary wan gateway and from the client routed through the firewall (this time all traffic not just tcp/udp;) are from two to five seconds.

            I think it's clearly just  a congested interface the way Backblaze saturates it through their SSL tunnel.

            1 Reply Last reply Reply Quote 0
            • B
              bryan.paradis
              last edited by

              @arriflex:

              @bryan.paradis:

              I am not sure if its going to change anything. Turn off gateway monitor on that wan and then test with backblaze and ping out and see if you see a difference. There could be something going on where the added load on the box is degrading the connection further or something. If you get the same loss/latency without apinger running on that WAN then you are dealing with normal saturation? This will need to be fixed using one of the things listed in my above post.

              I finally got around to trying this for you. With the gateway monitor off on the secondary wan, and fully saturated upstream traffic out of it, ping times from both the configurator page using the secondary wan gateway and from the client routed through the firewall (this time all traffic not just tcp/udp;) are from two to five seconds.

              I think it's clearly just  a congested interface the way Backblaze saturates it through their SSL tunnel.

              Is that same, worse or better than results with apinger on? Would have been interesting if apinger perpetuates the problem with the highly increased load.

              1 Reply Last reply Reply Quote 0
              • A
                AIMS-Informatique
                last edited by

                Having the same CPU rising behaviour on our PCEngine ALIX with Nano Intall (2.1.4 stable).

                In France, we experiment loads of trouble over xDSL connections. Mainly loss, caused even by a bad synchro or by a user that get the line to saturate because of big downloads / uploads.
                This causes pf to experience a hight CPU load when GW is considered as offline by PF.

                We did the trick of gateway polling in "Routing->Gateways->Edit gateway" :

                - Advanced->Packet Loss Thresholds = 20% / 40% (default 10% / 20%)
                  - Probe Interval = 5s (default = 1s)
                  - Down = 60s (default = 10s)

                For what we experienced so far with those values is a better responsivness of PHP UI, and RRD graph shows a fall of CPU load. Still having those settings for test for few hours on PF that are experimenting DSL sync difficulties.

                It looks good so far, and looks like increasing apinger tests and faillure decision, gives the ALIX more time to execute what it has to execute, and CPU graph falls dramatically (so far…).

                Sounds to be a good and quick idea to play with the values above.

                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.