Navigation

    Netgate Discussion Forum
    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search

    An earnest appeal - please do fix APINGER in 2.2

    2.2 Snapshot Feedback and Problems - RETIRED
    29
    95
    17495
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P
      pubmsu last edited by

      Hi there,

      There has been numerous posts in various places in the forum on apinger issues that started appearing mostly from 2.1.x. These issues still exist in 2.2 Alpha (as of snapshot of couple weeks back).

      1. The main issue is that after some time (this period varies), the apinger shows wrong RTT and Loss numbers in the dashboard, or displays "Pending" and eventually excludes the gateway from gateway groups, while there's absolutely no actual RTT delay or packet loss in that WAN as tested by doing ping from any LAN client.

      2. Restarting apinger doesn't help much. In some cases within a minute of getting restarted, it reports the wrong RTT and loss and goes to "Pending" and then marks gateway as down (while in default gateway's case, pfsense is still using that gateway for pf to Internet traffic).

      3. The problem seems to be more prevalent for multi-WAN setups where the monitor IP is any Internet host with normal RTT of more than 50 ms. The problem lessens if the gateway IP is used as monitor IP (the default option) but then the value of testing whether the Internet connectivity is there or not diminishes.

      4. There's a weird workaround that isn't practical for everyone: if you have more than one WANs, use another router (like a wi-fi router) between the Internet connection and pfsense. Doing this, surprisingly stabilizes the apinger and it doesn't drift away much. In addition to this work-around, one can use a cron job entry to automatically restart apinger every hour or so, to further avoid apinger's wrong behavior.

      apinger is the basic mechanism required for smooth functioning of multi-WAN setups. So this is very critical that it works flawlessly. Many users are getting affected and not everyone is reporting. Many hours of productivity are being lost due to falsely making a gateway inactive. It should be fixed in 2.2.

      Here's a list of forum postings regarding various apinger-related issues:

      https://forum.pfsense.org/index.php?topic=68637
      https://forum.pfsense.org/index.php?topic=66328
      https://forum.pfsense.org/index.php?topic=69533
      https://forum.pfsense.org/index.php?topic=74914
      https://forum.pfsense.org/index.php?topic=72085
      https://forum.pfsense.org/index.php?topic=72314
      https://forum.pfsense.org/index.php?topic=76770
      https://forum.pfsense.org/index.php?topic=77266
      https://forum.pfsense.org/index.php?topic=73009
      https://forum.pfsense.org/index.php?topic=70441
      https://forum.pfsense.org/index.php?topic=72455
      https://forum.pfsense.org/index.php?topic=73109
      https://forum.pfsense.org/index.php?topic=72303
      https://forum.pfsense.org/index.php?topic=69261
      https://forum.pfsense.org/index.php?topic=71879
      https://forum.pfsense.org/index.php?topic=67354
      https://forum.pfsense.org/index.php?topic=68637
      https://forum.pfsense.org/index.php?topic=65505
      https://forum.pfsense.org/index.php?topic=63470.0

      Thanks,
      msu

      1 Reply Last reply Reply Quote 0
      • P
        pubmsu last edited by

        (not being able to attach screenshot showing issue due to 500 internal server error). Here it is:

        http://imgur.com/GCqD1CV

        1 Reply Last reply Reply Quote 0
        • G
          ggzengel last edited by

          I second pubmsu.
          I get more than 100 mails every day about these instable cable providers.
          At some days I get more than 2000 mails. All send within 5 minutes.

          1 Reply Last reply Reply Quote 0
          • N
            naras last edited by

            +1
            This is a quite annoying issue.

            1 Reply Last reply Reply Quote 0
            • A
              akeness last edited by

              im also having this issue. i have 3 wans combined bandwidth but 2 of my gateways are always down. which is not. :(

              1 Reply Last reply Reply Quote 0
              • B
                Brutal last edited by

                They've had the gall to offer 4 general releases with this problem.  It makes anything higher than 2.0.3 worthless for anything more serious than a simple home router.

                Shame.  This is where software ideas like this go to die.

                1 Reply Last reply Reply Quote 0
                • jimp
                  jimp Rebel Alliance Developer Netgate last edited by

                  The problem is that none of us can reproduce this on demand in an environment we control and can debug. Yes, we know some people have problems, but they don't affect the majority of users.

                  Multi-WAN works fine here, and for many others. Ermal has been working on fixing this up but it's been a long process since the exact parameters to reproduce the problem have never been clearly identified or replicated.

                  The notifications are a separate issue entirely, but that's not slated to be cleaned up on 2.2 but sometime afterward.

                  Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                  Need help fast? Netgate Global Support!

                  Do not Chat/PM for help!

                  1 Reply Last reply Reply Quote 0
                  • P
                    pubmsu last edited by

                    Thanks for your update jimp - we at least now know the difficulty in fixing it.

                    If you want, I can help you with access to an otherwise good multi-WAN test environment that has this issue, which is an exact replication of our production environment. You can let me know in PM if this will help.

                    We have been using pfSense for last 7 years I guess, and really need this to be resolved.

                    1 Reply Last reply Reply Quote 0
                    • jimp
                      jimp Rebel Alliance Developer Netgate last edited by

                      Getting access wouldn't help as much as definitively identifying the specific condition leading to the problem if possible (e.g. a latency over X for Y amount of time, or Z gateways with Q latency, etc)

                      Depending on how long it takes for the problem to repeat it could still be difficult for us to find time to watch it closely enough to find when the problem starts specifically.

                      Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                      Need help fast? Netgate Global Support!

                      Do not Chat/PM for help!

                      1 Reply Last reply Reply Quote 0
                      • P
                        pubmsu last edited by

                        Got it Jim. In our case though the problem starts within 5 to 10 minutes.

                        1 Reply Last reply Reply Quote 0
                        • jimp
                          jimp Rebel Alliance Developer Netgate last edited by

                          Seeing the quality graph for your gateways may help as well, with notes about where apinger was restarted and where the problem was first noticed.

                          I tried to artificially induce latency using one firewall in front of another and increasing the delay on a limiter for ICMP traffic. Each time I let it run for 15+ minutes at various latencies and then lifted the limiter. Each time it always bounced back to close to 0 for me, I never saw it get stuck, so there must be a few different factors at work making it get stuck over time for others.

                          Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                          Need help fast? Netgate Global Support!

                          Do not Chat/PM for help!

                          1 Reply Last reply Reply Quote 0
                          • A
                            athurdent last edited by

                            Maybe this is something to consider: I never had any problems with my setup running NanoBSD for the last year. I switched to a full install recently (CF died, bought an SSD) and now I am seeing Packetloss steadily increasing for my HENet tunnel. 120% packetloss ATM, uptime of the firewall is 4 days. I am pinging the same IPv6 Host  via Smokeping from a Linux host behind the pfSense GW and the graphs look a little different. This is on 2.1.4, not on 2.2.






                            1 Reply Last reply Reply Quote 0
                            • J
                              Jeremy11one last edited by

                              I have this problem too.  Apinger reports that my WAN connection keeps going up and down several times every hour.  It started a few months ago.  I have not switched ISPs or anything.  I installed the latest snapshot (built on Mon Jul 28 12:22:20 CDT 2014) and still have the problem.

                              I do not use multiple WANs.  Just one.

                              1 Reply Last reply Reply Quote 0
                              • R
                                ridnhard19 last edited by

                                I had this same issue as well and ended up coding in the local/private cable modems IP address into the config (192.168.100.1) and that was the workaround I used. Doesnt do anything for monitoring the connection but it's not always bouncing the connection up and down.

                                1 Reply Last reply Reply Quote 0
                                • J
                                  Jeremy11one last edited by

                                  When you say "the config," do you mean the "Monitor IP"?  My config was monitoring the default gateway IP, which is on the cable modem and I still had the problem.

                                  1 Reply Last reply Reply Quote 0
                                  • G
                                    georgeman last edited by

                                    I don't think this is related to the main issue described here, but I have observed a similar behavior under high network load and while using the traffic shaper, because the ping probes are put on the default queue instead of the one specified by the floating rule on WAN that is supposed to handle the situation. Probably this happens because apinger starts before the firewall itself, since killing the related states makes them go into the correct queue immediately

                                    If it ain't broke, you haven't tampered enough with it

                                    1 Reply Last reply Reply Quote 0
                                    • N
                                      naras last edited by

                                      Issue still exsits in recent bulids in my testing enviroments.

                                      1 Reply Last reply Reply Quote 0
                                      • ?
                                        Guest last edited by

                                        …and frequently results in tunnels (IPsec or openVPN) going down for no obvious reasons, except for apinger freakin' out.

                                        I increased the times for apinger alarm significantly, that helps at least a little...

                                        1 Reply Last reply Reply Quote 0
                                        • S
                                          Supermule Banned last edited by

                                          I dont have these problems at all running 40+ pfsenses….

                                          I use traceroute to monitor the wanted IP upstream to decide if the GW is down.

                                          All are stable currently running 0% packetloss..... No change from 2.0.X

                                          I dont like the idea of monitoring other external hosts not in your upstream environment. That way you dont get a real picture of your GW status.


                                          1 Reply Last reply Reply Quote 0
                                          • Raul Ramos
                                            Raul Ramos last edited by

                                            @Supermule:

                                            I dont have these problems at all running 40+ pfsenses….

                                            What's your config for WAN interfaces? I see allot of people write that have problems but doesn't put configs to help troubleshoot the problem.

                                            Some times i have the problem in my multi-wan interface (PPPoE only config user and pass and a ppp (LTE) WAN only config default number). Don't see the problem when i disconnect my ppp.

                                            pfSense:
                                            ASRock -> Wolfdale1333-D667 (2GB TeamElite Ram)
                                            Marvell 88SA8040 Sata to CF(Sandisk 4GB) Controller
                                            NIC's: RTL8100E (Internal ) and Intel® PRO/1000 PT Dual (Intel 82571GB)

                                            1 Reply Last reply Reply Quote 0
                                            • S
                                              Supermule Banned last edited by

                                              More or less the same for all 40+….


                                              1 Reply Last reply Reply Quote 0
                                              • N
                                                naras last edited by

                                                @Supermule:

                                                I use traceroute to monitor the wanted IP upstream to decide if the GW is down.

                                                All are stable currently running 0% packetloss….. No change from 2.0.X

                                                I dont like the idea of monitoring other external hosts not in your upstream environment. That way you dont get a real picture of your GW status.

                                                Could you please tell us how to use traceroute to monitor the wanted IP upstream to decide if the GW is down?

                                                Multi-wan with static IPs and different gateways within each wan subnets are stable at least in my tests,  but I use pppoe connections and we get the same gateway IP allmost all the times, so we have to set at least one monitor IP outside the wan subnet,  and this line with outside  monitor ip allways gets offline as the apinger reported, but the connection  functional as normal.

                                                If there is another to monitor the gatwway, that really helps.

                                                1 Reply Last reply Reply Quote 0
                                                • S
                                                  Supermule Banned last edited by

                                                  http://ping.eu/traceroute/

                                                  Use the first one thats not in your WAN subnet.

                                                  1 Reply Last reply Reply Quote 0
                                                  • N
                                                    naras last edited by

                                                    @Supermule:

                                                    http://ping.eu/traceroute/

                                                    Use the first one thats not in your WAN subnet.

                                                    It's not within pfsense, and not done  automaticly either?

                                                    1 Reply Last reply Reply Quote 0
                                                    • S
                                                      Supermule Banned last edited by

                                                      I understand why you are confused…

                                                      I use traceroute to monitor the wanted IP upstream to decide if the GW is down.

                                                      I use traceroute to locate the IP to monitor and then use the built in GW monitor tool in PFSense.

                                                      Works fine here.

                                                      1 Reply Last reply Reply Quote 0
                                                      • N
                                                        naras last edited by

                                                        @Supermule:

                                                        I understand why you are confused…

                                                        I use traceroute to monitor the wanted IP upstream to decide if the GW is down.

                                                        I use traceroute to locate the IP to monitor and then use the built in GW monitor tool in PFSense.

                                                        Works fine here.

                                                        OK, I did that several months ago,  and with no use.
                                                        The next hop routers are always outside my wan subnet:(

                                                        Thanks anyway.

                                                        1 Reply Last reply Reply Quote 0
                                                        • dennypage
                                                          dennypage last edited by

                                                          I've been running into quite a few problems with apinger in the 2.1 series with a simple single wan configuration.

                                                          https://redmine.pfsense.org/issues/3692

                                                          I was thinking that I would try to debug it and went looking for the source code to apinger but wasn't able to find it in the 2.1.5 main or packages. Can someone send me a pointer to it?

                                                          Thanks

                                                          1 Reply Last reply Reply Quote 0
                                                          • bmeeks
                                                            bmeeks last edited by

                                                            @dennypage:

                                                            I've been running into quite a few problems with apinger in the 2.1 series with a simple single wan configuration.

                                                            https://redmine.pfsense.org/issues/3692

                                                            I was thinking that I would try to debug it and went looking for the source code to apinger but wasn't able to find it in the 2.1.5 main or packages. Can someone send me a pointer to it?

                                                            Thanks

                                                            The source code is in the pfsense-tools repo.  You must complete a couple of electronic documents in order to access it.  Access is controlled via SSH public keys.  It is based off the FreeBSD port here: http://www.freshports.org/net/apinger/

                                                            Information on what is required to get access is posted in a Sticky Thread at the top of the Development sub-forum here: https://forum.pfsense.org/index.php?topic=76132.0.

                                                            Bill

                                                            1 Reply Last reply Reply Quote 0
                                                            • dennypage
                                                              dennypage last edited by

                                                              @bmeeks:

                                                              The source code is in the pfsense-tools repo.  You must complete a couple of electronic documents in order to access it.  Access is controlled via SSH public keys.  It is based off the FreeBSD port here: http://www.freshports.org/net/apinger/

                                                              Information on what is required to get access is posted in a Sticky Thread at the top of the Development sub-forum here: https://forum.pfsense.org/index.php?topic=76132.0.

                                                              Thanks. I downloaded what I thought was the packages distribution from

                                                              https://github.com/pfsense/pfsense-packages/releases/tag/RELENG_2_1_5

                                                              but apinger wasn't in there. Is there a different repo?

                                                              1 Reply Last reply Reply Quote 0
                                                              • H
                                                                heper last edited by

                                                                the packages repo is only for optional addons. (stuff that is not included in a base-install from CD)

                                                                i believe bmeeks said in which repo you can find apinger, and how to access it

                                                                1 Reply Last reply Reply Quote 0
                                                                • bmeeks
                                                                  bmeeks last edited by

                                                                  @heper:

                                                                  the packages repo is only for optional addons. (stuff that is not included in a base-install from CD)

                                                                  i believe bmeeks said in which repo you can find apinger, and how to access it

                                                                  Correct. It's in the pfsense-tools repo which is NOT hosted on Github directly.  It's hosted on a server operated by the pfSense team and you must follow the instructions in the link I posted above to gain access to the repo.

                                                                  Bill

                                                                  1 Reply Last reply Reply Quote 0
                                                                  • E
                                                                    eri-- last edited by

                                                                    I pushed a fix.

                                                                    Please try with next snapshots and see if you still get the same issues.

                                                                    1 Reply Last reply Reply Quote 0
                                                                    • A
                                                                      applerule last edited by

                                                                      @ermal:

                                                                      I pushed a fix.

                                                                      Please try with next snapshots and see if you still get the same issues.

                                                                      Still broken for me. :(  See attachment.  Let me know if there is anything else I can provide.


                                                                      1 Reply Last reply Reply Quote 0
                                                                      • E
                                                                        eman_resu last edited by

                                                                        Any update on the matter ?

                                                                        My box just went dark after apinger decided that the wan is not up and then remained stuck with a cpu load of 99% denying to be restarted.

                                                                        1 Reply Last reply Reply Quote 0
                                                                        • R
                                                                          Roots0 last edited by

                                                                          APINGER does't cope well with my WAN flapping whenever this happens my HENET ipv6 gateway starts reporting loss forever even though the ipv6 tunnel is backup and stable!

                                                                          Mobile Computer & Network Support Stockport, UK
                                                                          www.timotten.co.uk

                                                                          1 Reply Last reply Reply Quote 0
                                                                          • G
                                                                            grandrivers last edited by

                                                                            why wouldn't adding some other methods of detection be a good idea had a xincom (same as syswan) that did 3 different means of detection (imcp ping, http heartbeat, traffic flow, DPD
                                                                            (RFC3706) ) years ago the was like a $250 router which was nice cause my isp use to block imcp

                                                                            pfsense 2.4 super micro A1SRM-2558F
                                                                            C2558 8gig ECC  60gig SSD
                                                                            tripple Wan dual pppoe

                                                                            1 Reply Last reply Reply Quote 0
                                                                            • E
                                                                              eman_resu last edited by

                                                                              @naras:

                                                                              +1
                                                                              This is a quite annoying issue.

                                                                              and the outcome in my case is a nonfunctional apinger with miles of logentries like: apinger: No usable targets found, exiting

                                                                              1 Reply Last reply Reply Quote 0
                                                                              • A
                                                                                applerule last edited by

                                                                                I'm having better luck with the 9/15 build.  I am not sure if something changed between 9/13 and 9/15 that would have affected apinger, but the status seems to be working better.  Still not 100% correct though.  Before, the connection would show insanely high RTT in the console; now, that isn't the case.  However, when I look at the logs I am getting a lot of apinger "down" log messages.




                                                                                1 Reply Last reply Reply Quote 0
                                                                                • A
                                                                                  applerule last edited by

                                                                                  @applerule:

                                                                                  I'm having better luck with the 9/15 build.

                                                                                  Nevermind.  I've been checking it off and on and haven't seen it go "Latency" or "Offline" today.  It finally did.  Typical stuck at really high RTT and jumps between latency and offline.  I got too excited considering it hadn't gotten "stuck" for nearly a day.  :(


                                                                                  1 Reply Last reply Reply Quote 0
                                                                                  • N
                                                                                    naras last edited by

                                                                                    Still no luck with Sep 14 09:09:38 CDT 2014 build, two pppoe links are online before my isp resets my connections, after the reset, default gw gets online normally, but the opt1 gets stuck although the link reconnected successfully.

                                                                                    And the OPT1 gets online again after I restart apinger.


                                                                                    1 Reply Last reply Reply Quote 0
                                                                                    • First post
                                                                                      Last post