• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Bug in apinger halts load balancing and failover

Scheduled Pinned Locked Moved Routing and Multi WAN
10 Posts 3 Posters 3.7k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • L
    lsoltero
    last edited by Dec 18, 2010, 9:19 PM

    on pfSense 1.2.3-RELEASE

    Running in Failover mode between WAN and OPT1 I noticed that once in a while monitoring of the pool stopped after the following error is displayed in the slbd.log

    Dec 18 01:15:53 webxaccelerator apinger: 208.67.222.222: Lost packet count mismatch (-20!=0)!
    Dec 18 01:15:53 webxaccelerator apinger: 208.67.222.222: Received packets buffer: ################################################## ####################

    ps aux | grep apinger shows that apinger is no longer running. This causes failover and loadbalacing to stop working since there is no process monitoring the interfaces.

    Looking at the source code to apinger.c we see on line 854 we note that apinger exits on error.

    if (t->recently_lost!=really_lost){
                            fprintf(f,"  lost packet count mismatch (%i!=%i)!\n",t->recently_lost,really_lost);
                            logit("%s: Lost packet count mismatch (%i!=%i)!",t->name,t->recently_lost,really_lost);
                            logit("%s: Received packets buffer: %s %s\n",t->name,buf2,buf1);
                            err=1;
                    }
                    free(buf1);
                    free(buf2);

    fprintf(f,"\n");
            }
            fclose(f);
            if (err) abort();

    Patching apinger.c as follows

    vmmail3# diff apinger.c apinger.c.orig
    858,859c858
    < t->recently_lost = really_lost = 0;
    < // err=1;
    –-

    err=1;

    prevents apinger from exiting on error. Load balancing and failover now work as expected even when a condition occurs to flag this error.

    Dec 18 20:52:37 webxaccelerator apinger: 208.67.222.222: Lost packet count mismatch (-21!=0)!
    Dec 18 20:52:37 webxaccelerator apinger: 208.67.222.222: Received packets buffer: ################################################## ####################
    Dec 18 21:05:55 webxaccelerator apinger: ALARM: 208.67.220.220(208.67.220.220)  *** down ***
    Dec 18 21:06:03 webxaccelerator apinger: alarm canceled: 208.67.220.220(208.67.220.220)  *** down ***

    Hope this helps.

    --luis

    1 Reply Last reply Reply Quote 0
    • L
      lsoltero
      last edited by Dec 21, 2010, 8:55 AM

      So I have looked at this a little more closely.  The version of apinger included in the pfPorts seems to have the same issue.  Basically if an inconsistency is found in the number of packets lost then apinger exits.  In my mind apinger should ** NEVER ** exit.

      It seems that the apinger in pfPorts is used when building pfSense 2.0.  1.2.3-RELEASE uses the FreeBSD ports version.

      Following is a patch against the FreeBSD ports version of apinger that resolves my issues with failover pools halting when inconsistent packet loss is detected.  I don't currently do any work with 2.0 but it would be good if one of the maintainers applied the following patch to apinger included in pfPorts.

      –- apinger.c  2010-12-21 08:47:22.000000000 +0000
      +++ apinger.c.new      2010-12-21 08:47:15.000000000 +0000
      @@ -787,7 +787,6 @@
      time_t tm;
      int i,qp,really_lost;
      char *buf1,*buf2;
      -int err=0;

      if (config->status_file==NULL) return;

      @@ -855,7 +854,7 @@
                              fprintf(f,"  lost packet count mismatch (%i!=%i)!\n",t->recently_lost,really_lost);
                              logit("%s: Lost packet count mismatch (%i!=%i)!",t->name,t->recently_lost,really_lost);
                              logit("%s: Received packets buffer: %s %s\n",t->name,buf2,buf1);
      -                      err=1;
      +                      t->recently_lost = really_lost = 0;
                      }
                      free(buf1);
                      free(buf2);
      @@ -863,7 +862,6 @@
                      fprintf(f,"\n");
              }
              fclose(f);
      -      if (err) abort();
      }

      #ifdef FORKED_RECEIVER

      1 Reply Last reply Reply Quote 0
      • J
        jeebsion
        last edited by Dec 23, 2010, 8:54 AM

        Isoltero,

        and no wonder and BETA4 that I've been testing resorted to only one link after a while. I hope the ports being patched soonest possible. Thank you for identifying this.

        regards ..

        1 Reply Last reply Reply Quote 0
        • L
          lsoltero
          last edited by Dec 23, 2010, 9:11 AM

          I posted a bug report here.
          http://redmine.pfsense.org/issues/1127

          hopefully someone will look at it and take appropriate action.  Otherwise your only option is to build a DevISO and then patch apinger yourself.

          Good luck.

          –luis

          1 Reply Last reply Reply Quote 0
          • G
            gergero
            last edited by Dec 23, 2010, 10:55 AM

            I use pfSense 1.2.3 embedded - can you recommend a workaround to automate restart of apinger once it has exited?

            Regards,
            gergero

            1 Reply Last reply Reply Quote 0
            • L
              lsoltero
              last edited by Dec 23, 2010, 3:38 PM

              the best way to address this is to apply the patch to apinger.c, recompile, and then swap out the apinger executable that comes with 1.2.3-RELEASE with the new version.    This is the simplest way to fix this issue.

              I have uploaded a version of the patch apinger for pfSense 1.2.3 to here…

              http://www.globalmarinenet.com/downloads/wxa/apinger

              you are welcome to use that if you like.  Download the new apinger and copy it to /usr/local/sbin on your box.

              My understanding is that maintenance on 1.2.3 has stopped so you will need to apply the patch manually.

              take care.

              --luis

              1 Reply Last reply Reply Quote 0
              • J
                jeebsion
                last edited by Dec 23, 2010, 3:43 PM

                Isoltero,

                by any chance, can I apply that file you uploaded to v2BETA4?

                regards,

                Najib

                1 Reply Last reply Reply Quote 0
                • L
                  lsoltero
                  last edited by Dec 23, 2010, 3:59 PM

                  I don't think so but you can try…  i have no experience with 2.0 but i did look at the apinger.c code in pfTools and it is quite different than that found in the FreeBSD ports.  I did patch and compile the 2.0 version and tried to run it on 1.2.3 and watched it core dump on startup.

                  here is the patch for the 2.0 version in pfTools


                  --- apinger.c 2010-12-21 08:41:44.000000000 +0000
                  +++ apinger.c.new 2010-12-23 15:54:35.000000000 +0000
                  @@ -805,7 +805,6 @@
                  time_t tm;
                  int i,qp,really_lost;
                  char *buf1,*buf2;
                  -int err=0;

                  if (config->status_file==NULL) return;

                  @@ -867,7 +866,7 @@
                  if (t->recently_lost!=really_lost){
                  logit("Target "%s": Lost packet count mismatch (%i(recently_lost) != %i(really_lost))!",t->name,t->recently_lost,really_lost);
                  logit("Target "%s": Received packets buffer: %s %s\n",t->name,buf2,buf1);

                  • err=1;
                  • t->recently_lost = really_lost = 0;
                    }
                    free(buf1);
                    free(buf2);
                    @@ -875,7 +874,6 @@
                    fprintf(f,"\n");
                    }
                    fclose(f);
                  • if (err) abort();
                    }

                  void main_loop(void){


                  You can download a binary version of this from
                  http://www.globalmarinenet.com/downloads/wxa/apinger2.0

                  you will need to download this and copy it to /usr/local/sbin/apinger on your box.  Note that i can't test this since I am not running under 2.0.  This version of apinger was compiled under FreeBSD 7.3 so not sure if it will run under 8.0.  Try it out and let us know if it works...

                  --luis

                  1 Reply Last reply Reply Quote 0
                  • J
                    jeebsion
                    last edited by Dec 23, 2010, 4:22 PM

                    luis,

                    I saw your bug post on redmine. Perhaps the next upgrade image (tomorrow) this issue would be rectified. Waiting for that, I shall :-)

                    1 Reply Last reply Reply Quote 0
                    • G
                      gergero
                      last edited by Dec 23, 2010, 4:58 PM

                      @lsoltero:

                      I have uploaded a version of the patch apinger for pfSense 1.2.3 to here…

                      THANK YOU!!!

                      1 Reply Last reply Reply Quote 0
                      1 out of 10
                      • First post
                        1/10
                        Last post
                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                        This community forum collects and processes your personal information.
                        consent.not_received