Navigation

    Netgate Discussion Forum
    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search

    Bug in apinger halts load balancing and failover

    Routing and Multi WAN
    3
    10
    3439
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • L
      lsoltero last edited by

      on pfSense 1.2.3-RELEASE

      Running in Failover mode between WAN and OPT1 I noticed that once in a while monitoring of the pool stopped after the following error is displayed in the slbd.log

      Dec 18 01:15:53 webxaccelerator apinger: 208.67.222.222: Lost packet count mismatch (-20!=0)!
      Dec 18 01:15:53 webxaccelerator apinger: 208.67.222.222: Received packets buffer: ################################################## ####################

      ps aux | grep apinger shows that apinger is no longer running. This causes failover and loadbalacing to stop working since there is no process monitoring the interfaces.

      Looking at the source code to apinger.c we see on line 854 we note that apinger exits on error.

      if (t->recently_lost!=really_lost){
                              fprintf(f,"  lost packet count mismatch (%i!=%i)!\n",t->recently_lost,really_lost);
                              logit("%s: Lost packet count mismatch (%i!=%i)!",t->name,t->recently_lost,really_lost);
                              logit("%s: Received packets buffer: %s %s\n",t->name,buf2,buf1);
                              err=1;
                      }
                      free(buf1);
                      free(buf2);

      fprintf(f,"\n");
              }
              fclose(f);
              if (err) abort();

      Patching apinger.c as follows

      vmmail3# diff apinger.c apinger.c.orig
      858,859c858
      < t->recently_lost = really_lost = 0;
      < // err=1;
      –-

      err=1;

      prevents apinger from exiting on error. Load balancing and failover now work as expected even when a condition occurs to flag this error.

      Dec 18 20:52:37 webxaccelerator apinger: 208.67.222.222: Lost packet count mismatch (-21!=0)!
      Dec 18 20:52:37 webxaccelerator apinger: 208.67.222.222: Received packets buffer: ################################################## ####################
      Dec 18 21:05:55 webxaccelerator apinger: ALARM: 208.67.220.220(208.67.220.220)  *** down ***
      Dec 18 21:06:03 webxaccelerator apinger: alarm canceled: 208.67.220.220(208.67.220.220)  *** down ***

      Hope this helps.

      --luis

      1 Reply Last reply Reply Quote 0
      • L
        lsoltero last edited by

        So I have looked at this a little more closely.  The version of apinger included in the pfPorts seems to have the same issue.  Basically if an inconsistency is found in the number of packets lost then apinger exits.  In my mind apinger should ** NEVER ** exit.

        It seems that the apinger in pfPorts is used when building pfSense 2.0.  1.2.3-RELEASE uses the FreeBSD ports version.

        Following is a patch against the FreeBSD ports version of apinger that resolves my issues with failover pools halting when inconsistent packet loss is detected.  I don't currently do any work with 2.0 but it would be good if one of the maintainers applied the following patch to apinger included in pfPorts.

        –- apinger.c  2010-12-21 08:47:22.000000000 +0000
        +++ apinger.c.new      2010-12-21 08:47:15.000000000 +0000
        @@ -787,7 +787,6 @@
        time_t tm;
        int i,qp,really_lost;
        char *buf1,*buf2;
        -int err=0;

        if (config->status_file==NULL) return;

        @@ -855,7 +854,7 @@
                                fprintf(f,"  lost packet count mismatch (%i!=%i)!\n",t->recently_lost,really_lost);
                                logit("%s: Lost packet count mismatch (%i!=%i)!",t->name,t->recently_lost,really_lost);
                                logit("%s: Received packets buffer: %s %s\n",t->name,buf2,buf1);
        -                      err=1;
        +                      t->recently_lost = really_lost = 0;
                        }
                        free(buf1);
                        free(buf2);
        @@ -863,7 +862,6 @@
                        fprintf(f,"\n");
                }
                fclose(f);
        -      if (err) abort();
        }

        #ifdef FORKED_RECEIVER

        1 Reply Last reply Reply Quote 0
        • J
          jeebsion last edited by

          Isoltero,

          and no wonder and BETA4 that I've been testing resorted to only one link after a while. I hope the ports being patched soonest possible. Thank you for identifying this.

          regards ..

          1 Reply Last reply Reply Quote 0
          • L
            lsoltero last edited by

            I posted a bug report here.
            http://redmine.pfsense.org/issues/1127

            hopefully someone will look at it and take appropriate action.  Otherwise your only option is to build a DevISO and then patch apinger yourself.

            Good luck.

            –luis

            1 Reply Last reply Reply Quote 0
            • G
              gergero last edited by

              I use pfSense 1.2.3 embedded - can you recommend a workaround to automate restart of apinger once it has exited?

              Regards,
              gergero

              1 Reply Last reply Reply Quote 0
              • L
                lsoltero last edited by

                the best way to address this is to apply the patch to apinger.c, recompile, and then swap out the apinger executable that comes with 1.2.3-RELEASE with the new version.    This is the simplest way to fix this issue.

                I have uploaded a version of the patch apinger for pfSense 1.2.3 to here…

                http://www.globalmarinenet.com/downloads/wxa/apinger

                you are welcome to use that if you like.  Download the new apinger and copy it to /usr/local/sbin on your box.

                My understanding is that maintenance on 1.2.3 has stopped so you will need to apply the patch manually.

                take care.

                --luis

                1 Reply Last reply Reply Quote 0
                • J
                  jeebsion last edited by

                  Isoltero,

                  by any chance, can I apply that file you uploaded to v2BETA4?

                  regards,

                  Najib

                  1 Reply Last reply Reply Quote 0
                  • L
                    lsoltero last edited by

                    I don't think so but you can try…  i have no experience with 2.0 but i did look at the apinger.c code in pfTools and it is quite different than that found in the FreeBSD ports.  I did patch and compile the 2.0 version and tried to run it on 1.2.3 and watched it core dump on startup.

                    here is the patch for the 2.0 version in pfTools


                    --- apinger.c 2010-12-21 08:41:44.000000000 +0000
                    +++ apinger.c.new 2010-12-23 15:54:35.000000000 +0000
                    @@ -805,7 +805,6 @@
                    time_t tm;
                    int i,qp,really_lost;
                    char *buf1,*buf2;
                    -int err=0;

                    if (config->status_file==NULL) return;

                    @@ -867,7 +866,7 @@
                    if (t->recently_lost!=really_lost){
                    logit("Target "%s": Lost packet count mismatch (%i(recently_lost) != %i(really_lost))!",t->name,t->recently_lost,really_lost);
                    logit("Target "%s": Received packets buffer: %s %s\n",t->name,buf2,buf1);

                    • err=1;
                    • t->recently_lost = really_lost = 0;
                      }
                      free(buf1);
                      free(buf2);
                      @@ -875,7 +874,6 @@
                      fprintf(f,"\n");
                      }
                      fclose(f);
                    • if (err) abort();
                      }

                    void main_loop(void){


                    You can download a binary version of this from
                    http://www.globalmarinenet.com/downloads/wxa/apinger2.0

                    you will need to download this and copy it to /usr/local/sbin/apinger on your box.  Note that i can't test this since I am not running under 2.0.  This version of apinger was compiled under FreeBSD 7.3 so not sure if it will run under 8.0.  Try it out and let us know if it works...

                    --luis

                    1 Reply Last reply Reply Quote 0
                    • J
                      jeebsion last edited by

                      luis,

                      I saw your bug post on redmine. Perhaps the next upgrade image (tomorrow) this issue would be rectified. Waiting for that, I shall :-)

                      1 Reply Last reply Reply Quote 0
                      • G
                        gergero last edited by

                        @lsoltero:

                        I have uploaded a version of the patch apinger for pfSense 1.2.3 to here…

                        THANK YOU!!!

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post