Carp notifications: more than 300 emails for a single CARP switch

  • Hello,
        I try to explain you a problem I have in my production environment… I have about 80 VIPs managed via CARP... when something happens and the two routers switch from CARP master/backup, I receive:

    1. When the switch happens, about 80 emails from the new master and 80 from the old master
    2. When the roles are restored, other 80 emails per machine

    In the end it's about 300 emails for a single role switch and restore...

    What do you think if the notifications would be managed for "switch" and not for "each ip switched"? Maybe in the body of the email, we could have the list of the VIPs that generated the alert...


  • Rebel Alliance Developer Netgate

    Unfortunately it's not quite that simple. No one thing gets triggered to tell the system as a whole that a switch has occurred, each VIP generates its own independent event to the OS.

    We have toyed with the idea of summarizing the notifications though, so they come in batches every minute or two, rather than immediately, to help avoid spamming like that.

  • Well… it could be like that. On the global "send notification" function, the system could:

    1. Create a table with the fields: Date/time, Text of the notification. If the table already exists, nothing will be done
    2. Create a record with date/time and the text of the notification
    3. Create a Cron job that will execute a "send all the notification" after 2 minutes. If the job already exists, it will be just rescheduled to "after 2 minutes"
    4. Just consider the notification as sent

    The Cron job, once lauched, will send all the notifications in the table in one email, then delete the records in the "Alerts table".

    The only difficult thing is to create a lock/unlock semaphore for the table creation and deletion.

    What do you think about that? :)


  • Rebel Alliance Developer Netgate

    Overcomplicated. :-)

    Simply dump the alerts with a timestamp into a text file. After X minutes, send the contents of the text file and delete it.

    No need for a table or database or anything similar. Sure, it could be used, but sounds like overkill to me.

  • mmmhhh… yes, probably today was a hard day to me, so I come out with complicated stuff...

    yes, the idea of "just use a text file" is nice, the only problem is that it does not exclude the possibility that pfSense sends multiple emails in stead of just one (if you don't manage to create a Cron job)... or on the other side, you could have a Cron job Always running (every 1-2 minutes) even if there are no notifications to send.

    Summarizing both solutions, it could be like that. When the "send notification" function is called it will:

    1. Create/append the notification in a text file with Date/time, Text of the notification
    2. Create a Cron job that will execute a "send all the notification" after 2 minutes. If the job already exists, nothing will be done
    3. Just consider the notification as sent

    Then the Cron job will just:

    1. Send the notification using the text file as body of the email
    2. Delete the text file
    3. Delete the Cron job

    Is it simply enough? ;)

  • Rebel Alliance Developer Netgate

    No need to add/remove the cron job so much, it can be there all the time if needed, ~~or the "at" command could be used to set a timer, e.g.

    echo '/usr/local/bin/send_all_notification_mail.php' | at now +2 minutes
    An always-present cron job would be fine though, to constantly check for notifications and send them if needed.
    EDIT: we apparently don't include all of the binary bits for 'at' to work correctly, so a simple cron job would suffice.

  • Ok… I didn't know that adding/removing a Cron job with a max. frequency of 2 minutes could be a problem or something to avoid...

    Also, what do you think is better, to use a "single text file" to store the notifications or use different small text file, one for each single notification? I am just worry that when 300 or more notifications are generated in few seconds there could be some kind of problem in managing the single text file...