Multi-Wan Failover States for VOIP $300

kapara

Problem:

When WAN1 fails and WAN2 takes over for voip calls for Cloud Mitel system everything is fine. When WAN1 comes back up sessions are established to both WAN1 and WAN2 and phone ring but no one can hear the other party on the phone.

Solution:

When Phones connected to WAN1 fail to WAN2 and then failback I want all the states for the phones connected to WAN2 to be reset/killed for the phones. Phones will be on their own specific subnet attached to either a physical interface or a virtual interface.

Being able to either specify an interface or subnet in relation to a specific gateway is fine. Another option is to specify an ip range to kill any connections when primary gateway comes back up. I ussually setup 2 gateway groups so both can fail to the other if needed. But I only need IP States reset for the phones when failing back to WAN2 as in the example below..

Wan1Wan2 Computers (Wan1 Primary WAN2 Secondary)

Wan2Wan1 Phones (WAN2 Primary WAN1 Secondary)

As others have had this issue I am hoping that it will either be integrated as a feature with future releases or will be easy to implement so that deploying to multiple units is not difficult. And of course must be rock solid and not prone to failure. Can easily be edited if a change in either IP or interface is required.

Doing this via the Diagnostics: Execute command Menu is fineas long as their are directions for inserting and removing easily.

kapara

Is this even possible?

cmb

There's a feature request open for pretty close to what you're looking for.
https://redmine.pfsense.org/issues/855

No plans to implement at this time. Though that should be more practical going forward with the replacement of apinger with dpinger in 2.3, apinger's bad math issues would have probably just caused serious problems with excessive state killing if such a feature existed before. If someone wanted to submit a pull request we'd get it merged post-2.3 release.

The most practical way of doing that would be adding an option to kill all states once a gateway comes back up - same as what happens when one goes down if that's enabled. Doing it for specific IPs is significantly more complicated.

kapara

Hoping to find someone who can create some kind of script for this. All my phones will be on a specific interface or virtual interface.

All I am looking for is if the following occurs:

1. Primary interface (Gateway Group) for voip vlan goes down the phones failover to secondary gateway. (This works perfectly!)

2. Primary gateway comes back up. Phones failover. (This works with issues.) States do not move.

I need a script for this.

System checks if both gateways are up/up and kills any states associated with a spcific interface or virtual interface or subnet that has states connected on secondary gateway.

I really need to find a solution to this as soon as possible.

kapara

Since no takers on this I have posted a job on upwork to try and get this done. Hopefully if the script works well it would be great to see if it could be incorporated into pfsense!

https://www.upwork.com/job/freebsd-script-kill-states-pfsense-firewall_~013d5c1aed488b4691/

luckman212

kapara did you hire someone for this yet? I am making some progress on hacking gwlb.inc and rc.gateway_alarm but hitting some roadblocks. Same situation as you, trying to be sure that all phones are on the primary gateway after a fail-back.

I would be willing to donate $200-250 towards this project as well

Brutal

I also work with Mitel Teleworker phones. Killing the states to switch back to the primary once it comes back up is not ideal. This would add an unnecessary interruption of possible phone calls.

A cron job to reset states in the middle of the night may be a better solution to get the phones back to the primary to minimize disruption.

What I think is needed and sorely lacking in the failover capability is to force all communication from a source address (internal) going out the same interface. What happens is the phones have a TCP control connection state that will stay up until the phone is rebooted. At failover, that state is now going out the secondary. Once the primary comes back up, the TCP control session is still going out the secondary, but any phone call then sets up a new UDP session for voice RTP. Because the Primary has now come back up, the router sends the new voice session out the Primary. The security proxy (Mitel Border Gateway) audio doesn't work because it's now getting RTP packets from a wrong IP address in relation to the TCP control.

Same principal as a SSL connection to a bank. The communication cannot be split between two different Natted IP addresses.

I haven't found "Sticky Connections" to fix this in the past.

luckman212

It would be great if the "state killing" function could be defined per-interface or per-VLAN. Typically you should have VOIP on it's own VLAN anyway and that would make the state killing easier and less disruptive to other services that might be running on other subnets.

kapara

Yes hired someone and he created a script. Worked like a charm! Only deployed to one location so far. Then SSD failure! Have to re-engage engineer but I can forward his info to you.

Would be great if this could be officially supported!