Dual WAN failover, PAP2T and asterisk won't register unless reset states

hoba

That is correct, however keep in mind that you will kill other already established connections on other interfaces (for example connections already running from lan to wan2 if wan1 fails) if you just kill ALL the states. However depending on how reliable your wans are usually this should not happen too often. Just have a look at status>systemlogs, loadbalancer tab to see how reliable your wans have been in the past.

ilko

OPT1 (ADSL) is playing up once per day, usually at the end of office hours (office next to us turning on their alarm…who knows?). It happens for a minute or two, but that's enough all our VOIP phones to go crazy for that period. For now I just moved them to WAN1FailsToWAN2, but reseting all states would be a problem.

How do I get in script which interface has went DOWN to reset it's states only?

Another thing- could this indicate a problem?

check_reload_status.log

02-18-2008_at_091500 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-18-2008_at_130000 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-18-2008_at_153000 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-18-2008_at_204000 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-19-2008_at_002500 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-19-2008_at_090001 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-19-2008_at_161500 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-19-2008_at_165500 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-20-2008_at_150000 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-21-2008_at_043000 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-21-2008_at_152500 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-21-2008_at_193500 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-21-2008_at_211001 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-21-2008_at_215501 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-22-2008_at_053000 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...

parrotscience

I'd like a solution for this too… My DSL connection - my main one - goes down and my cable connection - as a backup gets stuck for my SIP connections to the DSL line if they fail... I was using the reset states as well, but would like something a little more automatic.

@ilko:

OPT1 (ADSL) is playing up once per day, usually at the end of office hours (office next to us turning on their alarm…who knows?). It happens for a minute or two, but that's enough all our VOIP phones to go crazy for that period. For now I just moved them to WAN1FailsToWAN2, but reseting all states would be a problem.

How do I get in script which interface has went DOWN to reset it's states only?

Another thing- could this indicate a problem?

check_reload_status.log

02-18-2008_at_091500 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-18-2008_at_130000 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-18-2008_at_153000 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-18-2008_at_204000 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-19-2008_at_002500 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-19-2008_at_090001 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-19-2008_at_161500 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-19-2008_at_165500 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-20-2008_at_150000 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-21-2008_at_043000 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-21-2008_at_152500 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-21-2008_at_193500 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-21-2008_at_211001 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-21-2008_at_215501 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...
02-22-2008_at_053000 There appears to be 2 or more check_reload_status processes. Forcing kill and restart of all now...

ilko

This is what I tried yesterday. Mind you, linux/bsd/pfSense is pretty new to me, and I am not programmer at all ::)

/conf/config.xml

 <system>...
....
<afterfilterchangeshellcmd>/usr/local/bin/reset_states.sh</afterfilterchangeshellcmd></system>

/usr/local/bin/reset_states.sh
chmod 755

#!/bin/sh
sleep 70
/sbin/pfctl -F state
sleep 40
/sbin/pfctl -F state

Had to reset twice, because those PAP2 devices register every 30 seconds, if next registration is not right after reseting states, it uses again the old gateway, or for some reason some of them won't re-register. Reseting twice seems to work for now. Hmm, is there anything else to be flushed? NAT rules? Why connections are established on the failed gateway even after reseting states?

What I saw- all are register via WAN, which fails and comes up again in 2 seconds. Loss of audio is only for these 2 seconds. Reseting states later seems NOT to break audio, to be reconfirmed.

If WAN fails for longer, then 1:50 min wait until all devices are re-registered. Prefer that way, instead of reseting states and moving on another gateway for every 1-2 seconds line break up. Also- reseting states makes firewall rules, about preferred gateway order to be reapplied.
Didn't have much time yesterday to test, during the week will play more and report. At least I can start with something, thanks for clues :)

eri--

Try pfctl -F all -i {$interface_that_goes_down}
Is better and should avoid running it twice.

ilko

Umm, $interface_that_goes_down is system variable, or I need to replace it with something? If latter, how do I get which interface went down?

sullrich

Look in /var/db/pingstatus

Each monitored item will appear there. Simply look for DOWN in the files. You could easily parse each file looking for DOWN and then resolve the IP back to the interface.

ilko

That directory is empty, same as pingmsstatus. No such files in /var/db.

ilko

@ermal:

Try pfctl -F all -i {$interface_that_goes_down}
Is better and should avoid running it twice.

Using
#!/bin/sh
sleep 5
/sbin/pfctl -F all

causes no new states created- Diagnostics: Show States- "No states were found."

back to

#!/bin/sh
sleep 60
/sbin/pfctl -F state
sleep 40
/sbin/pfctl -F state

This also makes when WAN or OPT1 are back online, all connections to use their preferred gateway again, which is good.
If we reset states on the failed gateway only, the above will not happen.
Need more time to study the negative effect of reseting states.

ilko

Since I've added this for 3 weeks it's working fine, however OPT1 has failed just a few times during office hours. No complains about loss of internet or failed PAP2T device for now. I will stick with this workaround until better solution comes up.

In short:

Change /conf/config.xml

 <system>...
....
<afterfilterchangeshellcmd>/usr/local/bin/reset_states.sh</afterfilterchangeshellcmd></system>

Create /usr/local/bin/reset_states.sh

#!/bin/sh
sleep 60
/sbin/pfctl -F state
sleep 40
/sbin/pfctl -F state

chmod 755 /usr/local/bin/reset_states.sh