SIP registration timeout due to stale entry in pfsense state table

PeeZee

I just noticed something today which I thought I'd share to see if anyone else encountered this problem already:

I have an ADSL internet connection with a dynamic IP, pfsense as router/firewall and a debian based asterisk PBX sitting in the LAN.
Port forwarding and firewall rules are set up to allow UDP/TCP SIP traffic from my VoIP provider to reach my asterisk box.
This has all been working perfectly for a few months now, with one exception:

Every now and then, my SIP registration to my VoIP provider times out (I see this in the asterisk log) and it takes a while to re-register.
This can take from a few minutes to a few hours, and this weekend it took 2 days to get registered again.

Using tcpdump and wireshark I tracked the problem down to the change of IP address on my PPPoE ADSL line and a stale entry in the pfsense state table.

Since it is a dynamic IP subscription, every time the ADSL reconnects the IP changes.
However, somehow, the OLD IP address remains in the pfsense state table.
I believe this has to do with the asterisk box retrying the connection every 20 seconds, and keeping the old state alive in the state table.

Here is an example to make it more clear:

Last week I received IP address 77.109.121.166 from my ADSL provider through PPPoE.
My local asterisk PBX has IP 10.99.0.8, my VoIP provider (3starsnet) has IP 85.119.188.3
The pfsense state table (filtered on port 5060) contained:

udp  	10.99.0.8:5060 -> 77.109.121.166:5060 -> 85.119.188.3:5060  	MULTIPLE:MULTIPLE

All working great. Now, this weekend my ADSL reconnected, changing my IP address to 77.109.121.219.
But the state table still said:

udp  	10.99.0.8:5060 -> 77.109.121.166:5060 -> 85.119.188.3:5060  	MULTIPLE:MULTIPLE

so the returned SIP packets from my VoIP provider never reached my firewall or PBX.

After manually deleting that rule from the firewall states today using the X button on the right, a new state line appeared:

udp  	10.99.0.8:5060 -> 77.109.121.219:5060 -> 85.119.188.3:5060  	MULTIPLE:MULTIPLE

And the SIP registration succeeded right away.

Now the question is, why is pfsense keeping states alive with the wrong WAN IP address ?

fnbisson

I have absolutely the same issue. Did you fix it ?

I just flush my states and now it's working great..

PeeZee

Nope, never got this fixed. The workaround is indeed flushing the state table, but having to do that often is quite annoying.
Luckily my ADSL connection doesn't disconnect that often so my WAN IP doesn't change often, but still…

I never got any replies here either, I'm not sure what I should try to get some dev attention here :-)

kongar

Have the same issue.
Tried to restart of SIP PBX (asterisk) to check, is it possible to solve it from PBX side. It is not helped.
Reliable automated solution is needed.

danswartz

Do you have asterisk set up to do 'qualify=yes' on the trunk? I believe that refreshes the SIP registration…

Perry

Might help

#!/bin/sh
# 

# Clear voip phone states entries when wan ip changes.

# 

# HowTo:
#       - From pfSense shell
# 	- ee
#	- paste this code
#       - Change the value of ext_if, local_voip_ip and provider_voip_ip
#       - press esc a a
#       - save as /usr/local/etc/rc.d/voipstate.sh 
#       - chmod 744 /usr/local/etc/rc.d/voipstate.sh
#
# Cronjob:
#       - In pfSense webgui Diagnostics -> Edit File
#       - load /cf/conf/config.xml
#       - under cron add 
#		 #			<minute>*/1</minute>
#			<hour>*</hour>
#			<mday>*</mday>
#			<month>*</month>
#			<wday>*</wday>
#			<who>root</who>
#			<command></command>/usr/local/etc/rc.d/voipstate.sh
# 
#       - save the config.xml
#       - reboot pfSense

#
ext_if="vlan5" # Enter Your Wan Nic Name em0, vlan1
voip_file="/var/run/voip_file.ip"
local_voip_ip="192.168.1.199" # Enter your phone ip
provider_voip_ip="66.197.246.248" # Enter your voip providers ip
EXIT_SUCCESS=0
EXIT_FAILURE=1
if [ `id -u` -ne 0 ]
then
echo "Only root may run this program."
exit $EXIT_FAILURE
fi
usage(){
echo "Usage: $0"
}
get_ip(){
if [ -f $voip_file ]
then
registered_ip=`cat ${voip_file}`
else
registered_ip=""
fi
current_ip=`ifconfig ${ext_if} | awk '/inet / { print $2 }'`
}
update_hosts(){
if [ "$registered_ip" != "$current_ip" ]
then
echo "WAN ip address changed, clearing states entries.. " | logger
echo
/sbin/pfctl -k $local_voip_ip -k $provider_voip_ip
echo $current_ip > $voip_file 
echo "done." | logger
fi
}
#
# Main
#
get_ip
update_hosts
exit $EXIT_SUCCESS

voipstate.sh.txt

danswartz

Interesting idea. I don't think you need to edit the config.xml to set cron jobs though, since a package is available, no?

maxthetor

i have the same issue, but, with not translate.

my voip server = 10.0.0.9
my voip-trunk-server = 201.86.87.5 (vono)

everything works fine for a while, then , then it's going sip registry timeout, forever.

voip01*CLI> sip show registry
Host Username Refresh State Reg.Time
VONO:5060 XXXXX 105 Request Sent Sat, 09 Jan 2010 11:55:07

in my pfsense box, tcpdump in wan interface

11:55:05.105388 IP 10.0.0.9.5060 > 201.86.87.5.5060: SIP, length: 584
11:55:07.106127 IP 10.0.0.9.5060 > 201.86.87.5.5060: SIP, length: 584

my nat rule.

Outbound NAT rules

nat on $wan from 10.0.0.9/32 to any -> 189.XX.XX.XX/32 static-port

ps: my ip 189.XX.XX.XX is Virtual IP (routed)

my state table.

all udp 201.86.87.5:5060 <- 10.0.0.9:5060 NO_TRAFFIC:SINGLE
all udp 10.0.0.9:5060 -> 201.86.87.5:5060 SINGLE:NO_TRAFFIC

anyone have any ideas?

gnail

@Perry:

Might help

Thank you Perry, that worked perfectly.

PS: This has been in development for a while now hasn't it. :)

PeeZee

@Perry:

Might help
…

Only noticed your post now, still had the problem this morning, tried the script now, works perfectly

Thanks!

Would you happen to know if this problem is addressed in 2.0 ? If not, your script may be a good starting point for a built-in solution.

Perry

http://redmine.pfsense.org/issues/show/8 should cover all dead states problems.

I've add the script to the fit123 package (cass).

g4m3c4ck

I had a similar problem to this but a little different. Wasted a half a day figuring out what the problem was but hey thats life…...

My setup is as follows: Pfsense 1.2.3-Release, Single WAN Static IP, I have pfsense running everything through a single interface using VLANs.

I have my asterisk box and the VOIP phones in their own VLAN/subnet with all the proper inbound and outbound NAT setup and with everything working properly.

Well on my LAN I foolishly opened my old asterisk test VM to see how I had some extensions configured and it attempts to register with my VOIP Provider. I realize what is happening and shutdown the VM. At this point I receive calls but no sound. I run a packet capture on both interfaces and see incoming RTP packets from the WAN and outgoing RTP packets from the Asterisk box on the VOIP VLAN.
All outgoing calls work fine. I check the asterisk box and it seems to still be registered with my VOIP Provider. Reboot the Asterisk box. Same problem. Clear all states in Pfsense. Same problem. Grr...I had been doing some work on the Asterisk box so I thought I found a bug or made a mistake. I simplify everything and make sure everything is working properly with Asterisk. Still didn't work.

Reboot Pfsense. Everything works again......

So I guess the dead state problem is with any pfsense setup with more that two interfaces?

Will Perry's little hack help me too?

danswartz

Hmmm, I just had this happen yesterday. My WAN IP changed, but for some reason the SIP registration entry didn't get punted. I deleted it manually and all was good. I looked at Perry's script, and while it looks fine, I sure think it would be nice if we could put scripts somewhere that they would be executed automatically when the filter is reloaded. I was looking at /etc/inc/filter.inc and saw that packages can put custom scripts in /usr/local/pkg/pf to do stuff like this - is there any reason we can't have a generic version of this? e.g. something like /usr/local/pf or some-such?

danswartz

Ironically, the dyndns component does exactly what I need. e.g. it detects the WAN IP has changed, and sends an update request to my dyndns account to update the name of my gateway. I was looking where this gets called, and it is very specific to the dyndns code. When I was using clarkconnect as my gateway, there was a script (/etc/rc.local, if memory serves), that they would call when they detected that the WAN IP had changed. and you could hook whatever you wanted there. Maybe I am misreading the code now, but it looks like it doesn't make any attempt to detect this event, but just calls the interface configuration code. I would dearly love the same functionality in pfsense (and would be happy to take a shot at coding it up, if needed.)

cmb

There is a custom option to pfctl that we have, -b IIRC, that kills all states on a specified interface. This is new in 2.0, and supposed to be run whenever an IP changes, as well as after failover for a multi-WAN setup. There's a todo item open to test it. http://redmine.pfsense.org/issues/show/8 I suspect it has some outstanding issues.

States don't get deleted in 1.2.x when an IP changes or in any other scenario, so anything that stays active will retain the former NAT association.

danswartz

Hmmm, that is interesting, since I am running 2.0 (a snapshot from 4/24, IIRC.) I will take a look at this and see if it is not working right all the time. Thanks!

danswartz

So I guess the dead state problem is with any pfsense setup with more that two interfaces?

(From a couple of posts back by someone else). I ignored this initially, because I don't have multiwan, but I suddenly realized I do have more than two interfaces! I have LAN and WAN as always, but my workaround for the havp issues was to install havp on my freebsd server and use re2 (wan is re0 and lan re1) to talk to that server on a dedicated subnet. So, in fact I have 3 interfaces live. I'm wondering if that is a big clue. I am at work now, but I will take a look at this later and post my findings…

danswartz

Hmmm, I've tried unplugging the wan cable and plugging it back in 10 seconds or so later - I see warnings about the gateway being down, but it apparently didn't get a new IP - not sure how to force that to happen (I have a PPPoE WAN). I guess I can just wait a few days for it to change…

cmb

@danswartz:

Hmmm, I've tried unplugging the wan cable and plugging it back in 10 seconds or so later - I see warnings about the gateway being down, but it apparently didn't get a new IP - not sure how to force that to happen (I have a PPPoE WAN). I guess I can just wait a few days for it to change…

Depends on your ISP, but usually if you disconnect and reconnect PPPoE (reboot, or do so under Status>Interfaces) you'll usually get a new IP.

danswartz

Okay, maybe I'll give the reboot a try. My concern with rebooting the gateway was that I thought it might do too much stuff, so it might not prove anything if it did get a new IP (and also, there would not be any states to kill). I think when I get home today, I will try to kill PPPoE and reconnect it as you suggested. Thx!