SIP registration timeout due to stale entry in pfsense state table



  • I just noticed something today which I thought I'd share to see if anyone else encountered this problem already:

    I have an ADSL internet connection with a dynamic IP, pfsense as router/firewall and a debian based asterisk PBX sitting in the LAN.
    Port forwarding and firewall rules are set up to allow UDP/TCP SIP traffic from my VoIP provider to reach my asterisk box.
    This has all been working perfectly for a few months now, with one exception:

    Every now and then, my SIP registration to my VoIP provider times out (I see this in the asterisk log) and it takes a while to re-register.
    This can take from a few minutes to a few hours, and this weekend it took 2 days to get registered again.

    Using tcpdump and wireshark I tracked the problem down to the change of IP address on my PPPoE ADSL line and a stale entry in the pfsense state table.

    Since it is a dynamic IP subscription, every time the ADSL reconnects the IP changes.
    However, somehow, the OLD IP address remains in the pfsense state table.
    I believe this has to do with the asterisk box retrying the connection every 20 seconds, and keeping the old state alive in the state table.

    Here is an example to make it more clear:

    Last week I received IP address  77.109.121.166 from my ADSL provider through PPPoE.
    My local asterisk PBX has IP 10.99.0.8, my VoIP provider (3starsnet) has IP 85.119.188.3
    The pfsense state table (filtered on port 5060) contained:

    udp  	10.99.0.8:5060 -> 77.109.121.166:5060 -> 85.119.188.3:5060  	MULTIPLE:MULTIPLE
    

    All working great. Now, this weekend my ADSL reconnected, changing my IP address to 77.109.121.219.
    But the state table still said:

    udp  	10.99.0.8:5060 -> 77.109.121.166:5060 -> 85.119.188.3:5060  	MULTIPLE:MULTIPLE
    

    so the returned SIP packets from my VoIP provider never reached my firewall or PBX.

    After manually deleting that rule from the firewall states today using the X button on the right, a new state line appeared:

    udp  	10.99.0.8:5060 -> 77.109.121.219:5060 -> 85.119.188.3:5060  	MULTIPLE:MULTIPLE
    

    And the SIP registration succeeded right away.

    Now the question is, why is pfsense keeping states alive with the wrong WAN IP address ?



  • I have absolutely the same issue. Did you fix it ?

    I just flush my states and now it's working great..



  • Nope, never got this fixed. The workaround is indeed flushing the state table, but having to do that often is quite annoying.
    Luckily my ADSL connection doesn't disconnect that often so my WAN IP doesn't change often, but still…

    I never got any replies here either, I'm not sure what I should try to get some dev attention here :-)



  • Have the same issue.
    Tried to restart of SIP PBX (asterisk) to check, is it possible to solve it from PBX side. It is not helped.
    Reliable automated solution is needed.



  • Do you have asterisk set up to do 'qualify=yes' on the trunk?  I believe that refreshes the SIP registration…



  • Might help

    #!/bin/sh
    # 
    
    # Clear voip phone states entries when wan ip changes.
    
    # 
    
    # HowTo:
    #       - From pfSense shell
    # 	- ee
    #	- paste this code
    #       - Change the value of ext_if, local_voip_ip and provider_voip_ip
    #       - press esc a a
    #       - save as /usr/local/etc/rc.d/voipstate.sh 
    #       - chmod 744 /usr/local/etc/rc.d/voipstate.sh
    #
    # Cronjob:
    #       - In pfSense webgui Diagnostics -> Edit File
    #       - load /cf/conf/config.xml
    #       - under cron add 
    #		 #			<minute>*/1</minute>
    #			<hour>*</hour>
    #			<mday>*</mday>
    #			<month>*</month>
    #			<wday>*</wday>
    #			<who>root</who>
    #			<command></command>/usr/local/etc/rc.d/voipstate.sh
    # 
    #       - save the config.xml
    #       - reboot pfSense
    
    #
    ext_if="vlan5" # Enter Your Wan Nic Name em0, vlan1
    voip_file="/var/run/voip_file.ip"
    local_voip_ip="192.168.1.199" # Enter your phone ip
    provider_voip_ip="66.197.246.248" # Enter your voip providers ip
    EXIT_SUCCESS=0
    EXIT_FAILURE=1
    if [ `id -u` -ne 0 ]
    then
    echo "Only root may run this program."
    exit $EXIT_FAILURE
    fi
    usage(){
    echo "Usage: $0"
    }
    get_ip(){
    if [ -f $voip_file ]
    then
    registered_ip=`cat ${voip_file}`
    else
    registered_ip=""
    fi
    current_ip=`ifconfig ${ext_if} | awk '/inet / { print $2 }'`
    }
    update_hosts(){
    if [ "$registered_ip" != "$current_ip" ]
    then
    echo "WAN ip address changed, clearing states entries.. " | logger
    echo
    /sbin/pfctl -k $local_voip_ip -k $provider_voip_ip
    echo $current_ip > $voip_file 
    echo "done." | logger
    fi
    }
    #
    # Main
    #
    get_ip
    update_hosts
    exit $EXIT_SUCCESS
    

    voipstate.sh.txt



  • Interesting idea.  I don't think you need to edit the config.xml to set cron jobs though, since a package is available, no?



  • i have the same issue, but, with not translate.

    my voip server = 10.0.0.9
    my voip-trunk-server = 201.86.87.5 (vono)

    everything works fine for a while, then , then it's going sip registry timeout, forever.

    voip01*CLI> sip show registry
    Host                            Username      Refresh State                Reg.Time               
    VONO:5060                    XXXXX        105 Request Sent        Sat, 09 Jan 2010 11:55:07

    in my pfsense box, tcpdump in wan interface

    11:55:05.105388 IP 10.0.0.9.5060 > 201.86.87.5.5060: SIP, length: 584
    11:55:07.106127 IP 10.0.0.9.5060 > 201.86.87.5.5060: SIP, length: 584

    my nat rule.

    Outbound NAT rules

    nat on $wan  from 10.0.0.9/32 to any -> 189.XX.XX.XX/32 static-port

    ps: my ip 189.XX.XX.XX is Virtual IP (routed)

    my state table.

    all udp 201.86.87.5:5060 <- 10.0.0.9:5060      NO_TRAFFIC:SINGLE
    all udp 10.0.0.9:5060 -> 201.86.87.5:5060      SINGLE:NO_TRAFFIC

    anyone have any ideas?



  • @Perry:

    Might help

    Thank you Perry, that worked perfectly.

    PS: This has been in development for a while now hasn't it.  :)



  • @Perry:

    Might help

    Only noticed your post now, still had the problem this morning, tried the script now, works perfectly

    Thanks!

    Would you happen to know if this problem is addressed in 2.0 ? If not, your script may be a good starting point for a built-in solution.



  • http://redmine.pfsense.org/issues/show/8 should cover all dead states problems.

    I've add the script to the fit123 package (cass).



  • I had a similar problem to this but a little different. Wasted a half a day figuring out what the problem was but hey thats life…...

    My setup is as follows: Pfsense 1.2.3-Release, Single WAN Static IP, I have pfsense running everything through a single interface using VLANs.

    I have my asterisk box and the VOIP phones in their own VLAN/subnet with all the proper inbound and outbound NAT setup and with everything working properly.

    Well on my LAN I foolishly opened my old asterisk test VM to see how I had some extensions configured and it attempts to register with my VOIP Provider. I realize what is happening and shutdown the VM. At this point I receive calls but no sound. I run a packet capture on both interfaces and see incoming RTP packets from the WAN and outgoing RTP packets from the Asterisk box on the VOIP VLAN.
    All outgoing calls work fine. I check the asterisk box and it seems to still be registered with my VOIP Provider. Reboot the Asterisk box. Same problem. Clear all states in Pfsense. Same problem. Grr...I had been doing some work on the Asterisk box so I thought I found a bug or made a mistake. I simplify everything and make sure everything is working properly with Asterisk. Still didn't work.

    Reboot Pfsense. Everything works again......

    So I guess the dead state problem is with any pfsense setup with more that two interfaces?

    Will Perry's little hack help me too?



  • Hmmm, I just had this happen yesterday.  My WAN IP changed, but for some reason the SIP registration entry didn't get punted.  I deleted it manually and all was good.  I looked at Perry's script, and while it looks fine, I sure think it would be nice if we could put scripts somewhere that they would be executed automatically when the filter is reloaded.  I was looking at /etc/inc/filter.inc and saw that packages can put custom scripts in /usr/local/pkg/pf to do stuff like this - is there any reason we can't have a generic version of this?  e.g. something like /usr/local/pf or some-such?



  • Ironically, the dyndns component does exactly what I need.  e.g. it detects the WAN IP has changed, and sends an update request to my dyndns account to update the name of my gateway.  I was looking where this gets called, and it is very specific to the dyndns code.  When I was using clarkconnect as my gateway, there was a script (/etc/rc.local, if memory serves), that they would call when they detected that the WAN IP had changed. and you could hook whatever you wanted there.  Maybe I am misreading the code now, but it looks like it doesn't make any attempt to detect this event, but just calls the interface configuration code.  I would dearly love the same functionality in pfsense (and would be happy to take a shot at coding it up, if needed.)



  • There is a custom option to pfctl that we have, -b IIRC, that kills all states on a specified interface. This is new in 2.0, and supposed to be run whenever an IP changes, as well as after failover for a multi-WAN setup. There's a todo item open to test it. http://redmine.pfsense.org/issues/show/8  I suspect it has some outstanding issues.

    States don't get deleted in 1.2.x when an IP changes or in any other scenario, so anything that stays active will retain the former NAT association.



  • Hmmm, that is interesting, since I am running 2.0 (a snapshot from 4/24, IIRC.)  I will take a look at this and see if it is not working right all the time.  Thanks!



  • So I guess the dead state problem is with any pfsense setup with more that two interfaces?

    (From a couple of posts back by someone else).  I ignored this initially, because I don't have multiwan, but I suddenly realized I do have more than two interfaces!  I have LAN and WAN as always, but my workaround for the havp issues was to install havp on my freebsd server and use re2 (wan is re0 and lan re1) to talk to that server on a dedicated subnet.  So, in fact I have 3 interfaces live.  I'm wondering if that is a big clue.  I am at work now, but I will take a look at this later and post my findings…



  • Hmmm, I've tried unplugging the wan cable and plugging it back in 10 seconds or so later - I see warnings about the gateway being down, but it apparently didn't get a new IP - not sure how to force that to happen (I have a PPPoE WAN).  I guess I can just wait a few days for it to change…



  • @danswartz:

    Hmmm, I've tried unplugging the wan cable and plugging it back in 10 seconds or so later - I see warnings about the gateway being down, but it apparently didn't get a new IP - not sure how to force that to happen (I have a PPPoE WAN).  I guess I can just wait a few days for it to change…

    Depends on your ISP, but usually if you disconnect and reconnect PPPoE (reboot, or do so under Status>Interfaces) you'll usually get a new IP.



  • Okay, maybe I'll give the reboot a try. My concern with rebooting the gateway was that I thought it might do too much stuff, so it might not prove anything if it did get a new IP (and also, there would not be any states to kill).  I think when I get home today, I will try to kill PPPoE and reconnect it as you suggested.  Thx!



  • Oh, yeah what you're looking at is states, rebooting will wipe those. Disconnect/reconnect under Status > Interfaces



  • Cool, will do.  Thanks again!



  • I ran out of time this morning, but I did get some useful information.  Basically, the routine that gets called to flush states for downed gateways has a number of issues.  I will try to complete my analysis today and follow up on redmine.



  • I was not able to get totally to the bottom of this (e.g. figure out the fix), due to not knowing the code, but I think I did figure out two reasons why the code is not doing what is intended.  I updated issue 8 on redmine.  Let me know if there is anything more I can do on this front…



  • I've recently come to think that I'm seeing the same problem.  I recently switched from DSL to Cable.  When using DSL, the modem was configured as a bridge and either I got an address or I didn't.  With the cable modem, if I don't get an address from the cable "head end" for any reason, the modem itself will give me a private address (192.168.100.x).  In talking to the ISP, it seems they have been doing maintenance on their systems at night.  During that time, my modem thinks the network is down and ends up giving me a private address.  The lease time is only 30 seconds and usually I get a public address again within 2-10 minutes.

    However, during that time, my trixbox/asterisk system trys to re-registers with my VOIP provider.  Now, pfsense keeps an outbound state from my trixbox to the provider using the PRIVATE address.  I see the packets being sent out the WAN but the source address is the private address even though I've already obtained a new public address.  This results in all my inbound phone calls being rejected.

    I need a way of clearing the state table when the address changes.  In the short term, I would like to prevent pfsense from obtaining that private dhcp address from the cable modem.



  • Can anybody post a working script suitable for 2.0 ?



  • I have a static WAN IP over a PPPoE connection that periodically drops. Upon moving to v2.0RC3 I experienced the problem described in this thread. Solution was to run pfctl -b on the WAN interface IP (or to manually reset all states in the web GUI, or restart the PFSense box which does the same, as already discussed).

    Basically I want the states between the SIP server and the Asterisk box cleared when the PPP interface comes back up. pfctl -b will clear ALL existing states but it is the only method I have found that reliably works.

    cat > /usr/local/sbin/voip-wan-wipe
    #!/bin/sh
    sleep 30 # Give the WAN routes time to take effect
    pfctl -b 202.116.181.110 # Clear all existing connection states for my WAN IP

    Chmod that to 755. Add the following line to the /usr/local/sbin/ppp-linkup file just before the exit line:

    /usr/local/sbin/voip-wan-wipe & # Run as a separate script to execute in a separate process

    I can verify this works for my setup. I don't understand why the problem did not present in v1.2.3 for me though.

    I did also try pfctl -k <asterisk box="">-k <sip peer="">but it didn't work: it said that it cleared some states but it did not result in the SIP registration coming back.</sip></asterisk>


Locked