Some filtered states can only be killed individually, not in bulk

micro8765

I have a VOIP phone connected via my backup 4G WAN that I'm trying to force back onto the recovered main WAN connection. VOIP phones are stubborn about holding their connections it seems.

I went to DIAG > STATES and set a filter for the VOIP phone's IP address which showed 8 states, one in and one out for each of its 4 lines.

I then hit the 'Kill States' button which should have killed all filtered states, but it left 4 of them there and the phone held its connection to the 4G WAN. Only when I went and clicked each of the remaining states individually by using their trashcan icon was I able to kill the states. After this the phone reconnected to the main WAN as per my firewall rules.

I'm not sure if this is a bug or not - I would guess so since in my mind 'Kill States' should effectively be the same as clicking all the trash icons. But I'm no dev and have no expectation of this behaviour changing. I haven't tested recently but I believe I've seen similar behaviour when trying 'Reset States' for the entire firewall - i.e. that the phones hold their connections regardless.

I'm just glad I now have a method to bump phones that is less drastic than rebooting the router, which was my former sledgehammer approach to this issue! Just reporting it in case it is useful to developers or other users.

Derelict

This perhaps?

https://redmine.pfsense.org/issues/8554

micro8765

Yes looks like a solid candidate. I'll test this after the next release.

micro8765

I've tested this on 244_2 and it still fails to kill 4 of the states. I have to trash the remaining states individually.

jimp

Might be this, then: https://redmine.pfsense.org/issues/9270

micro8765

Ok. I'll test on 245 when released and update.

micro8765

This problem is still dogging me.

I suspect that this:

https://redmine.pfsense.org/issues/9270

could be a dup of this:

https://redmine.pfsense.org/issues/4674 ??

I will test when 2.5 comes out.

jimp

No, those are not related.

Ximulate

There's a number of posts on this issue. There are several scripts posted on this forum that kill the lingering connections. The problem with the scripts is that it will kill an active phone conversation. Not sure how to resolve that.

micro8765

@Ximulate said in Some filtered states can only be killed individually, not in bulk:

There are several scripts posted on this forum that kill the lingering connections.

Are you referring to scripts like the following:

pfctl -k 192.168.1.41
pfctl -k 0.0.0.0/0 -k 192.168.1.41

Because I've tried that as a potential quick workaround for staff when facing this situation, but it doesn't work - the same 'sticky' states seem to be left intact as when using the dialog's 'Kill States' button on a filtered state.

In fact, even a full state table reset (Diagnostics -> States -> Reset States -> Reset the firewall state table) failed to kill the 'sticky' states in at least some of my tests.

If you know of another script that does work, please let me know. I can live with killing active phone conversations in many circumstances. As of now the only two workarounds that actually work are:

Manually kill the states via the dialog as outlined earlier in this thread. This scares and confuses non-technical staff way too much, and must be done one IP at a time so it is a bit click intensive and time consuming.
Reboot the router. This is the path we generally take despite the outage, as it is easy to understand and execute. It has the distinct advantage of always resolving the issue.

Ximulate

I'm by no means an expert at any of this, just on a quest to solve my own similar situation.

The Reset States states GUI command always worked for me, though its a bit like using a sledgehammer for a thumbtack. Wonder if the device(s) on your network are so quickly reestablishing the connection that it just appears to to not kill the state? Your Web GUI on the VoIP may have a setting to allow you to adjust the re-registration time of the phone. If you can access it, perhaps increase the time.

The scripts I've used are run automatically by cron (install the crontab package to easily manage cron jobs). Still, not an ideal solution. You could also schedule a reboot via cron.

You might look into if you can disable then reset the failover wan interface via a command (not sure if you can.)

Look at your Firewall Optimization Options under System > Advanced > Firewall & NAT. If its not already, try Normal or Aggressive. This changes the state timeouts. Scroll to the bottom, and you can fine tune those even further.

In Diagnostics > Command Prompt, run "pfctl -st" to see you actual State Timings. Changing the above from say Normal to Aggressive should reduce the time outs.

Hope this helps.

Here are my state times outs with Agressive:
tcp.first 30s

tcp.opening 5s
tcp.established 18000s
tcp.closing 60s
tcp.finwait 30s
tcp.closed 30s
tcp.tsdiff 10s
udp.first 60s
udp.single 30s
udp.multiple 60s
icmp.first 20s
icmp.error 10s
other.first 60s
other.single 30s
other.multiple 60s
frag 30s
interval 10s
adaptive.start 241200 states
adaptive.end 482400 states
src.track 0s