SCTP Session Timeouts
-
Is this perhaps a bug report you submitted?
https://redmine.pfsense.org/issues/15661#change-74029
If not, then it looks like someone else has recognized this issue and created a Redmine bug report on it.
-
@bmeeks The bug mentioned applies only to GUI problem. But SCTP state "SCTP Closing" after ABORT message is another. What should be the best sesion state after an ABORT? Maybe the session should be deleted at all?
-
@rafal-arciszewski said in SCTP Session Timeouts:
But SCTP state "SCTP Closing" after ABORT message is another. What should be the best sesion state after an ABORT? Maybe the session should be deleted at all?
Sorry, but I don't know the answer to that one. STCP is not something I'm very familiar with.
-
@rafal-arciszewski I'm on vacation until around the 12th, but in the mean time can you clarify a few things?
After upgrade from 23.09.1 to 24.03
Do you mean that you did not have this issue in 23.09.01 and did in 24.03?
We use a lot sctp connections where both ends use the always the same SCTP port and IP address.
So you're re-using both source and destination IP and port number? Why?
Shouldn't the session be in "SCTP Closed" or even be removed in that case?
This is going to be the same thing as a TCP closing session, basically to make sure than any in-transmit packets for the closed connection get caught.
-
After upgrade from 23.09.1 to 24.03
Do you mean that you did not have this issue in 23.09.01 and did in 24.03?
Yes. The problem started after the upgrade. I don't have possibility to check it in 23.09.1 again but I presume that either default SCTP timers were different or SCTP session state after an ABORT was different.
We use a lot sctp connections where both ends use the always the same SCTP port and IP address.
So you're re-using both source and destination IP and port number? Why?
We use a lot of telco equipment and in telco world the endpoints are configured usually with fixed IP and SCTP port. In my example endpointA is always using 10.102.81.36:36412 and endpointB is always using 192.168.99.22:36412.
When we restart endpointA, this endpoint sends ABORT message and then after ~3minutes it start to send INIT messages to reconnect. From firewall point of view this is exactly the same session and during this 900s timeout the firewall is blocking those messages.Shouldn't the session be in "SCTP Closed" or even be removed in that case?
This is going to be the same thing as a TCP closing session, basically to make sure than any in-transmit packets for the closed connection get caught.
So the only way is to decrease the SCTP closing timeout? The bug report is already solved (will be included in 24.08), but is there any other way to change this timeout (by modifing manually config.xml perhaps?) I tried to use session timout in the firewall rule but is also seems not to be working with SCTP.
-
@rafal-arciszewski I've added https://cgit.freebsd.org/src/commit/?id=82e021443a76b1f210cfb929a495185179606868 to cope with this.
The TCP path already had a similar thing for similarly broken TCP stacks.It'll be part of the upcoming plus release too.
-
@kprovost Thank you very much!
-
Hi @kprovost @rafal-arciszewski
I have exactly the same issue as this in pfSense+ 24.11 - did https://cgit.freebsd.org/src/commit/?id=82e021443a76b1f210cfb929a495185179606868 get implemented?
I see the saving of non-default timers issue is fixed in this release but I still see SCTP INIT packets being blocked forever after an association ABORT?
Surely no hack is required here, as the SCTP association isn't misbehaving, it's doing what SCTP associations often do. Shouldn't pfSense purge the state immediately when an ABORT is received for the association?
Thanks -
Actually, this is a bug in FreeBSD, introduced in D42393 - https://reviews-dev.freebsd.org/D42393
In this change, SCTP-specific state timers have been introduced in pf. However, the ABORT mechanism is not handled correctly.
RFC4960 section 3.3.7 states that “under any circumstances, an endpoint that receives an ABORT MUST NOT respond to that ABORT by sending an ABORT of its own.”
Basically, an ABORT kills an association immediately with no acknowledgement necessary or possible.
This is in contrast to the SHUTDOWN which should be replied to with SHUTDOWN_COMPLETE.
In terms of firewall states, this means that the state should move to CLOSED immediately upon receipt of a correctly formatted ABORT chunk.
The current code in this change moves the state to CLOSING upon receipt of an ABORT.sys/netpfil/pf/pf.c 5933 if (pd->sctp_flags & (PFDESC_SCTP_SHUTDOWN | PFDESC_SCTP_ABORT | 5944 PFDESC_SCTP_SHUTDOWN_COMPLETE)) { 5945 if (src->state < SCTP_SHUTDOWN_PENDING) { 5946 pf_set_protostate(*state, psrc, SCTP_SHUTDOWN_PENDING); 5947 (*state)->timeout = PFTM_SCTP_CLOSING; 5948 } 5949 } 5950 if (pd->sctp_flags & (PFDESC_SCTP_SHUTDOWN_COMPLETE)) { 5951 pf_set_protostate(*state, psrc, SCTP_CLOSED); 5952 (*state)->timeout = PFTM_SCTP_CLOSED; 5953 }
The first IF statement in line 5933 should not include PFDESC_SCTP_ABORT as a valid flag to move to PFTM_SCTP_CLOSING.
The second IF statement in line 5950 should include PFDESC_SCTP_ABORT as a valid flag.
In the above implementation, when an ABORT is sent, the state machine will be stuck in PFTM_SCTP_CLOSING forever (or until it times out).Corrected code should look like this:
sys/netpfil/pf/pf.c 5933 if (pd->sctp_flags & (PFDESC_SCTP_SHUTDOWN | 5944 PFDESC_SCTP_SHUTDOWN_COMPLETE)) { 5945 if (src->state < SCTP_SHUTDOWN_PENDING) { 5946 pf_set_protostate(*state, psrc, SCTP_SHUTDOWN_PENDING); 5947 (*state)->timeout = PFTM_SCTP_CLOSING; 5948 } 5949 } 5950 if (pd->sctp_flags & (PFDESC_SCTP_SHUTDOWN_COMPLETE | PFDESC_SCTP_ABORT)) { 5951 pf_set_protostate(*state, psrc, SCTP_CLOSED); 5952 (*state)->timeout = PFTM_SCTP_CLOSED; 5953 }
-
New issue raised for the above - https://redmine.pfsense.org/issues/15924
-
@JustinSims Here is the bug report.
https://redmine.pfsense.org/issues/15924