Snort drops connections on CARP IP after failover
Hello to everyone!
Let`s cut straight to the case!
I have a dual box HA/redundancy setup with pfsense. Boxes are identical and mostly everything works (including failover and state synchronization).
I also have net.inet.carp.preempt set to 1.
Now on the WAN I have a single public IP for each PFsense box (x.x.x.1) and (x.x.x.2). Then I have a CARP virtual IP x.x.x.3, which is used for internet connectivity. I also have a CARP virtual IP x.x.x.4 (for VPN LAN NAT) and x.x.x.5 (for LAN1 NAT), which are both also on the WAN interface and both are failing over just fine.
When I use a testbox in LAN1 to reach LAN2 and send a file, failover happens just fine and connection is being kept up since the states are being synched.
When I use testbox in LAN1 to reach the world and download an .iso file, the NAT to x.x.x.5 is being used and the testbox receives the file.
During this download from the internet I can see the states (with the indication of NAT) are being synched and appear on the slave machine, however, when the failover happens, the connection is being stalled (I use wget for the download and it shows a stalled connection). The download does not continue through the slave PFsense box, although it has the state synched, but the moment the master box comes back to life, the connection resumes and download finishes.
The same problem persists when I change the NAT address to be an IP alias to the WAN CARP IP.
Also, when the failover happens, the new master drops the state from its state table, while the previous master keeps it. When it fails back over, the slave, that dropped the state has it up again.
I tried only having one CARP IP on the WAN interface and having NAT translate the addresses to that, but it still doesn`t fail over. The moment I enter persistent CARP maintenance mode the other box drops the NAT states.
I managed to resolve this problem, when I found that Snort package, which I had installed and configured on WAN, was dropping the state on the slave (the box that is becoming the master on failover), because of the ongoing download for which the initialization was only seen by the snort on the previous master. Hence Snort thought it was an intrusive packet and denied the connection.
That also explains, why it was all good again on the master box after failing back.
Now the question is:
Is there a way to make Snort synchronize it
s decisions about connections to the slave box, so that when a failover happens, "Snort2" wouldnt drop the state from the table?
If no such thing is possible, maybe one could point me to an article, that comments on the rules to disable, so that no such drops would occur.
I feel that disabling rules to get things to work shouldn`t be the only solution.
Anyway, thank You for your time!
bmeeks last edited by
Are you 100% positive it is Snort that is breaking the failover? All it really does is analyze packets, and if an alert is triggered, it puts the IP address of the offender (either SRC, DST or BOTH depending on how it is configured) into the pfSense packet filter's snort2c table. It is up to pfSense and pf to manage states from that point on for that session. Snort should be totally oblivious to the failover process and also to state table synchronization.
Snort would only clear existing states under these two conditions: (1) you have that option configured on the INTERFACE SETTINGS tab; and (2) a new alert is detected on the IP address. So one of the things Snort does when inserting an IP into that snort2c table is clear any existing states for the IP (when the option is configured).
Everything you have written is true.
The alert is triggered (therefore the state is dropped) on the traffic, because the snort on the slave box, which then takes over only sees the file transfer from halfway.
The transfer is initiated on the master box and the slave snort will see the continuation of the transfer as someone trying to send a file without a request for it.
The triggering rule states: "NO CONTENT-LENGTH OR TRANSFER-ENCODING IN HTTP RESPONSE". It`s ID is 120, 3.
That is the rule that was causing problems.
bmeeks last edited by
Humm… OK. I did just find this link about a limited experimental high availability option for the Snort Stream5 preprocessor: https://www.snort.org/faq/readme-ha.
I've never used this option nor investigated integrating it into the pfSense package. Might be potentially useful, though. Would you like to perhaps contribute some code to the project if you have the time and expertise? I'm going to be busy for a while looking for the Snort bugs on the armv7 hardware used in the SG-1000 and SG-3100 Netgate appliances.
Thanks for introducing me to the High Availability of Snort.
I will look in to it, although I will not be able to do any coding for that, cause of the lack of expertise.
The high availability would only be required in pfsense for systems which have a tightly limited failure gap (let it be downtime, lost packets or dropped connections). The community of the paid version (if such) is probably already looking in to this.