HA randomly BACKUP goes to MASTER state
-
@Przemyslaw85, Im using monitoring system too and when the problem occuring lots of hosts (vlans) are not visible for a while within it.
@b_it, thank you for link, please, let us to known about your test.
-
@m4rek11 @Przemyslaw85 Sure I will. I hope that I will back with good news. Stay tuned
-
On last Saturday I added the two patches I mentioned in the previous comment and so far it looks much better. I don't see too many unnecessary messages, both firewalls are stable after these few days. Here are all the patches I added directly to mitigate CARP issue:
Fix CARP event storm when leaving persistent CARP maintenance mode 1/2
https://github.com/pfsense/pfsense/commit/8a906fba5e42d391227dfc39311d02b570576d50.patchFix CARP event storm when leaving persistent CARP maintenance mode 2/2
https://github.com/pfsense/pfsense/commit/3c15b353c6968801cfffb7d3b30a7069d2330a3e.patchduring patching Saturday I also manually added this one:
Fix Clicking Save & Force Update on a Dynamic DNS entry results in a GUI timeout
https://github.com/pfsense/pfsense/commit/bdffb77d1aa21770b23ef408ad9fba79d0825ec5.patchand I applied this three patches from recommended section:
Disable pf counter data preservation to temporarily work around latency when reloading large rulesets (Redmine #12827)Fix Captive Portal handling of non-TCP traffic after login (Redmine #12834)
Fix OpenVPN dashboard widget client termination (Redmine #12817)
to sum up: for now I will stay with 2.6.0 version with patches
-
@b_it I understand I have made changes for mode 1/2 and mode 2/2.
For mode 1/2 I have to do steps for server 1 or both. -
@przemyslaw85 I think that every node should have the same set of patches. So I patched first node, and than the second node.
this name is just my own convention name:
Fix CARP event storm when leaving persistent CARP maintenance mode 1/2
Fix CARP event storm when leaving persistent CARP maintenance mode 2/2For CARP issue the second patch is not going to apply without the first one. This the view from one node (the second has the same set o patches)
-
@b_it I confirm the operation of the patches.
Yesterday I made a few changes to the original files using the file editor. I didn't know there was such a module as patches. I had to revert to the original changes from a copy made before editing.
As I added 1/2 2/2 patches and Dynamin DNS I did not notice any improvement. Only after I added patches # 12827, # 12834, # 12816 and # 12817 I can say that now the system works as it should. -
@przemyslaw85 Seems to me that when I started to patch (CARP) I saw that firewall is more responsive making later changes (patching) but I didn't wait too long - just rebooted both nodes to be sure that all selected patches are fully applied.
I have to admit that I started to make more thorough tests after I rebooting FWs (with mentioned patch set), so I can't be sure what really helped and how much.
BTW; The patching mechanism was introduced around version 2.5, and I've already learned from his beginning that I have to be careful selecting patches. -
@Przemyslaw85, @B_IT, after that changes did you have carp storm in logs and that MASTER -> BACKUP, BACKUP ->MASTER change for little time?
-
@m4rek11 I am looking into logs I see that during applying patch there are some entries, but after patching I see only a few, and they all looks as they should (at least for me) and they have reason (eg. rebooted node). I wouldn't call them storm and definitely I don't see flipping MASTER - BACKUP entries now.
-
@m4rek11 After applying the patches, I did not notice that the routers changed the roles of Master-> Backup, Backup-> Master.
All the problems went with those when I made any changes to the rules, dns or DHCP.I found my configuration error early. For unknown reason, for 2 different networks I sent the same vhid for Virtual IP. But the problems were still there. After applying the patches, the problem was gone.