As this problem bit me several times now when upgrading CARP-clusters I tried to troubleshoot it a bit more. Here is what I found out:
I read somewhere that there were protocolchanges in pfsync between 2.7.0 and 2.7.2. So if you upgrade the nodes one by one there is the situation where one system is already on 2.7.2 and the other one still on 2.7.0. During this time 2.7.0 tries to sync states to 2.7.2 and vice versa. Due to the versionmismatch the statetable seems to have invalid records in the list that causes the too many creators error and causes the statesync to fail. Even after both nodes reach 2.7.2 those invalid records still remain in the statetables, which then still breaks the sync.
The resolution to this is:
Either reboot both nodes at the same time, so the broken records are not synced back and forth. Downtime about 1-2 minutes on fast systems.
Or disable the statesync via pfsync on both nodes (system>high availability), reset the statetables on both nodes (Diagnostics>States, Reset states). Then reenable the statesync again. Downtime a few seconds as only the dropped connections need to be reestablished.
Unfortunately both procedures cause a disruption of traffic, though the second way of getting pfsync going again is rather short. An additional impacti is, that during the upgradeprocess statesync is broken from the time the first node reboots till the states get cleared on both nodes after both nodes are upgraded.
Hope this helps some people that run into the same issues.
Regards
Holger