Sync error with packages since today


  • LAYER 8 Moderator

    Hi all,

    since today we're picking up errors in CARP SYNC whenever something belonging to packages is synced:

    A communications error occurred while attempting to call XMLRPC method merge_installedpackages_section: Unable to connect to tls://192.168.168.2:443. Error: Operation timed out @ 2020-02-10 10:15:19
    A communications error occurred while attempting to call XMLRPC method merge_installedpackages_section: Unable to connect to tls://192.168.168.2:443. Error: Operation timed out @ 2020-02-10 10:15:29
    A communications error occurred while attempting to call XMLRPC method exec_php: Unable to connect to tls://192.168.168.2:443. Error: Operation timed out @ 2020-02-10 10:15:39
    A communications error occurred while attempting to call XMLRPC method exec_php: Unable to connect to tls://192.168.168.2:443. Error: Operation timed out @ 2020-02-10 10:15:49
    

    That only popped up since today and wasn't a problem last week as we edited e.g. Radius Users or OVPN servers.

    Both nodes are on latest stable (2.4.x Branch) and have the same packages (only acme missing on the secondary node ) and all packages are latest versions, too. It's also pretty reproducable. You e.g. create a new cert? No problem, synced to standby node. You edit or create a new FreeRadius User? You get the above mentioned 4 errors but the user pops up on the standby almost normally. Edit a VPN server configuration? You get the 4 errors. Add filter rules? No problems.

    At first I though it could be related to the OVPN client export package but as stated above, FreeRadius Users also triggered the same problem.

    Any insights?

    Greets
    Jens


  • Rebel Alliance Developer Netgate

    If it says it timed out, that implies it couldn't connect. If you didn't change anything on the systems, then you need to check what could have changed otherwise.

    The most common thing we've seen cause this kind of problem is when you have the firewall set to kill states for down gateways. If there is a gateway on the secondary which is in a down state, the filter reload from the initial config sync would trigger a filter reload, which resets states. That may have happened at just the wrong moment and cut off connectivity to the secondary.

    That's just one possibility, but something to consider.


  • LAYER 8 Moderator

    @jimp said in Sync error with packages since today:

    That's just one possibility, but something to consider.

    Absolutely, thanks. As this was some elevated by my boss because of the constant nagging ;) I can report, that Paighton from Support has found it out. To my surprise our old FreeRadius configuration (since the FR2 package times) contained a manual sync setting instead of using the systems sync (which would be the right way but I can remember it sometimes being bugged in the beginning). So as we switched UI Port a few weeks ago we never had any problem until there was a request for a new customer VPN server and Radius User. Didn't see that coming and perhaps would have found it in the end after debugging hours, but happy to say that support got it faster :) So always check your packages that allow syncing to the cluster peer and make sure the sync is using the right ip/port/credentials or is using the system ones in the first place :)

    Should have found that myself, but sometimes especially in your own setup environments you get stuck in a rut... In a customer setup I'm fairly certain we would've found that ;)


Log in to reply