CARP sync not working, slave machine misbehaving



  • Hi.

    I've recently ran into CARP problem on my pfSense machines. It started after upgrading from 2.1-RC1 (AFAIR) to 2.1-RELEASE. First symptoms noticed: dozens of notices (per day) on master:

    "An error code was received while attempting XMLRPC sync with username admin https://10.10.10.3:443 - Code 2: Invalid return payload: enable debugging to examine incoming payload"

    Next symptoms notices: slave configuration is out of date (from before 2.1-RELEASE upgrade) and I have problems accessing slave machine via https. I tried to debug the problem and found following things:

    1. Running /etc/rc.filter_synchronize by hand on master gives following output (I've removed sensitive data) and hangs.

    –-SENT---
    POST /xmlrpc.php HTTP/1.0
    User-Agent: PEAR XML_RPC
    Host: 10.10.10.3
    Authorization: Basic SomeBase64EncodedThingHere
    Content-Type: text/xml
    Content-Length: 209
    
     <methodcall><methodname>pfsense.host_firmware_version</methodname>
     <params><param>
    <value><string>ClearTextPasswordHere</string></value></params></methodcall> 
    
    ---END---
    

    2. After doing above on master, webConfigurator on slave stops responding and does nothing. Only restarting it from console makes it work again. Following entries are logged (on slave) after restarting webConfigurator:

    Oct 16 14:29:37 172.16.16.253 lighttpd[26530]: (mod_fastcgi.c.2543) unexpected end-of-file (perhaps the fastcgi process died): pid: 26551 socket: unix:/tmp/php-fastcgi.socket-0
    Oct 16 14:29:37 172.16.16.253 lighttpd[26530]: (mod_fastcgi.c.3329) response not received, request sent: 801 on socket: unix:/tmp/php-fastcgi.socket-0 for /xmlrpc.php?, closing connection
    Oct 16 14:29:37 172.16.16.253 lighttpd[26530]: (mod_fastcgi.c.2543) unexpected end-of-file (perhaps the fastcgi process died): pid: 26551 socket: unix:/tmp/php-fastcgi.socket-0
    Oct 16 14:29:37 172.16.16.253 lighttpd[26530]: (mod_fastcgi.c.3329) response not received, request sent: 826 on socket: unix:/tmp/php-fastcgi.socket-0 for /index.php?, closing connection
    Oct 16 14:29:37 172.16.16.253 lighttpd[26530]: (mod_fastcgi.c.2543) unexpected end-of-file (perhaps the fastcgi process died): pid: 26551 socket: unix:/tmp/php-fastcgi.socket-0
    Oct 16 14:29:37 172.16.16.253 lighttpd[26530]: (mod_fastcgi.c.3329) response not received, request sent: 802 on socket: unix:/tmp/php-fastcgi.socket-0 for /xmlrpc.php?, closing connection
    Oct 16 14:29:37 172.16.16.253 lighttpd[26530]: (mod_fastcgi.c.2543) unexpected end-of-file (perhaps the fastcgi process died): pid: 26551 socket: unix:/tmp/php-fastcgi.socket-0
    Oct 16 14:29:37 172.16.16.253 lighttpd[26530]: (mod_fastcgi.c.3329) response not received, request sent: 802 on socket: unix:/tmp/php-fastcgi.socket-0 for /xmlrpc.php?, closing connection
    Oct 16 14:29:37 172.16.16.253 lighttpd[26530]: (mod_fastcgi.c.2543) unexpected end-of-file (perhaps the fastcgi process died): pid: 26551 socket: unix:/tmp/php-fastcgi.socket-0
    Oct 16 14:29:37 172.16.16.253 lighttpd[26530]: (mod_fastcgi.c.3329) response not received, request sent: 802 on socket: unix:/tmp/php-fastcgi.socket-0 for /xmlrpc.php?, closing connection
    Oct 16 14:29:37 172.16.16.253 lighttpd[26530]: (mod_fastcgi.c.2543) unexpected end-of-file (perhaps the fastcgi process died): pid: 26551 socket: unix:/tmp/php-fastcgi.socket-0
    Oct 16 14:29:37 172.16.16.253 lighttpd[26530]: (mod_fastcgi.c.3329) response not received, request sent: 802 on socket: unix:/tmp/php-fastcgi.socket-0 for /xmlrpc.php?, closing connection
    Oct 16 14:29:37 172.16.16.253 lighttpd[26530]: (mod_fastcgi.c.2543) unexpected end-of-file (perhaps the fastcgi process died): pid: 26551 socket: unix:/tmp/php-fastcgi.socket-0
    Oct 16 14:29:37 172.16.16.253 lighttpd[26530]: (mod_fastcgi.c.3329) response not received, request sent: 802 on socket: unix:/tmp/php-fastcgi.socket-0 for /xmlrpc.php?, closing connection
    Oct 16 14:29:37 172.16.16.253 lighttpd[26530]: (mod_fastcgi.c.2543) unexpected end-of-file (perhaps the fastcgi process died): pid: 26551 socket: unix:/tmp/php-fastcgi.socket-0
    Oct 16 14:29:37 172.16.16.253 lighttpd[26530]: (mod_fastcgi.c.3329) response not received, request sent: 802 on socket: unix:/tmp/php-fastcgi.socket-0 for /xmlrpc.php?, closing connection
    Oct 16 14:29:37 172.16.16.253 lighttpd[26530]: (mod_fastcgi.c.2543) unexpected end-of-file (perhaps the fastcgi process died): pid: 26551 socket: unix:/tmp/php-fastcgi.socket-0
    Oct 16 14:29:37 172.16.16.253 lighttpd[26530]: (mod_fastcgi.c.3286) child signaled: 9

    3. I can log via SSH to slave machine, but connection is killed some seconds later (it disappears from netstat -an, checked via serial console).

    4. When I'll disconnect master machine then slave properly takes over and SSH connection are not dropped anymore. After reconnecting master it takes over and slave properly falls back into "backup" state in CARP status.

    To me it looks like slave box while being in CARP backup state kills web/SSH connections established with it. This would explain SSH being dropped and web sync being interrupted. I completly have no idea why it may behave like that. I'll appreciate any help to debug and fix these issues. Note: my setup was running more or less fine for a looong time and there were no changes recently except for 2.1-RELEASE upgrade.



  • Old problem and seems that no one cares… Anyway... I've reinstalled slave router from scratch some time ago and it was working just fine for about two weeks. Now lighttpd stops working on it few seconds after restart (both, web configurator restart and system reboot). It gives error 500 when trying to access and logs are filled with entries like:

    Dec 23 09:02:20 lighttpd[47357]: (mod_fastcgi.c.3329) response not received, request sent: 871 on socket: unix:/tmp/php-fastcgi.socket-0 for /firewall_aliases.php?, closing connection
    Dec 23 09:02:20 lighttpd[47357]: (mod_fastcgi.c.2543) unexpected end-of-file (perhaps the fastcgi process died): pid: 47576 socket: unix:/tmp/php-fastcgi.socket-0

    and then

    Dec 23 09:02:20 lighttpd[47357]: (network_openssl.c.118) SSL: 5 -1 1 Operation not permitted
    Dec 23 09:02:20 lighttpd[47357]: (connections.c.637) connection closed: write failed on fd 22

    BTW, are there any updates to 2.1 STABLE? I'm on "built on Wed Sep 11 18:17:37 EDT 2013" and it says "You are on the latest version."


Log in to reply