Crashes do not recover due to a faulty code in /etc/inc/config.lib.inc line 383



  • In the past month I had many crashes. Most of the time pfSense somehow recovered from it.

    However many times the recovery does not work. The system is restarting and then it goes wrong:
    The system is dying in an endless loop:
    "Warning: Cannot use a scalar value as an array in /etc/inc/config.lib.inc on line 383"

    So I transferred that file to my PC and copied the involved loop (below)

    Line 383 =
    $config['version'] = sprintf('%.1f', $next / 10);

    So something in that line is "not ok" or at least there should be a tested if there is an exception which should be handled different

    Note that if pfSense is in this loop, a restart does not help. The restart will end with the same endless loop.

    Netgate, please change this code!

    Louis

    /* Loop and run upgrade_VER_to_VER() until we're at current version */
    while ($config['version'] < $g['latest_config']) {
    	$cur = $config['version'] * 10;
    	$next = $cur + 1;
    	$migration_function = sprintf('upgrade_%03d_to_%03d', $cur,
    	    $next);
    
    	foreach (array("", "_custom") as $suffix) {
    		$migration_function .= $suffix;
    		if (!function_exists($migration_function)) {
    			continue;
    		}
    		if (isset($already_run[$migration_function])) {
    			/* Already executed, skip now */
    			unset($config['system']
    			    ['already_run_config_upgrade']
    			    [$migration_function]);
    		} else {
    			$migration_function();
    		}
    	}
    	$config['version'] = sprintf('%.1f', $next / 10);
    	if (platform_booting()) {
    		echo ".";
    	}
    }


  • I should have added some hints ☺

    I think the problem can occur if there is no valid config. That probably as a consequence of e.g. the following event:

    switch off an interface
    switch on an interface => crash
    system should write the new config, but that is not possible any more because the system is allready crashed
    

    In the same context:

    can not determine kernel version
    can not find config
    

    The situation can sometimes be fixed by placing an USB containing a recent config file.

    Louis



  • When you lose the config.xml file on the disk, all bets are off at that point. That is the most critical file for a pfSense installation. It contains everything for how the firewall should configure itself.

    If it is missing, can't be read, or is corrupt, all kinds of bad things will follow.

    So all of the problems you listed with that restart loop are due to the missing config.xml. Providing it one on the USB stick solves the restart loop problem.

    As you stated up front, the root issue is something with turning interfaces on and off that causes a system crash. Once the system crashes, then you can certainly have follow-on events. But in your case, I think this restart loop is not relevant. It is a symptom of the other disease.



  • @bmeeks said in Crashes do not recover due to a faulty code in /etc/inc/config.lib.inc line 383:

    When you lose the config.xml file on the disk, all bets are off at that point. That is the most critical file for a pfSense installation. It contains everything for how the firewall should configure itself.

    A few remarks

    • This is what probably happens .....
    • Even if it is true this is not the correct way of handling things! The platform should always recover (perhaps with the previous config, that appart).

    To guarantee that:

    • each "what ever action" should never be executed before a valid copy of the previous config file has been made
    • if the action lead to a crash than the saved old config should be loaded
    • a second config change should never start before the previous is executed / finished

    And I am sure that that is what NetGate is more or less doing. I know there is a config.old

    Whatever something within that implementation is not "waterproof"

    Louis


Log in to reply