Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Absolute Worst Experience Trying to Upgrade to 2.4.4_3

    Scheduled Pinned Locked Moved Problems Installing or Upgrading pfSense Software
    8 Posts 4 Posters 952 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • rmaederR
      rmaeder
      last edited by

      The attempt to upgrade from 2.4.4_2 to 2.4.4_3 on an APU1 (AMD G-T40E, mSATA SSD) failed with several fatal problems.

      1. after the unit rebooted after the first step of the upgrade, it produced a page fault:
      Fatal trap 12: page fault while in kernel mode
      cpuid = 1; apic id = 01
      fault virtual address   = 0x800e29000
      fault code              = supervisor write data, page not present
      instruction pointer     = 0x20:0xffffffff81189dde
      stack pointer           = 0x28:0xfffffe0093dd3940
      frame pointer           = 0x28:0xfffffe0093dd3940
      code segment            = base 0x0, limit 0xfffff, type 0x1b
                              = DPL 0, pres 1, long 1, def32 0, gran 1
      processor eflags        = interrupt enabled, resume, IOPL = 0
      current process         = 748 (logger)
      

      This was right after some messages about microcode updating. This also corrupted the root file system beyond repair. I reinstalled from scratch, which went fine.

      1. Trying to restore the previous config resulted in a broken system

      This is even worse. Every attempt to load the configuration I had saved a couple of hours ago on the running 2.4.4_2 system failed and permanently damaged the freshly installed system. The web configurator would only spit out error messages in my browser after that and not recover.

      Fatal error: Uncaught Exception: XML error: RRDDATA at line 3995 cannot occur more than once in /etc/inc/xmlparse.inc:87 Stack trace: #0 [internal function]: startElement(Resource id #6, 'RRDDATA', Array) #1 /etc/inc/xmlparse.inc(186): xml_parse(Resource id #6, '/UH252lklZZqR63...', false) #2 /etc/inc/xmlparse.inc(147): parse_xml_config_raw('/conf/config.xm...', Array, 'false') #3 /etc/inc/config.lib.inc(132): parse_xml_config('/conf/config.xm...', Array) #4 /etc/inc/config.inc(159): parse_config() #5 /etc/inc/gwlb.inc(23): require_once('/etc/inc/config...') #6 /etc/inc/functions.inc(33): require_once('/etc/inc/gwlb.i...') #7 /etc/inc/notices.inc(24): require_once('/etc/inc/functi...') #8 /etc/inc/config.gui.inc(37): require_once('/etc/inc/notice...') #9 /etc/inc/auth.inc(31): require_once('/etc/inc/config...') #10 /etc/inc/authgui.inc(25): include_once('/etc/inc/auth.i...') #11 /usr/local/www/guiconfig.inc(51): require_once('/etc/inc/authgu...') #12 /usr/local/www/index.php(44): require_once('/usr/local/www/...') #13 {m in /etc/inc/xmlparse.inc on line 87 PHP ERROR: Type: 1, File: /etc/inc/xmlparse.inc, Line: 87, Message: Uncaught Exception: XML error: RRDDATA at line 3995 cannot occur more than once in /etc/inc/xmlparse.inc:87 Stack trace: #0 [internal function]: startElement(Resource id #6, 'RRDDATA', Array) #1 /etc/inc/xmlparse.inc(186): xml_parse(Resource id #6, '/UH252lklZZqR63...', false) #2 /etc/inc/xmlparse.inc(147): parse_xml_config_raw('/conf/config.xm...', Array, 'false') #3 /etc/inc/config.lib.inc(132): parse_xml_config('/conf/config.xm...', Array) #4 /etc/inc/config.inc(159): parse_config() #5 /etc/inc/gwlb.inc(23): require_once('/etc/inc/config...') #6 /etc/inc/functions.inc(33): require_once('/etc/inc/gwlb.i...') #7 /etc/inc/notices.inc(24): require_once('/etc/inc/functi...') #8 /etc/inc/config.gui.inc(37): require_once('/etc/inc/notice...') #9 /etc/inc/auth.inc(31): require_once('/etc/inc/config...') #10 /etc/inc/authgui.inc(25): include_once('/etc/inc/auth.i...') #11 /usr/local/www/guiconfig.inc(51): require_once('/etc/inc/authgu...') #12 /usr/local/www/index.php(44): require_once('/usr/local/www/...') #13 {m
      

      I mean, even if an uploaded config file had been corrupt (which it wasn't), it must never damage the running system (think of DOS, etc). It could simply say that the file was corrupt, and refuse to load it; this seems rather obvious, doesn't it?
      The problem turned out to be the long rrd section in the file; after I edited that out, I could finally load the config.

      The system isn't very robust when it comes to the config file in general. This is horrible, too:

      Enter an option: 4
      
      Config.xml is corrupted and is 0 bytes.  Could not restore a previous backup.Config.xml is corrupted and is 0 bytes.  Could not restore a previous backup.
       0) Logout (SSH only)                  9) pfTop
       1) Assign Interfaces                 10) Filter Logs
       2) Set interface(s) IP address       11) Restart webConfigurator
       3) Reset webConfigurator password    12) PHP shell + pfSense tools
       4) Reset to factory defaults         13) Update from console
       5) Reboot system                     14) Enable Secure Shell (sshd)
       6) Halt system                       15) Restore recent configuration
       7) Ping host                         16) Restart PHP-FPM
       8) Shell
      

      So it doesn't seem to keep a basic factory config somewhere for such cases. Restoring to factory defaults should always work, right?

      Roman

      GertjanG 1 Reply Last reply Reply Quote 0
      • GertjanG
        Gertjan @rmaeder
        last edited by

        @rmaeder said in Absolute Worst Experience Trying to Upgrade to 2.4.4_3:

        I mean, even if an uploaded config file had been corrupt (which it wasn't),

        But :

        @rmaeder said in Absolute Worst Experience Trying to Upgrade to 2.4.4_3:

        The problem turned out to be the long rrd section in the file

        so it was a corrupt config file. The fact that it is huge (mine are 720 MB) is normal : that happens when RRD files are included.
        But they shouldn't have multiple identical sections "RRDDATA at line 3995 cannot occur more than once " (same RRD info).

        This can happens if the files on disk are in bad shape.

        Restart pfSense using the console, stop booting at that the first menu and launch some "fsck" (see Manual for all this)
        I guess you will find issues with the disk structure.
        Running Windows or any ther disk with a disk in the same shape will produce the same results.

        @rmaeder said in Absolute Worst Experience Trying to Upgrade to 2.4.4_3:

        Restoring to factory defaults should always work, right?

        This is NOT a desktop OS !!!!

        Default settings exists but not for attributing which interface goes to what NIC.
        Think about it 1 minute and you'll understand why.
        Hint : this is a firewall.

        No "help me" PM's please. Use the forum, the community will thank you.
        Edit : and where are the logs ??

        rmaederR 1 Reply Last reply Reply Quote 0
        • jimpJ
          jimp Rebel Alliance Developer Netgate
          last edited by

          Do you still have the full backtrace for that crash you posted? There isn't enough to go on to accurately speculate about causes without that.

          That said, from the general description of symptoms and the other errors I'd say you have a disk problem.

          Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

          Need help fast? Netgate Global Support!

          Do not Chat/PM for help!

          rmaederR 1 Reply Last reply Reply Quote 0
          • rmaederR
            rmaeder @Gertjan
            last edited by

            @Gertjan said in Absolute Worst Experience Trying to Upgrade to 2.4.4_3:

            @rmaeder said in Absolute Worst Experience Trying to Upgrade to 2.4.4_3:

            I mean, even if an uploaded config file had been corrupt (which it wasn't),

            But :

            @rmaeder said in Absolute Worst Experience Trying to Upgrade to 2.4.4_3:

            The problem turned out to be the long rrd section in the file

            so it was a corrupt config file. The fact that it is huge (mine are 720 MB) is normal : that happens when RRD files are included.
            But they shouldn't have multiple identical sections "RRDDATA at line 3995 cannot occur more than once " (same RRD info).

            This can happens if the files on disk are in bad shape.

            My config backups were only about 6MB.
            But the fact that pfSense can produce backups it cannot read later on is really bad, isn't it?
            The fact that it contains two rrd sections shouldn't be fatal.

            Restart pfSense using the console, stop booting at that the first menu and launch some "fsck" (see Manual for all this)
            I guess you will find issues with the disk structure.
            Running Windows or any ther disk with a disk in the same shape will produce the same results.

            @rmaeder said in Absolute Worst Experience Trying to Upgrade to 2.4.4_3:

            Restoring to factory defaults should always work, right?

            This is NOT a desktop OS !!!!

            Default settings exists but not for attributing which interface goes to what NIC.
            Think about it 1 minute and you'll understand why.
            Hint : this is a firewall.

            Closed firewall/router products usually have a button you can press with a paper clip that restores things.
            For pfSense this isn't feasible, I admit, and we could just consider a reinstall to be some kind of factory reset. I wouldn't have made a fuss about it, if the backup had worked.

            Roman

            1 Reply Last reply Reply Quote 0
            • rmaederR
              rmaeder @jimp
              last edited by

              @jimp said in Absolute Worst Experience Trying to Upgrade to 2.4.4_3:

              Do you still have the full backtrace for that crash you posted? There isn't enough to go on to accurately speculate about causes without that.

              The terminal program I used to talk to the serial console of the unit didn't support scrolling, so I don't have more than I posted.

              That said, from the general description of symptoms and the other errors I'd say you have a disk problem.

              Whatever it was, I eventually reinstalled 2.4.4_3 on both of my units, and restored the hand-edited backup (without the rrd sections). I just downloaded a config backup after about two days of operations and it contains a single <rrddata> section with a couple of (distinct) files.

              From now on, I'll always make two backups, one without the rrd data, just in case.

              Roman

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Did it actually have more than one rrddata section in it?
                There was a bug for that but it was fixed some time ago: https://redmine.pfsense.org/issues/8994

                I agree I would expect it to be able to restore the factory settings from any config. The default config is stored in /conf.default/config.xml. Unless that was also damaged I can only think the lack of a current good config or any backup configs (other than default) caused it to hit an error before it actually restored the default.

                Steve

                rmaederR 1 Reply Last reply Reply Quote 0
                • rmaederR
                  rmaeder @stephenw10
                  last edited by

                  @stephenw10 said in Absolute Worst Experience Trying to Upgrade to 2.4.4_3:

                  Did it actually have more than one rrddata section in it?
                  There was a bug for that but it was fixed some time ago: https://redmine.pfsense.org/issues/8994

                  I've looked at all my older backups. Here is the situation:

                  A backup of 20181204 (before upgrading to 2.4.4_1) contains just one rrddata section.
                  On 20181211 I replaced the SD card with an mSATA SSD and reinstalled pfSense. The backup made before also contains one rrddata section.

                  However, a backup I made on 20190109, before upgrading to 2.4.4_2, contains two such sections. The upgrade went fine, so I never used that backup.
                  All backups since then contain two rrddata sections, which I didn't notice until I tried to load one after the failed 2.4.4_3 upgrade.

                  The symptoms are similar to what the bug mentions, but that should have been fixed in 2.4.4_1?

                  Roman

                  GertjanG 1 Reply Last reply Reply Quote 0
                  • GertjanG
                    Gertjan @rmaeder
                    last edited by Gertjan

                    @rmaeder said in Absolute Worst Experience Trying to Upgrade to 2.4.4_3:

                    but that should have been fixed in 2.4.4_1?

                    Yes, it was, but fix didn't clean your existing config file.

                    Open a console, option 8 and use

                    viconfig
                    

                    Add an empty

                    	<donothing>
                    	</donothing>
                    

                    And save the config (: wq).
                    From now on, it will be part of your config file, it will last in there.

                    Note that reading in the existing config while booting doesn't break things, but importing does (when a double rrddata exists).

                    Also : https://github.com/pfsense/pfsense/blob/master/src/usr/local/www/diag_backup.php#L204 looks nice, but I guess (again - I'm just thinking out loud here) that a second <rrddata></rrddata> isn't removed. So, in a config backup, selected without rrddata you had always an empty rrddata section, that silently failed when restoring the actually RRD data files on disk when booting pfSense.
                    Actually exporting rrd data :

                    57f1302b-8da6-490b-b8d5-55cad482b471-image.png

                    and you wind up having two rrddata section again, and that went wrong.

                    No "help me" PM's please. Use the forum, the community will thank you.
                    Edit : and where are the logs ??

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post
                    Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.