Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    DHCP config is apparently not updated in a safe fashion

    Scheduled Pinned Locked Moved DHCP and DNS
    13 Posts 4 Posters 986 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • T
      tenortim
      last edited by

      I ran into a nasty issue today where ntopng filled the drive on my pfsense router (I've posted about that in the ntopng forum).
      The nasty part was that this only became apparent when I was deleting a static dhcp definition for a laptop that no longer exists, and the result was that it destroyed most of my dhcp configuration (the first 18 entries were all that were left).
      The only safe way to perform an update is to write a new file (flush and verify it was fully written) and then atomically move into place. It seems that is not happening for the dhcp config update.
      I have backups, but it would be really great if the update code were more careful/paranoid.
      Should I open an issue?

      Another thought. Given that the consumption of space happened over an extended period, is there any way to get email alerts when utilization goes over a threshold (e.g. 90%)?

      Tim

      GertjanG 1 Reply Last reply Reply Quote 0
      • KOMK
        KOM
        last edited by

        You probably won't get much action on the dhcp server issue here. Perhaps upstream at FreeBSD?

        As for alerts, I run the Zabbix network monitor and use Zabbix agent package on pfSense for system metrics etc.

        T 1 Reply Last reply Reply Quote 0
        • T
          tenortim @KOM
          last edited by

          @KOM It's not the upstream server that's at fault because that code doesn't ever change the config, it's the way the UI/php code updates the config. The config itself was trashed because it wasn't updated in a way that's safe if the disk is full. Does that make sense?

          1 Reply Last reply Reply Quote 0
          • KOMK
            KOM
            last edited by

            Make sense now. I thought you were complaining about the DHCP service in FreeBSD. Open an issue on Redmine if you like.

            1 Reply Last reply Reply Quote 0
            • T
              tenortim
              last edited by

              Thanks @Kom. Will do. I switched over to my old system temporarily, so I can spend a bit more time looking at what files got damaged compared to my backup from last week.

              1 Reply Last reply Reply Quote 0
              • KOMK
                KOM
                last edited by

                pfSense supports the autobackup feature which saves the last n copies of your config. You might have been able to rollback to a previous config via Diagnostics - Backup & Restore - Config History.

                1 Reply Last reply Reply Quote 0
                • GertjanG
                  Gertjan @tenortim
                  last edited by

                  @tenortim said in DHCP config is apparently not updated in a safe fashion:

                  and the result was that it destroyed most of my dhcp configuratio

                  Your disk ran out of space.
                  This was logged I guess.
                  pfSEnse, as "any other device with an OS" will go on up untill the bitter end.

                  In your case, things when wrong when the dhcp.conf file was rewritten.
                  Next time it could be the pfSense config.xml file - or any other config file based on config.xml (a couple of hundred).

                  Your dhcp.conf file was probably written correctly - as far as PHP can check - but the underlying OS died when it was closing the file. These actions are being done in parallel - your file system becomes 'dirty' and not-closed files are at risk.

                  pfSense itself logs to circular log files, because it's known to run on limited RAM/disk machines.
                  Installing packages that are based on 'tracking' should be logged to dedicated (syslog server / some NAS / where) because if something goes on, and the pfSEnse dies, it takes your log with it - the same log that could explian you post postmortem what actually happened.

                  See https://www.test-domaine.fr/munin/brit-hotel-fumel.net/index.htmlfor an example I do receive alert mails if some values are going over some predefined limit.

                  No "help me" PM's please. Use the forum, the community will thank you.
                  Edit : and where are the logs ??

                  1 Reply Last reply Reply Quote 0
                  • T
                    tenortim
                    last edited by

                    @KOM, yes I have backups, and, once I manually deleted the 9.2GB of rrd logs that ntopng had generated going back over a year, restoring the backup is easy.

                    @Gertjan, no that is not what happened. The OS didn't die. It was just fine. The system was up, just no longer operating correctly. What I think happened (I haven't looked at the code yet) was that the UI tried to overwrite the file and this failed part way with no space leaving a truncated file.

                    On a POSIX-compliant system (such as FreeBSD), it is entirely possible to do this in a safe way:

                    1. Create a new temporary file.
                    2. Write contents, checking the error return from write().
                    3. Call fsync() on the file and check the error return.
                    4. Close the file and check the error return.
                      If all of the above succeeds, you now have the new config written to stable storage and even if the OS crashes, that data is safe.

                    Finally, 5) rename the temp file over the config file.
                    Again, POSIX guarantees that rename() is atomic and that either the original file or the replacement file will exist regardless of whether the system crashes at any point during the rename call.

                    If we're already doing that, then there's an OS/filesystem bug. If we're not, then we're not updating safely and are susceptible to failure if/when the filesystem fills.

                    1 Reply Last reply Reply Quote 0
                    • KOMK
                      KOM
                      last edited by

                      For what it's worth, I have NEVER seen an operating system gracefully handle a full system disk. Not one. They all hang, barf or choke in one way or another.

                      T 1 Reply Last reply Reply Quote 0
                      • T
                        tenortim @KOM
                        last edited by

                        @KOM said in DHCP config is apparently not updated in a safe fashion:

                        For what it's worth, I have NEVER seen an operating system gracefully handle a full system disk. Not one. They all hang, barf or choke in one way or another.

                        I would agree with you there with the fine distinction that the kernel handles it just fine, but generally, userspace doesn't do so well. But hanging would have been benign. Truncating critical files, less so. And I'm painfully aware just how little userspace code is written to be highly resilient/safe in the face of errors (it's tedious and painful to do).

                        1 Reply Last reply Reply Quote 0
                        • KOMK
                          KOM
                          last edited by

                          I use Zabbix to monitor my infrastructure, and pfSense has Zabbix packages. It would notify you if the disk got below 10% free space, for instance. Not exactly an ideal fix for your issue but at least you would know you were getting close to full before a major incident happened.

                          T 1 Reply Last reply Reply Quote 0
                          • T
                            tenortim @KOM
                            last edited by

                            @KOM Thanks. That's a really great suggestion. We use Zabbix to monitor our infrastructure at my day job, and the infrastructure team seem happy with it. Time to roll it out at home!

                            1 Reply Last reply Reply Quote 0
                            • T
                              thestyledare Banned
                              last edited by

                              This post is deleted!
                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.