Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Package deletion then have to restart web configurator

    Scheduled Pinned Locked Moved 2.1 Snapshot Feedback and Problems - RETIRED
    25 Posts 3 Posters 8.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • P Offline
      phil.davis
      last edited by

      Here is 1 problem with the shared memory reference count implementation in /etc/inc/util.inc
      The data in the shared memory is actually string data. The PHP routine shmop_write just writes a string to the shared memory, with no null terminator or anything. 0 becomes "0" in the first byte, followed by whatever happened to be in the following bytes of memory. If the reference count goes from 9 to 10, the memory goes from "9 " to "10". When it is decremented, 10-1=9, "9" goes in the first byte, the second byte stays "0". Next time the value is looked at, it returns "90"! This is a recipe for getting the reference count wrong and thus it never returns to zero. When that happens, conf_mount_ro will not actually re-mount read-only - it only switches back to read-only when the reference count goes back to zero.
      The attached bit of code calls refcount_reference and refcount_unreference in a way that demonstrates the problem. You can just save it somewhere and run it from the command line to get:

      [2.1-BETA0][admin@test02.homedomain]/var/log(117): php shmop_demo.php
      Content-type: text/html
      
      refcount_read: 0
      refcount_reference: 1
      refcount_read: 1
      refcount_reference: 2
      refcount_read: 2
      refcount_reference: 3
      refcount_read: 3
      refcount_reference: 4
      refcount_read: 4
      refcount_reference: 5
      refcount_read: 5
      refcount_reference: 6
      refcount_read: 6
      refcount_reference: 7
      refcount_read: 7
      refcount_reference: 8
      refcount_read: 8
      refcount_reference: 9
      refcount_read: 9
      refcount_reference: 10
      refcount_read: 10
      refcount_unreference:
      refcount_read: 90
      refcount_reference: 91
      refcount_read: 91
      refcount_reference: 92
      refcount_read: 92
      refcount_reference: 93
      refcount_read: 93
      refcount_reference: 94
      refcount_read: 94
      refcount_reference: 95
      refcount_read: 95
      refcount_reference: 96
      refcount_read: 96
      refcount_reference: 97
      refcount_read: 97
      refcount_reference: 98
      refcount_read: 98
      refcount_reference: 99
      refcount_read: 99
      refcount_reference: 100
      refcount_read: 100
      refcount_unreference:
      refcount_read: 990
      refcount_reference: 991
      refcount_read: 991
      refcount_reference: 992
      refcount_read: 992
      refcount_reference: 993
      refcount_read: 993
      refcount_reference: 994
      refcount_read: 994
      refcount_reference: 995
      refcount_read: 995
      refcount_reference: 996
      refcount_read: 996
      refcount_reference: 997
      refcount_read: 997
      refcount_reference: 998
      refcount_read: 998
      refcount_reference: 999
      refcount_read: 999
      refcount_reference: 1000
      refcount_read: 1000
      refcount_unreference:
      refcount_read: 9990
      
      

      There is 10 bytes allocated for this shared memory section, I guess other bad things happen after this sort of process causes the ref count to exceed 10 digits. Also the initialisation just puts a "0" in the first byte. I think there is the potential for the other 9 bytes to be random rubbish at startup? Which could cause really hard to track and reproduce problems.
      I'll have a go at fixing this up better - it might solve the issues with nanobsd systems getting left in RW after package installs etc. I can't see how it would solve the slow switching back to RO though, but who knows?
      Edit: I rebooted and monitored the ref count in memory while installing and removing blinkled - it just went 0, 1, 2, 1, 0. But the same problem with a bunch of sig11 process exits. So I don't think fixing the ref count issue described here will fix the sig11 exits.

      shmop_demo.txt

      As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
      If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

      1 Reply Last reply Reply Quote 0
      • jimpJ Offline
        jimp Rebel Alliance Developer Netgate
        last edited by

        Well I committed a fix for that, but it didn't help the crashes.

        https://github.com/bsdperimeter/pfsense/commit/a9f250d6a3372404cb7adb9c6d870eb085f566d0
        https://github.com/bsdperimeter/pfsense/commit/780705e9b8058130fa6b9e15dcca46f85df23395

        Remember: Upvote with the šŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

        Need help fast? Netgate Global Support!

        Do not Chat/PM for help!

        1 Reply Last reply Reply Quote 0
        • P Offline
          phil.davis
          last edited by

          I applied your changes and they fixup the ref count issue if it gets big. But the problem is still there - from my monitoring the ref count only goes to 2 anyway.
          Interestingly, I was using a little script to show me the ref count, and while removing blinkled it even gave me a segmentation fault. Also, the blinkled processes hung around a long time after the removal script had said that the pbi was deleted, I guess running in memory without looking to read pages out of the file on the CF card. They even survived after the segmentation fault that I got interactively.

          [2.1-BETA0][root@test02.homedomain]/var/log(14): php rr.php
          Content-type: text/html
          
          refcount_read: 0001
          [2.1-BETA0][root@test02.homedomain]/var/log(16): ps ax | grep blink
           5588Ā  ??Ā  SsĀ  Ā   0:36.21 /usr/local/bin/blinkled -i vr0 -l /dev/led/led2
           5876Ā  ??Ā  SsĀ  Ā   0:36.21 /usr/local/bin/blinkled -i vr1 -l /dev/led/led3
          62799Ā  u0Ā  S+Ā  Ā   0:00.01 grep blink
          [2.1-BETA0][root@test02.homedomain]/var/log(17): php rr.php
          Segmentation fault
          [2.1-BETA0][root@test02.homedomain]/var/log(18): ps ax | grep blink
           5588Ā  ??Ā  SsĀ  Ā   0:37.99 /usr/local/bin/blinkled -i vr0 -l /dev/led/led2
           5876Ā  ??Ā  SsĀ  Ā   0:38.00 /usr/local/bin/blinkled -i vr1 -l /dev/led/led3
          12199Ā  u0Ā  S+Ā  Ā   0:00.01 grep blink
          [2.1-BETA0][root@test02.homedomain]/var/log(19): ps ax | grep blink
          13010Ā  u0Ā  S+Ā  Ā   0:00.01 grep blink
          [2.1-BETA0][root@test02.homedomain]/var/log(20): php rr.php
          Content-type: text/html
          
          refcount_read: 0000
          
          

          As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
          If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

          1 Reply Last reply Reply Quote 0
          • P Offline
            phil.davis
            last edited by

            In /etc/inc/pkg-utils.inc function uninstall_package I commented out:

            	//	exec("/usr/bin/tar xzPfU /tmp/pkg_libs.tgz -C /");
            	//	exec("/usr/bin/tar xzPfU /tmp/pkg_bins.tgz -C /");
            	//	@unlink("/tmp/pkg_libs.tgz");
            	//	@unlink("/tmp/pkg_bins.tgz");
            
            

            This stuff is backed up earlier in the routine and then restored for some reason.
            But it includes things like /usr/local/lib/php/20090626 which has a bunch of "so" files related to php.
            With this restore commented out, I don't get the sig11 crashes, the web configurator stays available.
            I wonder why lots of stuff from /usr/local/lib is being backed up and restored during every package uninstall?
            The "so" files would go missing for a moment as "tar" deletes the original on disk and then restores the one from backup.

            As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
            If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

            1 Reply Last reply Reply Quote 0
            • jimpJ Offline
              jimp Rebel Alliance Developer Netgate
              last edited by

              That was next on my list of things to try but had some customers to attend to.

              The libraries are backed up in case a package removed a file that was required for the system to function properly. The restore process could probably be tweaked a bit somehow though.

              Remember: Upvote with the šŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

              Need help fast? Netgate Global Support!

              Do not Chat/PM for help!

              1 Reply Last reply Reply Quote 0
              • P Offline
                phil.davis
                last edited by

                I made a couple of pull requests, which Seth has committed, to make the reference count in the shared memory section even more robust - locking it while it is incremented and decremented. Hopefully now as long as all code does actually call conf_mount_rw folowed by conf_mount_ro after it has finished changing stuff, the filesystem will always end up read-only again on nanobsd.
                Also tidied up an unnecessary rw then ro mount in pkg-utils which was slowing things down every time the user navigated to System:Packages and it listed Installed Packages.
                This is all good stuff for nanobsd in general, but it doesn't fix the sig11. To fix that I still need to comment out the file restore from pkg-utils.inc:

                	//	exec("/usr/bin/tar xzPfU /tmp/pkg_libs.tgz -C /");
                	//	exec("/usr/bin/tar xzPfU /tmp/pkg_bins.tgz -C /");
                	//	@unlink("/tmp/pkg_libs.tgz");
                	//	@unlink("/tmp/pkg_bins.tgz");
                

                Now it needs someone like JimP who has the big picture, to work out how best to trim down this restore - maybe can do a comparison, work out if anything essential has gone missing, and only put the missing things back?

                As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
                If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

                1 Reply Last reply Reply Quote 0
                • jimpJ Offline
                  jimp Rebel Alliance Developer Netgate
                  last edited by

                  I'm going over that with the other devs.

                  The problem is likely the U flag to tar, which unlinks files before restoring. I wager if that were removing, and even better if "k" was put in its place, it may behave better.

                  "k" would cause it to keep existing files, and since this is restoring a backup after files were removed, that seems to make more sense to me. I'm just not sure if there are any edge cases I'm forgetting that required U to restore properly.

                  If the pkg uninstall corrupted the file, U would be better… but I'm not sure if that was one of the reasons.

                  Remember: Upvote with the šŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                  Need help fast? Netgate Global Support!

                  Do not Chat/PM for help!

                  1 Reply Last reply Reply Quote 0
                  • jimpJ Offline
                    jimp Rebel Alliance Developer Netgate
                    last edited by

                    A quick test of this on NanoBSD is promising. Without U, but with k (just replaced U with k in the tar command), the GUI is fine and everything seems happy.

                    The blinkled process still crashed but since it was uninstalled I expected as much.

                    Not sure how that might help/hurt a full install. I've got a discussion started with the other devs to see if I'm missing anything there. More testing would be appreciated before I commit it.

                    Remember: Upvote with the šŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                    Need help fast? Netgate Global Support!

                    Do not Chat/PM for help!

                    1 Reply Last reply Reply Quote 0
                    • P Offline
                      phil.davis
                      last edited by

                      Good news - I did various install/uninstalls with the new code and there are no problems. webConfigurator keeps running fine, no sig11 (apart from blinkled, which is a different issue).

                      Installed the following:
                      blinkled
                      openvpn client export utility
                      pfblocker
                      squid3

                      then did a firmware upgrade to the build that just finished:
                      2.1-BETA0 (i386)
                      built on Tue Jul 31 19:07:11 EDT 2012
                      FreeBSD 8.3-RELEASE-p3

                      All went well. It reinstalled all the packages and the webConfigurator stayed up. After the package installs were finished all services were running, openvpn link up, web browsing from client… working.

                      During the package installs it does a package removal first for each package. That now spews out a lot of "file exists" messages from the tar restore with k option, but it works.
                      I guess the package removal is just in case, but if the code could detect that it is a package install from the first boot after a firmware upgrade then it could know that the package removal step is not needed.

                      I have attached a log of the serial console output for the record.

                      bootlog-upgrade-2012-08-01.txt

                      As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
                      If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

                      1 Reply Last reply Reply Quote 0
                      • jimpJ Offline
                        jimp Rebel Alliance Developer Netgate
                        last edited by

                        Looks like we might just need "2>/dev/null" at the end of the tar command to silence the errors.

                        If you want to test that, edit the command(s) in pkg-utils.inc and add it, and then:

                        touch /conf/needs_package_sync
                        

                        Then reboot, that'll make it do the pkg reinstall when it boots back up.

                        Remember: Upvote with the šŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                        Need help fast? Netgate Global Support!

                        Do not Chat/PM for help!

                        1 Reply Last reply Reply Quote 0
                        • P Offline
                          phil.davis
                          last edited by

                          Ermal has already committed a change to use mwexec() instead of plain exec() - that should stop the loads of console output. He also fixed up the %age display on the console so it doesn't keep spewing out all across the line. I'll give those a try.

                          As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
                          If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

                          1 Reply Last reply Reply Quote 0
                          • jimpJ Offline
                            jimp Rebel Alliance Developer Netgate
                            last edited by

                            ok, I've been a bit busy this morning so I didn't review the commit log yet.

                            Remember: Upvote with the šŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                            Need help fast? Netgate Global Support!

                            Do not Chat/PM for help!

                            1 Reply Last reply Reply Quote 0
                            • jimpJ Offline
                              jimp Rebel Alliance Developer Netgate
                              last edited by

                              I did a gitsync on my ALIX and ran the test I mentioned and it worked fine with the latest changes Ermal made, so this may be all solved now. (knock on wood)

                              Remember: Upvote with the šŸ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                              Need help fast? Netgate Global Support!

                              Do not Chat/PM for help!

                              1 Reply Last reply Reply Quote 0
                              • P Offline
                                phil.davis
                                last edited by

                                Upgraded to:
                                2.1-BETA0 (i386)
                                built on Wed Aug 1 16:50:12 EDT 2012
                                FreeBSD 8.3-RELEASE-p3

                                Copy of console output is attached - cleaner than before. If I am feeling OCD I will find the appropriate places to put "\n" so that the %age output doesn't scribble over the beginning of a line of text.

                                Everything reinstalled fine - blinkled, openvpn, pfblocker, squid3

                                Note: squid3 left the mount ref count at 2 after its install. Thus the mount points do not go back to read-only. It has missing and mis-placed conf_mount_ro() calls. I have submitted a separate pull request to fix that. Not an issue for this thread.

                                bootlog-upgrade-2012-08-02.txt

                                As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
                                If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

                                1 Reply Last reply Reply Quote 0
                                • P Offline
                                  phil.davis
                                  last edited by

                                  Yep, I did feel OCD! I have put in a couple of pull requests to tidy up the console output some more. Everything functional seems fine. I'll see how the console output looks tomorrow after the next snapshot/upgrade/package install sequence.
                                  After the conf_mount_ro fixups for squid and squid3 it should also leave the filesystem mounted read-only when finished.

                                  As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
                                  If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post
                                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.