• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Package deletion then have to restart web configurator

2.1 Snapshot Feedback and Problems - RETIRED
3
25
8.2k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P
    phil.davis
    last edited by Jul 5, 2012, 5:44 PM

    I am now on nanobsd:
    2.1-BETA0 (i386)
    built on Thu Jul 5 02:36:34 EDT 2012

    After deleting a package I no longer get any response from the web interface. The package deletion finishes fine with the usual set of messages, for example here is a simple cron deletion:

    Backing up libraries... 
    Removing package...
    Removing Cron components...
    Tabs items... done.
    Menu items... done.
    Services... done.
    Loading package instructions...
    Deinstall commands... done.
    Removing package instructions...done.
    Auxiliary files... done.
    Package XML... done.
    Configuration... done.
    Cleaning up... 
    Package deleted.
    

    Any attempt to navigate to another page or type in the router address to start at the beginning again just times out.
    I go to a console session and restart the WebConfigurator and it works fine.
    Package installs do not have this problem.
    I can't see anything nasty in the system log or in /tmp/PHP-errors.log
    I have noticed this since the July builds, after the big rebuild when there was a 48-hour gap between snapshots. But I can't guarantee that that is exactly when it started.
    The problem is very easily reproducible on my system - just install something small and quick like cron, then delete it.
    I'm not sure exactly where to look to debug this one.

    As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
    If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

    1 Reply Last reply Reply Quote 0
    • J
      jimp Rebel Alliance Developer Netgate
      last edited by Jul 5, 2012, 7:39 PM

      I've been installing and uninstalling packages like crazy the last couple weeks and haven't hit anything like that yet.

      Sure that nothing shows up in the system log after that? (The log would be wiped on a reboot, so you'd have to check over SSH)

      Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

      Need help fast? Netgate Global Support!

      Do not Chat/PM for help!

      1 Reply Last reply Reply Quote 0
      • N
        Nachtfalke
        last edited by Jul 5, 2012, 8:39 PM

        And if it is always the same package then please post the package.

        1 Reply Last reply Reply Quote 0
        • P
          phil.davis
          last edited by Jul 6, 2012, 5:58 AM

          I have seen the problem with Cron, pfBlocker and bandwidthd - a couple of easy to install and remove packages. But the problem seems to be per-system. I have 6 Alix 2D13 nanobsd systems running 2.1, all on July builds. I went to every system and installed cron then removed it. 4 systems have the problem, 2 don't. All the systems have combinations of OpenVPN servers and clients doing site-to-site shared key OpenVPN. 2 systems are just ordinary WAN/LAN - both of those have the problem. 2 systems with WAN/OPT1/LAN have the problem, 2 systems with WAN/OPT1/LAN do not have the problem. The 4 systems with dual ISP access (WAN+OPT1) have default gateway switching on and gateway groups to feed LAN traffic into. In all 4 cases the OpenVPN clients are on LAN and go out the default gateway. All that stuff is working fine. I am struggling to see what is the common difference between systems with and without the problem.

          The system log does have nasty errors in it - here are 2 examples of the 4 systems that do this:

          Jul  6 05:14:21 ikp-rt-01 php: /pkg_mgr_install.php: The command '/usr/local/etc/rc.d/cron.sh stop' returned exit code '1', the output was 'kill: 59619: No such process'
          Jul  6 05:14:23 ikp-rt-01 check_reload_status: Syncing firewall
          Jul  6 05:14:25 ikp-rt-01 kernel: pid 27898 (php), uid 0: exited on signal 11 (core dumped)
          Jul  6 05:14:53 ikp-rt-01 check_reload_status: Reloading filter
          Jul  6 05:14:54 ikp-rt-01 check_reload_status: Reloading check_reload_status because it exited from an error!
          Jul  6 05:14:54 ikp-rt-01 kernel: pid 290 (check_reload_status), uid 0: exited on signal 11
          Jul  6 05:14:54 ikp-rt-01 kernel: pid 292 (check_reload_status), uid 0: exited on signal 11
          Jul  6 05:14:55 ikp-rt-01 kernel: pid 31913 (lighttpd), uid 0: exited on signal 11
          Jul  6 05:14:56 ikp-rt-01 kernel: pid 29092 (php), uid 0: exited on signal 11
          Jul  6 05:15:54 ikp-rt-01 kernel: pid 63445 (rrdtool), uid 0: exited on signal 11
          
          
          Jul  6 11:18:37 idp-rt-01 php: /pkg_mgr_install.php: The command '/usr/local/etc/rc.d/cron.sh stop' returned exit code '1', the output was 'kill: 3134: No such process'
          Jul  6 11:18:38 idp-rt-01 check_reload_status: Syncing firewall
          Jul  6 11:18:41 idp-rt-01 kernel: pid 11216 (php), uid 0: exited on signal 11
          Jul  6 11:19:16 idp-rt-01 check_reload_status: Reloading filter
          Jul  6 11:19:17 idp-rt-01 check_reload_status: Reloading check_reload_status because it exited from an error!
          Jul  6 11:19:17 idp-rt-01 kernel: pid 278 (check_reload_status), uid 0: exited on signal 11
          Jul  6 11:19:17 idp-rt-01 kernel: pid 280 (check_reload_status), uid 0: exited on signal 11
          Jul  6 11:19:18 idp-rt-01 kernel: pid 50054 (lighttpd), uid 0: exited on signal 11
          Jul  6 11:19:20 idp-rt-01 kernel: pid 50953 (php), uid 0: exited on signal 11
          Jul  6 11:19:48 idp-rt-01 kernel: pid 27191 (rrdtool), uid 0: exited on signal 11
          
          

          All the hardware is the same - 2GB CF cards, 256MB Alix boards. I can't believe that 1 have 4 out of 6 systems with a similar hardware issue.

          I removed all the OpenVPN and DynDNS stuff from my test router. It still has the problem! I guess I will flash another CF card for the test router and see how it behaves.

          If anyone else has experienced this issue I would love to hear about your logs and config.

          As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
          If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

          1 Reply Last reply Reply Quote 0
          • N
            Nachtfalke
            last edited by Jul 6, 2012, 9:11 AM

            Could this be a problem of read/write access on nanobsd systems in the install/deinstall routine ?

            But this would not explain why it works on some and not on others…

            1 Reply Last reply Reply Quote 0
            • J
              jimp Rebel Alliance Developer Netgate
              last edited by Jul 6, 2012, 4:41 PM

              Yeah that's definitely a nanobsd issue.

              I think there is still a ticket around for that very problem, happens on 2.0.x also, though I've only ever seen it after a firmware upgrade, not normally from installing a package.

              Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

              Need help fast? Netgate Global Support!

              Do not Chat/PM for help!

              1 Reply Last reply Reply Quote 0
              • P
                phil.davis
                last edited by Jul 30, 2012, 1:52 PM

                Got another example of this just now. The nanobsd system is running the snapshot from 20120727-1520.
                This system had never had any package installed. I installed blinkled, rebooted to test that the pbi db survived the reboot, then removed blinkled. Now the Web configurator does not respond. The system log has:

                Jul 30 17:56:51 test02 php: /index.php: Successful login for user 'admin' from: 192.168.12.10
                Jul 30 17:56:51 test02 php: /index.php: Successful login for user 'admin' from: 192.168.12.10
                Jul 30 18:10:23 test02 check_reload_status: Syncing firewall
                Jul 30 18:11:05 test02 check_reload_status: Syncing firewall
                Jul 30 18:11:46 test02 check_reload_status: Reloading filter
                Jul 30 18:11:48 test02 check_reload_status: Reloading check_reload_status because it exited from an error!
                Jul 30 18:11:48 test02 kernel: pid 283 (check_reload_status), uid 0: exited on signal 11
                Jul 30 18:11:48 test02 kernel: pid 285 (check_reload_status), uid 0: exited on signal 11
                Jul 30 18:11:49 test02 kernel: pid 24777 (lighttpd), uid 0: exited on signal 11
                Jul 30 18:11:49 test02 kernel: pid 2722 (blinkled), uid 0: exited on signal 11
                Jul 30 18:11:49 test02 kernel: pid 2252 (blinkled), uid 0: exited on signal 11
                Jul 30 18:11:53 test02 kernel: pid 25625 (php), uid 0: exited on signal 11
                Jul 30 18:12:01 test02 kernel: pid 5355 (rrdtool), uid 0: exited on signal 11
                

                I can't see anything in /var/log/* that logs the package removal - as an aside it might be nice if there was somewhere that package installation and removal is logged (maybe it is already).
                Here is what the system looks like now:

                [2.1-BETA0][admin@test02.homedomain]/root(2): ps aux
                USER     PID %CPU %MEM   VSZ   RSS  TT  STAT STARTED      TIME COMMAND
                root      10 96.0  0.0     0     8  ??  RL    5:55PM  96:39.89 [idle]
                root       0  0.0  0.0     0    56  ??  DLs   5:55PM   0:00.02 [kernel]
                root       1  0.0  0.2  1888   480  ??  ILs   5:55PM   0:00.15 /sbin/init --
                root       2  0.0  0.0     0     8  ??  DL    5:55PM   0:00.08 [g_event]
                root       3  0.0  0.0     0     8  ??  DL    5:55PM   0:00.70 [g_up]
                root       4  0.0  0.0     0     8  ??  DL    5:55PM   0:03.28 [g_down]
                root       5  0.0  0.0     0     8  ??  DL    5:55PM   0:00.00 [crypto]
                root       6  0.0  0.0     0     8  ??  DL    5:55PM   0:00.00 [crypto returns]
                root       7  0.0  0.0     0     8  ??  DL    5:55PM   0:00.08 [pfpurge]
                root       8  0.0  0.0     0     8  ??  DL    5:55PM   0:00.00 [xpt_thrd]
                root       9  0.0  0.0     0     8  ??  DL    5:55PM   0:00.01 [pagedaemon]
                root      11  0.0  0.0     0   104  ??  WL    5:55PM   0:35.10 [intr]
                root      12  0.0  0.0     0     8  ??  DL    5:55PM   0:00.00 [ng_queue]
                root      13  0.0  0.0     0     8  ??  DL    5:55PM   0:02.40 [yarrow]
                root      14  0.0  0.0     0    64  ??  DL    5:55PM   0:00.22 [usb]
                root      15  0.0  0.0     0     8  ??  DL    5:55PM   0:00.00 [vmdaemon]
                root      16  0.0  0.0     0     8  ??  DL    5:55PM   0:00.01 [idlepoll]
                root      17  0.0  0.0     0     8  ??  DL    5:55PM   0:00.00 [pagezero]
                root      18  0.0  0.0     0     8  ??  DL    5:55PM   0:00.04 [bufdaemon]
                root      19  0.0  0.0     0     8  ??  DL    5:55PM   0:00.21 [syncer]
                root      20  0.0  0.0     0     8  ??  DL    5:55PM   0:00.05 [vnlru]
                root      21  0.0  0.0     0     8  ??  DL    5:55PM   0:00.05 [softdepflush]
                root      29  0.0  0.0     0     8  ??  DL    5:55PM   0:00.35 [md0]
                root      35  0.0  0.0     0     8  ??  DL    5:55PM   0:01.00 [md1]
                root     158  0.0  0.8  4600  1920  ??  S     6:14PM   0:00.07 /usr/local/bin/rrdtool -
                root     303  0.0  0.9  3936  2256  ??  Is    5:55PM   0:00.01 /sbin/devd
                root    6401  0.0  1.3  5344  3032  ??  Is    5:55PM   0:00.01 /usr/sbin/sshd
                root    6847  0.0  0.5  3328  1268  ??  Is    5:55PM   0:00.03 /usr/local/sbin/dhcp6c -d -c /var/etc/dhcp6c_wan.conf vr1
                root    9854  0.0  0.4  3328  1000  ??  Is    5:56PM   0:00.00 /usr/local/bin/minicron 240 /var/run/ping_hosts.pid /usr/local/bin/ping_hosts.sh
                root   10087  0.0  0.4  3328  1044  ??  I     5:56PM   0:00.01 minicron: helper /usr/local/bin/ping_hosts.sh  (minicron)
                root   10373  0.0  0.4  3328  1000  ??  Is    5:56PM   0:00.00 /usr/local/bin/minicron 3600 /var/run/expire_accounts.pid /etc/rc.expireaccounts
                root   10723  0.0  0.4  3328  1044  ??  I     5:56PM   0:00.00 minicron: helper /etc/rc.expireaccounts  (minicron)
                root   10741  0.0  0.4  3328  1000  ??  Is    5:56PM   0:00.00 /usr/local/bin/minicron 86400 /var/run/update_alias_url_data.pid /etc/rc.update_alias_url_data
                root   11082  0.0  0.4  3328  1044  ??  I     5:56PM   0:00.00 minicron: helper /etc/rc.update_alias_url_data  (minicron)
                root   13766  0.0  0.9  4976  2284  ??  Ss    5:56PM   0:00.16 /usr/sbin/syslogd -c -c -l /var/dhcpd/var/run/log -f /var/etc/syslog.conf
                root   13976  0.0  0.5  3544  1188  ??  Is    5:56PM   0:00.02 /usr/local/sbin/sshlockout_pf 15
                root   14664  0.0  0.3  1576   784  ??  SN    7:38PM   0:00.00 sleep 60
                root   15162  0.0  1.3  5136  3080  ??  Ss    5:56PM   0:00.27 /usr/local/sbin/openvpn --config /var/etc/openvpn/client1.conf
                root   16302  0.0  0.6  3448  1356  ??  Is    5:56PM   0:00.01 /usr/sbin/inetd -wW -R 0 -a 127.0.0.1 /var/etc/inetd.conf
                root   19581  0.0  0.5  3328  1260  ??  Ss    5:56PM   0:00.80 /usr/local/sbin/apinger -c /var/etc/apinger.conf
                root   25020  0.0  6.7 38504 16144  ??  I     5:56PM   0:01.27 /usr/local/bin/php
                dhcpd  32970  0.0  2.1  8448  5164  ??  Ss    5:56PM   0:00.13 /usr/local/sbin/dhcpd -user dhcpd -group _dhcp -chroot /var/dhcpd -cf /etc/dhcpd.conf -pf /var/run/dhcpd.pid vr
                nobody 34877  0.0  0.9  5576  2236  ??  I     5:56PM   0:00.14 /usr/local/sbin/dnsmasq --local-ttl 1 --all-servers --rebind-localhost-ok --stop-dns-rebind --dns-forward-max=5
                root   47276  0.0  2.6  6132  6156  ??  SNs   5:56PM   0:00.62 /usr/local/bin/ntpd -g -c /var/etc/ntpd.conf
                root   57909  0.0  0.6  3420  1364  ??  Ss    5:56PM   0:00.02 /usr/sbin/cron -s
                root   59317  0.0  1.5  8096  3636  ??  Ss    7:37PM   0:00.31 sshd: admin@pts/0 (sshd)
                root   13821  0.0  0.9  5928  2204  u0- S     5:56PM   0:00.14 /usr/sbin/tcpdump -s 256 -v -l -n -e -ttt -i pflog0
                root   13900  0.0  0.7  3784  1668  u0  Is    5:56PM   0:00.03 login [pam] (login)
                root   13906  0.0  0.4  3328   892  u0- I     5:56PM   0:00.02 logger -t pf -p local0.info
                root   14265  0.0  0.6  3708  1356  u0  I     5:56PM   0:00.01 -sh (sh)
                root   16275  0.0  0.6  3708  1360  u0  I     5:56PM   0:00.02 /bin/sh /etc/rc.initial
                root   53737  0.0  1.0  4760  2388  u0  I+    6:22PM   0:00.06 /bin/tcsh
                root   56042  0.0  0.6  3708  1376  u0- SN    5:56PM   0:02.00 /bin/sh /var/db/rrd/updaterrd.sh
                root   14680  0.0  0.5  3468  1224   0  R+    7:38PM   0:00.01 ps aux
                root   60066  0.0  0.6  3708  1516   0  Ss    7:37PM   0:00.02 /bin/sh /etc/rc.initial
                root   63382  0.0  1.1  4760  2544   0  S     7:38PM   0:00.04 /bin/tcsh
                
                

                I will leave it this way for a while. If anyone has suggestions for more data to collect then I can do that. Or even find a way to let someone access it remotely if that would be any help to those who can try and track this down. You never know, if this is fixed then similar things that happen on the initial boot and package reinstall after an upgrade might also have the same fix?

                As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
                If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

                1 Reply Last reply Reply Quote 0
                • J
                  jimp Rebel Alliance Developer Netgate
                  last edited by Jul 30, 2012, 2:34 PM Jul 30, 2012, 2:33 PM

                  Been chasing that for years… never have been able to narrow it down myself.

                  Since the mount ro/rw calls use a shared memory reference, perhaps some digging with ipcs might help:

                  fetch -o /usr/bin/ipcs http://files.chi.pfsense.org/jimp/ipcs.i386
                  chmod a+x /usr/bin/ipcs
                  rehash
                  

                  And then check the output of:

                  
                  ipcs -m
                  ipcs -pt
                  ipcs -T
                  

                  Might not hurt to compare the output from that when it's running normally (/ is read-only, after a reboot with no pkg operations) and then again when it's in the crashy state.

                  Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                  Need help fast? Netgate Global Support!

                  Do not Chat/PM for help!

                  1 Reply Last reply Reply Quote 0
                  • P
                    phil.davis
                    last edited by Jul 30, 2012, 3:06 PM

                    For the record. here is the output in the broken state:

                    [2.1-BETA0][admin@test02.homedomain]/root(14): ipcs -m
                    Shared Memory:
                    T           ID          KEY MODE        OWNER    GROUP
                    m        65536         1000 --rw-r--r-- root     wheel
                    
                    [2.1-BETA0][admin@test02.homedomain]/root(15): ipcs -pt
                    Message Queues:
                    T           ID          KEY MODE        OWNER    GROUP           LSPID        LRPID STIME    RTIME    CTIME
                    
                    Shared Memory:
                    T           ID          KEY MODE        OWNER    GROUP            CPID         LPID ATIME    DTIME    CTIME
                    m        65536         1000 --rw-r--r-- root     wheel             288        20297 20:38:18 20:38:18 17:55:38
                    
                    Semaphores:
                    T           ID          KEY MODE        OWNER    GROUP    OTIME    CTIME
                    
                    [2.1-BETA0][admin@test02.homedomain]/root(16): ipcs -T
                    msginfo:
                            msgmax:        16384    (max characters in a message)
                            msgmni:           40    (# of message queues)
                            msgmnb:         2048    (max characters in a message queue)
                            msgtql:           40    (max # of messages in system)
                            msgssz:            8    (size of a message segment)
                            msgseg:         2048    (# of message segments in system)
                    
                    shminfo:
                            shmmax:     33554432    (max shared memory segment size)
                            shmmin:            1    (min shared memory segment size)
                            shmmni:          192    (max number of shared memory identifiers)
                            shmseg:          128    (max shared memory segments per process)
                            shmall:         8192    (max amount of shared memory in pages)
                    
                    seminfo:
                            semmap:           30    (# of entries in semaphore map)
                            semmni:           10    (# of semaphore identifiers)
                            semmns:           60    (# of semaphores in system)
                            semmnu:           30    (# of undo structures in system)
                            semmsl:           60    (max # of semaphores per id)
                            semopm:          100    (max # of operations per semop call)
                            semume:           10    (max # of undo entries per process)
                            semusz:          136    (size in bytes of undo structure)
                            semvmx:        32767    (semaphore maximum value)
                            semaem:        16384    (adjust on exit max value)
                    
                    

                    Now to contemplate what to try/collect next before rebooting and real-time data is gone.

                    As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
                    If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

                    1 Reply Last reply Reply Quote 0
                    • P
                      phil.davis
                      last edited by Jul 30, 2012, 7:13 PM Jul 30, 2012, 3:21 PM

                      It seems odd that blinkled appears in the list of processes that sig11:

                      Jul 30 18:11:49 test02 kernel: pid 2722 (blinkled), uid 0: exited on signal 11
                      Jul 30 18:11:49 test02 kernel: pid 2252 (blinkled), uid 0: exited on signal 11
                      

                      To me, that means either:
                      a) The sig11 is happening before blinkled is stopped and the package removed; or
                      b) The blinkled package is getting removed, but the process/es are not stopped first - somehow having the proverbial rug pulled from under them, their executable disappears from storage while they are running in memory.
                      Edit: From status services I couldn't get blinkled to stop and start, so there seems to be some problem with the stop/start code for this package.

                      As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
                      If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

                      1 Reply Last reply Reply Quote 0
                      • P
                        phil.davis
                        last edited by Jul 30, 2012, 7:22 PM Jul 30, 2012, 5:39 PM

                        Here is 1 problem with the shared memory reference count implementation in /etc/inc/util.inc
                        The data in the shared memory is actually string data. The PHP routine shmop_write just writes a string to the shared memory, with no null terminator or anything. 0 becomes "0" in the first byte, followed by whatever happened to be in the following bytes of memory. If the reference count goes from 9 to 10, the memory goes from "9 " to "10". When it is decremented, 10-1=9, "9" goes in the first byte, the second byte stays "0". Next time the value is looked at, it returns "90"! This is a recipe for getting the reference count wrong and thus it never returns to zero. When that happens, conf_mount_ro will not actually re-mount read-only - it only switches back to read-only when the reference count goes back to zero.
                        The attached bit of code calls refcount_reference and refcount_unreference in a way that demonstrates the problem. You can just save it somewhere and run it from the command line to get:

                        [2.1-BETA0][admin@test02.homedomain]/var/log(117): php shmop_demo.php
                        Content-type: text/html
                        
                        refcount_read: 0
                        refcount_reference: 1
                        refcount_read: 1
                        refcount_reference: 2
                        refcount_read: 2
                        refcount_reference: 3
                        refcount_read: 3
                        refcount_reference: 4
                        refcount_read: 4
                        refcount_reference: 5
                        refcount_read: 5
                        refcount_reference: 6
                        refcount_read: 6
                        refcount_reference: 7
                        refcount_read: 7
                        refcount_reference: 8
                        refcount_read: 8
                        refcount_reference: 9
                        refcount_read: 9
                        refcount_reference: 10
                        refcount_read: 10
                        refcount_unreference:
                        refcount_read: 90
                        refcount_reference: 91
                        refcount_read: 91
                        refcount_reference: 92
                        refcount_read: 92
                        refcount_reference: 93
                        refcount_read: 93
                        refcount_reference: 94
                        refcount_read: 94
                        refcount_reference: 95
                        refcount_read: 95
                        refcount_reference: 96
                        refcount_read: 96
                        refcount_reference: 97
                        refcount_read: 97
                        refcount_reference: 98
                        refcount_read: 98
                        refcount_reference: 99
                        refcount_read: 99
                        refcount_reference: 100
                        refcount_read: 100
                        refcount_unreference:
                        refcount_read: 990
                        refcount_reference: 991
                        refcount_read: 991
                        refcount_reference: 992
                        refcount_read: 992
                        refcount_reference: 993
                        refcount_read: 993
                        refcount_reference: 994
                        refcount_read: 994
                        refcount_reference: 995
                        refcount_read: 995
                        refcount_reference: 996
                        refcount_read: 996
                        refcount_reference: 997
                        refcount_read: 997
                        refcount_reference: 998
                        refcount_read: 998
                        refcount_reference: 999
                        refcount_read: 999
                        refcount_reference: 1000
                        refcount_read: 1000
                        refcount_unreference:
                        refcount_read: 9990
                        
                        

                        There is 10 bytes allocated for this shared memory section, I guess other bad things happen after this sort of process causes the ref count to exceed 10 digits. Also the initialisation just puts a "0" in the first byte. I think there is the potential for the other 9 bytes to be random rubbish at startup? Which could cause really hard to track and reproduce problems.
                        I'll have a go at fixing this up better - it might solve the issues with nanobsd systems getting left in RW after package installs etc. I can't see how it would solve the slow switching back to RO though, but who knows?
                        Edit: I rebooted and monitored the ref count in memory while installing and removing blinkled - it just went 0, 1, 2, 1, 0. But the same problem with a bunch of sig11 process exits. So I don't think fixing the ref count issue described here will fix the sig11 exits.

                        shmop_demo.txt

                        As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
                        If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

                        1 Reply Last reply Reply Quote 0
                        • J
                          jimp Rebel Alliance Developer Netgate
                          last edited by Jul 30, 2012, 7:29 PM

                          Well I committed a fix for that, but it didn't help the crashes.

                          https://github.com/bsdperimeter/pfsense/commit/a9f250d6a3372404cb7adb9c6d870eb085f566d0
                          https://github.com/bsdperimeter/pfsense/commit/780705e9b8058130fa6b9e15dcca46f85df23395

                          Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                          Need help fast? Netgate Global Support!

                          Do not Chat/PM for help!

                          1 Reply Last reply Reply Quote 0
                          • P
                            phil.davis
                            last edited by Jul 30, 2012, 9:20 PM

                            I applied your changes and they fixup the ref count issue if it gets big. But the problem is still there - from my monitoring the ref count only goes to 2 anyway.
                            Interestingly, I was using a little script to show me the ref count, and while removing blinkled it even gave me a segmentation fault. Also, the blinkled processes hung around a long time after the removal script had said that the pbi was deleted, I guess running in memory without looking to read pages out of the file on the CF card. They even survived after the segmentation fault that I got interactively.

                            [2.1-BETA0][root@test02.homedomain]/var/log(14): php rr.php
                            Content-type: text/html
                            
                            refcount_read: 0001
                            [2.1-BETA0][root@test02.homedomain]/var/log(16): ps ax | grep blink
                             5588  ??  Ss     0:36.21 /usr/local/bin/blinkled -i vr0 -l /dev/led/led2
                             5876  ??  Ss     0:36.21 /usr/local/bin/blinkled -i vr1 -l /dev/led/led3
                            62799  u0  S+     0:00.01 grep blink
                            [2.1-BETA0][root@test02.homedomain]/var/log(17): php rr.php
                            Segmentation fault
                            [2.1-BETA0][root@test02.homedomain]/var/log(18): ps ax | grep blink
                             5588  ??  Ss     0:37.99 /usr/local/bin/blinkled -i vr0 -l /dev/led/led2
                             5876  ??  Ss     0:38.00 /usr/local/bin/blinkled -i vr1 -l /dev/led/led3
                            12199  u0  S+     0:00.01 grep blink
                            [2.1-BETA0][root@test02.homedomain]/var/log(19): ps ax | grep blink
                            13010  u0  S+     0:00.01 grep blink
                            [2.1-BETA0][root@test02.homedomain]/var/log(20): php rr.php
                            Content-type: text/html
                            
                            refcount_read: 0000
                            
                            

                            As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
                            If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

                            1 Reply Last reply Reply Quote 0
                            • P
                              phil.davis
                              last edited by Jul 30, 2012, 9:30 PM

                              In /etc/inc/pkg-utils.inc function uninstall_package I commented out:

                              	//	exec("/usr/bin/tar xzPfU /tmp/pkg_libs.tgz -C /");
                              	//	exec("/usr/bin/tar xzPfU /tmp/pkg_bins.tgz -C /");
                              	//	@unlink("/tmp/pkg_libs.tgz");
                              	//	@unlink("/tmp/pkg_bins.tgz");
                              
                              

                              This stuff is backed up earlier in the routine and then restored for some reason.
                              But it includes things like /usr/local/lib/php/20090626 which has a bunch of "so" files related to php.
                              With this restore commented out, I don't get the sig11 crashes, the web configurator stays available.
                              I wonder why lots of stuff from /usr/local/lib is being backed up and restored during every package uninstall?
                              The "so" files would go missing for a moment as "tar" deletes the original on disk and then restores the one from backup.

                              As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
                              If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

                              1 Reply Last reply Reply Quote 0
                              • J
                                jimp Rebel Alliance Developer Netgate
                                last edited by Jul 30, 2012, 9:53 PM

                                That was next on my list of things to try but had some customers to attend to.

                                The libraries are backed up in case a package removed a file that was required for the system to function properly. The restore process could probably be tweaked a bit somehow though.

                                Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                Need help fast? Netgate Global Support!

                                Do not Chat/PM for help!

                                1 Reply Last reply Reply Quote 0
                                • P
                                  phil.davis
                                  last edited by Jul 31, 2012, 9:43 AM

                                  I made a couple of pull requests, which Seth has committed, to make the reference count in the shared memory section even more robust - locking it while it is incremented and decremented. Hopefully now as long as all code does actually call conf_mount_rw folowed by conf_mount_ro after it has finished changing stuff, the filesystem will always end up read-only again on nanobsd.
                                  Also tidied up an unnecessary rw then ro mount in pkg-utils which was slowing things down every time the user navigated to System:Packages and it listed Installed Packages.
                                  This is all good stuff for nanobsd in general, but it doesn't fix the sig11. To fix that I still need to comment out the file restore from pkg-utils.inc:

                                  	//	exec("/usr/bin/tar xzPfU /tmp/pkg_libs.tgz -C /");
                                  	//	exec("/usr/bin/tar xzPfU /tmp/pkg_bins.tgz -C /");
                                  	//	@unlink("/tmp/pkg_libs.tgz");
                                  	//	@unlink("/tmp/pkg_bins.tgz");
                                  

                                  Now it needs someone like JimP who has the big picture, to work out how best to trim down this restore - maybe can do a comparison, work out if anything essential has gone missing, and only put the missing things back?

                                  As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
                                  If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

                                  1 Reply Last reply Reply Quote 0
                                  • J
                                    jimp Rebel Alliance Developer Netgate
                                    last edited by Jul 31, 2012, 5:50 PM

                                    I'm going over that with the other devs.

                                    The problem is likely the U flag to tar, which unlinks files before restoring. I wager if that were removing, and even better if "k" was put in its place, it may behave better.

                                    "k" would cause it to keep existing files, and since this is restoring a backup after files were removed, that seems to make more sense to me. I'm just not sure if there are any edge cases I'm forgetting that required U to restore properly.

                                    If the pkg uninstall corrupted the file, U would be better… but I'm not sure if that was one of the reasons.

                                    Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                    Need help fast? Netgate Global Support!

                                    Do not Chat/PM for help!

                                    1 Reply Last reply Reply Quote 0
                                    • J
                                      jimp Rebel Alliance Developer Netgate
                                      last edited by Jul 31, 2012, 6:52 PM

                                      A quick test of this on NanoBSD is promising. Without U, but with k (just replaced U with k in the tar command), the GUI is fine and everything seems happy.

                                      The blinkled process still crashed but since it was uninstalled I expected as much.

                                      Not sure how that might help/hurt a full install. I've got a discussion started with the other devs to see if I'm missing anything there. More testing would be appreciated before I commit it.

                                      Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                      Need help fast? Netgate Global Support!

                                      Do not Chat/PM for help!

                                      1 Reply Last reply Reply Quote 0
                                      • P
                                        phil.davis
                                        last edited by Aug 1, 2012, 6:21 AM

                                        Good news - I did various install/uninstalls with the new code and there are no problems. webConfigurator keeps running fine, no sig11 (apart from blinkled, which is a different issue).

                                        Installed the following:
                                        blinkled
                                        openvpn client export utility
                                        pfblocker
                                        squid3

                                        then did a firmware upgrade to the build that just finished:
                                        2.1-BETA0 (i386)
                                        built on Tue Jul 31 19:07:11 EDT 2012
                                        FreeBSD 8.3-RELEASE-p3

                                        All went well. It reinstalled all the packages and the webConfigurator stayed up. After the package installs were finished all services were running, openvpn link up, web browsing from client… working.

                                        During the package installs it does a package removal first for each package. That now spews out a lot of "file exists" messages from the tar restore with k option, but it works.
                                        I guess the package removal is just in case, but if the code could detect that it is a package install from the first boot after a firmware upgrade then it could know that the package removal step is not needed.

                                        I have attached a log of the serial console output for the record.

                                        bootlog-upgrade-2012-08-01.txt

                                        As the Greek philosopher Isosceles used to say, "There are 3 sides to every triangle."
                                        If I helped you, then help someone else - buy someone a gift from the INF catalog http://secure.inf.org/gifts/usd/

                                        1 Reply Last reply Reply Quote 0
                                        • J
                                          jimp Rebel Alliance Developer Netgate
                                          last edited by Aug 1, 2012, 12:23 PM

                                          Looks like we might just need "2>/dev/null" at the end of the tar command to silence the errors.

                                          If you want to test that, edit the command(s) in pkg-utils.inc and add it, and then:

                                          touch /conf/needs_package_sync
                                          

                                          Then reboot, that'll make it do the pkg reinstall when it boots back up.

                                          Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                          Need help fast? Netgate Global Support!

                                          Do not Chat/PM for help!

                                          1 Reply Last reply Reply Quote 0
                                          • First post
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.