Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    /usr/local/etc/rc.d bugs

    Scheduled Pinned Locked Moved 2.0-RC Snapshot Feedback and Problems - RETIRED
    4 Posts 3 Posters 2.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • H
      Hawq
      last edited by

      I've mentioned some rc.d problems in other thread but decided to create new one as I've debugged problem a little more. Feel free to merge threads if you think they should be merged :-)

      The bugs are…

      1. Function restart_packages() from rc.newwanip doesn't do restart properly
      2. Packages aren't stopped on system reboot/halt
      3. Backup router forks lots of php causing OOM error and system failure

      Now... rc.newwanip has function restart_packages which calls rc.start_packages (in background). Restarting packages should first stop them waiting for processes to finish and then start them again. Unfortunately calling "stop" was removed from rc.start_packages long long time ago and placed in rc.stop_packages. Its fine, but rc.stop_packages is never called or grep is cheating me :-) Because of that few thing doesn't work as they should.

      1. User scripts may missbehave, ie. I've placed in /usr/localt/etc/rc.d simple script that launches tcpdump into background to log some traffic. Script has proper handling of start, stop and restart commands. After reboot I have lots and lots instances of tcpdump running because stop wasn't called before start when exectued from restart_packages().
      2. System services aren't restarted. Calling just "start" to restart service won't work if it checks if its already running. Thus no restart ever occurs for such services.
      3. Services aren't properly stopped on system reboot/halt. That means processes are simply killed at some point. This may lead to data corruption depending on services used and how they handle signals.
      4. Any service related operation (start/stop/restart) shoudn't be lanuched into background. If there are many interfaces used in system rc.start_packages is called in background many times. Even with proper implementation of start/stop it won't work because script processes will interfere with each other (one stopping, second starting, third stopping and so on).

      Here is little patch that fixed bugs #1 and #2: http://pastebin.com/ARVfDvLs

      Finally bug #3 about which I wrote in other thread, but back then I didn't knew reason of such behavior. Now after little research I know exactly what happens. My setup has total of 51 network interfaces reported by ifconfig. After each click on "Apply changes" in any part of pfSense web configurator on my master router following things happen on backup router:

      1. /usr/local/sbin/check_reload_status is run
      2. for each network interface /etc/rc.newwanip is called, by now we have 51 php processes spawned
      3. each copy of  /etc/rc.newwanip calls /etc/rc.start_packages shell script, we have 51 of those too but they quickly disappear because…
      4. each copy of /etc/rc.start_packages executes each *.sh file from /usr/local/etc/rc.d in background so each *.sh script is spawned 51 times almost simultaneously

      Lets say I'm working on adding aliases, rules, configuring VPNs etc. and lets say I'll click "Apply changes" 5 times in 5 minutes. I'll have 255 php processes which will launch each script from /usr/local/etc/rc.d 255 times. Usually thats enough to hog C2D CPU, eat 4 gigs of ram, 8 gigs of swap effectively killing machine.

      I'm wondering if sources of check_reload_status are available somewhere? In worst case (no fix in mainstream pfSense) I'll be able to fix this problem on my own.

      1 Reply Last reply Reply Quote 0
      • jimpJ
        jimp Rebel Alliance Developer Netgate
        last edited by

        The source for the check_reload_status binary is in the pfsense-tools repo.

        https://github.com/bsdperimeter/pfsense-tools/tree/master/pfPorts/check_reload_status/files

        Usually rc.newwanip only gets run on interfaces that have a gateway assigned, though I can see how that would have scaling issues in the case you are seeing. Ermal is the one who wrote the check_reload_status program, he'll probably have some ideas on a fix for this. He's traveling right now (returning from BSDCan), I'll open a ticket so this doesn't get lost.

        Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

        Need help fast? Netgate Global Support!

        Do not Chat/PM for help!

        1 Reply Last reply Reply Quote 0
        • jimpJ
          jimp Rebel Alliance Developer Netgate
          last edited by

          http://redmine.pfsense.org/issues/1534

          Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

          Need help fast? Netgate Global Support!

          Do not Chat/PM for help!

          1 Reply Last reply Reply Quote 0
          • S
            sullrich
            last edited by

            Thanks for digging into the code and such a detailed report.  It helps quite a bit.

            1 Reply Last reply Reply Quote 0
            • First post
              Last post
            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.