Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    25.07 unbound - pfblocker - python - syslog

    Scheduled Pinned Locked Moved General pfSense Questions
    39 Posts 4 Posters 1.1k Views 5 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J Offline
      jrey @jrey
      last edited by

      said in 25.07 unbound - pfblocker - python - syslog:

      Everything ... I can add this to the list of things to try however, Thanks

      @Gertjan @stephenw10

      WTH - (I may have used stronger language here)

      but selecting "Everything" and restarting syslogd and unbound seems to have stopped the duplicates.

      So now the question is why ? The previously unselected items on the list are not even running.. (their associated log files are still all zero bytes. ). Must be in the syslog handler init (order things start maybe?) or top level when everything is selected vs individual items and you end up with a duplicate handler ?

      and since that seemed to work, I've shut the syslog server down yet again to see when it comes back up is the netgate resumes without manual intervention on syslog and unbound.
      the answer to that is apparently no - because the syslogd service died

      Screenshot 2025-08-08 at 12.31.18 PM.png

      required a restart of both the syslogd and unbound services to make things "normal" again

      1 Reply Last reply Reply Quote 0
      • stephenw10S Online
        stephenw10 Netgate Administrator
        last edited by

        Hmm, WTH indeed!

        Good clue though.

        Seems like that's probably independent of syslogd dying locally. 🤔

        1 Reply Last reply Reply Quote 0
        • J Offline
          jrey @stephenw10
          last edited by

          @stephenw10 said in 25.07 unbound - pfblocker - python - syslog:

          It still logs locally though? Just not sending to the remote syslog server?

          yes - still locally - local logging hasn't changed or stopped

          yes not sending to syslog (ie didn't recover after the syslog server came back up)

          after switching to "everything". the duplicates have stopped, but syslogd just outright died again with I took the syslog server down

          I restarted both syslogd and unbound still no duplicates at this point.

          something is still not playing nice together. syslog itself or syslog/unbound

          and looking at the live stream of data coming into the syslog server
          I suddenly remembered why I have turned off "everything" and selected the individual services I had

          I'm now seeing a /usr/sbin/cron message for newsyslog (checking every minute if it need to roll local files over ) and the application nginx is sending it's log file when I navigate the any web page.

          So "everything" likely includes some things that are not included/available in individual selection.

          changing anything on the selection list and hitting save, requires a manual restart of unbound looking at the live stream on the syslog server other events are still coming through

          I temporarily changed the setting from Everything back to my list so I could grab the appropriate pfSense.conf file that is generated for syslog.d
          and then changed it back to Everything -

          I was looking for a specific reference to either cron and/or nginx logging - leaving the system on the dashboard is currently flooding the syslog stream with the everything being done. But hey no duplicates.

          Still looking for clues as to why it might hang/die if the remote server goes off line. And if I could specifically stop the nginx messaging that would be cool,

          pondering which is worse actually --- duplicate messages from filterdns/unbound or 100's of nginx messages / minute I will simply never use. sitting on the dashboard = bad, sitting on any static page = no messages
          the /usr/sbin/cron (messages) are 1 per minute minumum more if other cron jobs run. /usr/sbin/newsyslog every minute seems a bit aggressive to determine if log files need to roll over.

          the hanging/dead syslogd is still going to be a problem. can likely "fix" that temporarily with a monitor on the service -

          in hind sight I now say to myself you had to push upgrade didn't you. The system was working perfectly with 255+ days of uptime and zero issues. The other side says where is the fun in that.

          J P 2 Replies Last reply Reply Quote 1
          • J Offline
            jrey @jrey
            last edited by

            @stephenw10

            For what it is worth

            After sitting turned off for almost 260 days, I started and upgraded a virtual from 2.7.x install to 2.8

            the same "duplicate" syslog problem exists there.
            when I had the same individual syslog options checked there as I did on my production box) which is the way the virtual was last turned off.

            duplicates from unbound

            Screenshot 2025-08-08 at 4.12.06 PM.png

            changed setting to "everything".
            Duplicates gone

            Screenshot 2025-08-08 at 4.13.45 PM.png

            and "bonus" cron and nginx messages

            1 Reply Last reply Reply Quote 0
            • P Offline
              postilion @jrey
              last edited by

              @jrey
              Yesterday we upgraded three recently installed Netgate 8300-Max firewalls from 24.11 to 25.07. One of the problems we noticed was that our syslogd would die shortly after starting. Last messages in /var/log/system.log were always "syslogd: sendto: Connection refused"

              A fourth firewall, which was upgraded to 25.07 prior to deployment did not exhibit this problem.

              After reading your experiences I went to the syslog config page and found that an old Graylog server was still referred to in the three problem installs. I removed those entries, and syslogd has been running just fine ever since.

              So I can confirm that an unreachable target seems to be enough to kill the syslogd process. This is a bug.
              -nic

              J 1 Reply Last reply Reply Quote 1
              • stephenw10S Online
                stephenw10 Netgate Administrator
                last edited by

                Hmm, just to be clear is the remote syslog server here in a subnet local to pfSense?

                I'm failing to replicate it so far with a missing server. I just get 'host is down' log spam.

                P J 2 Replies Last reply Reply Quote 0
                • P Offline
                  postilion @stephenw10
                  last edited by postilion

                  @stephenw10
                  No, in this case the remote syslog receiver was remote, on a different subnet. And for completeness, the destination subnet is not a connected network from the firewall perspective.
                  -nic

                  1 Reply Last reply Reply Quote 0
                  • J Offline
                    jrey @postilion
                    last edited by

                    @postilion

                    Interesting - by syslog config page you mean system logs settings ?

                    So I have two syslog servers so there are two IP addresses listed there.

                    Netgate -> syslog1
                    Netgate -> syslog2

                    Here is a different observation -
                    syslog1 shuts itself down and does some automated system maintenance twice a week ( the downtime usually lasts just a few minutes) and it comes back up. (other systems in the network have no issue logging to it before it goes down and/or after it auto restarts.

                    as syslog2 is running throughout -- it never misses a record (including from the netgate)

                    front the netgate everything that would normally go to both syslogs continues to write to the local files and syslog2 while syslog1 is offline. when it syslog1 is back online the netgate does not resume sending to syslog1.

                    In my case restarting syslogd alone does not resolve the problem, restarting unbound alone does not resolve the problem
                    But if I restart syslogd and unbound in that order (from the services page) - it resumes

                    I think that when syslog1 goes down, syslogd and unbound are not playing well together, one of them or in combination are not restarting the connection when it fails and resumes.

                    because syslog1 maintenance happens in the wee hours of the morning
                    I wrote a little script to monitor the syslog1 going down and then taking the appropriate action when it comes back up (because the syslogd and unbound don't actually stop they are both still running throughout) service watchdog for example won't see a problem.

                    some output from a test run of the script. (192.168.0.35 is my syslog1 server)

                    ++ date
                    + echo 'Mon 11 Aug 2025 08:43:30 EDT: 192.168.0.35 is offline.'
                    Mon 11 Aug 2025 08:43:30 EDT: 192.168.0.35 is offline.
                    + ip_status=1
                    + sleep 300
                    + true
                    + ping -c 1 192.168.0.35
                    + '[' 1 -eq 1 ']'
                    ++ date
                    + echo 'Mon 11 Aug 2025 08:48:30 EDT: 192.168.0.35 is back online. Restarting syslogd and unbound...'
                    Mon 11 Aug 2025 08:48:30 EDT: 192.168.0.35 is back online. Restarting syslogd and unbound...
                    

                    The network and various system are very stable -- 260+ days on 24.11 never failed. 0 issues. Data tells the story. (the change is colour is based on unbound pid) only ever changed as a result of pfblocker restarting unbound when needed (usually every couple of days)

                    above the graph I have drawn some lines, from left to right
                    the first 2 I was upgrading the syslog system the small gap represents the reboots of that system (the colour in the graph didn't change so the pid didn't change).
                    the third small gap, (aug 4) - again no change in pid
                    the 4 line just after the Aug 6 marker is where I upgraded from 24.11 to 25.07
                    the horizontal line is where the whole doubling of data thing was happening. Lots of bouncing around trying various things. Within that horizontal the large empty block starting at the "8" in "Aug 8" 1:30am mark is where the first automated syslog1 maintenance happened. No data from 1:30am until I kicked the services in the morning.

                    Corrected the doubling up issue last vertical at the end of the horizontal, which is also just not right fix if you don't want "everything". (but for now the "Everything" setting stops that and the data level returns to "normal" and I've been playing

                    Screenshot 2025-08-11 at 10.11.36 AM.png

                    1 Reply Last reply Reply Quote 0
                    • J Offline
                      jrey @stephenw10
                      last edited by

                      @stephenw10

                      for clarity in my case both of the syslog servers I referenced in my response to @postilion are both in the same subnet

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S Online
                        stephenw10 Netgate Administrator
                        last edited by

                        Hmm, maybe I misread this then. What circumstances cause syslogd to stop in pfSense?

                        P J 2 Replies Last reply Reply Quote 0
                        • P Offline
                          postilion @stephenw10
                          last edited by

                          @stephenw10
                          In our case syslogd would stop shortly after boot. Restarting syslogd, it would stop again a short time later. I don't have exact timings, as I was dealing with other update-related issues at the time, and this was a lower priority.

                          After reading @jrey 's message, however, I went back into the syslog settings panel, removed the graylog server, since it's currently down, and saved. This caused syslogd to start again, and it's now been running smoothly for the past two hours.

                          Note: This is on three (3) separate 8300-Max units, all installed within the past month, and all updated from 24.11 to 25.07 yesterday (8/10/25).
                          -nic

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S Online
                            stephenw10 Netgate Administrator
                            last edited by

                            Ah, OK I think I replicated it. Digging....

                            J 1 Reply Last reply Reply Quote 0
                            • J Offline
                              jrey @stephenw10
                              last edited by

                              @stephenw10
                              Generally taking one or the other of the two syslog servers offline, but I don't know the cause yet.

                              appears it is not continuing / reestablishing the connection to the one that went down and came back up

                              I've also tried having just one syslog server - it is broken in that case too.

                              I've had the syslogd service actually stop (shows service down) twice since the update, but always in relationship to when one of the syslog servers goes off line

                              I'm really not sure why at this point (at least for me) just restarting syslogd isn't enough to make it go..
                              restarting syslogd only doesn't help -- I "Always" have to restart both syslogd and unbound.

                              My script will temporarily work around the not sending, by just restarting them in order (even if they are running, because mostly they are they just stop communicating with the one server even though it is back online). Other systems in the network resume just fine.

                              Interestingly enough, we know that from time to time, pfblocker will restart unbound, when that happens - there has never been an issue and still isn't. in that case unbound can come and go as it pleases and there is no issue.

                              But some combination of the syslog server going away temporarily and returning causes syslogd and unbound to stop (but continue to local files and the other syslog server). so both of the process are generally running just in limp mode.

                              1 Reply Last reply Reply Quote 0
                              • J Offline
                                jrey @stephenw10
                                last edited by

                                @stephenw10 @postilion

                                Cool, that you can replicate..

                                Just for a little more clarity
                                One of my servers is a Graylog, the other is not (Synology repository, that's all it does is collect the same data). the two are not linked in anyway. they are completely different systems.

                                it doesn't matter which one I take of line however the result is the same, when it comes back comms do not resume to that one, but still continue to the other

                                So don't think we can assume it is specifically related to Graylog

                                I can run for days, as long as I don't take one of the syslog servers offline (which in the case of 1 server happens twice a week (Monday and Thursday)

                                Again based on unbound pid -
                                had been running fine until yesterday morning, then a little bit of colour while I was writing a script and doing some thing things until about noon, then from noonish yesterday until the gap this morning, the orange(ish) on the right is after the script figured out it needed to take action and the green is when pfblocker thought unbound needed a restart because of some change that had been download --- all pretty normal except now having to have the script intervention to restart the syslog/unbound combo

                                Screenshot 2025-08-11 at 11.27.49 AM.png

                                It will likely run clean now until Thrusday, or unless I see something else I want to "try" sometime between now and then unbound will normally be restarted at least 1 based on a pfblocker download/change -- just like this morning I don't expect it will cause any issues. other than a noted change in pid for unbound.

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S Online
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  Yeah it look like this only happens when the remote host actually responds with 'refused'. So local hosts that don't respond to arp or just don't respond at all to the syslog packets will not trigger it.

                                  J 1 Reply Last reply Reply Quote 0
                                  • J Offline
                                    jrey @stephenw10
                                    last edited by

                                    @stephenw10

                                    thanks for digging in

                                    Interesting - that honestly seems different than the previous version.

                                    The behaviour of the server goes down/ comes up hasn't really changed.

                                    once it gets a "refused". does it ever retry ? (say when the next message is sent)

                                    a "default" syslog system supports retries interval and max retries options for "refused connections" are those options available to us ?

                                    Seems it was working (without any options before) because after any logging server went down under 24.11 logging just resumed when the server came back, as you would expect a syslog sender to do without intervention. and is the case with every other system on the network sending logs to the server..

                                    I'm guess the answer to supporting that options is no ?
                                    hint: I tried adding them to a new conf file I created in /var/etc/syslog.d directory assuming it would process any/all conf files in the directory. sadly the service would not start so removed that config file and restarted it. I'm guessing either pfSense syslog doesn't support those options or it didn't like the second conf file in the directory ? I also just tried adding the options to existing conf file (ya the one that says do not edit 😊 It started, but when I checked the conf the options had been removed from the config. so it didn't like me adding them there. 😢 😞

                                    I'll just use my script in the interim it works -- in monitors and after the server goes down, then when it comes up, wait a couple of minutes and restarts. syslog and unbound -- everything works perfectly from then on.

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S Online
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      Whether or not it reties it definitely shouldn't kill syslogd! https://redmine.pfsense.org/issues/16362

                                      I would expect it to keep trying though.

                                      J 1 Reply Last reply Reply Quote 1
                                      • J Offline
                                        jrey @stephenw10
                                        last edited by

                                        @stephenw10

                                        but that's not exactly the case -- it only stops logging and does not resume to the server that went down and came back up --

                                        I would not say it killed syslogd completely because it is still logging to a second server if configured even though it may have received a "refused connection" from either one of the two configured it is only the one going down does not resume. The other just carries on happily receiving logs.

                                        Now perhaps if both remote servers go offline it might stop the syslog service completely (or maybe if there is only one) - I haven't tried shutting them both down at the same time and I haven't tried only having one remote configured - I guess I could try that when things are a little less busy some evening.

                                        I guess you are saying that the retry options are not available in the pfsense version. from the documentation of a standard syslog setup, these options are specifically referenced in the context of a "refused connection" and how many times it should retry at what interval, which is exactly what the case is. Oddly enough not of the other system I have that are sending logs to the same servers are having a problem and have no specific options set.

                                        either way thanks for the investigation. I appreciate it.

                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S Online
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          Well in my test setup I can reliably reproduce it killing syslogd. It's fixed in internal dev versions though so something needs back porting.

                                          Now it could be that it keeps functioning as long as at least one remote server is available... 🤔

                                          P J 2 Replies Last reply Reply Quote 1
                                          • P Offline
                                            postilion @stephenw10
                                            last edited by postilion

                                            @stephenw10
                                            In our experience syslogd dies if any target is unreachable, as noted above.
                                            -nic

                                            1 Reply Last reply Reply Quote 2
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.