Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    25.07 unbound - pfblocker - python - syslog

    Scheduled Pinned Locked Moved General pfSense Questions
    49 Posts 7 Posters 2.5k Views 9 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J Offline
      jrey @kmp
      last edited by

      @kmp said in 25.07 unbound - pfblocker - python - syslog:

      I can also add that it appears that pfSense syslogd doesn't die at the time the target service goes down - it dies when the service is transitioning to being up.

      the behaviour seems to be slightly different depending on having 1 or 2 receiving systems setup - specifically as to if it syslog continues, dies outright and/or actually logs anything about the trouble it has encountered. clearly however if one remote dies, one continues, so the syslogd doesn't really die. @stephenw10 never really mentioned if it was going down or trying to recover, but in that redmine, there is a brief comment about "it takes about 10 minutes".

      I think we are just in a hold and see what happens upstream as the root cause is clearly in syslogd - the redmine created the redmine (link above in thread) but unless they work some release magic, it will likely be a while. I had actually thought of using the Boot environment and just rolling back to 24.11 where it worked fine but then the little script I wrote is working fine on recovering from the issue, so here we are..

      thanks for the offer though.

      1 Reply Last reply Reply Quote 0
      • stephenw10S Offline
        stephenw10 Netgate Administrator
        last edited by

        It seemed to be around 10mins at the time but having tried to replicate it with debugging it's not that simple. I failed to do so in the time available. I'll be re-testing that next week.

        J 1 Reply Last reply Reply Quote 0
        • J Offline
          jrey @stephenw10
          last edited by

          @stephenw10

          I noticed that 2.8.1 RC released with a note about syslogd so I went to track down what actually changed from the redmine numbers final looked here to see "this change and description last week"
          Screenshot 2025-08-26 at 3.30.39 PM.png

          did this make the 2.8.1 RC release ? wasn't clear from the timing and notes. I had previously upgraded / tested 2.8 as part of this issue and confirmed the failure there --

          the diff associated specifically with 2.8.1 RC doesn't list this so I'm guess it is not there.? (but sometimes stuff gets built and not include in the notes)

          if this noted change made the cut I'll jump through the hoops on 2.8.1 RC and test it, but if it for sure missed the cut not in this RC, I'll wait..

          Thanks

          1 Reply Last reply Reply Quote 0
          • stephenw10S Offline
            stephenw10 Netgate Administrator
            last edited by

            I'm away at the moment and can't check directly. However I don't believe it includes that. 2.8.1 is intended to be as close to 25.07.1 as possible so that testing/bugs apply similarly to both. It looks like it was the source address binding that was fixed there.
            I'll be back on this next week.

            1 Reply Last reply Reply Quote 1
            • J jrey referenced this topic
            • S Offline
              stdanro
              last edited by

              Im having the same issue.
              Im sending to elastic angent and each time I kill the elastic angent and it stops listening on port 9001 the syslog stops sending.
              bb6ad2a7-520a-4e80-a232-90f669d1f822-image.png

              I do see a weird destination unreachable ICMP before that on my log collector host.
              b52b6957-82ed-4f03-a6ee-49ff94f15c09-image.png

              J 1 Reply Last reply Reply Quote 0
              • J Offline
                jrey @stdanro
                last edited by

                @stdanro

                It would be the recovery from that -- look at the code referenced - syslogd and the changes are specifically related to EGAIN and ECONNREFUSED messages (they were not being handled) -

                not having them processed causes all kinds of interesting artifacts -and different when sending to a single server vs multiple servers (I have 2, and the order they are listed also changes the behaviour that is if one goes down and the other does not, which is my case)

                Because in my environment I know exactly when the issue is going to occur because of a fixed schedule maintenance window on one of the syslog servers) I have a script that monitors that receiving device and restarts syslog accordingly after it detect the system/port are back and available)

                Other than that "tiny little issue" as far as I can tell it is rock solid in processing messages. 😊

                The only option for us currently is to wait for the new build of syslogd - so that it just recovers like it did before, back in the 24.11 days.

                1 Reply Last reply Reply Quote 0
                • O Offline
                  OffstageRoller
                  last edited by

                  I'm running into a similar issue after upgrading to 25.07.1.

                  I see this in my logs:

                  2025-09-02 10:51:05.870660-07:00	syslogd	-	kernel boot file is /boot/kernel/kernel
                  2025-08-30 04:48:40.984805-07:00	syslogd	-	sendto: Connection refused
                  2025-08-29 02:28:48.148564-07:00	syslogd	-	kernel boot file is /boot/kernel/kernel
                  2025-08-23 04:47:30.000287-07:00	syslogd	-	sendto: Connection refused
                  2025-08-19 11:35:12.527643-07:00	syslogd	-	kernel boot file is /boot/kernel/kernel
                  2025-08-19 11:30:45.904435-07:00	syslogd	-	exiting on signal 15
                  2025-08-18 18:21:26.032560-07:00	syslogd	-	kernel boot file is /boot/kernel/kernel
                  2025-08-16 04:47:40.955468-07:00	syslogd	-	sendto: Connection refused
                  

                  I get sendto: Connection refused in my logs, and that's when syslogd dies.

                  You can see this almost always happens for me around the same time at 4:48AM. It's also every 7 days.

                  I run Graylog via docker on my server. Every Saturday morning I backup all of my docker containers starting at 4 AM, and this means stopping each container while it's being backed up.

                  I tested this today by stopping my Graylog container, and in less than 2 hours syslogd had stopped running on pfSense.

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S Offline
                    stephenw10 Netgate Administrator
                    last edited by

                    Yup, I'm back on this now. Trying to replicate with debugging....

                    O 1 Reply Last reply Reply Quote 0
                    • O Offline
                      OffstageRoller @stephenw10
                      last edited by

                      @stephenw10 I tried testing this a few times today. Twice it took about 2 hours before it stopped, and a third time it took maybe 4 hours before it stopped.

                      I'll try testing some more, but it's not consistent.

                      pfSense will immediately post that sendto: Connection refused log event once I shut down Graylog. But then 1 to maybe 4 hours later, the syslogd service stops.

                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.