Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    NUT randomly dies - long run problem

    Scheduled Pinned Locked Moved pfSense Packages
    7 Posts 2 Posters 2.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C
      Criggie
      last edited by

      My firewall is an alix 2d2 running pfsense, and its powered by an old serial APC smartUPS 620.

      Since the only serial port on the alix is for a console I've used a serial to USB adapter to provide a second serial port, that shows as /dev/cuaU0 and it works fine.

      Occasionally it will simply stop monitoring.  I have found no way to trigger it manually.
      This appears on every SSH session and the console repeated every 5 minutes:

      Broadcast Message from root@pfsense2.criggie.org.nz                            
              (no tty) at 17:21 NZST...                                              
      
      Communications with UPS smartups@localhost lost                                
      
      Broadcast Message from root@pfsense2.criggie.org.nz                            
              (no tty) at 17:26 NZST...                                              
      
      UPS smartups@localhost is unavailable
      ...
      
      

      My pfsense box syslogs to another host, and this appears in that log file

      
      Jun 29 06:08:39 pfsense2 apcsmart[33125]: update_status: apc_write failed: Device not configured
      Jun 29 06:08:39 pfsense2 apcsmart[33125]: update_status: apc_write failed: Device not configured
      Jun 29 06:08:39 pfsense2 kernel: ugen0.2: <prolific technology="" inc.="">at usbus0 (disconnected)
      Jun 29 06:08:39 pfsense2 kernel: uplcom0: at uhub0, port 2, addr 2 (disconnected)
      Jun 29 06:08:39 pfsense2 upsd[33503]: Data for UPS [smartups] is stale - check driver
      Jun 29 06:08:39 pfsense2 upsd[33503]: Data for UPS [smartups] is stale - check driver
      Jun 29 06:08:39 pfsense2 apcsmart[33125]: smartmode: issuing 'Y' failed: Device not configured
      Jun 29 06:08:39 pfsense2 apcsmart[33125]: smartmode: issuing 'Y' failed: Device not configured
      Jun 29 06:08:39 pfsense2 apcsmart[33125]: smartmode: issuing 'Y' failed: Device not configured
      Jun 29 06:08:39 pfsense2 apcsmart[33125]: smartmode: issuing 'Y' failed: Device not configured
      Jun 29 06:08:39 pfsense2 apcsmart[33125]: smartmode: issuing 'Y' failed: Device not configured
      Jun 29 06:08:39 pfsense2 apcsmart[33125]: smartmode: issuing 'Y' failed: Device not configured
      ...</prolific> 
      

      and that last line is repeated approximately 2500-3000 times per second.  Last week's syslog file shows  13,426,870 copies of the same message!

      So the only fix then is to either restart the NUT service or to kill and restart  /usr/pbi/nut-i386/libexec/nut/apcsmart
      Once I do that it works fine for some time, from hours to months.

      I've had this issue in both 2.0.x and 2.1

      So I'm now running  /usr/pbi/nut-i386/libexec/nut/apcsmart -a smartups -D

      and now I bet it will run fine for months….

      
      Network UPS Tools - APC Smart protocol driver 3.04 (2.6.5)
      APC command table version 3.0
         0.000000     debug level is '1'
         0.269746     attempting firmware lookup using command 'V'
         0.319701     APC - attempting to find command set
         0.990117     APC - Parsing out supported cmds and vars
         2.048696     protocol_verify - APC: [d] unrecognized
         3.167895     APC - About to get capabilities string
         5.246675     supported capability: 75 (I) - input.transfer.high
         5.246860     supported capability: 6c (I) - input.transfer.low
         5.246985     supported capability: 65 (4) - battery.charge.restart
         5.247102     supported capability: 6f (I) - output.voltage.nominal
         5.247173     supported capability: 73 (4) - input.sensitivity
         5.247283     supported capability: 71 (4) - battery.runtime.low
         5.247416     supported capability: 70 (4) - ups.delay.shutdown
         5.247539     supported capability: 6b (4) - battery.alarm.threshold
         5.248445     supported capability: 72 (4) - ups.delay.start
         5.248588     supported capability: 45 (4) - ups.test.interval
         5.248702     APC - UPS capabilities determined
         5.248746     detected Smart-UPS 620    [QS0230242266] on /dev/cuaU0
      
      Broadcast Message from root@pfsense2.criggie.org.nz                            
              (no tty) at 12:57 NZST...                                              
      
      Communications with UPS smartups@localhost established                         
      
      

      So I'm trying to pick whether its the USB bit, the USB-Serial bit, or something in NUT.

      If this rings a bell please let me know, but I'm still investigating.

      1 Reply Last reply Reply Quote 0
      • D
        doktornotor Banned
        last edited by

        Have you tried to simply use the ports the other way round?

        1 Reply Last reply Reply Quote 0
        • C
          Criggie
          last edited by

          @doktornotor:

          Have you tried to simply use the ports the other way round?

          Nope - I don't want to lose the console access, and I guess its convoluted to move the system console onto a USB serial port.

          Thing is, it DOES work this way for a bit.

          Yeah it happened again about half an hour ago, and my ssh session only showed half a second worth of scrollback.  So I'm logging it to a file now  :-\

          1 Reply Last reply Reply Quote 0
          • C
            Criggie
            last edited by

            So here's the debug from nut:

            
               0.000000     debug level is '1'
               0.285332     attempting firmware lookup using command 'V'
               0.345285     APC - attempting to find command set
               1.025723     APC - Parsing out supported cmds and vars
               2.084389     protocol_verify - APC: [d] unrecognized
               3.772458     APC - About to get capabilities string
               5.860236     supported capability: 75 (I) - input.transfer.high
               5.870168     supported capability: 6c (I) - input.transfer.low
               5.879415     supported capability: 65 (4) - battery.charge.restart
               5.889014     supported capability: 6f (I) - output.voltage.nominal
               6.198564     supported capability: 73 (4) - input.sensitivity
               6.210537     supported capability: 71 (4) - battery.runtime.low
               6.222520     supported capability: 70 (4) - ups.delay.shutdown
               6.234367     supported capability: 6b (4) - battery.alarm.threshold
               6.246193     supported capability: 72 (4) - ups.delay.start
               6.258536     supported capability: 45 (4) - ups.test.interval
               6.270410     APC - UPS capabilities determined
               6.282198     detected Smart-UPS 620    [QS0230242266] on /dev/cuaU0
             921.945335     Communications with UPS lost: timeout
             921.960375     smartmode: issuing 'Y' failed: Device not configured
             921.972454     smartmode: issuing 'Y' failed: Device not configured
             921.984275     smartmode: issuing 'Y' failed: Device not configured
             921.996115     smartmode: issuing 'Y' failed: Device not configured
             922.007944     smartmode: issuing 'Y' failed: Device not configured
             922.019743     smartmode: issuing 'Y' failed: Device not configured
            ...
            281,453 lines of that same message
            
            

            Which is pretty useless…  However the system dmesg shows

            
            ugen0.2: <prolific technology="" inc.="">at usbus0 (disconnected)
            uplcom0: at uhub0, port 2, addr 2 (disconnected)
            ugen0.2: <prolific technology="" inc.="">at usbus0
            uplcom0: <prolific 0="" 2="" technology="" inc.="" usb-serial="" controller,="" class="" 0,="" rev="" 1.10="" 3.00,="" addr="">on usbus0</prolific></prolific></prolific> 
            

            So that makes it look like the USB/serial adapter is going away and coming back - any suggestions on how to make the uplcom module more stable?

            1 Reply Last reply Reply Quote 0
            • D
              doktornotor Banned
              last edited by

              Half of these USB->serial adapters are half-broken at best… Very much doubt it's the driver fault.

              1 Reply Last reply Reply Quote 0
              • C
                Criggie
                last edited by

                @doktornotor:

                Half of these USB->serial adapters are half-broken at best… Very much doubt it's the driver fault.

                Fair enough - I've borrowed a different brand etc to see if that's the problem.

                1 Reply Last reply Reply Quote 0
                • C
                  Criggie
                  last edited by

                  OK I've tried a bunch of different adapters, and only the original PL2303 one does anythign at all.  I tried an old USB/serial adapter that our Cisco expert swears by, and a funny looking stacking Belkin one that the other Cisco wally likes.

                  In the NUT config screen, the "Local UPS port" list always shows these choices:
                  Auto (USB Only)
                  cuau0
                  cuac1
                  ttyu0
                  ttyu1

                  When I have the working adapter plugged, the list gains two more entries (notice the capitalisation)
                  Auto (USB Only)
                  cuaU0
                  cuau0
                  cuac1
                  ttyU0
                  ttyu0
                  ttyu1

                  dmesg entries, working USB/serial adapter
                  ugen0.2: <prolific technology="" inc.="">at usbus0
                  uplcom0: <prolific 0="" 2="" technology="" inc.="" usb-serial="" controller,="" class="" 0,="" rev="" 1.10="" 3.00,="" addr="">on usbus0

                  dmesg entries, failing Belkin USB/serial adapter
                  ugen0.2: <belkin components="">at usbus0

                  So there's no uplcom0 line for the non working ones.  I'm stumped now - not sure if it s a kernel thing not finding the right module, or a php/gui thing.</belkin></prolific></prolific>

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post
                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.