Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Netgate 6100 Max unresponsive after "pool I/O failure, zpool=pfSense error=97" error found in the logs

    Scheduled Pinned Locked Moved Official Netgate® Hardware
    15 Posts 4 Posters 726 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      michmoor LAYER 8 Rebel Alliance @keyser
      last edited by

      @keyser said in Netgate 6100 Max unresponsive after "pool I/O failure, zpool=pfSense error=97" error found in the logs:

      Use Servergrade hardware with ECC memory, redundant powersupplies, Uplink using LAGG’s across multiple discrete NICs and create the ZFS mirror Zpool across two different discs on two different controllers (not ports on the same controller).

      In fairness....Hes using 6100 - official hardware. i would think that official components would fail infrequent so maybe just RMA the device

      Firewall: NetGate,Palo Alto-VM,Juniper SRX
      Routing: Juniper, Arista, Cisco
      Switching: Juniper, Arista, Cisco
      Wireless: Unifi, Aruba IAP
      JNCIP,CCNP Enterprise

      keyserK 1 Reply Last reply Reply Quote 0
      • M
        michmoor LAYER 8 Rebel Alliance @maxferrario
        last edited by

        @maxferrario said in Netgate 6100 Max unresponsive after "pool I/O failure, zpool=pfSense error=97" error found in the logs:

        my (standard, as per https://docs.netgate.com/pfsense/en/latest/recipes/high-availability.html) HA setup was completely useless in this situation and I had to come to our office to reset the unit.

        Couldnt you have logged into the secondary to force a manual failover?

        Firewall: NetGate,Palo Alto-VM,Juniper SRX
        Routing: Juniper, Arista, Cisco
        Switching: Juniper, Arista, Cisco
        Wireless: Unifi, Aruba IAP
        JNCIP,CCNP Enterprise

        M 1 Reply Last reply Reply Quote 0
        • keyserK
          keyser Rebel Alliance @michmoor
          last edited by

          @michmoor said in Netgate 6100 Max unresponsive after "pool I/O failure, zpool=pfSense error=97" error found in the logs:

          In fairness....Hes using 6100 - official hardware. i would think that official components would fail infrequent so maybe just RMA the device

          Sure - but it can be a little tricky to get a RMA ticket from support on issues like this without any troubleshooting.

          Love the no fuss of using the official appliances :-)

          1 Reply Last reply Reply Quote 0
          • M
            maxferrario @michmoor
            last edited by

            @michmoor I'm new to pfSense: can you please explain me how to perform a manual failover?
            I had a look at the official docs but couldn't find anything useful.

            M 1 Reply Last reply Reply Quote 0
            • M
              michmoor LAYER 8 Rebel Alliance @maxferrario
              last edited by

              @maxferrario i assume you can go into maintenance mode from the secondary/passive firewall?

              If not, i would invest in an OOB system so you can console into your firewalls remotely. Along the lines of what @keyser recommended when designing high availability systems, i would also invest in smart PDUs so you can shut down the outlet to your devices remotely if needed.

              Firewall: NetGate,Palo Alto-VM,Juniper SRX
              Routing: Juniper, Arista, Cisco
              Switching: Juniper, Arista, Cisco
              Wireless: Unifi, Aruba IAP
              JNCIP,CCNP Enterprise

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                @keyser said in Netgate 6100 Max unresponsive after "pool I/O failure, zpool=pfSense error=97" error found in the logs:

                faulty NICs

                That would not normally be one of the causes because a failed interface should cause to primary node to demote itself. It still could be if the NIC still showed as UP somehow though.

                To manually failover you would need to set the primary in maintenance mode which would require some access to it, SSH for example which would usually still work. If you had cross connected the serial consoles you could login to he secondary to reach the primary console.

                Steve

                M 1 Reply Last reply Reply Quote 0
                • M
                  maxferrario @stephenw10
                  last edited by

                  @stephenw10
                  what do you mean with "cross connecting the serial consoles", and would that help me to remotely access the primary? When this issue happened, I couldn't use openVPN because the primary was unresponsive, so I had no accesso to both the primary and the secondary boxes.

                  Massimo

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Connect the serial console from the Primary to a USB port on the Secondary and the other way too.

                    Then you can SSH into one node and reach the serial console on the other node using:
                    cu -l cuaU0 -s 115200

                    Use ~~. to escape that.

                    M 1 Reply Last reply Reply Quote 1
                    • M
                      maxferrario @stephenw10
                      last edited by

                      Thanks @stephenw10 ,
                      good to know.
                      But this would not help me if the issue described above happens again: even the console was unresponsive.

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Yes, if the console is completely unresponsive then it won't help but the console is often the last thing to still function.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.