Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    ahcich0: Timeout on slot.....

    Scheduled Pinned Locked Moved Hardware
    14 Posts 2 Posters 1.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • G
      glippi
      last edited by

      Hello,

      I recently migrated my servers from esx6.7 to 8.0u2.
      With this I also installed a new pfSense VM and uploaded the configuration of the old to the new.

      It all works as intended except for 1 thing.
      Once in a while, it will be unreachable and then come back, the console shows this:
      Feb 7 01:13:16 kernel ahcich0: Timeout on slot 7 port 0
      Feb 7 01:13:16 kernel ahcich0: is 00000003 cs 00000000 ss 00000000 rs 00000180 tfd 40 serr 00000000 cmd 0004df17
      Feb 7 01:13:16 kernel ahcich0: ... waiting for slots 00000100
      Feb 7 01:13:16 kernel ahcich0: Timeout on slot 8 port 0
      Feb 7 01:13:16 kernel ahcich0: is 00000003 cs 00000000 ss 00000000 rs 00000180 tfd 40 serr 00000000 cmd 0004df17

      One from 2 days earlier:
      Feb 5 03:08:10 kernel ahcich0: Timeout on slot 5 port 0
      Feb 5 03:08:10 kernel ahcich0: is 00000003 cs 00000000 ss 00000000 rs 00000060 tfd 40 serr 00000000 cmd 0004df17
      Feb 5 03:08:10 kernel ahcich0: ... waiting for slots 00000040
      Feb 5 03:08:10 kernel ahcich0: Timeout on slot 6 port 0
      Feb 5 03:08:10 kernel ahcich0: is 00000003 cs 00000000 ss 00000000 rs 00000060 tfd 40 serr 00000000 cmd 0004df17

      I am myself not directly sure, but I think it could be because of my card that I use, which is an intel 82599 10 gigabit and maybe failing.
      or something wrong with PCI passthrough on esxi8.... any help would be appreciated :)

      The version of pfsense I use:
      2.7.2-RELEASE (amd64)
      FreeBSD 14.0-CURRENT

      G 1 Reply Last reply Reply Quote 0
      • G
        glippi @glippi
        last edited by

        @glippi for the VM it seems there are no events from my esxi host

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Is the VM storage accessed via that NIC?

          G 1 Reply Last reply Reply Quote 0
          • G
            glippi @stephenw10
            last edited by

            @stephenw10 it is a vm disk directly attached on the server, no iscsi or anything, other VM's on the same datastore have no issues.
            The VM has the intel 82599 10 gigabit dual port directly attached to it, one is WAN and the other LAN, so that is why I am expecting the card to actually get disconnected and reconnected, but the events for the ESXi on host show nothing like this, so maybe driver issues, but it is the same card, same pfSense build and all.

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              Ah, OK. Seems unlikely to be NIC related then those messages are from the SATA controller.

              Does it actually cause an issue when that is shown?

              G 1 Reply Last reply Reply Quote 0
              • G
                glippi @stephenw10
                last edited by

                @stephenw10 Internet and LAN goes down, but VM and pfSense stay up, so there is a hard disconnect it seems on both network ports assigned to the VM from the intel nic

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  If it was losing the NIC, or even the link, there would be a lot more logged. Is it really only those ahci errors shown?

                  G 1 Reply Last reply Reply Quote 0
                  • G
                    glippi @stephenw10
                    last edited by

                    @stephenw10 This would be all around that time that this occurs:
                    Feb 5 02:00:01 php 35301 [pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
                    Feb 5 02:04:00 sshguard 81577 Exiting on signal.
                    Feb 5 02:04:00 sshguard 67907 Now monitoring attacks.
                    Feb 5 02:31:00 sshguard 67907 Exiting on signal.
                    Feb 5 02:31:00 sshguard 27224 Now monitoring attacks.
                    Feb 5 02:32:00 sshguard 27224 Exiting on signal.
                    Feb 5 02:32:00 sshguard 55819 Now monitoring attacks.
                    Feb 5 02:33:00 sshguard 55819 Exiting on signal.
                    Feb 5 02:33:00 sshguard 86073 Now monitoring attacks.
                    Feb 5 03:00:00 php 89853 [pfBlockerNG] Starting cron process.
                    Feb 5 03:00:01 php 89853 [pfBlockerNG] No changes to Firewall rules, skipping Filter Reload
                    Feb 5 03:08:10 kernel ahcich0: Timeout on slot 5 port 0
                    Feb 5 03:08:10 kernel ahcich0: is 00000003 cs 00000000 ss 00000000 rs 00000060 tfd 40 serr 00000000 cmd 0004df17
                    Feb 5 03:08:10 kernel ahcich0: ... waiting for slots 00000040
                    Feb 5 03:08:10 kernel ahcich0: Timeout on slot 6 port 0
                    Feb 5 03:08:10 kernel ahcich0: is 00000003 cs 00000000 ss 00000000 rs 00000060 tfd 40 serr 00000000 cmd 0004df17
                    Feb 5 03:08:21 login 33166 login on ttyv0 as root

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Hmm, nothing shown there then.

                      When this happens do you have to do anything to restore the connection?

                      G 1 Reply Last reply Reply Quote 0
                      • G
                        glippi @stephenw10
                        last edited by

                        @stephenw10 No, all just comes back, pfSense uptime shows that it did not go down, WAN and LAN connection uptime shows that WAN and LAN did

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Hmm, so it's just down for those 10s or so and nothing else is logged in pfSense?

                          I would think it pretty much has to be something in the hypervisor with those symptoms.

                          G 2 Replies Last reply Reply Quote 0
                          • G
                            glippi @stephenw10
                            last edited by

                            @stephenw10 My thoughts so too, but no hardware events reported there, I will gather some more logs soon and share them with you here, for now it has not re-occurred

                            1 Reply Last reply Reply Quote 1
                            • G
                              glippi @stephenw10
                              last edited by

                              @stephenw10 hello, just a small update. I found 2 out of 6 SSD's in me raid with SMART alert flag (RAID6 and I honestly think this should not be seen like this on pfSense and not on other VM's on same drive span)
                              and I am moving the links away from the Intel NIC to the onboard NIC for WAN and a VM nic for LAN. So far with moving LAN from the Intel has made the connection more stable and not seen the error since.

                              It seems that the Intel NIC (intel 82599 10 gigabit) is badly supported in ESXi 8u2 vs 6.7u2 so that might also be part of the issue.
                              Since the hypervisor did change and the error never occurred on the previous (same pfSense config, other ESX version).

                              I think therefor atm that it is the NIC and the support with ESXi 8 vs 6.7 that has caused this error, today I am swapping out the drives flagged by SMART

                              G 1 Reply Last reply Reply Quote 1
                              • G
                                glippi @glippi
                                last edited by

                                @glippi Hello, after swapping out the intel 82599 10 gigabit dual port and moving this to the onboard ethernet with PCI passthrough to the VM, it seems everything is stable now.
                                the AHCI errors, however consistent with disks (as far as I can see), it seems that in this case it was with the ethernet controller.
                                My guess it was just a faulty card as since moving over I have had no issues now for 18 days.

                                If you would like any logs, please let me know which one you would like to see for further investigation on this.

                                1 Reply Last reply Reply Quote 1
                                • First post
                                  Last post
                                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.