Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Bad disk issue with FreeBSD 6.1 based builds

    Development
    3
    5
    6.3k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • K
      k3rmit
      last edited by

      I'm still using build 01-24-07 since my last bad experience with an upgrade, but i'm getting weird errors now with heavy disk activity as described here:

      http://www.freebsd.org/cgi/query-pr.cgi?pr=103435

      These cause temporary disk deadlocks dropping one of my network interfaces and screwing its CARP virtual ip.

      In particular i was just looking at snort blocked addresses and all our network connections were dropped simultaneously (i had to relaunch pfsync to make it work correctly again, it was stuck in INIT), as by system logs:

      re0: watchdog timeout
      re0: link state changed to DOWN
      ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
      ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
      re0: watchdog timeout
      ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly
      ad4: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly
      re0: watchdog timeout
      ad4: WARNING - SET_MULTI taskqueue timeout - completing request directly
      ad5: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
      re0: watchdog timeout
      ad5: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
      ad5: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly
      re0: watchdog timeout
      ad5: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly
      re0: watchdog timeout
      ad5: WARNING - SET_MULTI taskqueue timeout - completing request directly
      ad5: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=33122223
      re0: 10 link states coalesced
      re0: link state changed to DOWN
      re0: link state changed to UP
      arp_rtrequest: bad gateway 62.2.160.66 (!AF_LINK)
      arp_rtrequest: bad gateway 10.100.0.1 (!AF_LINK)
      re0: watchdog timeout
      ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
      ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
      re0: watchdog timeout
      ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly
      ad4: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly
      re0: watchdog timeout
      ad4: WARNING - SET_MULTI taskqueue timeout - completing request directly
      ad5: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
      re0: watchdog timeout
      ad5: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
      re0: watchdog timeout
      ad5: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly
      ad5: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly
      re0: watchdog timeout
      ad5: WARNING - SET_MULTI taskqueue timeout - completing request directly
      ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=33122223
      re0: 11 link states coalesced
      re0: link state changed to DOWN
      re0: link state changed to UP
      arp_rtrequest: bad gateway 62.2.160.66 (!AF_LINK)
      arp_rtrequest: bad gateway 10.100.0.1 (!AF_LINK)
      re0: watchdog timeout
      ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
      ad4: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
      re0: watchdog timeout
      ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly
      ad4: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly
      re0: watchdog timeout
      ad4: WARNING - SET_MULTI taskqueue timeout - completing request directly
      re0: watchdog timeout
      ad5: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
      ad5: WARNING - SETFEATURES SET TRANSFER MODE taskqueue timeout - completing request directly
      re0: watchdog timeout
      ad5: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout - completing request directly
      ad5: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout - completing request directly
      re0: watchdog timeout
      ad5: WARNING - SET_MULTI taskqueue timeout - completing request directly
      ad5: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=33122223
      re0: 11 link states coalesced
      re0: link state changed to DOWN
      re0: link state changed to UP
      arp_rtrequest: bad gateway 62.2.160.66 (!AF_LINK)
      arp_rtrequest: bad gateway 10.100.0.1 (!AF_LINK)
      re0: watchdog timeout
      re0: link state changed to DOWN
      re0: link state changed to UP
      arp_rtrequest: bad gateway 62.2.160.66 (!AF_LINK)
      arp_rtrequest: bad gateway 10.100.0.1 (!AF_LINK)
      re0: watchdog timeout
      re0: link state changed to DOWN
      ad5: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=156301487
      re0: link state changed to UP
      arp_rtrequest: bad gateway 62.2.160.66 (!AF_LINK)
      arp_rtrequest: bad gateway 10.100.0.1 (!AF_LINK)
      arp_rtrequest: bad gateway 10.100.0.1 (!AF_LINK)
      arp_rtrequest: bad gateway 10.100.0.1 (!AF_LINK)
      arp_rtrequest: bad gateway 62.2.160.66 (!AF_LINK)
      arp_rtrequest: bad gateway 62.2.160.97 (!AF_LINK)

      It seems that it's been solved with the 6.2 kernels from last november, so i'd like to know if the latest pfsense build is reliable for a production use and the "preemption" kernel config key is still disabled (see http://forum.pfsense.org/index.php/topic,3664.0.html).

      Regards!!

      1 Reply Last reply Reply Quote 0
      • S
        sullrich
        last edited by

        Setup the BIOS so the nics are not sharing the Hard Disks IRQ.

        1 Reply Last reply Reply Quote 0
        • C
          cmb
          last edited by

          you can try a snapshot from http://snapshots.pfsense.org/FreeBSD6/RELENG_1/ for a 6.2-based version. No debugging is enabled.

          1 Reply Last reply Reply Quote 0
          • K
            k3rmit
            last edited by

            Thanks for the support and here's the update.

            I've managed to update the BIOS, btw it's a Dell machine and there are no IRQ settings available. I found out that two of the installed network cards share the same IRQ, i hope that's not a big issue, i think it's a common practice.

            Anyway i've also installed the latest build and it worked flawlessly, i just had a bad duplication problem with the services list (multiple "squid" rows). I've conducted tests which previously failed (even if after a long uptime) on the file system and now they seem to work fine, but before too much complimenting i'd like to see what happens under heavy load, also heavy network load.

            Also, latest build seems to have the issue outlined here

            http://forum.pfsense.org/index.php/topic,3325.0.html

            in the last post solved

            BTW, what stops you from releasing the latest builds?

            Alberto

            1 Reply Last reply Reply Quote 0
            • C
              cmb
              last edited by

              known bugs is why the snapshots are not yet a release version. The first 1.2 beta will be out soon, which will be free of all known issues. The 1.2 release should follow shortly after.

              As for IRQ sharing, yeah it shouldn't cause any problems, but it will reduce performance (not sure how much, but something as active as a disk and a NIC, both of which can be interrupt heavy if under load, could be significant).

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.