Random System Crash



  • Version: 2.3.2-RELEASE (amd64)
    CPU Type: AMD Opteron™ Processor 246 2 CPUs: 2 package(s) x 1 core(s)
    1U Rack mount Tyan board,Adaptec AIC7902 Ultra320 SCSI adapter on-board.
    36G SCSI Drive
    Broadcom Gigabit Ethernet Controller, ASIC rev. 0x002003 on-board (LAN)
    Intel 82551 Pro/100 Ethernet on-board (WAN)

    This router will randomly shut down on a daily basis. This router has been in a production environment for about a month. Issue started this past weekend.

    Features it uses: Firewall, NAT, 1:1NAT, dpinger, ntpd, unbound, Limiter

    It's a pretty basic setup.

    Last few log entries before the crash...
    Sep 7 04:58:50 kernel arp: 24.X.X.81 moved from 00:e0:81:2c:7c:e2 to 00:e0:81:2c:7c:ba on fxp0
    Sep 7 04:58:49 kernel arp: 24.X.X.81 moved from 00:e0:81:2c:7c:ba to 00:e0:81:2c:7c:e2 on fxp0
    Sep 7 04:38:53 kernel arp: 24.X.X.81 moved from 00:e0:81:2c:7c:ba to 00:e0:81:2c:7c:e2 on fxp0
    Sep 7 04:18:57 kernel arp: 24.X.X.81 moved from 00:e0:81:2c:7c:ba to 00:e0:81:2c:7c:e2 on fxp0
    Sep 7 03:59:00 kernel arp: 24.X.X.81 moved from 00:e0:81:2c:7c:ba to 00:e0:81:2c:7c:e2 on fxp0
    Sep 7 03:39:04 kernel arp: 24.X.X.81 moved from 00:e0:81:2c:7c:ba to 00:e0:81:2c:7c:e2 on fxp0
    Sep 7 03:19:08 kernel arp: 24.X.X.81 moved from 00:e0:81:2c:7c:e2 to 00:e0:81:2c:7c:ba on fxp0
    Sep 7 03:19:03 kernel arp: 24.X.X.81 moved from 00:e0:81:2c:7c:ba to 00:e0:81:2c:7c:e2 on fxp0
    Sep 7 03:05:10 root rc.update_bogons.sh is ending the update cycle.
    Sep 7 03:05:09 root rc.update_bogons.sh is beginning the update cycle.
    Sep 7 03:01:00 root rc.update_bogons.sh is sleeping for 249
    Sep 7 03:01:00 root rc.update_bogons.sh is starting up.
    Sep 7 02:59:08 kernel arp: 24.X.X.81 moved from 00:e0:81:2c:7c:e2 to 00:e0:81:2c:7c:ba on fxp0
    Sep 7 02:59:03 kernel arp: 24.X.X.81 moved from 00:e0:81:2c:7c:ba to 00:e0:81:2c:7c:e2 on fxp0
    Sep 7 02:42:08 kernel ahd0: Address or Write Phase Parity Error Detected in TARG.
    Sep 7 02:42:08 kernel <<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
    Sep 7 02:42:08 kernel 0x0 0x0
    Sep 7 02:42:08 kernel 0x0ahd1: Address or Write Phase Parity Error Detected in TARG.
    Sep 7 02:42:08 kernel 0x0<<<<<<<<<<<<<<<<< Dump Card State Ends >>>>>>>>>>>>>>>>>>
    Sep 7 02:42:08 kernel 0x0STACK: 0x0 0x23 0x0 0x0 0x0 0x0 0x0 0x0
    Sep 7 02:42:08 kernel 0x0CDB a 0 b4 80 8 18
    Sep 7 02:42:08 kernel 0x0ahd0: SCBPTR == 0x1f0, SCB_NEXT == 0xff00, SCB_NEXT2 == 0xff0c
    Sep 7 02:42:08 kernel 0x0ahd0: REG0 == 0x9460, SINDEX = 0x10e, DINDEX = 0x10e
    Sep 7 02:42:08 kernel STACK:
    Sep 7 02:42:08 kernel CCSCBCTL[0x0]CDB 12 0 0 0 24 0
    Sep 7 02:42:08 kernel ahd1: SCBPTR == 0x1f1, SCB_NEXT == 0xff40, SCB_NEXT2 == 0xff15
    Sep 7 02:42:08 kernel ) ahd1: REG0 == 0x6bfd, SINDEX = 0x102, DINDEX = 0x102
    Sep 7 02:42:08 kernel CCSCBCTL[0x4]SIMODE0[0xc]:(CCSCBDIR:(ENOVERRUN) |ENIOERR

    SMART checks out on the drive.

    Not sure what the issue is. We have another router with identical hardware and a similar setup that is not having issue.

    Any ideas?

    Thanks



  • Okay, so the random shutdowns were not because of… 0x0ahd1: Address or Write Phase Parity Error Detected in TARG.

    Yesterday in the evening we had a power supply failure. We replaced the power supply and the system has yet to go down since.

    However we still get the "0x0ahd1: Address or Write Phase Parity Error Detected in TARG." errors in the logs.

    Are we looking at a HDD failure in the works?