94-024r1 SDA ASC/ASCQ Codes

Doug, dtn 237-2145 Flames to NL: 08-Feb-1994 1756 hagerman at starch.enet.dec.com
Tue Feb 8 14:56:04 PST 1994









        Date:           February 8, 1994                 X3T10/94-024 Rev 1
        To:             X3T10 Committee (SCSI)
        From:           Doug Hagerman (Digital)
        Subject:        Error Handling for SCSI Controllers

        This paper is a proposal for some additional error codes to
        handle situations encountered in storage subsystems,
        particularly RAID subsystems.



        1  OVERVIEW

        In the current SCSI standard the description of each ASC/ASCQ
        codes is very brief, and there is no requirement that a disk
        vendor use the codes in any particular way.  In the case of
        RAID subsystems, one of the original RAB goals was to allow
        third party configuration software to be used to control RAID
        boxes from several vendors.  In this light I suggest that we
        may wish to arrive at both a detailed list of ASC/ASCQ codes
        and also some specific information about how they are used in
        processing each command.

        The goal is to arrive at a list of ASC/ASCQ codes suitable for
        addition to the current SCSI-3 list.



        2  SOURCES OF ERRORS

        An initiator talking to a subsystem can get back errors from
        several sources in the subsystem; these can be classified as
        follows.

             1.  Device errors reported in pass-through mode (D).
                 These match the codes for the devices exactly--if you
                 talk to a disk using the pass-through addressing
                 mechanism you get the errors directly from the
                 device.  Nothing new is needed in the ASC/ASCQ table
                 for these codes, and since they won't be reported
                 during normal operation (they are mapped into some
                 kind of a subsystem error), they don't need to be in
                 the list.

             2.  Bus and general SCSI errors (B).  These codes are the
                 ones that are common to most SCSI devices and relate
                 to various things that can go wrong in any SCSI
                 device.  A clue to these is that in the current
                 ASC/ASCQ table a code can be used by any device type,
                 it's a general SCSI error.  A few new ones are
                 proposed.






                                                                Page 2


             3.  Subsystem command errors (C).  These are generated in
                 response to RAID configuration commands with errors.
                 These codes are all new, and will be added as the
                 detailed list of commands is worked out.  There will
                 be a number of these, I would guess at least two
                 dozen.

             4.  Errors that occur during operation of the RAID device
                 (R).  Many of these are the same as those used for
                 disk devices since RAID boxes look a lot like disks.
                 Others are new, depending on the various functions
                 performed by the controller as it runs.


        [I'm not sure how to handle the case of SCSI errors that occur
        inside the RAID box.  For example, should the host see a
        distinction between "4A:  command phase error on host bus" and
        "4A:  command phase error on internal bus"?]



        3  DRAFT LIST OF ASC/ASCQ CODES

        The list below includes the existing SCSI-3 ASC/ASCQ codes
        that should be applicable to SDA devices, plus a number of new
        ones.  For each code I tried to classify it into one of the
        four classes listed above (D, B, C, R) in order to try to show
        why it is needed.

        The codes caused by commands (C) will not be understood in
        detail until the commands themselves are figured out.  The
        codes from normal RAID operation (R) need to be reviewed in
        terms of their usefulness across all implementations.



        4  PROPOSED TEXT

        The following text is intended to be incorporated into the
        SCSI Controller Commands (SCC) document.



        8.0  Subsystem Environment

        SCC describes subsystems that consist of addressable devices
        including SDAs (SCSI Disk Arrarys), disks, power supplies,
        fans, and operator consoles.  Conventional SCSI devices,
        including all these except SDAs, may be considered as
        independent units since each device reports only its own
        errors.  The SDA device type is unique because it reports not
        only its own errors but also those resulting from events on






                                                                Page 3


        lower level devices.  An SDA is a controller, and has a
        slightly more complicated error repording scheme as a result.

        Note that from the viewpoint of the initiator, there is no
        distinction between "controller errors" and "device errors
        handled by controller".  Both types are reported to the
        initiator from the SDA LUN.



        8.1  Controller Errors

        Subsystem controller (SDA) errors are those that occur in the
        controller itself, and are reported to the initator using the
        appropriate SCSI mechanism, and the error type is indicated by
        the approprate ASC/ASCQ combination for the SCC device type.
        An example of this method would be a controller memory error,
        in which case the error is not traceable to any underlying
        subsystem device.

        This category includes errors that occur on the SCSI bus
        connection between the initiator and the controller.

        In the case of a RAID subsystem, since the subsystem nominally
        represents itself to the initiator as a disk, many of the disk
        device type codes are used.  A new column will be added to the
        SCSI-3 list of ASC/ASCQs indicating those suitable for use by
        SDAs.  Additional error codes for situations specific to SDAs
        are listed below and will be added to the SCSI-3 list.



        8.2  Device Errors Handled By Controller

        Errors in an underlying device can be handled automatically by
        the controller and reported to the initiator as subsystem
        exception conditions.  An example of this situation is a disk
        error in a RAID subsystem, which would be handled
        automatically by some method that was pre-arranged when the
        RAID subsystem was set up.  The initiator would see only a
        subsystem exception condition, without the information about
        the details of the underlying disk error itself.

        The subsystem can also optionally maintain a log of underlying
        device errors so that the initiator can find out the details
        of those errors for maintenance reasons.



        8.3  Device Errors Handled By Initiator

        An initiator may communicate to a device connected to an SDA






                                                                Page 4


        by using the SCSI-3 SDA pass-through addressing mechanism.
        This method would typically be used for diagnostic or
        maintenance operations.  SCSI-3 SDA addressing allows an
        initiator to send commands directly to any addressable device
        in the subsystem by simply specifying the LUN that represents
        the device.  Errors that occur during such commands cause a
        contingent allegiance condition on that LUN (task set, really)
        which is handled by the initiator in the normal SCSI-3
        fashion.

        In this situation the SDA reports the ASC/ASCQ codes that are
        native to the device.  No new codes will be needed for
        existing device types (disk, tape, etc.).



        8.4  Status Values, Sense Key Codes, And ASC/ASCQ Values

        Status values and sense key codes are used in the same way as
        they are used in other SCSI devices.

        ASC/ASCQ codes for disks, tapes, and other devices connected
        to a SCSI controller are used in the same way as they are used
        on the native devices and are not listed here.

        The table of ASC/ASCQ codes in SCSI-3 will have to be modified
        to indicate the use of the codes by SCSI controllers.  The
        ASC/ASCQ codes proposed for use by SCSI controllers are listed
        below.  The actual values for the codes will be determined
        later.

        RCBD  ASC ASCQ  DTLPWRSOMC  Event
        ----  --- ----  ----------  -----
        (This first section includes all the codes that are currently
        defined in SCSI-3 that are used by all devices, plus many that are
        used by disk devices, plus some others that seem likely to be
        relevant to subsystems. All these would have an "R" added to
        the current table in SCSI-3.)

           D  13h  00h  D   W  O    ADDRESS MARK NOT FOUND FOR DATA FIELD
           D  12h  00h  D   W  O    ADDRESS MARK NOT FOUND FOR ID FIELD
          B   29h  03h *DTLPWRSOMC* BUS DEVICE RESET MESSAGE OCCURRED
           D  11h  0Eh *DT  WR O  * CANNOT DECOMPRESS USING DECLARED ALGORITHM
         C    30h  06h *DT  W  O  * CANNOT FORMAT MEDIUM - INCOMPATIBLE MEDIUM
         C    30h  02h  DT  WR O    CANNOT READ MEDIUM - INCOMPATIBLE FORMAT
         C    30h  01h  DT  WR O    CANNOT READ MEDIUM - UNKNOWN FORMAT
         C    30h  05h *DT  W  O  * CANNOT WRITE MEDIUM - INCOMPATIBLE FORMAT
         C    30h  04h *DT  W  O  * CANNOT WRITE MEDIUM - UNKNOWN FORMAT
         CB   3Fh  02h  DTLPWRSOMC  CHANGED OPERATING DEFINITION
           D  30h  03h  DT          CLEANING CARTRIDGE INSTALLED
         CB   4Ah  00h  DTLPWRSOMC  COMMAND PHASE ERROR
         CB   2Ch  00h  DTLPWRSOMC  COMMAND SEQUENCE ERROR






                                                                Page 5


          B   2Fh  00h  DTLPWRSOMC  COMMANDS CLEARED BY ANOTHER INITIATOR
           D  0Ch  04h *DT  W  O  * COMPRESSION CHECK MISCOMPARE ERROR
         CB   2Bh  00h  DTLPWRSO C  COPY CANNOT EXECUTE SINCE HOST
                                       CANNOT DISCONNECT
          B   4Bh  00h  DTLPWRSOMC  DATA PHASE ERROR
         C    16h  03h *D   W  O  * DATA SYNC ERROR - DATA AUTO-REALLOCATED
         C    16h  01h *D   W  O  * DATA SYNC ERROR - DATA REWRITTEN
         C    16h  04h *D   W  O  * DATA SYNC ERROR - RECOMMEND REASSIGNMENT
         C    16h  02h *D   W  O  * DATA SYNC ERROR - RECOMMEND REWRITE
         C    16h  00h  D   W  O    DATA SYNCHRONIZATION MARK ERROR
         C    11h  0Dh *DT  WR O  * DE-COMPRESSION CRC ERROR
         C    71h  00h * T        * DECOMPRESSION EXCEPTION LONG ALGORITHM ID
         C    70h  NNh * T        * DECOMPRESSION EXCEPTION SHORT ALGORITHM
                                       ID OF NN
         C    19h  00h  D      O    DEFECT LIST ERROR
         C    19h  03h  D      O    DEFECT LIST ERROR IN GROWN LIST
         C    19h  02h  D      O    DEFECT LIST ERROR IN PRIMARY LIST
         C    19h  01h  D      O    DEFECT LIST NOT AVAILABLE
         C    1Ch  00h  D      O    DEFECT LIST NOT FOUND
         C    32h  01h  D   W  O    DEFECT LIST UPDATE FAILURE
        R     40h  NNh  DTLPWRSOMC  DIAGNOSTIC FAILURE ON COMPONENT NN (80H-FFH)
        R     0Ah  00h  DTLPWRSOMC  ERROR LOG OVERFLOW
           D  11h  02h  DT  W SO    ERROR TOO LONG TO CORRECT
        R     03h  02h   T          EXCESSIVE WRITE ERRORS
        R  D  5Dh  00h *DTLPWRSOMC* FAILURE PREDICTION THRESHOLD EXCEEDED
        R  D  31h  01h  D L    O    FORMAT COMMAND FAILED
           D  1Ch  02h  D      O    GROWN DEFECT LIST NOT FOUND
           D  09h  04h *DT  WR O  * HEAD SELECT FAULT
          B   00h  06h  DTLPWRSOMC  I/O PROCESS TERMINATED
           D  10h  00h  D   W  O    ID CRC OR ECC ERROR
         CB   5Eh  03h *DTLPWRSO C* IDLE CONDITION ACTIVATED BY COMMAND
          B   5Eh  01h *DTLPWRSO C* IDLE CONDITION ACTIVATED BY TIMER
        R  D  30h  00h  DT  WR OM   INCOMPATIBLE MEDIUM INSTALLED
        R  D  11h  08h   T          INCOMPLETE BLOCK READ
        R B   48h  00h  DTLPWRSOMC  INITIATOR DETECTED ERROR MESSAGE RECEIVED
        RCB   3Fh  03h  DTLPWRSOMC  INQUIRY DATA HAS CHANGED
        R     44h  00h  DTLPWRSOMC  INTERNAL TARGET FAILURE
          B   3Dh  00h  DTLPWRSOMC  INVALID BITS IN IDENTIFY MESSAGE
         CB   20h  00h  DTLPWRSOMC  INVALID COMMAND OPERATION CODE
        RC    21h  01h *DT  WR OM * INVALID ELEMENT ADDRESS
         CB   24h  00h  DTLPWRSOMC  INVALID FIELD IN CDB
         CB   26h  00h  DTLPWRSOMC  INVALID FIELD IN PARAMETER LIST
          B   49h  00h  DTLPWRSOMC  INVALID MESSAGE ERROR
        R     5Bh  02h  DTLPWRSOM   LOG COUNTER AT MAXIMUM
        R     5Bh  00h  DTLPWRSOM   LOG EXCEPTION
        R     5Bh  03h  DTLPWRSOM   LOG LIST CODES EXHAUSTED
        R     2Ah  02h  DTL WRSOMC  LOG PARAMETERS CHANGED
        RCB   21h  00h  DT  WR OM   LOGICAL BLOCK ADDRESS OUT OF RANGE
        R     08h  00h  DTL WRSOMC  LOGICAL UNIT COMMUNICATION FAILURE
        R     08h  02h  DTL WRSOMC  LOGICAL UNIT COMMUNICATION PARITY ERROR
        R     08h  01h  DTL WRSOMC  LOGICAL UNIT COMMUNICATION TIME-OUT
        R     05h  00h  DTL WRSOMC  LOGICAL UNIT DOES NOT RESPOND TO SELECTION






                                                                Page 6


        R     4Ch  00h  DTLPWRSOMC  LOGICAL UNIT FAILED SELF-CONFIGURATION
        R     3Eh  00h  DTLPWRSOMC  LOGICAL UNIT HAS NOT SELF-CONFIGURED YET
        R     04h  01h  DTLPWRSOMC  LOGICAL UNIT IS IN PROCESS OF BECOMING READY
        R     04h  00h  DTLPWRSOMC  LOGICAL UNIT NOT READY, CAUSE NOT REPORTABLE
        R     04h  04h  DTL    O    LOGICAL UNIT NOT READY, FORMAT IN PROGRESS
         C    04h  02h  DTLPWRSOMC  LOGICAL UNIT NOT READY, INITIALIZING
                                       COMMAND REQUIRED
         C    04h  03h  DTLPWRSOMC  LOGICAL UNIT NOT READY, MANUAL
                                       INTERVENTION REQUIRED
         C    25h  00h  DTLPWRSOMC  LOGICAL UNIT NOT SUPPORTED
        R     5Eh  00h *DTLPWRSO C* LOW POWER CONDITION ACTIVE
           D  15h  01h  DTL WRSOM   MECHANICAL POSITIONING ERROR
        RC D  53h  00h  DTL WRSOM   MEDIA LOAD OR EJECT FAILED
        R  D  3Bh  0Dh *DT  WR OM * MEDIUM DESTINATION ELEMENT FULL
        R  D  31h  00h  DT  W  O    MEDIUM FORMAT CORRUPTED
        RC    3Ah  00h  DTL WRSOM   MEDIUM NOT PRESENT
        R     53h  02h  DT  WR OM   MEDIUM REMOVAL PREVENTED
        R     3Bh  0Eh *DT  WR OM * MEDIUM SOURCE ELEMENT EMPTY
          B   43h  00h  DTLPWRSOMC  MESSAGE ERROR
        R     3Fh  01h  DTLPWRSOMC  MICROCODE HAS BEEN CHANGED
        R     1Dh  00h  D   W  O    MISCOMPARE DURING VERIFY OPERATION
          B   11h  0Ah  DT     O    MISCORRECTED ERROR
          B   2Ah  01h  DTL WRSOMC  MODE PARAMETERS CHANGED
        R     11h  03h  DT  W SO    MULTIPLE READ ERRORS
          B   00h  00h  DTLPWRSOMC  NO ADDITIONAL SENSE INFORMATION
        R     32h  00h  D   W  O    NO DEFECT SPARE LOCATION AVAILABLE
           D  01h  00h  D   W  O    NO INDEX/SECTOR SIGNAL
           D  02h  00h  D   WR OM   NO SEEK COMPLETE
        R     28h  00h  DTLPWRSOMC  NOT READY TO READY TRANSITION, MEDIUM
                                       MAY HAVE CHANGED
        R     5Ah  01h  DT  WR OM   OPERATOR MEDIUM REMOVAL REQUEST
        R     5Ah  00h  DTLPWRSOM   OPERATOR REQUEST OR STATE CHANGE
                                       INPUT (UNSPECIFIED)
        RC    5Ah  03h  DT  W  O    OPERATOR SELECTED WRITE PERMIT
        RC    5Ah  02h  DT  W  O    OPERATOR SELECTED WRITE PROTECT
          B   4Eh  00h  DTLPWRSOMC  OVERLAPPED COMMANDS ATTEMPTED
         CB   1Ah  00h  DTLPWRSOMC  PARAMETER LIST LENGTH ERROR
         CB   26h  01h  DTLPWRSOMC  PARAMETER NOT SUPPORTED
         CB   26h  02h  DTLPWRSOMC  PARAMETER VALUE INVALID
          B   2Ah  00h  DTL WRSOMC  PARAMETERS CHANGED
        R     03h  00h  DTL W SO    PERIPHERAL DEVICE WRITE FAULT
        R B   29h  01h *DTLPWRSOMC* POWER ON OCCURRED
        R B   29h  00h  DTLPWRSOMC  POWER ON, RESET, OR BUS DEVICE
                                       RESET OCCURRED
        R     40h  00h  D           RAM FAILURE (SHOULD USE 40 NN)
        R     14h  01h  DT  WR O    RECORD NOT FOUND
        R     14h  00h  DTL WRSO    RECORDED ENTITY NOT FOUND
        R     18h  02h  D   WR O    RECOVERED DATA - DATA AUTO-REALLOCATED
        R     18h  05h  D   WR O    RECOVERED DATA - RECOMMEND REASSIGNMENT
        R     18h  06h  D   WR O    RECOVERED DATA - RECOMMEND REWRITE
        R     17h  05h  D   WR O    RECOVERED DATA USING PREVIOUS SECTOR ID
        R     18h  03h       R      RECOVERED DATA WITH CIRC






                                                                Page 7


        R     18h  07h *D   W  O  * RECOVERED DATA WITH ECC - DATA REWRITTEN
        R     18h  01h  D   WR O    RECOVERED DATA WITH ERROR CORRECTION
                                       & RETRIES APPLIED
        R     18h  00h  DT  WR O    RECOVERED DATA WITH ERROR CORRECTION
                                       APPLIED
        R     18h  04h       R      RECOVERED DATA WITH L-EC
        R  D  17h  03h  DT  WR O    RECOVERED DATA WITH NEGATIVE HEAD OFFSET
        R  D  17h  00h  DT  WRSO    RECOVERED DATA WITH NO ERROR CORRECTION
                                       APPLIED
        R  D  17h  02h  DT  WR O    RECOVERED DATA WITH POSITIVE HEAD OFFSET
        R     17h  01h  DT  WRSO    RECOVERED DATA WITH RETRIES
        R     17h  04h      WR O    RECOVERED DATA WITH RETRIES AND/OR
                                       CIRC APPLIED
        R     17h  06h  D   W  O    RECOVERED DATA WITHOUT ECC - DATA
                                       AUTO-REALLOCATED
        R     17h  09h *D   W  O  * RECOVERED DATA WITHOUT ECC - DATA REWRITTEN
        R  D  17h  07h  D   W  O    RECOVERED DATA WITHOUT ECC -
        R                               RECOMMEND REASSIGNMENT
        R     17h  08h  D   W  O    RECOVERED DATA WITHOUT ECC - RECOMMEND
                                       REWRITE
        R     1Eh  00h  D   W  O    RECOVERED ID WITH ECC CORRECTION
         CB   37h  00h  DTL WRSOMC  ROUNDED PARAMETER
         CB   39h  00h  DTL WRSOMC  SAVING PARAMETERS NOT SUPPORTED
        R B   29h  02h *DTLPWRSOMC* SCSI BUS RESET OCCURRED
        R B   47h  00h  DTLPWRSOMC  SCSI PARITY ERROR
        R B   45h  00h  DTLPWRSOMC  SELECT OR RESELECT FAILURE
        R     5Ch  02h  D      O    SPINDLES NOT SYNCHRONIZED
        R     5Ch  01h  D      O    SPINDLES SYNCHRONIZED
          B   5Eh  04h *DTLPWRSO C* STANDBY CONDITION ACTIVATED BY COMMAND
          B   5Eh  02h *DTLPWRSO C* STANDBY CONDITION ACTIVATED BY TIMER
        R B   1Bh  00h  DTLPWRSOMC  SYNCHRONOUS DATA TRANSFER ERROR
         CB   4Dh  NNh *DTLPWRSOMC* TAGGED OVERLAPPED COMMANDS (NN = QUEUE TAG)
          B   3Fh  00h  DTLPWRSOMC  TARGET OPERATING CONDITIONS HAVE CHANGED
        R     5Bh  01h  DTLPWRSOM   THRESHOLD CONDITION MET
        R     26h  03h  DTLPWRSOMC  THRESHOLD PARAMETERS NOT SUPPORTED
           D  09h  00h  DT  WR O    TRACK FOLLOWING ERROR
        R     11h  00h  DT  WRSO    UNRECOVERED READ ERROR
        R     11h  04h  D   W  O    UNRECOVERED READ ERROR - AUTO
                                       REALLOCATE FAILED
        R     11h  0Bh  D   W  O    UNRECOVERED READ ERROR - RECOMMEND
                                       REASSIGNMENT
        R     11h  0Ch  D   W  O    UNRECOVERED READ ERROR -
                                       RECOMMEND REWRITE THE DATA
          B   46h  00h  DTLPWRSOMC  UNSUCCESSFUL SOFT RESET


        (This next section has new codes that may be applicable to controller
        devices that use the new SCC device model. These were generated
        by taking a cut at what errors could be caused by the commands
        proposed by George Penokie, plus a review of some VU codes that
        are in use in existing RAID implementations.)







                                                                Page 8


         C                          Invalid P-LUI identifier.
         C                          No P-extents to report.
         C                          Various mode page error codes needed.
         C                          Invalid R-LUI identifier.
         C                          Invalid parameter.
         C                          Invalid P-extent identifier.
         C                          Invalid P-LBA identifier.
         C                          Invalid unit range.
         C                          Redundancy group cannot be deleted because
                                       of [various reasons].
         C                          Check data cannot be recalculated because
                                       of [various reasons].
         C                          Check data cannot be verified because
                                       of [various reasons].
         C                          Check data not specified for this R-LUI.
         C                          P-extent cannot be rebuilt because
                                       of [various reasons].
         C                          P-LUI cannot be rebuilt because
                                       of [various reasons].
         C                          Invalid V-LUI identifier.
         C                          Invalid PS-extent paramter.
         C                          Invalid PS-extent identifier.
         C                          V-LUI check data cannot be recalculated
                                       because of [various reasons].
         C                          Invalid V-LUI number.
         C                          Invalid V-LBA identifier.
         C                          Invalid V-LBA number.
         C                          V-LBA check data cannot be verified because
                                       of [various reasons].
         C                          Invalid S-LUI identifier.
         C                          Invalid P-extent identifier.
         C                          Invalid S-LUI identifier.
         C                          P-LUI spare cannot be created/modified
                                       because of [various reasons].
         C                          S-LUI cannot be deleted because
                                       of [various reasons].

        RC                          LUN already exists; cannot do "Add LUN"
                                       function
        RC                          LUN does not exist; cannot do "Replace
                                       LUN" function
        RC                          Drive already exists; cannot do "Add
                                       drive" function
        RC                          Drive does not exist; cannot do requested
                                       function for it
        RC                          Drive can't be deleted; it's part of a LUN
        RC                          Drive can't be failed; it's formatting
        RC                          Drive can't be replaced; it's not marked as
                                       failed or replaced
        RC                          Invalid action to take
        RC                          Invalid reconstruction amount
        RC                          Invalid reconstruction frequency






                                                                Page 9


        RC                          Invalid LUN block size
        RC                          Invalid LUN type
        RC                          Invalid segment size
        RC                          Invalid number of drives in LUN
        RC                          Invalid number of LUN blocks
        RC                          Invalid RAID level
        RC                          Invalid drive sector size
        RC                          Invalid LUN block size/drive sector
                                       size modulo
        RC                          No disks defined for LUN
        RC                          Disk defined multiple times for LUN
        RC                          Drive cannot be included in rank because
                                       rank is full
        RC                          Ranks have differen number of disks defined
        RC                          Multiple disks on same channel within
                                       same rank
        RC                          Mirrored disks on the same channel
        RC                          No parity disk defined
        RC                          No data disks defined
        RC                          Too many disks defined
        RC                          No space available for LUN
        RC                          Drive status cannot be changed to GOOD
        RC                          Error in processing a subsystem mode page
        RC                          Drive INQUIRY data mismatch between drives
                                       in the LUN
        RC                          Drive capacity mismatch between drives
                                       in the LUN
        RC                          Drive block size mismatch between drives
                                       in the LUN


        RC    xxh  xxh              Rebuild in progress
        RC                          Recalculation in progress
        RC                          Spare not available - ?
        RC                          Check data error
         C                          Invalid bit specified - ?
         C                          Text string overflow - ?
        R     xxh                   State change has occurred
        R          00h              Informational, refer to log to find cause of
                                       state change
        R          01h              Intervention required
        R          02h              SDA available
        R          03h              Redundancy level got better
        R          04h              Redundancy level got worse


        (The following codes are currently in use in various RAID
        controller implementations. These controllers are not based on the SCC
        device model, so obviously some of them are not likely to
        be generally applicable.)

        R B   xxh  xxh              Request Sense command to drive failed






                                                               Page 10


        R                           Premature completion of a drive command
        R                           Drive returned vendor unique sense data
        RC                          Command timeout
        R B                         Buss errors
        R                           Non-SCSI bus parity error
        R                           No command control structures available
        R                           Nonvolatile parameter memory component
                                       event report
        R                           Watchdog timer timout
        R  D                        Disconnect timeout
        R                           Backup battery failure event report
        R                           Chip command timeout
        R B                         Unexpected bus phase
        R B                         Message Reject received on a valid message
        RCB                         Byte transfer timeout
        RCB                         Command failed--SCSI ID verification failed
        R                           Data transfer request error
        RCB                         ID Message not sent
        R                           Data returned from drive is invalid
        R B                         Synchronous negotiation error
        R B                         Maximum number of errors for
                                       this I/O exceeded
        R                           Drive reported recovered error without
                                       transferring all data
        R B                         Unexpected disconnect
        RCB                         Unexpected message
        RCB                         Unexpected Tag message
        R B                         Test Unit Ready or Read Capacity
                                       command failed
        R                           Drive failed due to a deferred error
                                       reported by drive
        R                           Unrecovered Read/Write error
        R                           No response from one or more drives
        R                           NV memory and drive metadata indicate
                                       conflicting drive configurations
        R B                         Synchronous Transfer value differences
                                       between drives

        R                           Drive failed because of a failed write
                                       operation (replace drive now)
        R                           Drive failed because automatic reallocation
                                       failed
        R                           Drive failed because reconstruction failed
                                       on drive being reconstructed
        R                           Drive failed because reconstruction failed
                                       because of read error on source drive
        R                           Drive failed due to hardware component
                                       diagnostics failure
        R                           Drive failed because it failed a Test Unit
                                       Ready command or Read Capacity command
                                       or during format or reconstruction
                                       operation






                                                               Page 11


        R                           Drive failed because it failed a Format
                                       Unit command
        R                           Drive failed because of a deferred error
                                       reported by drive
        R                           Excessive media error rate
        R                           Excessive Seek error rate
        R                           Excessive grown defects
        R                           No response from one or more drives
        R                           ROM code indicates no drive is present
                                       although information stored on disks
                                       indicates drive should be present
        RCB                         Mode parameters for drives in LUN
                                       don't match
        R                           Wrong drive was replaced
        R                           Component failure affecting multiple
                                       channels
        R B                         Reservation conflict
        RC                          Operation not allowed during reconstruction
        R                           Non-failed drive was unavailable
                                       for operations
        R B                         Drive not returning required mode sense page
        R                           Parity/data mismatch
                        




        8.5  Error Control Mode Page

        Because of the possible complexity of a SCSI controller device
        and the requirement for sophisticated tracking of the devices
        internal state transitions, a new mode page is required for
        controlling the error reporting system.  TBD.

        [State reporting and tracking method needed.  Ability to
        define "levels" of states.  Ability to inquire all possible
        states.  Ability to force state machine into a certain state
        (or maybe this is a command.) Method of referring to log to
        find out reason for state change.]



        8.6  Error Logging

        Because of the desire by the committee to support standardized
        user interface programs to control RAID subsystems, a
        standardized method for managing and using error logs is
        needed.




More information about the T10 mailing list