Error Handling for SCSI Controllers
Doug, dtn 237-2145 Flames to NL: 07-Jan-1994 1059
hagerman at starch.enet.dec.com
Fri Jan 7 07:59:30 PST 1994
Date: January 5, 1994 X3T10/94-___ Rev 0
To: X3T10 Committee (SCSI)
From: Doug Hagerman (Digital)
Subject: Error Handling for SCSI Controllers
This paper is a proposal for some additional error codes to handle
situations encounteredin storage subsystems, particularly RAID subsystems.
This is intended to be incorporated into the SCSI Controller
Commands (SCC) document.
8.0 Subsystem Environment
SCC describes subsystems that consist of addressable devices including
DACLs (Disk Array Conversion Layers), disks, power supplies, fans,
and operator consoles. Conventional SCSI devices, including all these
except DACLs, may be considered as independent units since each reports
only its own errors. The DACL device type is unique because it reports
not only its own errors but also those resulting from events on
lower level devices. A DACL is a controller, and has a slightly more
complicated error repording scheme as a result.
(Note that from the viewpoint of the initiator, there is no
distinction between "controller errors" and "device errors handled
by controller". Both types are reported to the initiator from the
DACL LUN.)
8.1 Controller Errors
Subsystem controller (DACL) errors are those that occur in the
controller itself, and are reported to the initator using the
appropriate SCSI mechanism, and the error type is indicated by the
approprate ASC/ASCQ combination for the SCC device type. An example
of this method would be a controller memory error, in which case
the error is not traceable to any underlying subsystem device.
In the case of a RAID subsystem, since the subsystem nominally represents
itself to the initiator as a disk, the disk device type codes will be used.
Additional error codes for controller specific situations are
listed below.
8.2 Device Errors Handled by Controller
Errors in an underlying device can be handled automatically by the
controller and reported to the initiator as subsystem exception conditions.
An example of this situation is a disk error in a RAID subsystem, which
would be handled by some method that was pre-arranged when the
RAID subsystem was set up. The initiator would see only a subsystem
exception condition, without the information about the details of the
underlying disk error itself. Error codes and how they relate to
underlying device errors are listed below.
8.3 Device Errors Handled by Initiator
Errors in an underlying device can also be handled by a pass-through
mechanism at the controller. This method would typically be
used for diagnostic or maintenance operations. The SCC addressing mechanism
allows an initiator to send commands directly to any addressable device in
the subsystem by simply specifying the LUN that represents the device.
See the relevant addressing document. Errors that occur in this
process cause a contingent allegiance condition on that LUN (task set,
really) which is handled by the initiator in the normal SCSI fashion.
The controller's pass-through mechanism will report the ASC/ASCQ codes that
are native to the device. No new codes will be needed for existing
device types (disk, tape, etc.).
8.4 Logging Device Errors
The subsystem can also optionally maintain a log of underlying device
errors so that the initiator can find out the details of those errors
for maintenance reasons.
8.5 Status Values, Sense Key Codes, and ASC/ASCQ Values
This list includes new codes for conditions native to SCSI controllers,
and those that the controller reports as a result of events triggered
by underlying devices. Codes for existing device types (disks, etc.)
are not listed here.
8.5.1 Status Values
A controller may return any of the status codes described in the
SCSI standard, including: GOOD, CHECK CONDITION, CONDITION MET,
BUSY, INTERMEDIATE, INTERMEDIATE - CONDITION MET, RESERVATION CONFLICT,
COMMAND TERMINATED, and QUEUE FULL. These status codes have the same
meanings as described in the SCSI standard.
8.5.2 Sense Key Codes and ASC/ASCQ Values
A controller may return the following sense key codes and ASC/ASCQ values.
The following list shows the normal relationship between the
codes and values, and the class of events that cause them to be
reported. These sense key descriptions are in addition to the
descriptions in the SCSI standard.
Sense Key Code ASC ASCQ Event
-------------- --- ---- -----
NO SENSE No specific sense key information
to be reported.
RECOVERED ERROR The last command completed successfully
without data loss, with some recovery
action performed by the controller.
Data was not lost.
xxh xxh Device unavailable, data regenerated.
xxh xxh
NOT READY The logical unit is not ready.
xxh xxh Rebuild in progress.
xxh xxh Recalculation in progress.
xxh xxh Operator initiated activity.
MEDIUM ERROR The last command terminated with a
non-recovered error condition that was
caused by a data storage condition.
Data may have been lost.
xxh xxh Redundancy failure.
xxh xxh Spare not available.
xxh xxh Check data error.
HARDWARE ERROR The last command terminated with a
non-recovered error condition that was
caused by a non-data component of the
system. Data may have been lost.
xxh xxh Power supply failure.
xxh xxh Fan failure.
xxh xxh
ILLEGAL REQUEST There was an illegal parameter in the
command or in the additional parameters.
xxh xxh Invalid bit specified.
xxh xxh Text string overflow.
xxh xxh Invalid P-LUI.
xxh xxh Invalide P-extent.
xxh xxh Invalid R-LUI.
xxh xxh Incompatible redundancy group parameter.
xxh xxh Invalid V-LUI.
xxh xxh Incompatible volume set parameter.
xxh xxh Invalid S-LUI.
xxh xxh Incompatible spare parameter.
UNIT ATTENTION A data storage element was changed,
or the device was reset.
DATA PROTECT A command was attempted on a data area
that is protected from this operation.
The command is not executed.
BLANK CHECK Blank or missing data area encountered.
VENDOR-SPECIFIC This sense key is available for reporting
vendor-specific conditions.
COPY ABORTED Copy command aborted due to device error.
ABORTED COMMAND The target aborted the command. The
initiator may be able to recover by trying
the command again.
EQUAL SEARCH DATA found matching data.
VOLUME OVERFLOW Data buffer end encountered.
MISCOMPARE Data did not match.
More information about the T10
mailing list