new Data Recovery on deferred errors [94-067r2]

Gerry Houlder Gerry_Houlder at notes.seagate.com
Thu Aug 11 06:31:19 PDT 1994


I have revised this document per discussion at the July working group meeting.
The issue addressed by the changes is as follows:
 
A) There were comments against using a READ command to recover data from the
buffer that had not been written to disk. I have added a definition of a
RECOVER BUFFERED DATA command (which previously did not exist for block
devices, only sequential devices) and propose using this command instead of a
READ command.
 
Alternatives discussed that were rejected:
1) Use READ command -- some members felt that using ACA to modify the action of
this command (by returning data from buffer that is different than what is
written on disk for that LBA & length) is wrong. I can see their point so I am
no longer suggesting this response.
2) Use READ BUFFER command -- this cannot be used because it is byte oriented,
not logical block oriented like READ and WRITE. Converting the LBA and length
values to byte offset location and byte length is to horrible to contemplate.
3) Use REASSIGN BLOCK followed by SYNCHRONIZE CACHE -- If the data could be
written to disk with this procedure, the target's "auto-reallocation" algorithm
would also have been successful and there wouldn't be a need to have the
unrecoverable write error in the first place. This procedure is inadequate.
4) Leave the recovery procedure in the description as "implementation
specific". This goes against the spirit of having a standard. A recovery method
must be suggested by the standard for the recovery action to be useful in a
multi-vendor environment.
 
--------------------------------------
Data Recovery on Deferred Errors . . . . . . . . . . . . . . . . . . . .
X3T10/94-067r2
 
PROBLEM STATEMENT
 
We have a customer that is concerned about recovering write data after a
deferred error. When Write Caching is used, the WRITE command will return GOOD
status before the data is written to the disc. If an unrecoverable error occurs
during that write operation, a later command from the same initiator will end
with CHECK CONDITION status. The sense bytes for that error will report
Deferred Error and the Logical Block Address of the erroring block. When such
an error occurs, the customer wants to read the unwritten data back and attempt
to retry the write (possibly at a different location). Existing direct access
device SCSI doesn't have enough information/commands to do this to our
customer's satisfaction.
 
SOLUTION
 
The Auto Contingent Allegiance (ACA) feature of SCSI-3 is almost enough. The
ACA procedure allows for retrieving sense data, then issuing a command to
"retrieve" the unwritten data from the target's buffer. Of course, the target
must be careful to retain all write data for failed write commands (and allow
that data to be used to satisfy subsequent read back requests) until after the
Contingent Allegiance is cleared. The returned sense data will contain the
first LBA that was not written to disc in the information bytes. This can be
used by the following  command to retrieve the unwritten data. The only item
that is still unknown is the number of blocks of unwritten data that is
available. I propose using the command specific bytes in REQUEST SENSE data to
indicate the length of unwritten data when a deferred error occurs. Note that
this number of blocks can include data from other commands that has been
"merged" with the data in the failing command. Such merged data is always
sequentially after the data from the failing command.
 
Add the following wording to the error reporting section of the direct access
device model (equivalent to section 9.1.12 in SCSI-2 Standard):
 
Condition
Unrecovered write error, GOOD status already returned because write caching is
active.
 
Sense Bytes
Error Code = 0x71; Sense Key = MEDIUM ERROR; ASC & ASCQ contain appropriate
codes; Information = LBA of first block not written to medium; Command Specific
Information = Number of blocks not written to disk (and may be recovered via a
RECOVER BUFFERED DATA command if it is issued before ACA condition is cleared).
This number of blocks may include blocks from subsequent commands that were
merged with the block that encountered the unrecoverable write error.
 
Also add the following command description to the SBC document:
[The document editor shall propose an acceptable command op code to the working
committee when the proposal is accepted.]
 
x.x.x  RECOVER BUFFERED DATA command
 
                    Table xx - RECOVER BUFFERED DATA command
 
  Bit|  7  |  6  |  5  |  4  |  3  |  2  |  1  |  0  |
Byte |-----------------------------------------------|
  0  |            Operation Code (xxh)               |
-----|-----------------------------------------------|
  1  |                Reserved                       |
-----|-----------------------------------------------|
  2  | (MSB)                                         |
-----|--                                           --|
  3  |             Logical Block Address             |
-----|--                                           --|
  4  |                                               |
-----|--                                           --|
  5  |                                         (LSB) |
-----|-----------------------------------------------|
  6  |                Reserved                       |
-----|-----------------------------------------------|
  7  | (MSB)       Transfer Length                   |
-----|--                                           --|
  8  |                                         (LSB) |
-----|-----------------------------------------------|
  9  |                Control Byte                   |
-----|-----------------------------------------------|
 
The RECOVER BUFFERED DATA command (table xx) is used to recover data that has
been transferred to the target's buffer but has not been successfully written
to the medium. It is normally used to recover from error or exception
conditions that make it impossible to write the buffered data to the medium.
The execution of this command is similar to the READ(10) command except that
the data is transferred from the target's buffer instead of the medium.
 
See x.x.x [reference to section for READ(10) command] for a definition of the
logical block address field and transfer length field.
 
If an attempt is made to recover data for an LBA that doesn't have any
associated unwritten data, CHECK CONDITION status shall be returned. The sense
key shall be set to ILLEGAL REQUEST and the additional sense bytes shall be
INVALID FIELD IN CDB.
 
If the transfer length is longer than the available unwritten data, the target
shall transfer the available unwritten data and return CHECK CONDITION status.
The valid bit shall be set, the information field shall be set to the number of
logical blocks that were actually returned, the sense key shall be set to
RECOVERED ERROR, and the additional sense bytes shall be NO ADDITIONAL SENSE
INFORMATION.
--
Gerry Houlder -- Gerry_Houlder at notes.seagate.com
-------------------------------------------------------------------------
Seagate Technology   -   920 Disc Drive   -   Scotts Valley, CA 95066 USA
Main Phone 408-438-6550   -   Email Problems postmaster at notes.seagate.com
Technical Support: BBS 408-438-8771  Fax 408-438-8137  Voice 408-438-8222  
-------------------------------------------------------------------------

### OGATE Version 8 message trace and attachment information:
### MsgFileName: m:\mgate\outbound\433.MSG
### Org Date:    08-11-94 06:35:02 AM
### From:        Gerry Houlder at SEAGATE
### To:          scsi @ WichitaKS.NCR.COM @ internet
### Subject:     new Data Recovery on deferred errors [94-067r2]
### Attachments: none




More information about the T10 mailing list