X3T9.2/87-019 Rev 1 ANSI X3T9.2 John Lohmeyer, Chairman NCR Corporation 3718 Rock Road Wichita, KS 67226 January 6, 1987 February 19, 1988 First Revision, updated for Rev 3, corrected Subject: Consideration of deferred error presentation Dear Mr. Lohmeyer The new draft of X3T9.2/86-109, Revision 3, dated Oct 31, 1987, describes Deferred Errors in section 7.1.14.2 on page 7-63. The capability of presenting deferred errors is very important, but a number of other factors must be considered in their definition and presentation. a) It is very important that the initiator that generated the command that failed in a deferred manner be notified that the command failed. If it is not notified, the initiator may continue operating without taking appropriate recovery steps and may cause later system integrity failures. In addition, it is important that another initiator attempting a command after a deferred error be notified so that it will be aware that the data it is requesting may not be up to date or valid. In some implementations, there may not even be a valid record of which initiator generated the command that failed in the deferred manner. Considering these requirements, it is clear that deferred errors should be presented once to each attached initiator that may have been affected by the error at the time the initiator next makes a request against the LUN for which a deferred error was detected. If the deferred error cannot be associated with a single LUN, it should be presented for all LUN's that may be affected. b) A large class of deferred errors can be recovered completely by the SCSI target with no intervention required from the initiator. An example of such an error is a failure during cache write back, which can be recovered by writing to an alternate area from the still available cache information. In cases like these, the deferred errors should not be presented unless presentation is specifically enabled by the appropriate Mode Select parameters. If either statistical logging or individual error logging is supported by the target, such recoverable errors should be placed in the log. c) A sufficiently sophisticated controller may choose to present only those errors which actually may affect requested data to initiators other than the initiator associated with the failing command. As an example, if a write back failure occurs to a certain block and the error is unrecoverable, accesses to data other than that failing block from initiators other than the initiator associated with the write command should complete normally. Any command from the initiator generating the write command should get a Check Condition with sense information of deferred error. Any command from any initiator that attempts to access the failing block should get a Check Condition with sense information of deferred error. d) There does not appear to be any architecturally practical way to protect a system from errors that are associated with long past commands unless the system guarantees that critical information will be protected by write through, journaling, or other special actions until the successful completion of all commands is explicitly detected. This is guaranteed on most systems by some sort of synchronization process. The posting process for deferred errors is described as a generic function. Two mechanisms of posting may be used. If the initiator and target support Asynchronous Event Notification, that process may be used to post deferred errors to the affected initiators. If either the initiator or target choose not to support Asynchronous Event Notification, the normal CHECK CONDITION status presentation followed by a REQUEST SENSE command can be used to present the deferred error condition. In this case, the presentation of the deferred error information will be delayed until a command is offered to the affected LUN to which a CHECK CONDITION status can be returned. To reflect these considerations, I recommend modifying section 7.1.14.2 on page 7-63 to read as follows: 7.1.14.2 Deferred Errors Error Code 70h (Current Error) indicates that the CHECK CONDITION status returned is the result of an error or exception condition on the command that returned the CHECK CONDITION status. This includes errors generated during execution of the command by the actual execution process. It also includes errors not related to any command that are first observed during execution of a command. Examples of this latter type of error include disk servo off-track errors and bring-up test errors. Error Code 71h (Deferred error) indicates that the CHECK CONDITION status returned is the result of an error or exception condition that occurred during execution of a previous command for which GOOD status has already been presented. Such commands are associated with use of the immediate bit, with caching, and with multiple command buffering. The command, data, and status are exchanged, but command execution begins at a later time determined by the target. The Deferred Error indication may be posted at a time selected by the Target through the Asynchronous Event Notification process (See section 6.4.4) if AEN is supported by both the initiator and target. If AEN is not supported by the initiator or the target, the Deferred Error indication may be posted through the presentation of CHECK CONDITION to the next command from the appropriate initiator. The subsequent execution of a REQUEST SENSE will recover the Deferred Error sense information. If a CHECK CONDITION for a deferred error is presented, the current command has not performed any storage operations or output operations to the media. After the target detects a deferred error condition on a logical unit, it shall post a Deferred Error according to the rules described below. The sense information associated with the Deferred Error shall begin with 71h and otherwise be identical in format to the extended sense format. 1) If a deferred error can be recovered with no external system intervention, a Deferred Error indication will not be posted unless required by the error handling parameters of the Mode Select command. The occurrence of the error may be logged if statistical or error logging is supported. 2) If a deferred error can be associated with a causing initiator and with a particular function or a particular subset of data, a Deferred Error indication shall be posted to the causing initiator. If an initiator other than the causing initator attempts access to the particular function or subset of data associated with the deferred error, a CHECK CONDITION indication shall also be posted to that initiator in response to the command attempting the access. The deferred error information shall be presented to the subsequent Request Sense command. Note that not all devices may be sufficiently sophisticated to identify the function or data that has failed. Those that cannot must treat the error in the following fashion. 3) If a deferred error cannot be associated with a causing initiator or with a particular set of data, a Deferred Error indication shall be posted on behalf of the failing LUN to each initiator. If multiple deferred errors have accumulated for some initiators, only the last [ first? ] error will be provided in the Deferred Error posting process. 4) If a deferred error cannot be associated with a particular LUN, it shall be posted in the appropriate manner for all LUN's supported by the target. 5) If a deferred error occurs while a current command is operating, and if the current command has not performed any write operations that have changed the media, a CHECK CONDITION will terminate the command and the Deferred Error information will be presented in the subsequent REQUEST SENSE command. If a deferred error occurs while a current command is operating and the current command has changed the media or has also been affected by the error, a CHECK CONDITION will terminate the command and the subsequent REQUEST SENSE will contain Current Error information. In this case, if the Current Error information does not adequately define the Deferred Error condition, a Deferred Error may be posted after the Current Error information has been recovered. If a deferred error occurs while a current command is operating and the current command completes successfully, the Target may choose to post the Deferred Error information after the completion of the current command. Implementor's Note: Deferred errors may indicate that an operation was unsuccessful long after the command performing the data transfer was posted as successful. If data that cannot be replicated or recovered from other sources is being stored using such buffered write operations, synchronization commands should be performed before the critical data is destroyed in the host initiator. This is necessary to be sure that recovery actions can be taken if deferred errors do occur in the storing of the data. If AEN is not implemented, the synchronizing process must provide the necessary commands to allow the posting of a CHECK CONDITION and subsequent posting of Deferred Error sense information after all buffered operations are guaranteed to be complete. Thank you for your consideration of this proposal. Sincerely, Robert N. Snively