SCSI-3 FCP ACA/QErr abort process

Charles Monia Monia at mail.dec.com
Fri Aug 22 15:57:23 PDT 1997


* From the T10 (formerly SCSI) Reflector (t10 at symbios.com), posted by:
* Charles Monia <Monia at mail.dec.com>
*
See my .02 below. All of which is unofficial.
Charles
-----Original Message-----
From:	Joseph Carl Nemeth [SMTP:jnemeth at concentric.net]
Sent:	Friday, August 22, 1997 5:38 PM
To:	'T10 Reflector'
Subject:	SCSI-3 FCP ACA/QErr abort process
* From the T10 (formerly SCSI) Reflector (t10 at symbios.com), posted by:
* Joseph Carl Nemeth <jnemeth at concentric.net>
*
I am having a hard time determining exactly what it means for a target =
to=20
"abort" a task in the case of the ACA condition with the Mode Select =
QErr=20
bit set to 1. I would appreciate a response from anyone who knows =
exactly=20
how this is supposed to work.
There appear to be two very different kinds of "abort" actions.
The first kind of "abort" is in response to any Task Control Function =
that=20
aborts established tasks. In this case, it seems clear from the SCSI-3 =
SAM=20
that once the Function Complete response is made for the Task Control=20
Function itself, the aborted tasks are simply blown away -- they must =
not=20
have any further interactions with the initiator.
Specifically, the target   will not send any more FCP_XFER_RDY, =
FCP_DATA, or FCP_RSP IUs to the=20
initiator for any task that has been aborted, and therefore, cannot =
return=20
any kind of status or autosense data for those tasks.
[CAM]  SAM defines behavior as seen by the "application client" -- =
analogous, to NT's class driver for example.  With that in mind, I'd =
restate the above to say that no further status or autosense data should =
be sent to the  "Application Client" or NT Class Driver (to continue the =
example). To see what's expected at the transport layer, you've got to =
look at the spec for the transport protocol  -- FCP in this case.
The second kind of "abort" is in response to certain classes of error =
that=20
are detected by the target. One example is the "overlapped command"=20
condition, in which an initiator sends two overlapped Untagged commands.
In this case, it seems clear that both commands are actually =
"terminated"=20
rather than "aborted," completing (albeit prematurely) with Check=20
Condition/COMMAND ABORTED/Overlapped Commands Attempted error status=20
-- the implementor's note makes it clear that the aborted (first) =
command may need=20
to report a residue, and I don't know how it would do this in FCP =
without=20
being able to post its status and autosense data.
[CAM]  There is a semantics problem here.  As I interpret your scenario, =
one untagged command (command 1) is sent followed by another (command =
2).  The "duplicate command" error only applies to command 2.  If QErr =
were clear, command 1 would be allowed to complete normally (once the =
ACA condition was cleared).
The resultant behavior with QErr set is:
Command 1 -- Terminated with CHECK CONDITION.
Command 2 -- Aborted.
The aborted command is treated as if it was aborted with any one of the =
explicit task abort functions (CLEAR QUEUE,  ABORT TASK, etc). In that =
case, no residue or status from command 2 is returned to the application =
client/class driver.
How, then, is the QErr "abort" handled? If there are multiple Simple (or =
Untagged) queue commands in the enabled state for a logical unit, and an =
ACA condition occurs due to an error on one of them, they all go from =
the=20
enabled to the blocked state, and if the QErr flag is set, they must all =
be=20
"aborted." Is this a silent abort, as though a kind of Abort Task Set =
had=20
been issued from the host,=20
[CAM] =20
[CAM]  Yes -- just as if they had been explicitly aborted as described =
above.
or is it a noisy abort, in which each command=20
actually terminates and returns error status? If the former, how is=20
catastrophic data loss avoided?=20
[CAM]  Depends on the device type.  For disks, the device driver could =
simply reissue all the unfinished commands.  That's often how it's done. =
 In your scenario, a duplicate untagged command indicates a bug in the =
host software in which case, I'd expect the O/S to crash the system with =
a bug check before more damage is done.
Hope this helps.
Charles
*
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at symbios.com




More information about the T10 mailing list