SCSI-3 FCP ACA/QErr abort process
Charles Monia
Monia at mail.dec.com
Fri Aug 22 18:45:59 PDT 1997
* From the T10 (formerly SCSI) Reflector (t10 at symbios.com), posted by:
* Charles Monia <Monia at mail.dec.com>
*
----------
From: Joseph Carl Nemeth[SMTP:jnemeth at concentric.net]
Sent: Friday, August 22, 1997 8:45 PM
To: Charles Monia; 'T10 Reflector'
Subject: RE: SCSI-3 FCP ACA/QErr abort process
----------
From: Charles Monia
Sent: Friday, August 22, 1997 4:57 PM
To: 'Joseph Carl Nemeth'; 'T10 Reflector'
Subject: RE: SCSI-3 FCP ACA/QErr abort process
<text deleted>
Specifically, the target will not send any more FCP_XFER_RDY, FCP_DATA,
or FCP_RSP IUs to the
initiator for any task that has been aborted, and therefore, cannot return
any kind of status or autosense data for those tasks.
[CAM] SAM defines behavior as seen by the "application client" --
analogous, to NT's class driver for example. With that in mind, I'd
restate the above to say that no further status or autosense data should be
sent to the "Application Client" or NT Class Driver (to continue the
example). To see what's expected at the transport layer, you've got to look
at the spec for the transport protocol -- FCP in this case.
Right - the FCP does the "recovery abort" thing, which involves some
chit-chat between the initiator and target at the transport layer, but I
was actually referring to the application client layer: so far as the
application client is concerned, the aborted exchange vanishes off the face
of the earth.
The second kind of "abort" is in response to certain classes of error that
are detected by the target. One example is the "overlapped command"
condition, in which an initiator sends two overlapped Untagged commands.
<text deleted>
[CAM] There is a semantics problem here. As I interpret your scenario,
one untagged command (command 1) is sent followed by another (command 2).
The "duplicate command" error only applies to command 2. If QErr were
clear, command 1 would be allowed to complete normally (once the ACA
condition was cleared).
The resultant behavior with QErr set is:
Command 1 -- Terminated with CHECK CONDITION.
Command 2 -- Aborted.
The aborted command is treated as if it was aborted with any one of the
explicit task abort functions (CLEAR QUEUE, ABORT TASK, etc). In that
case, no residue or status from command 2 is returned to the application
client/class driver.
AHA! Thank you! This is exactly the *opposite* of what I thought SAM was
saying the FIRST time I read it, and it now makes perfect sense. SAM 5.6.2
- "A logical unit that detects an overlapped command shall abort all tasks
for the initiator in the task set and shall return CHECK CONDITION status
for [that] command." You just explained which one "that" command was, and
now this makes sense.
>> Sigh..... The above should have read:
>> "Command 2 -- Terminated with CHECK CONDITION.
>> Command 1 -- Aborted"
However, as I read this, QErr is irrelevant to this situation -- this
should happen even if the QErr bit is clear. ???
>> Nope: If QErr is cleared, Command 1 is not affected. Sorry for the typo.
<Stuff deleted>
I'm still very puzzled by the QErr thing. Let me describe a scenario.
Let's say I'm building a big RAID device. It has lots of spindles in it,
and I can get a whole lot of parallelism out of it -- that is, a whole lot
of Write commands could be issued to different logical sectors of a single
logical unit and all be executed concurrently.
Let's say multiple initiators send either Simple or Untagged commands
(properly, with no overlaps!) to this logical unit, and don't send any
Ordered or Head-of-queue commands. Let's say that the Mode Select
parameters are set to allow unrestricted reordering of commands, giving me
freedom to reorder these Simple commands and execute as many of them
concurrently as I can. Thus, in my understanding, each command joins the
set of enabled commands as soon as it is queued by the logical unit, and it
may begin executing at any time. Let's also say that NACA is supported, and
every one of these commands has the NACA bit set.
Now one of the spindles drops a bit, the faulting command reports Check
Condition, and the whole logical unit goes into the ACA condition. This
puts all of the enabled commands (which is *all* of them, even the ones
that are queued and have not yet started running) into the blocked state.
If QErr is clear, then when ACA is cleared, all of these blocked commands
return to the enabled state, and life goes on. However, if QErr is set,
clearing the ACA condition is supposed to abort all of the blocked
commands. Again, that's *all* of them, including commands from other
initiators.
If this is a silent abort on all these commands, these other initiators
won't even know anything happened -- they'll hang, waiting for the data
transfer to resume, and will (hopefully) eventually time out. At that
point, they'll issue a command to the device, and finally get their first
error indication: a Unit Attention condition, indicating that some other
initiator aborted their command. This doesn't sound right at all. ??? What
am I missing here?
>> Other scenarios can be described which cause similar problems. Basically,
>> in a multi-initiator configuration, SCSI assumes it's the responsibility
>> of the initiators to coordinate device access as needed to handle these
>> possibilities (often through some sort of out-of-band communications channel).
>>
There is also a new apparent contradiction between SAM and the r11a version
of the SPC document where the QErr bit is described: SAM clearly implies in
several places that QErr comes into play when ACA is cleared, and an older
rev of SPC (r7) agreed with this. The r11a version of the SPC now says that
when QErr is set, tasks are aborted when Check Condition or Command Term
inated is *sent*, which is what I thought *set* the ACA condition. ???
>
>> While I believe the descriptions of behavior ought to be consistent in all these specs, from the
>> standpoint of the application client, both are equivalent. ie. There's no way the
>> application client can distinguish between them.
>> Specifically, There is no test the device driver can make that
>> allows the driver to tell which behavior was implemented by the device.
Charles
*
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at symbios.com
More information about the T10
mailing list