SCSI-3 FCP ACA/QErr abort process

Charles Monia Monia at mail.dec.com
Fri Aug 22 18:45:59 PDT 1997


* From the T10 (formerly SCSI) Reflector (t10 at symbios.com), posted by:
* Charles Monia <Monia at mail.dec.com>
*
----------
From: 	Joseph Carl Nemeth[SMTP:jnemeth at concentric.net]
Sent: 	Friday, August 22, 1997 8:45 PM
To: 	Charles Monia; 'T10 Reflector'
Subject: 	RE: SCSI-3 FCP ACA/QErr abort process
----------
From: 	Charles Monia
Sent: 	Friday, August 22, 1997 4:57 PM
To: 	'Joseph Carl Nemeth'; 'T10 Reflector'
Subject: 	RE: SCSI-3 FCP ACA/QErr abort process
<text deleted>
Specifically, the target   will not send any more FCP_XFER_RDY, FCP_DATA, 
or FCP_RSP IUs to the
initiator for any task that has been aborted, and therefore, cannot return
any kind of status or autosense data for those tasks.
[CAM]  SAM defines behavior as seen by the "application client" -- 
analogous, to NT's class driver for example.  With that in mind, I'd 
restate the above to say that no further status or autosense data should be 
sent to the  "Application Client" or NT Class Driver (to continue the 
example). To see what's expected at the transport layer, you've got to look 
at the spec for the transport protocol  -- FCP in this case.
Right - the FCP does the "recovery abort" thing, which involves some 
chit-chat between the initiator and target at the transport layer, but I 
was actually referring to the application client layer: so far as the 
application client is concerned, the aborted exchange vanishes off the face 
of the earth.
The second kind of "abort" is in response to certain classes of error that
are detected by the target. One example is the "overlapped command"
condition, in which an initiator sends two overlapped Untagged commands.
<text deleted>
[CAM]  There is a semantics problem here.  As I interpret your scenario, 
one untagged command (command 1) is sent followed by another (command 2). 
 The "duplicate command" error only applies to command 2.  If QErr were 
clear, command 1 would be allowed to complete normally (once the ACA 
condition was cleared).
The resultant behavior with QErr set is:
Command 1 -- Terminated with CHECK CONDITION.
Command 2 -- Aborted.
The aborted command is treated as if it was aborted with any one of the 
explicit task abort functions (CLEAR QUEUE,  ABORT TASK, etc). In that 
case, no residue or status from command 2 is returned to the application 
client/class driver.
AHA! Thank you! This is exactly the *opposite* of what I thought SAM was 
saying the FIRST time I read it, and it now makes perfect sense. SAM 5.6.2 
- "A logical unit that detects an overlapped command shall abort all tasks 
for the initiator in the task set and shall return CHECK CONDITION status 
for [that] command." You just explained which one "that" command was, and 
now this makes sense.
>> Sigh..... The above should have read:
>> "Command 2 -- Terminated with CHECK CONDITION.
>> Command 1 -- Aborted"
However, as I read this, QErr is irrelevant to this situation -- this 
should happen even if the QErr bit is clear. ???
>> Nope:  If QErr is cleared, Command 1 is not affected.  Sorry for the typo.
<Stuff deleted>
I'm still very puzzled by the QErr thing. Let me describe a scenario.
Let's say I'm building a big RAID device. It has lots of spindles in it, 
and I can get a whole lot of parallelism out of it -- that is, a whole lot 
of Write commands could be issued to different logical sectors of a single 
logical unit and all be executed concurrently.
Let's say multiple initiators send either Simple or Untagged commands 
(properly, with no overlaps!) to this logical unit, and don't send any 
Ordered or Head-of-queue commands. Let's say that the Mode Select 
parameters are set to allow unrestricted reordering of commands, giving me 
freedom to reorder these Simple commands and execute as many of them 
concurrently as I can. Thus, in my understanding, each command joins the 
set of enabled commands as soon as it is queued by the logical unit, and it 
may begin executing at any time. Let's also say that NACA is supported, and 
every one of these commands has the NACA bit set.
Now one of the spindles drops a bit, the faulting command reports Check 
Condition, and the whole logical unit goes into the ACA condition. This 
puts all of the enabled commands (which is *all* of them, even the ones 
that are queued and have not yet started running) into the blocked state. 
If QErr is clear, then when ACA is cleared, all of these blocked commands 
return to the enabled state, and life goes on. However, if QErr is set, 
clearing the ACA condition is supposed to abort all of the blocked 
commands. Again, that's *all* of them, including commands from other 
initiators.
If this is a silent abort on all these commands, these other initiators 
won't even know anything happened -- they'll hang, waiting for the data 
transfer to resume, and will (hopefully) eventually time out. At that 
point, they'll issue a command to the device, and finally get their first 
error indication: a Unit Attention condition, indicating that some other 
initiator aborted their command. This doesn't sound right at all. ??? What 
am I missing here?
>> Other scenarios can be described which cause similar problems. Basically,
>> in a multi-initiator configuration, SCSI assumes it's the responsibility
>> of the initiators to coordinate device access as needed to handle these
>> possibilities (often through some sort of out-of-band communications channel).
>> 
There is also a new apparent contradiction between SAM and the r11a version 
of the SPC document where the QErr bit is described: SAM clearly implies in 
several places that QErr comes into play when ACA is cleared, and an older 
rev of SPC (r7) agreed with this. The r11a version of the SPC now says that 
when QErr is set, tasks are aborted when Check Condition or Command Term  
inated is *sent*, which is what I thought *set* the ACA condition. ???
>
>> While I believe the descriptions of behavior ought to be consistent in all these specs, from the 
>> standpoint of the application client, both are equivalent.  ie.  There's no way the
>> application client can distinguish between them.
>> Specifically, There is no test the device driver can make that
>> allows the driver to tell which behavior was implemented by the device.
Charles
*
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at symbios.com




More information about the T10 mailing list