IBM's response to 12-405r0 SAM-5: UA and task set interaction claraification

Kevin D Butt kdbutt at us.ibm.com
Tue Nov 6 12:20:56 PST 2012


Formatted message: <a href="http://www.t10.org/cgi-bin/ac.pl?t=r&f=r1211061_f.htm">HTML-formatted message</a>

Fred,
Please see our responses in red.
Kevin D. Butt
SCSI & Fibre Channel Architect, Tape Firmware
Data Protection & Retention
MS 6TYA, 9000 S. Rita Rd., Tucson, AZ 85744
Tel: 520-799-5280
Fax: 520-799-2723 (T/L:321)
Email address: kdbutt at us.ibm.com
http://www-03.ibm.com/servers/storage/ 
From:	"Knight, Frederick" <Frederick.Knight at netapp.com>
To:	Kevin D Butt/Tucson/IBM at IBMUS, 
Date:	10/17/2012 11:40 AM
Subject:	RE: IBM's response to 12-405r0 SAM-5: UA and task set 
interaction claraification
I honestly don?t understand some of your comments.  My questions are 
below:
From: owner-t10 at t10.org [mailto:owner-t10 at t10.org] On Behalf Of Kevin D 
Butt
Sent: Monday, October 15, 2012 7:47 PM
To: T10 Reflector; James P Allen
Subject: IBM's response to 12-405r0 SAM-5: UA and task set interaction 
claraification
IBM has comments related to 12-405r0 
SAM-5: UA and task set interaction claraification
(by: Frederick Knight)
T10/12-405r0   Uploaded: 2012/10/08   175917 bytes
http://www.t10.org/cgi-bin/ac.pl?t=d&f=12-405r0.pdf
The current SCSI standard can be interpreted in multiple ways for UA 
29/00. IBM would agree that clarification might be useful here providing 
we can agree on what that should be. However IBM does not think that 
clarification should be the current proposal in 12-405r0. 
Under this proposal if a lun has a UA 29/00 pending, multiple tasks would 
be allowed to be accepted by that lun. The first task processed (enabled) 
would fail with UA 29/00. The subsequent ones would be governed by QErr 
bit settings.
Actually, that is nothing new in this proposal.  That statement is already 
true.
IBM doesn't agree with this. There appears to be ambiguity in the standard 
on how/when tasks are placed in the task set in relation to pending 
errors.  In SAM section 5.4.2.3 (SCSI Command Received SCSI transport 
protocol service indication) it specifies the task router passes a task to 
the task manager via SCSI Command Received interface. This interface only 
allows one task at a time to passed to the task manager. There is not an 
interface to pass multiple tasks at once to the task manager. Thus the 
task router must call the SCSI Command Received for each task it sends to 
the task manager. The task router needs to wait for the SCSI Command 
Received to complete before it can send another task to the task manager 
via the SCSI Command Received. From section 8.8 (Command state 
transitions) one would infer the task received by the task manager from 
the SCSI command Received interface would go immediately into the dormant 
state. However in the case in point (a Unit Attention 29/00 pending with 
no other tasks in the task set), one could also infer that this task does 
the S1:S2 transition immediately after it goes into the dormant state. 
When that happens it should do the S2:S4 transition immediately.  
Thus the ambiguity in the standard is what the relation between SCSI 
Command Received (of section 5.6) and command state transition (of section 
8.8) when the task set is empty. Does it ever complete the full state 
transition (when pending errors exist for empty task set) before the SCSI 
Command Receive returns to the task router. This goes to the larger 
question of whether it is possible for multiple commands can be accepted 
into the task set before any of them are completed/aborted with Unit 
Attention 29/00.
If the SCSI Command Received led to an immediate Command Aborted in the 
Unit Attention 29/00. Any subsequent task issued by the task router would 
then be governed by QErr and NACA.
While the update to section 5.14 does not say explicit rule out the IBM 
Initiator preferred interpretation (i.e SCSI Command Received for empty 
task set results in immediate Unit Attention 29/00 before other task can 
be issued by the task router to the task manager), it gives greater 
credibility/emphasis on an alternate interpretation.  Future SCSI updates 
could then leverage this update as way to clarify the above ambiguity to 
the alternate interpretation.
This seems problematic for a number of situations where a device's state 
may have substantially changed thus compromising data from the initiator 
perspective. Thus the subsequent tasks may be processed from a device that 
is in a questionable state (i.e. loss of reservations and thus no 
synchronization of read/writes with other hosts, difference in mode 
settings etc). 
What you have described is how QERR=00b works.	If QERR is set to 00b then 
if ACA is being used, everything stops to wait for the host to do whatever 
it wants for cleanup/recovery and if ACA is not being used, then 
everything just proceeds (with all the problems you?ve mentioned).  There 
is nothing in this proposal that changes that operation.  So, I don?t 
understand why this is a problem?
See above description of SAM ambiguity.  The standards ambiguity is beyond 
the semantics of QErr.	
A way to avoid the above issue would be to have the device support QErr = 
x1 settings. This though is problematic, since a device may not support 
this setting.
QERR=x1 is just a way to control where cleanup/recovery actions take place 
(QERR=x1 means the device server does the cleanup and QERR=x0 means the 
host does the cleanup).  And, BTW ? everything described in the proposal 
will still occur.  AFTER the cleanup finishes (it does not matter what 
QERR is set to; no matter who does the cleanup), the UA 29/00 will still 
be queued for OTHER initiators, and so when those initiators finally have 
the UA delivered, it no longer reflects the current state of the device 
(which is all that this proposal is describing).  UAs do not describe a 
current state; they are an indication of a previous event.
As mentioned above QErr=x1 does not clarify the above ambiguity it just 
works around the issue.
A more robust solution (from the initiator's perspective) is for the first 
command to be failed immediately with UA 29/00 before other commands are 
accepted. The subsequent commands would then be governed by whether the 
faulted command had the NACA=1 bit set. If so then subsequent commands 
would be failed with ACA ACTIVE status. This is how some of IBM's 
engineers interpret the standard. Based on how SCSI evolved, IBM thinks 
that this is more consistent with the original intent of the standard. 
Thanks, 
Kevin D. Butt
SCSI & Fibre Channel Architect, Tape Firmware
Data Protection & Retention
MS 6TYA, 9000 S. Rita Rd., Tucson, AZ 85744
Tel: 520-799-5280
Fax: 520-799-2723 (T/L:321)
Email address: kdbutt at us.ibm.com
http://www-03.ibm.com/servers/storage/ 



More information about the T10 mailing list