TMF, ordering across I_Ts, and ALUA questions
wrstuden at wasabisystems.com
Fri Dec 23 18:24:49 PST 2005
On Dec 16, 2005, at 11:51 AM, Knight, Frederick wrote:
> * From the T10 Reflector (t10 at t10.org), posted by:
> * "Knight, Frederick" <Frederick.Knight at netapp.com>
> I've been asked to gather these questions to get T10 opinions.
> They generally have to do with multi-port synchronization
> (questions 1-3) and ALUA operation (questions 4-5).
I find the initial questions very timely. The IPS working group of the
IETF is hashing out requirements in the iSCSI spec, RFC 3720, regarding
inter-nexus task abort. Some SCSI-level guidance would be very helpful.
> 1) Relative ordering of LUN-scoped TMFs and responses vs. affected
> commands on other I_T nexuses
> When a target receives a LOGICAL_UNIT_RESET or CLEAR_TASK_SET TMF
> on an I_T_L Nexus A, what actions MUST the target take on tasks
> other I_T_L Nexuses BEFORE responding to the TMF?
> As described in SAM3R14, there are two Task Management Functions
> (TMFs) which, when issued on one I_T_L Nexus, will cause tasks
> active on other I_T_L Nexuses to be aborted:
Note: even though Target Warm Reset has been obsoleted, I understand
that some operating systems still use it, especially to clear
Also note that I personally believe that whatever behavior happens for
LU Reset is what should apply for Persistent Reserve Out Preempt and
Abort. Any other thoughts make my head hurt. :-)
> A) LOGICAL_UNIT_RESET
> Section 7.6 of SAM3R14 says
> "logical unit shall perform the logical unit reset functions
> specified in 6.3.3."
> Section 6.3.3 of SAM3R14 says
> "Abort all tasks"
> B) CLEAR_TASK_SET
> Section 7.5 of SAM3R14 says
> + "All tasks in the task set shall be aborted..."
> + "the task set shall be the one defined by the TST field in
> the Control mode page"
> So, if the target maintains one task set per LUN for all
> initiator ports (TST = 000b), then the CLEAR_TASK_SET TMF
> falls into this category.
> Consider the case of multiple I_T Nexuses submitting IO requests
> to the same LUN.
> Section 4.6.3 of SAM3R14 states the following:
> "The SCSI architecture model makes no assumption about and places
> no requirement on the ordering of requests or responses for
> different I_T nexuses."
> So, when an initiator sends one of the LUN-scoped TMFs over one
> I_T_L nexus, and receives a response from the target indicating
> that the TMF has successfully completed, which of the following
> is the initiator allowed to assume:
> [A1] All previously submitted tasks for that LUN on the issuing
> I_T nexus WILL EVENTUALLY be aborted
> [A2] All previously submitted tasks for that LUN on the issuing
> I_T nexus HAVE BEEN aborted
> [B1] All in-progress tasks for that LUN which were submitted on
> OTHER I_T nexuses WILL EVENTUALLY be aborted
> [B2] All in-progress tasks for that LUN which were submitted on
> OTHER I_T nexuses HAVE BEEN be aborted
> By our interpretation, we are confident that A2 may be assumed.
> We are less confident about B1 vs. B2, but speculate that B1 is
> indicated by the specifications. That is, commands on other
> nexuses may still be in progress at the target when the
> TMF-issuing initiator receives its response. Software
> (e.g. an operating system multi-pathing IO layer) which wishes
> to guarantee that all IOs for a given LUN have been aborted
> MUST issue TMFs for that LUN over ALL I_T nexuses on which
> IO requests have been submitted.
I think the discussion of what "aborting" means needs clarification. I
think if we are talking about the task changing data stored on the
device, such as in a WRITE command, I think it is reasonable for
"aborting" to mean that the task will no longer change data stored on
If however we mean that aborting means the task is no longer visible to
either the SCSI or transport layers, I think that answer B1 applies.
The issue that has come up on the IPS list is that iSCSI and iSER offer
data buffers to which the initiator sends data. Transfers destined for
these buffers probably will be in-flight when the abort happens. It is
much nicer on initiators to let the transfers to these buffers drain
off rather than suddenly invalidating them. For instance, in iSER, if
the buffers were suddenly revoked, the data mover levels would have to
abort the entire connection. Thus aborting one or more tasks becomes a
large transport event.
Other thoughts on this?
> As an extreme case, consider the following example:
> Assume 2 initiators submit the following commands in the following
> order for the same LUN, and also assume that the target receives
> the commands in the same order:
> Nexus A submits command A1
> Nexus B submits command B1
> Nexus A submits CLEAR_TASK_SET LUN task management command
> Target Aborts A1, marks LUN for 'need CLEAR_TASK_SET'
> Target sends CLEAR_TASK_SET response
> Nexus A receives CLEAR_TASK_SET response
> Nexus B submits command B2
> Does the specification mandate that command B2 must be unaffected
> by the CLEAR_TASK_SET? Or does the 'no requirement on ordering'
> statement in SAM3R14 section 4.6.3 allow even this level of
> asynchrony between the two nexuses?
I think one major issue in your explanation above is that you are
talking about (or it seems like you are talking about) what happens at
the initiator(s). I think it is clearer to talk in terms of what
happens at the targets. For instance if "Nexus X submits command Y"
becomes "Target receives command Y via nexus X", then we can clearly
state what happens. Mainly the CLEAR_TASK_SET tm command impacts
commands that have entered the task set at the target.
I would say that if B2 had entered the task set when the target
implemented the clear, then it should be affected. If it had not
entered the task set, then it should not be affected (it shouldn't
matter if it misses the abort by a microsecond or a minute).
> If command B2 is required to be unaffected, then how is the 'no
> requirement on ordering' statement to be interpreted?
> 2) Support for TST=001b targets
> Do host-side SCSI stacks fully support SCSI targets with
> per-initiator port task sets (TST=001b)?
> Section 7.4.6 of SPC3R21B defines two Task Set Types, as indicated
> by the TST field in the Control Mode Page:
> TST=000b -- The logical unit maintains one task set for all
> initiator ports
> TST=001b -- The logical unit maintains separate task sets for
> each initiator port
> Network Appliance SCSI target devices currently implement a
> single task set per LUN, for all initiator ports. And the NetApp
> targets return TST=000b in the Control mode page to indicate this.
> 1) We believe that this is the behavior exhibited by most,
> if not all, other major-vendor SCSI target devices.
> Is this belief accurate? Or do there exist SCSI target devices
> with significant market penetration, which implement separate
> task sets per initiator port (TST=001b)?
The Storage Builder for iSCSI target only supports TST=001b.
> 2) Do the significant operating systems fully support TST=001b
> targets? Or, would NetApp run into interoperability issues
> if we decided to implement TST=001b behavior?
I am unaware of any issues resulting from our TST=001b behavior.
More information about the T10