TMF, ordering across I_Ts, and ALUA questions

Knight, Frederick Frederick.Knight at netapp.com
Fri Dec 16 11:51:21 PST 2005


* From the T10 Reflector (t10 at t10.org), posted by:
* "Knight, Frederick" <Frederick.Knight at netapp.com>
*
I've been asked to gather these questions to get T10 opinions.
They generally have to do with multi-port synchronization
(questions 1-3) and ALUA operation (questions 4-5).

1) Relative ordering of LUN-scoped TMFs and responses vs. affected
   commands on other I_T nexuses
 
   Summary
   -------
   When a target receives a LOGICAL_UNIT_RESET or CLEAR_TASK_SET TMF
   on an I_T_L Nexus A, what actions MUST the target take on tasks from
   other I_T_L Nexuses BEFORE responding to the TMF?

   Details
   -------
   As described in SAM3R14, there are two Task Management Functions
   (TMFs) which, when issued on one I_T_L Nexus, will cause tasks
   active on other I_T_L Nexuses to be aborted:
 
   A) LOGICAL_UNIT_RESET
      Section 7.6 of SAM3R14 says
        "logical unit shall perform the logical unit reset functions
         specified in 6.3.3."
      Section 6.3.3 of SAM3R14 says
        "Abort all tasks"
 
   B) CLEAR_TASK_SET
      Section 7.5 of SAM3R14 says
        + "All tasks in the task set shall be aborted..."
        + "the task set shall be the one defined by the TST field in
           the Control mode page"
 
      So, if the target maintains one task set per LUN for all
      initiator ports (TST = 000b), then the CLEAR_TASK_SET TMF
      falls into this category.

   Consider the case of multiple I_T Nexuses submitting IO requests
   to the same LUN.
 
   Section 4.6.3 of SAM3R14 states the following:
 
     "The SCSI architecture model makes no assumption about and places
      no requirement on the ordering of requests or responses for
      different I_T nexuses."

   So, when an initiator sends one of the LUN-scoped TMFs over one
   I_T_L nexus, and receives a response from the target indicating
   that the TMF has successfully completed, which of the following
   is the initiator allowed to assume:
 
   [A1] All previously submitted tasks for that LUN on the issuing
        I_T nexus WILL EVENTUALLY be aborted
   [A2] All previously submitted tasks for that LUN on the issuing
        I_T nexus HAVE BEEN aborted
   [B1] All in-progress tasks for that LUN which were submitted on
        OTHER I_T nexuses WILL EVENTUALLY be aborted
   [B2] All in-progress tasks for that LUN which were submitted on
        OTHER I_T nexuses HAVE BEEN be aborted
 
   By our interpretation, we are confident that A2 may be assumed.
 
   We are less confident about B1 vs. B2, but speculate that B1 is
   indicated by the specifications.  That is, commands on other
   nexuses may still be in progress at the target when the
   TMF-issuing initiator receives its response.  Software
   (e.g. an operating system multi-pathing IO layer) which wishes
   to guarantee that all IOs for a given LUN have been aborted
   MUST issue TMFs for that LUN over ALL I_T nexuses on which
   IO requests have been submitted.
 
   As an extreme case, consider the following example:
   Assume 2 initiators submit the following commands in the following
   order for the same LUN, and also assume that the target receives
   the commands in the same order:
   
      Nexus A submits command A1
      Nexus B submits command B1
      Nexus A submits CLEAR_TASK_SET LUN task management command
        Target Aborts A1, marks LUN for 'need CLEAR_TASK_SET'
        Target sends CLEAR_TASK_SET response
      Nexus A receives CLEAR_TASK_SET response
      Nexus B submits command B2
 
   Does the specification mandate that command B2 must be unaffected
   by the CLEAR_TASK_SET?  Or does the 'no requirement on ordering'
   statement in SAM3R14 section 4.6.3 allow even this level of
   asynchrony between the two nexuses?

   If command B2 is required to be unaffected, then how is the 'no
   requirement on ordering' statement to be interpreted?
  
2) Support for TST=001b targets
 
   Summary
   -------
   Do host-side SCSI stacks fully support SCSI targets with
   per-initiator port task sets (TST=001b)?
 
   Details
   -------
   Section 7.4.6 of SPC3R21B defines two Task Set Types, as indicated
   by the TST field in the Control Mode Page:
   
      TST=000b -- The logical unit maintains one task set for all
                  initiator ports
      TST=001b -- The logical unit maintains separate task sets for
                  each initiator port
 
   Network Appliance SCSI target devices currently implement a
   single task set per LUN, for all initiator ports.  And the NetApp
   targets return TST=000b in the Control mode page to indicate this.
 
   Questions:
   
   1) We believe that this is the behavior exhibited by most,
      if not all, other major-vendor SCSI target devices.
 
      Is this belief accurate?  Or do there exist SCSI target devices
      with significant market penetration, which implement separate
      task sets per initiator port (TST=001b)?
 
   2) Do the significant operating systems fully support TST=001b
      targets?  Or, would NetApp run into interoperability issues
      if we decided to implement TST=001b behavior? 
 
 
3) Relative ordering of ordered and head of queue tagged commands
 
   Summary
   -------
   When a TST=000b target receives a task with the Ordered or
   Head_Of_Queue task tag on an I_T_L Nexus A, do the ordering
   requirements apply to tasks from other I_T_L Nexuses?
 
   Details
   -------
   Section 8 of SAM3R14 describes task set management, including
   relative ordering of tasks.  For example:
 
     - Tasks with the Ordered Task Attribute cannot enter the ENABLED
       state until all in-progress head of queue and all older tasks
       in the task set have completed (section 8.6.3)
     - Tasks with the Simple Task Attribute cannot enter the ENABLED
       state until all head of queue tasks and older ordered
       tasks have completed (section 8.6.2)
     - etc.
 
   If the target maintains one task set per LUN for all initiator
   ports (Control mode page TST = 000b), then the task set may
   include tasks submitted by multiple initiators, over multiple
   I_T nexuses.
 
   Since the ordering semantics described in SAM3R14 section 8 are
   described in terms of the task set, this implies an ordering
   constraint on commands and responses across different I_T
   nexuses.
 
   But section 4.6.3 of SAM3R14 states the following:
 
     "The SCSI architecture model makes no assumption about and places
      no requirement on the ordering of requests or responses for
      different I_T nexuses."
 
   So, consider the case of two I_T Nexuses submitting IO requests
   to the same LUN on a TST=000b target.  Assume the initiators
   submit the following commands in the following order, and also
   assume that the target receives the commands in the same order:
   
      Nexus A submits simple command A1
      Nexus B submits simple command B1
      Nexus B submits simple command B2
      Target moves all of A1, B1, and B2 to the Enabled state
        and starts work on them
      Nexus A submits ordered command A2
 
   When is the target allowed to begin processing A2:
   
   - After completion of A1 only?
   - Or after completion of A1, B1, and B2?
 
   Here's another case:
   
      Nexus A submits ordered command A3
      Target moves A3 to the Enabled state and starts work on it
      Nexus B submits simple command B3
      Nexus B submits simple command B4
 
   Does the presence in the task set of A3 prevent the target from
   beginning work on B3?
 
4) Regarding the ALUA state of UNAVAILABLE:

   A) How should the task manager handle task management commands if
      it receives them when it is in the UNAVAILABLE state?  

   Section 5.8.2.4.5 of the SPC-3 rev. 23 specifcation states the
   following about a logical unit in the UNAVAILABLE state:

    "The SCSI target device is not required to participate in all
     task management functions (see SAM-3 and the applicable SCSI
     transport protocol standards)."

   If the task manager receives a TMF request while the target port
   is in the UNAVAILABLE state, which of the following should the task
   manager perform?

     i) Return a response of TMF REJECTED to the task management
function?
    ii) Not respond to the task management function at all?
   iii) Return success to the task management function even though it
       could not process it?  For example, the task manager returns
       success to a LUN Reset request even though it could not clear
       out SCSI-2 reservations on the logical unit.

   B) How should the task manager handle TMF requests it receives right
      before the target port transitions to the ALUA state of
UNAVAILABLE?

   Section 5.8.2.5 of the specification discusses Transitions of ALUA
state.
   It states that:

     "If during the transition the logical unit is inaccessible, then
      the transition is performed as a single indivisible event and the
      device server shall respond by either returning BUSY status, or
      returning CHECK CONDITION status, with the sense key set to
      NOT READY, and the sense code set to LOGICAL UNIT NOT ACCESSIBLE,
      ASYMMETRIC ACCESS STATE TRANSITION; or

      If during the transition the target ports in a target port group
are
      able to access the requested logical unit, then the device server
      shall support those of the following commands that it supports
while
      in the active/optimized asymmetric access state:

      ...

      The SCSI target device is not required to participate in all task
      management functions."

   How does "not required to participate" translate into a command
response?

   Suppose the following scenario occurs: 

    1) Target port is in the Active/Optimized State.
    2) Task Manager receives a TMF (ex. LUN Reset).
    3) Target port implicitly transitions to the UNAVAILABLE state
before the
       TMF could be processed/completed.
    
   How should the task manager handle the TMF now that the LU is
inaccessible?
   Should the task manager:

     i) Return a response of TMF REJECTED to the task management
function?
    ii) Not respond to the task management function at all?
   iii) Return success to the task management function even though it
        could not finish process it? 

   C) How should the device server handle tasks that it had received
before
      the target port transitioned to the ALUA state of UNAVAILABLE?

   Section 5.8.2.5 of the specification also states:

     "Once a transition is completed, the new target port asymmetric
      access state may apply to some or all tasks entered into the task
      set before the completion of the transition. The new target port
      asymmetric access state shall apply to all tasks received by the
      device server after completion of a transition."

   Suppose the following scenario occurs:

    1) Target port is in the Active/Optimized State.
    2) Device Server receives multiple SCSI Read and Write requests.
    3) Target port implicitly and successfully transitions to the
       UNAVAILABLE state before any of the requests complete.

   Is the device server allowed to implicitly abort all of those tasks?
   Section 5.8.2.5 of the specification states:

     "An implicit CLEAR TASK SET task management function may be
      performed following a transition failure."

   But this only pertains to transition FAILUREs and in this scenario,
   the transition succeeded.  Which of the following should the device
   server do?

     i) Implicitly generate a CLEAR TASK SET and abort all tasks without
        sending any response to any of the tasks?
    ii) Return a response of CHECK CONDITION with sense key NOT READY
and
        additional sense code of LOGICAL UNIT NOT ACCESSIBLE, TARGET
PORT
        IN UNAVAILABLE STATE?

5) A) If the target port is in the ALUA state of STANDBY, does the
device
   server need to be able to satisfy all the requirements of a
Persistent
   Reservation request, including those that have the APTPL bit set?

   Section 5.8.2.4.4 of the SPC-3 rev. 23 specification states the
following
   about the STANDBY state:

      "When being accessed through a target port in the standby target
       port asymmetric access state, the device server shall support
       those of the following commands that it supports while in the
       active/optimized target port asymmetric access state:
       ...
       l) PERSISTENT RESERVE IN;
       m) PERSISTENT RESERVE OUT;"

   Suppose the device server is able to support the APTPL bit when the
target
   port is in the ACTIVE/OPTIMIZED state.  Section 6.12.3 on the
Persistent
   Reserve Out command in the SPC-3 rev. 23 specification states the
following
   about the APTPL bit:

      "If the last valid APTPL bit value received by the device server
is
       zero, the loss of power in the SCSI target device shall release
       the persistent reservation for the logical unit and remove all
       registered reservation keys (see 5.6.6). If the last valid
       APTPL bit value received by the device server is one, the logical
       unit shall retain any persistent reservation(s) that may be
present
       and all reservation keys (i.e., registrations) for all I_T
nexuses
       even if power is lost and later returned (see 5.6.4)."

   If contact with the logical unit (which also happens to be the
storage
   location of the registration keys of a persistent reservation
command) and
   the target port transitions to the ALUA state of STANDBY, does it
still
   need to support the APTPL bit setting for persistent reservations?

5) B) What should an initiator port do when it receives back an ALUA
state
   of STANDBY from the target port?  What should an initiator port do
when
   it receives back an ALUA state of UNAVAILABLE from the target port?
What
   should an initiator port do when it receives back an ALUA state of
   TRANSITIONING?

   What does an initiator port do when all target ports are in STANDBY?
Or
   if all target ports are in TRANSITIONING?

   Which ALUA state should the target port return if it wishes the
initiator
   port to keep retrying I/O until it succeeds?

	Thanks,
	Fred Knight
	Network Appliance
*
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at t10.org





More information about the T10 mailing list