FCP RECOVERY ABORT DONE BY A TARGET

GFRAZIER at ausvm6.vnet.ibm.com GFRAZIER at ausvm6.vnet.ibm.com
Mon Aug 1 08:40:12 PDT 1994


The attached note from Kurt Chan, HP, proposes removing the
requirement for FCP targets to perform recovery abort operations in
all but a few instances. (Please see the proposed FCP rewording
and justification toward the end of the attached note.) The
justification for the rewording does not seem to mention a large set
of ambiguous exchanges which the initiator will have to abort whenever
it performs various task management operations. Some of these
exchanges are listed below

1) EXCHANGES EXISTING DURING ABORT TASK SET
If targets do not perform recovery aborts for ambiguous exchanges
existing when they receive this Task Management (TM) function, the
initiator will need to perform recovery abort on all open exhcnages
with the target since it has not yet received FCP_RSP frames
for these open exchanges.

2) EXCHANGES EXISTING DURING CLEAR TASK SET, TARGET RESET
Initiators which initiate these Task Management functions must also
perform recovery aborts on every exchange which they have open with
the target since they have not received FCP_RSP frames for these open
exchanges.

3) EXCHANGES EXISTING DURING A CLEAR ACA WITH QERR=1
Initiators will need to perform recovery aborts on every exchange
which they have open with the LUN. (Recall that QErr=1 means:
"...all blocked tasks in the task set shall be aborted after an ACA
condition is cleared." (SPC, rev 1, Section 8.3.1)

Since an initiator may have a very large number of open exchanges with
a target during any of the above operations, a very large number of
recovery abort operations may occur whenever one of the above
functions is performed. On the other hand, if targets performed
recovery aborts on ambiguous exchanges as now required, only a very
few recovery abort operations would occur. Instead of assuming that
all open exchanges are ambiguous, the initiator can depend on the
target to abort the exchanges which it knows are ambiguous (i.e. with
unacknowledged frames), and the other exchanges can simply be erased
|from memory.

For this reason, I favor leaving FCP exactly as it is in rev 8b.

                                Giles Frazier
                                IBM Austin
                                gfrazier at ausvm6.vnet.ibm.com

**************************REFERENCED NOTE***********************


 From:  Kurt Chan
 To:    FC/SCSI Reflectors
 Subj:  Target-Initiated ABTS and FCP
 Date:  7/28/94

 After discussing this subject with Bob Snively at the FCSI meeting
 yesterday, Bob and I agree that some changes are needed to FCP rev 9
 if Targets want to claim compliance to FCP and simultaneously avoid
 initiating ABTS.

 The rationale is that FCP currently allows Initiators to RELY on
 Target Initiated ABTS, and therefore Exchanges could inadvertantly be
 left Open when Clear Task Set or Target Reset is received by a
 multiple-initiator Target. Inadvertant reuse of an open OX_ID
 is the data integrity hazard to be avoided.

 In studying this issue further, I think I also now understand some of
 the difficulty in getting class 3 and class 1/2 implementors to read
 the same meaning into the words in FCP.  I've attempted to clarify
 the essential behaviors of both below.

 For those anxious for a bottom line, I believe my conclusions lead to
 a happy ending for all:

 - class 1/2 Responders indeed must be able to initiate ABTS when
   responding to Task Management Flags (although I believe that there
   is only ONE "ambiguous" Exchange state that the Responder is
   REQUIRED to resolve because the Originator cannot).

 - class 3 Responders do not have to initiate ABTS since, by design, they
   do not have reliable knowledge of Exchange closure at the Originator.

 PS (In the following discussion, Targets are Exchange Responders and
     SCSI Initiators are Exchange Originators)

 My objective is to obtain consensus on the FCP wording changes below
 among as many implementors as possible so as to effect a change to rev
 9 before Aug 12. Please read and comment.

 FCP OVERVIEW
  -----------
 FCP Rev8b currently defines "Exchange ambiguity" from the Responder
 perspective as follows (paraphrased):

 > - If the Responder is in the process of transferring Sequence
 >   Initiative back to the Originator but has not yet been able to
 >   confirm the transfer, then the Exchange associated with that
 >   Sequence is in an "ambiguous" state.
 >
 > - If a Responder Reset or Clear Task Set has been received from an
 >   Originator, then all Exchanges with Originators other than the one
 >   that sent the Task Management flag are in an "ambiguous" state.

 My personal feeling is that the state of Sequence Initiative within
 the Exchange has nothing to do with the Exchange STATE.  I believe the
 STATE of an Exchange is either OPEN or CLOSED, and those states can be
 unambiguously defined from both Originator or Responder context based
 on FC-PH rules.

 I believe the current FCP document, when it refers to Exchange state, is
 actually referencing SEQUENCE state (i.e., whether or not the Sequence is
 deliverable or whether Initiative has been successfully transferred).
 However, I claim that the state of SEQUENCES within the Exchange is
 uninteresting when the unit of recovery is the EXCHANGE, as it is with
 ABTS-LS.  Attempting to use Sequence Initiative as the determining factor
 for deciding who is obligated to originate ABTS is overly complex,
 particularly when the ultimate goal is Exchange termination not Sequence
 termination.

 If the unit of recovery were the Sequence, then the status of
 intermediate Sequences would be important.  However, when dealing with
 EXCHANGE recovery this concern degrades to simply worrying about the
 state of the FINAL Sequence of the Exchange:  FCP_RSP.

 CLOSED vs OPEN EXCHANGE, CLASS 3
  -------------------------------
 Consider the following definition of OPEN and CLOSED Exchanges from the
 Originator perspective in class 3:

 - An OPEN Exchange is one for which the Originator has transmitted an
   FCP_CMD but has not received an FCP_RSP for that Exchange.

 - A CLOSED Exchange is any other Exchange:
   a) FCP_CMD has not yet been transmitted by the Originator
   b) FCP_RSP has already been received for a transmitted FCP_CMD

 Exchange State can only be reliably defined by the ORIGINATOR of the
 Exchange.  The Responder can never reliably determine whether or not
 an Exchange is Closed at the Originator, but the Originator can ALWAYS
 reliably determine whether or not an Exchange is Closed at the
 Responder.  It is this advantage that the Originator holds over the
 Responder that requires the Originator to be responsible for recovery
 operations, not the Responder.

 The reason it is difficult for the RESPONDER to define Exchange state
 is due to the definition of "Closed" at the Responder.  The Responder
 considers the Exchange CLOSED in class 3 when it transmits the FCP_RSP.
 However, since the FCP_RSP may never reach the Originator, not all
 Exchanges which are Closed at the Responder can also be considered
 Closed at the Originator.

 THIS IS THE PRIMARY DIFFERENCE - in class 3 the Originator knows for
 certain if an Exchange is truly Closed at the Responder by virtue of
 receiving FCP_RSP, but the only time a Class 3 Responder knows that an
 Exchange has been Closed at the Originator is when it sees the
 Originator reuse the OX_ID!  Note that this means Class 3 Exchange
 Responders must clear all Exchange resources as FCP_RSP is
 *transmitted*.

 The other boundary condition occurs when the Originator and Responder
 disagree about whether or not an Exchange has been OPENed.  However,
 this is not nearly as dangerous.  The worst case that can happen is
 that the Originator believes it has opened an Exchange when it really
 has not (since the FCP_CMD was lost).  In this case, the Originator
 performs a Recovery Abort to a Responder which has no knowledge of the
 Exchange being aborted, which is harmless.

 In summary, since class 3 Responders are always ignorant of Exchange
 Completion state at the Originator, I propose that only the Originator
 can reliably perform Recovery Abort on Open Exchanges, since only it
 knows which Exchanges are not yet Closed.

 CLOSED vs OPEN EXCHANGE, CLASS 2
  --------------------------------
 In classes 1 or 2, an Exchange is Closed at the Responder when it
 receives the ACK to FCP_RSP. An Exchange is Open at the Responder
 from the time it receives an FCP_CMD until it receives the ACK to
 FCP_RSP.

 Like class 3, the Originator considers an Exchange Open as soon as it
 transmits FCP_CMD.  It considers an Exchange Closed when it either
 receives the FCP_RSP or transmits the ACK to the FCP_RSP (depending on
 whether it's from the vantage point of the FC-2 or ULP within the
 Originator).

 Therefore the roles are now reversed!  Only the RESPONDER can reliably
 measure Exchange completion in Classes 1/2.  The Originator may close
 the Exchange at it's end when it transmits the ACK to FCP_RSP, but if
 the Responder never receives the ACK then

 a) the Exchange could stil be open at the Originator if the FCP_RSP
    was lost, or
 b) the Exchange might be closed if it was the ACK that was lost.

 The danger is that if the Responder cannot initiate Recovery Abort,
 then it may have Exchange resources dangling that will never get
 cleaned up, and therefore may be INADVERTANTLY REUSED by the
 Originator, and could alias into the old Exchange.

 The good news is that there is only ONE condition where the Responder
 is actually required to initiate Recovery Abort, and that is when

 1) it has transmitted FCP_RSP, and
 2) a Task Management flag is received which clears that task before the
    ACK can be returned

 For all other Exchange states or boundary conditions in class 1/2, the
 Originator knows with certainty whether or not the Responder MAY have
 an Open Exchange, and therefore the Originator can initiate Recovery
 Abort reliably.

 PATHOLOGICAL BEHAVIOR
  --------------------
 A question arose during "Devil's Advocate" discussions with Bob
 regarding what happens if a Host loses context of it's Exchanges
 (i.e., goes insane or powerfails).  The concern is that without
 Responder-initiated Recovery Abort there are data integrity problems
 at the FC-PH level.

 My recomended behavior in the Loop profile wll be as follows:

 1) Wait a minimum of R_A_TOV

 2) Initiate fabric discovery protocol (FLOGI)

 3) Initiate N_Port physical discovery protocol
     - OPN to AL_PA space on local loop topology
     - Server query for attached D_IDs on fabric topology

 4) Initiate FC-2 discovery protocol (PLOGI)

 5) Initiate FC-4 discovery protocol (PRLI)

 This procedure has the effect of clearing all outstanding
 Exchanges between N_Ports and clearing all pending frames
 for all Exchanges that may reside in the fabric without
 having to perform explicit Recovery Aborts for each
 potentially open Exchange.

 SUGGESTED FCP WORDING
  --------------------
 Ref: Pages 27-8 of FCP Rev8b

 Current wording for TARGET RESET:

 | TARGET RESET, when set to one, performs a reset to the SCSI device as
 | defined in SAM.  TARGET RESET resets all tasks for all initiators and
 | resets all internal states of the target to their initial power on
 | and default values as established by PRLI.  A unit attention condition
 | is created for all initiators.
 |
 | The TARGET RESET is transmitted by the initiator (Exchange Originator)
 | using a new Exchange.  The initiator and target clear all resources
 | that can be cleared unambiguously.  Any open Exchanges that are in an
 | ambiguous state shall be terminated by whichever port detects the
 | ambiguous state using a Recovery Abort.  For a target or initiator
 | FCP_Port, an Exchange is in an ambiguous state if the FCP_Port has
 | sequence initiative and there exists an unacknowledged frame for the
 | sequence or if the FCP_Port has transferred sequence initiative but
 | the transfer of the initiative has not been confirmed.  For a target
 | FCP_Port, an Exchange is also in an ambiguous state if the Exchange
 | exists between the target FCP_Port and an initiator other than the
 | initiator FCP_Port that performed the TARGET RESET.

 A couple of comments:

 1)  Ownership of Sequence Initiative on intermediate Sequences of an
     Exchange is irrelevant when the unit of recovery is the Exchange.

 2)  The term "unacknowledged frame" assumes class 1/2, and does
     not address class 3.

 3)  Intermediate unacknowledged frames are irrelevant when the unit of
     recovery is the Exchange.

 One of the compelling reasons we decided to make the Exchange the unit
 of recovery is to avoid dealing with Frames and Sequences where
 possible.  I suggest we look only at the boundary conditions where
 Exchanges are opened and closed, and put required behavior only in
 those places.  Therefore I suggest the second paragraph above be
 replaced with:

 "The TARGET RESET is transmitted by the SCSI Initiator (Exchange
  Originator) using a new Exchange.  The SCSI Initiator and Target
  shall clear all resources that can be cleared unambiguously.  The
  Target shall perform Recovery Abort on Exchanges for which an FCP_RSP
  frame has been transmitted in classes 1 or 2, but no ACK has been
  received.  Upon discovery of a Unit Attention condition with a
  Target, SCSI Initiators shall perform Recovery Abort on all Exchanges
  with that Target for which FCP_CMD has been transmitted, but FCP_RSP
  has not been received."

 Similarly, Clear Task Set should be reworded from:

 | The CLEAR TASK SET is transmitted by the initiator (Exchange
 | Originator) using a new Exchange.  The initiator and target clear all
 | resources that can be cleared unambiguously.  Any open Exchanges that
 | are in an ambiguous state shall be terminated by whichever port
 | detects the ambiguous state using a Recovery Abort.  For a target or
 | initiator FCP_Port, an Exchange is in an ambiguous state if the
 | FCP_Port has sequence initiative and there exists an unacknowledged
 | frame for the sequence or if the FCP_Port has transferred sequence
 | initiative but the transfer of the initiative has not been confirmed.
 | For a target FCP_Port, an Exchange is also in an ambiguous state if
 | the Exchange exists between the target FCP_Port and an initiator
 | other than the initiator FCP_Port that performed the CLEAR TASK SET.

 to the identical wording as suggested above, except replacing TARGET
 RESET with CLEAR TASK SET.

 Also note that these words do not prohibit a Target from aborting more
 Exchanges than just the ones for which an ACK to FCP_RSP is
 outstanding - having the Target abort ALL Exchanges which are still
 Open at the Target would be compliant (albeit redundant) behavior.

 Implementing just the minimal requirement set also reduces the
 liklihood of "ABTS collisions" (see FCSI SCSI Profile for discussion
 of this subject).

  --

 Kurt Chan
 kc at core.rose.HP.com
**********************END OF REFERENCED NOTE**************





More information about the T10 mailing list