[T11.3] Re: [Fwd: FCP-2: Lost FCP_CMND, Unacknowledged classes.]

Dave Peterson dap at cisco.com
Tue Jul 9 13:51:14 PDT 2002


INCITS T11.3 Mail Reflector
********************************
Howdy Santosh,

Comments below...dap

> -----Original Message-----
> From: santoshr at hpcuhe.cup.hp.com [mailto:santoshr at hpcuhe.cup.hp.com]On
> Behalf Of Santosh Rao
> Sent: Friday, June 28, 2002 5:42 PM
> To: T10 Reflector; Dave Peterson; Robert Snively
> Subject: [Fwd: FCP-2: Lost FCP_CMND, Unacknowledged classes.]
>
>
> Hello,
>
> I did not see any response to this and hence, am raising this issue
> again. We would appreciate any clarifications from FCP-2 editors and
> implementors. Can this be fixed in FCP-2 ?
>
> Thanks,
> Santosh
>
>
> Santosh Rao wrote:
> >
> > Hello,
> >
> > We have 3 issues regarding Annexe C Fig C.2 and the text in Section 8.2
> > for REC.
> >
> > Issue 1
> > =======
> > Section 8.2 states :
> >
> > "If the destination FCP_Port of the REC request determines that the
> > originator S_ID, OX_ID, RX_ID or task retry id are inconsistent, it
> > shall respond with a FCP_RJT with a rsn_code of "unable to perform
> > command request" and rsn_expln of "invalid OXID-RXID combination".
> >
> > Annex C Fig C.2 states :
> >
> > "The LS_RJT (Logical Error, Invalid OXID-RXID combination) for the REC
> > indicates that the exchange is unknown."
> >
> > The 2 quoted sections above are inconsistent with the reason code of the
> > FCP_RJT to be used. From the target's perspective, when it receives a
> > REC with an OXID-RXID combination for which it has no exchange state,
> > both the above sections of FCP-2 hold good.
> >
> > Which is the reason code to be returned in this case ?
> >

Per the current definition in FC-FS (and FCP-2r7a):
'00000011' b Logical error
The request identified by the Command code is invalid or logically
inconsistent for the conditions present.

'00001001' b Unable to perform command request
The Recipient of a Link Service command is unable to perform the request at
this time.

Logical error should be the correct response in this case and is also
specified response to an RRQ when the RX_ID, other than FFFFh, is unknown to
the target.

Unable to perform command request is typically returned when the link
service recipient has some sort of resource issue, not logged in, etc...

But I wouldn't be surprised if there is (other) inconsistent usage regarding
Logical Error and Unable to Perform Command Request. More guidance
surrounding FC "reason code" usage would be beneficial in FC-FS (at
minimum).

Need to see what was actually implemented and then do the right thing.

> > Issue 2
> > =======
> > How should the initiator differentiate b/n a FCP_RJT from a target due
> > to a lost FCP_CMD (the scenario described in Annexe C Fig C.2) and the
> > case where the target has discarded exchange state due to the expiration
> > of RR_TOV after sending FCP_RSP.
> >
> > In both the above cases, our interpretation of FCP-2 is that the
> > initiator will see a FCP_RJT response to the REC with :
> > rsn_code = "Logical Error" or "Unable to perform command request"
> > rsn_expln = "Invalid OXID-RXID combination"
> >
> > In this case, the initiator cannot apply the same error recovery for the
> > 2 cases. In the lost FCP_CMND case, the initiator may safely re-issue
> > the command. The latter case could occur in the following manner :
> >
> > - Initiator issues a command which does not involve data xfer.
> > - Target sends FCP_RSP, FCP_RSP is lost.
> > - Initiator REC_TOV timer pops and initiators sends REC.
> > - REC times out after RA_TOVels (which is > RR_TOV, for fabric)
> > - Initiator aborts REC and issues another REC
> > - Target sends FCP_RJT response since it has discarded the exchange
> > state.
> >
> > In the above case, the initiator MUST NOT re-issue the FCP_CMND, since
> > this can potentially cause a data corruption with tape devices. (ex :
> > re-issuing a scsi command like SPACE, WRITE FILEMARKS when they had
> > previously been executed successfully can cause tape data corruption.)
> >
> > Can someone clarify on how FCP-2 differentiates these 2 cases ? Without
> > the ability to differentiate between these 2 cases, the use of SLER in a
> > lost FCP_CMND scenario can result in potential data corruption with tape
> > devices.
> >

Appears change will be required here to make this stuff work. Some options:

A. request an FCP_CONF on every exchange - not too attractive for non-tagged
command operation (i.e., will affect performance).

B. use CRN and add more text regarding target behavior when a duplicate CRN
is received - problem with this is that application clients today do not yet
generate a CRN to use. I've always contended that (for the FC realm) the CRN
should be a transport-based entity (i.e., the FCP driver should be
generating the CRN), given that the SCSI protocol provides no built-in
ordering mechanism:( I lost that battle and thus lobbied for a hook to
provide a CRN in SAM-2. But, recent discussion of CRN as related to SAM-2
may cause the CRN hook to be removed, effectively forcing CRN back into the
FCP driver. The fate of CRN is TBD, until at least next week:)

C. modify/review the pertinent error detection and recovery timers - this
needs to be done anyway (per another issue that popped up regarding the time
to wait for an REC response). The value of RR_TOV is plain wrong (e.g., it
does not take into account the R_A_TOV/2*R_A_TOV wait time) and is (on the
verge?) of being overloaded.

This issue needs to be worked out. FCP-2 implementors feel free to speak up.


> > Issue 3
> > =======
> > Section 12.5.2 states that if a REC response is not received within
> > RA_TOV(els), the initiator shall abort the REC and send another REC in a
> > new exchange.
> >
> > Since the initiator detects the REC timeout only after RA_TOV (or 2 *
> > RA_TOV, as per proposed change in FCP-3) and this time value is larger
> > than RR_TOV, the target would have discarded exchange information after
> > RR_TOV.
> >
> > Hence, what is the point in retrying the REC ? It only exposes the
> > initiator to the issue described under "Issue 2".
> >
> > Any clarifications would be appreciated.
>
>

Again, the timer values needs to be reviewed...dap

> --
> The world is so fast that there are days when the person who says
> it can't be done is interrupted by the person who is doing it.
> 	~ Anon


To Unsubscribe:
mailto:t11_3-request at mail.t11.org?subject=unsubscribe





More information about the T10 mailing list