FCP-2: Lost FCP_CMND, Unacknowledged classes.

Santosh Rao santoshr at cup.hp.com
Tue Jun 25 17:48:20 PDT 2002


* From the T10 Reflector (t10 at t10.org), posted by:
* Santosh Rao <santoshr at cup.hp.com>
*
Hello,

We have 3 issues regarding Annexe C Fig C.2 and the text in Section 8.2
for REC.

Issue 1
=======
Section 8.2 states :

"If the destination FCP_Port of the REC request determines that the
originator S_ID, OX_ID, RX_ID or task retry id are inconsistent, it
shall respond with a FCP_RJT with a rsn_code of "unable to perform
command request" and rsn_expln of "invalid OXID-RXID combination".

Annex C Fig C.2 states :

"The LS_RJT (Logical Error, Invalid OXID-RXID combination) for the REC
indicates that the exchange is unknown."

The 2 quoted sections above are inconsistent with the reason code of the
FCP_RJT to be used. From the target's perspective, when it receives a
REC with an OXID-RXID combination for which it has no exchange state,
both the above sections of FCP-2 hold good.

Which is the reason code to be returned in this case ?

Issue 2
=======
How should the initiator differentiate b/n a FCP_RJT from a target due
to a lost FCP_CMD (the scenario described in Annexe C Fig C.2) and the
case where the target has discarded exchange state due to the expiration
of RR_TOV after sending FCP_RSP.

In both the above cases, our interpretation of FCP-2 is that the
initiator will see a FCP_RJT response to the REC with :
rsn_code = "Logical Error" or "Unable to perform command request"
rsn_expln = "Invalid OXID-RXID combination"

In this case, the initiator cannot apply the same error recovery for the
2 cases. In the lost FCP_CMND case, the initiator may safely re-issue
the command. The latter case could occur in the following manner :

- Initiator issues a command which does not involve data xfer. 
- Target sends FCP_RSP, FCP_RSP is lost.
- Initiator REC_TOV timer pops and initiators sends REC.
- REC times out after RA_TOVels (which is > RR_TOV, for fabric)
- Initiator aborts REC and issues another REC
- Target sends FCP_RJT response since it has discarded the exchange
state.

In the above case, the initiator MUST NOT re-issue the FCP_CMND, since
this can potentially cause a data corruption with tape devices. (ex :
re-issuing a scsi command like SPACE, WRITE FILEMARKS when they had
previously been executed successfully can cause tape data corruption.)

Can someone clarify on how FCP-2 differentiates these 2 cases ? Without
the ability to differentiate between these 2 cases, the use of SLER in a
lost FCP_CMND scenario can result in potential data corruption with tape
devices.


Issue 3
=======
Section 12.5.2 states that if a REC response is not received within
RA_TOV(els), the initiator shall abort the REC and send another REC in a
new exchange.

Since the initiator detects the REC timeout only after RA_TOV (or 2 *
RA_TOV, as per proposed change in FCP-3) and this time value is larger
than RR_TOV, the target would have discarded exchange information after
RR_TOV.

Hence, what is the point in retrying the REC ? It only exposes the
initiator to the issue described under "Issue 2".

Any clarifications would be appreciated.

Thanks,
Santosh



-- 
The world is so fast that there are days when the person who says 
it can't be done is interrupted by the person who is doing it.
	~ Anon
*
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at t10.org




More information about the T10 mailing list