FCP-2 problem

Terence Kelleher terryk at pathlight.com
Wed Jun 14 11:57:49 PDT 2000


* From the T10 Reflector (t10 at t10.org), posted by:
* "Terence Kelleher" <terryk at pathlight.com>
*
Does use of FCP_CONF_REQ on the response resolve the issue? The target will
release the resource after receiving FCP_CONF and the OXID is free. The REC
sent after the lost command will then not receive an ACC, as the OXID is
invalid.

CMD OXID = 1 ----->

	       <----- FCP_RSP with FCP_CONF_REQ

FCP_CONF     ----->

CMD OXID = 1 ---X

REC          ----->

	       <----- LS_REJ


If the FCP_CONF is lost, the target would issue REC to request
retransmission. If the OXID is reused before FCP_CONF is delivered to the
target, the Command is rejected because the OXID refers to an exchange which
is still open.


=============== =   Terence M. Kelleher
=============  ==   Principal Engineer - Embedded Software Group
===========   ===   Pathlight Technology, Inc.
=========    ====   9 Brown Road
=======     =====   Ithaca, New York 14850
=====      ======   Tel	: (607) 266-4000 Ext. 424
===       =======   Fax  : (607) 266-0352
P A T H L I G H T
         =========   email: terryk at pathlight.com
        ==========   website http://www.pathlight.com



 > -----Original Message-----
 > From: owner-fc at storage.network.com
 > [mailto:owner-fc at storage.network.com]On Behalf Of Baldwin, Dave
 > Sent: Tuesday, June 13, 2000 11:04 PM
 > To: Fibre Reflector; T10 Reflector
 > Cc: Robert Snively (Brocade)
 > Subject: FCP-2 problem
 >
 >
 > *
 > * From the fc reflector, posted by:
 > * "Baldwin, Dave" <Dave.Baldwin at emulex.com>
 > *
 > A serious hole in FCP-2 error recovery has been discovered. I would like
 > to solicit input on this issue from concerned parties. The problem can
 > occur in many forms with single or multi-LUN targets. Here is the basic
 > problem:
 >
 > Initiator                                        Target
 >
 > CMD ---------------------------->
 >
 > 1. A command (e.g. Test Unit Ready) is sent to the target with OX_ID =
 > 1.
 >
 >            <---------------------------     Response
 >
 > 2. A "good" response is sent back to the initiator. The initiator gets
 > the response and knows the TUR command has been completed, so the
 > exchange resources are freed. The target has sent the response, so it
 > saves the exchange information just in case the initiator needs to
 > recover a dropped response with REC/SRR.
 >
 > CMD ---------------------------->  X (dropped frame)
 >
 > 3. A new command (e.g. SPACE forward 1 block) is sent to the target with
 > OX_ID = 1. This OX_ID reuse can occur for many reasons in various
 > systems. The command never makes it to the target because of a bit
 > error.
 >
 > REC ------------------------------>
 >
 > 4. The initiator sends an REC ELS command to the target to make sure all
 > is well with OX_ID 1.
 >
 >           <------------------------------   ACC
 >
 > 5. The target sends an ACC to the ELS saying that exchange 1 is complete
 > and the initiator has sequence initiative. Unfortunately, the target is
 > talking about the TUR command, while the initiator is talking about the
 > SPACE command.
 >
 > SRR -------------------------------->
 >
 > 6. The initiator sends SRR to get the target to resend the response to
 > the SPACE command that it thinks has been dropped.
 >
 >          <--------------------------------  ACC
 >
 > 7. The target says OK, I'll resend the response for the TUR command.
 >
 >           <------------------------------- RSP
 >
 > 8. The target resends the TUR response. The initiator sees a "good"
 > response (it thinks for the SPACE command), and moves on to the next
 > command (maybe a WRITE).
 >
 > The initiator can now write to the wrong block because it thinks the
 > tape has been properly positioned.
 >
 > I have some preliminary thoughts on what might be done to solve this
 > issue, but none of them involve easy fixes. I was hoping someone might
 > come up with a simple solution. Any opinions?
 >
 >
 > I have a suggestion for improving a related FCP-2 behavior:
 >
 > We need to guard against having several outstanding exchanges with the
 > same OX_ID from the target's point of view (20 tape drives, individual
 > LUNs within one target,  whose last exchange executed just happen to
 > have the same OX_ID within the timeout period). Otherwise, we have
 > recovery issues with REC/SRR because they are not LUN specific (yet
 > ;-)).
 >
 > I think a good solution is for the target to release all resources
 > associated with the old command with OX_ID = n (which the target
 > believes has been completed), when it gets a new OX_ID = n frame in with
 > a new command (R_CTL = 6). The reuse of the OX_ID by the initiator is a
 > confirmation that the old command has been completed. Since the target
 > and initiator both think the old exchange is complete, this should be
 > sufficient confirmation to get rid of the old information in the target.
 >
 > Best regards,
 > Dave Baldwin
 > Emulex Corporation
 >
 >
 >
 >

*
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at t10.org




More information about the T10 mailing list