FCP-2 problem
Terence Kelleher
terryk at pathlight.com
Wed Jun 14 11:57:49 PDT 2000
* From the T10 Reflector (t10 at t10.org), posted by:
* "Terence Kelleher" <terryk at pathlight.com>
*
Does use of FCP_CONF_REQ on the response resolve the issue? The target will
release the resource after receiving FCP_CONF and the OXID is free. The REC
sent after the lost command will then not receive an ACC, as the OXID is
invalid.
CMD OXID = 1 ----->
<----- FCP_RSP with FCP_CONF_REQ
FCP_CONF ----->
CMD OXID = 1 ---X
REC ----->
<----- LS_REJ
If the FCP_CONF is lost, the target would issue REC to request
retransmission. If the OXID is reused before FCP_CONF is delivered to the
target, the Command is rejected because the OXID refers to an exchange which
is still open.
=============== = Terence M. Kelleher
============= == Principal Engineer - Embedded Software Group
=========== === Pathlight Technology, Inc.
========= ==== 9 Brown Road
======= ===== Ithaca, New York 14850
===== ====== Tel : (607) 266-4000 Ext. 424
=== ======= Fax : (607) 266-0352
P A T H L I G H T
========= email: terryk at pathlight.com
========== website http://www.pathlight.com
> -----Original Message-----
> From: owner-fc at storage.network.com
> [mailto:owner-fc at storage.network.com]On Behalf Of Baldwin, Dave
> Sent: Tuesday, June 13, 2000 11:04 PM
> To: Fibre Reflector; T10 Reflector
> Cc: Robert Snively (Brocade)
> Subject: FCP-2 problem
>
>
> *
> * From the fc reflector, posted by:
> * "Baldwin, Dave" <Dave.Baldwin at emulex.com>
> *
> A serious hole in FCP-2 error recovery has been discovered. I would like
> to solicit input on this issue from concerned parties. The problem can
> occur in many forms with single or multi-LUN targets. Here is the basic
> problem:
>
> Initiator Target
>
> CMD ---------------------------->
>
> 1. A command (e.g. Test Unit Ready) is sent to the target with OX_ID =
> 1.
>
> <--------------------------- Response
>
> 2. A "good" response is sent back to the initiator. The initiator gets
> the response and knows the TUR command has been completed, so the
> exchange resources are freed. The target has sent the response, so it
> saves the exchange information just in case the initiator needs to
> recover a dropped response with REC/SRR.
>
> CMD ----------------------------> X (dropped frame)
>
> 3. A new command (e.g. SPACE forward 1 block) is sent to the target with
> OX_ID = 1. This OX_ID reuse can occur for many reasons in various
> systems. The command never makes it to the target because of a bit
> error.
>
> REC ------------------------------>
>
> 4. The initiator sends an REC ELS command to the target to make sure all
> is well with OX_ID 1.
>
> <------------------------------ ACC
>
> 5. The target sends an ACC to the ELS saying that exchange 1 is complete
> and the initiator has sequence initiative. Unfortunately, the target is
> talking about the TUR command, while the initiator is talking about the
> SPACE command.
>
> SRR -------------------------------->
>
> 6. The initiator sends SRR to get the target to resend the response to
> the SPACE command that it thinks has been dropped.
>
> <-------------------------------- ACC
>
> 7. The target says OK, I'll resend the response for the TUR command.
>
> <------------------------------- RSP
>
> 8. The target resends the TUR response. The initiator sees a "good"
> response (it thinks for the SPACE command), and moves on to the next
> command (maybe a WRITE).
>
> The initiator can now write to the wrong block because it thinks the
> tape has been properly positioned.
>
> I have some preliminary thoughts on what might be done to solve this
> issue, but none of them involve easy fixes. I was hoping someone might
> come up with a simple solution. Any opinions?
>
>
> I have a suggestion for improving a related FCP-2 behavior:
>
> We need to guard against having several outstanding exchanges with the
> same OX_ID from the target's point of view (20 tape drives, individual
> LUNs within one target, whose last exchange executed just happen to
> have the same OX_ID within the timeout period). Otherwise, we have
> recovery issues with REC/SRR because they are not LUN specific (yet
> ;-)).
>
> I think a good solution is for the target to release all resources
> associated with the old command with OX_ID = n (which the target
> believes has been completed), when it gets a new OX_ID = n frame in with
> a new command (R_CTL = 6). The reuse of the OX_ID by the initiator is a
> confirmation that the old command has been completed. Since the target
> and initiator both think the old exchange is complete, this should be
> sufficient confirmation to get rid of the old information in the target.
>
> Best regards,
> Dave Baldwin
> Emulex Corporation
>
>
>
>
*
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at t10.org
More information about the T10
mailing list