FCP-2 problem

Neil T. Wanamaker ntw at crossroads.com
Wed Jun 14 10:56:56 PDT 2000

* From the T10 Reflector (t10 at t10.org), posted by:
* "Neil T. Wanamaker" <ntw at crossroads.com>
At 11:56 AM 6/14/00 -0500, Binford, Charles wrote:

>I agree with Jim that immediate OX_ID reuse by the initiator is 
>bad.  However, as Dave said in his original posting, the OX_ID  may have 
>been reused for a wide variety of reasons.  I'd support the 'must use n+1 
>OX_IDs' solution, but let me through out another possibility in case 
>others object to the OX_ID restriction.
>If my memory serves, SRR is FCP specific (i.e. an FC-4 link service, not a 
>generic ELS).  As such, it is reasonable to put FCP specific hooks 
>in.  The root problem is incorrect identification of the SCSI task to 
>retransmit data on behalf of.  It is being misinterpreted because of an 
>alias of the task tag (i.e. the OX_ID).  We could add Command Reference Number
>and LUN to the SRR request payload to avoid the alias problem.  Of course 
>this has two drawbacks:
>- requires use of CRN (otherwise not needed if single threaded I/O)
>- changes payload of SRR which has been stable for quite a while.

I would have to agree that if SRR were being done today, these would be 
included; at the time the REC/SRR were originally drawn up, there were no 
CRNs; the CRN isn't really useful without the associated LUN.

This, however skirts the real issue here, which is that of avoiding the 
existence of frames with duplicate identifiers in the fabric at the same time.

Prior to FCP-2, the initiator correctly understood that when the target 
sent the response, he had completed the exchange, and so when the initiator 
received the response and released its resources (including OX_ID), that 
exchange was known to be complete by both parties. Reuse of the OX_ID was 
perfectly OK, since a new frame with that OX_ID unambiguously belonged to 
the original exchange.

With the advent of FCP-2, this is no longer the case. Receipt of the 
response does not imply that the target knows the exchange to be complete, 
and the initiator must assume the target is still retaining the original 
OX_ID until either receipt of confirmation that the target has received the 
next command (untagged) or REC_TOV has expired. This is admittedly untidy, 
but with the tape devices for which this was intended, not too onerous.

>Charles Binford
>LSI Logic Storage Systems
>(316) 636-8566
>-----Original Message-----
>From: Jim McGrath 
>[mailto:Jim.McGrath at quantum.com]
>Sent: Wednesday, June 14, 2000 10:41 AM
>To: 'Baldwin, Dave'; Fibre Reflector; T10 Reflector
>Cc: Robert Snively (Brocade)
>Subject: RE: FCP-2 problem
>* From the T10 Reflector (t10 at t10.org), posted by:
>* Jim McGrath <Jim.McGrath at quantum.com>
>I don't think your solution works.  Specifically, how does the target know
>in this example that the initiator has reused OX-ID, since the target never
>received the frame containing the associated command?  As far as it knows,
>the REC ELS (which the initiator associates with the second, never received
>command) is associated still with the first command.
>Conversely, suppose the initiator sends the first command, the target
>completes and sends back status (which is then dropped, so the initiator
>never sees it), but then rather than sending the second command and then a
>REC ELS the initiator just sends a REC ELS?  The target cannot tell the
>difference between this and the first sequence of events.
>Maybe my concern is disallowed by another aspect of the error recovery
>protocol, or will be handled OK at a higher level of error recovery.  It
>just seems dangerous to make assumptions when frames are being dropped and
>so the recovery situation is very complicated.
>This is why I always advocate a simple, brute force error recovery whenever
>possible.  Specifically, the immediate reuse of OX_ID appears to be a very
>bad policy to follow.  Normally a window of values is established precisely
>to avoid these sorts of problems.  So another solution is to make sure you
>do not reuse an OX_ID until you are really sure that this sort of problem
>cannot occur.  In this case, using two values in alteration would work OK.
>In general use N+1 values if you want N commands outstanding at a time (and
>in practice me, being paranoid, would use N+m, where m is >1 to add some
>extra margin for strange situations I am too stupid to foresee).

* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at t10.org

More information about the T10 mailing list