FCP-2 problem

Binford, Charles cbinford at lsil.com
Wed Jun 14 09:56:32 PDT 2000


* From the T10 Reflector (t10 at t10.org), posted by:
* "Binford, Charles" <cbinford at lsil.com>
*
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_001_01BFD621.824EB7C8
Content-Type: text/plain; charset="iso-8859-1"

I agree with Jim that immediate OX_ID reuse by the initiator is bad.
However, as Dave said in his original posting, the OX_ID  may have been
reused for a wide variety of reasons.  I'd support the 'must use n+1
OX_IDs' solution, but let me through out another possibility in case
others object to the OX_ID restriction.

If my memory serves, SRR is FCP specific (i.e. an FC-4 link service, not
a generic ELS).  As such, it is reasonable to put FCP specific hooks in.
The root problem is incorrect identification of the SCSI task to
retransmit data on behalf of.  It is being misinterpreted because of an
alias of the task tag (i.e. the OX_ID).  We could add Command Reference
Number and LUN to the SRR request payload to avoid the alias problem.
Of course this has two drawbacks:

- requires use of CRN (otherwise not needed if single threaded I/O) 
- changes payload of SRR which has been stable for quite a while. 

Charles Binford 
LSI Logic Storage Systems 
(316) 636-8566 


-----Original Message----- 
From: Jim McGrath [ mailto:Jim.McGrath at quantum.com
 ] 
Sent: Wednesday, June 14, 2000 10:41 AM 
To: 'Baldwin, Dave'; Fibre Reflector; T10 Reflector 
Cc: Robert Snively (Brocade) 
Subject: RE: FCP-2 problem 


* From the T10 Reflector (t10 at t10.org), posted by: 
* Jim McGrath <Jim.McGrath at quantum.com> 
* 

I don't think your solution works.  Specifically, how does the target
know 
in this example that the initiator has reused OX-ID, since the target
never 
received the frame containing the associated command?  As far as it
knows, 
the REC ELS (which the initiator associates with the second, never
received 
command) is associated still with the first command. 

Conversely, suppose the initiator sends the first command, the target 
completes and sends back status (which is then dropped, so the initiator

never sees it), but then rather than sending the second command and then
a 
REC ELS the initiator just sends a REC ELS?  The target cannot tell the 
difference between this and the first sequence of events. 

Maybe my concern is disallowed by another aspect of the error recovery 
protocol, or will be handled OK at a higher level of error recovery.  It

just seems dangerous to make assumptions when frames are being dropped
and 
so the recovery situation is very complicated. 

This is why I always advocate a simple, brute force error recovery
whenever 
possible.  Specifically, the immediate reuse of OX_ID appears to be a
very 
bad policy to follow.  Normally a window of values is established
precisely 
to avoid these sorts of problems.  So another solution is to make sure
you 
do not reuse an OX_ID until you are really sure that this sort of
problem 
cannot occur.  In this case, using two values in alteration would work
OK. 
In general use N+1 values if you want N commands outstanding at a time
(and 
in practice me, being paranoid, would use N+m, where m is >1 to add some

extra margin for strange situations I am too stupid to foresee). 

Jim 


------_=_NextPart_001_01BFD621.824EB7C8
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: Quoted-Printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">

RE: FCP-2 problem I agree with Jim that immediate OX_ID reuse by the = initiator is bad.  However, as Dave said in his original posting, = the OX_ID  may have been reused for a wide variety of = reasons.  I'd support the 'must use n+1 OX_IDs' solution, but let = me through out another possibility in case others object to the OX_ID = restriction. If my memory serves, SRR is FCP specific (i.e. an = FC-4 link service, not a generic ELS).  As such, it is reasonable = to put FCP specific hooks in.  The root problem is incorrect = identification of the SCSI task to retransmit data on behalf of.  = It is being misinterpreted because of an alias of the task tag (i.e. = the OX_ID).  We could add Command Reference Number and LUN to the = SRR request payload to avoid the alias problem.  Of course this = has two drawbacks: - requires use of CRN (otherwise not needed if single = threaded I/O) 
- changes payload of SRR which has been stable for = quite a while. Charles Binford 
LSI Logic Storage Systems 
(316) 636-8566 
-----Original Message----- 
From: Jim McGrath [mailto:Jim.McGrath at quantum.com] 
Sent: Wednesday, June 14, 2000 10:41 AM 
To: 'Baldwin, Dave'; Fibre Reflector; T10 = Reflector 
Cc: Robert Snively (Brocade) 
Subject: RE: FCP-2 problem 
* From the T10 Reflector (t10 at t10.org), posted = by: 
* Jim McGrath <Jim.McGrath at quantum.com> 
* I don't think your solution works.  = Specifically, how does the target know 
in this example that the initiator has reused OX-ID, = since the target never 
received the frame containing the associated = command?  As far as it knows, 
the REC ELS (which the initiator associates with the = second, never received 
command) is associated still with the first = command. Conversely, suppose the initiator sends the first = command, the target 
completes and sends back status (which is then = dropped, so the initiator 
never sees it), but then rather than sending the = second command and then a 
REC ELS the initiator just sends a REC ELS?  = The target cannot tell the 
difference between this and the first sequence of = events. Maybe my concern is disallowed by another aspect of = the error recovery 
protocol, or will be handled OK at a higher level of = error recovery.  It 
just seems dangerous to make assumptions when frames = are being dropped and 
so the recovery situation is very = complicated. This is why I always advocate a simple, brute force = error recovery whenever 
possible.  Specifically, the immediate reuse of = OX_ID appears to be a very 
bad policy to follow.  Normally a window of = values is established precisely 
to avoid these sorts of problems.  So another = solution is to make sure you 
do not reuse an OX_ID until you are really sure that = this sort of problem 
cannot occur.  In this case, using two values = in alteration would work OK. 
In general use N+1 values if you want N commands = outstanding at a time (and 
in practice me, being paranoid, would use N+m, where = m is >1 to add some 
extra margin for strange situations I am too stupid = to foresee). Jim 
------_=_NextPart_001_01BFD621.824EB7C8--




More information about the T10 mailing list