FCP-2 problem

Baldwin, Dave Dave.Baldwin at emulex.com
Wed Jun 14 10:42:00 PDT 2000


* From the T10 Reflector (t10 at t10.org), posted by:
* "Baldwin, Dave" <Dave.Baldwin at emulex.com>
*
Jim,

I'm sorry I confused you by putting two issues in the same email. I wasn't
proposing a solution to the first problem, only asking for input.

I had already considered an OX_ID reuse scheme, and a scheme to use CRN with
a modified REC and SRR payload. I suspect both solutions have vendors that
can't implement these behaviors.

The OX_ID reuse is not easily controlled. From the initiator's perspective,
it is not immediately reusing the OX_ID. You send OX_ID =1  to a tape device,
then send thousands of commands to other disk devices on the network, and
when OX_ID =1 comes up for reuse it gets sent to the tape device causing the
problem. So, you would need to keep track of OX_ID use on a per LUN basis and
put other target restrictions in (for multi-LUN devices) like I suggested in
the second part of my email. Doing the "n + 1" reuse policy is even uglier
|from the initiator's perspective, but I can see why a target implementation
would vote for this solution (no work to do!). I don't think the performance
degradation in the initiator would be acceptable.

The CRN solution seems better to me, but requires changing the REC and SRR
ELS commands that have been implemented for awhile. It requires driver,
firmware, and in some cases hardware changes to implement. Identifying the
exact command to perform recovery on seems very important to me.

Does anyone see a simpler solution?

Best regards,
Dave Baldwin
Emulex Corporation

"Binford, Charles" wrote:

> *
> * From the fc reflector, posted by:
> * "Binford, Charles" <cbinford at lsil.com>
> *
> I agree with Jim that immediate OX_ID reuse by the initiator is bad.
> However, as Dave said in his original posting, the OX_ID  may have been
> reused for a wide variety of reasons.  I'd support the 'must use n+1
> OX_IDs'
> solution, but let me through out another possibility in case others
> object
> to the OX_ID restriction.
>
> If my memory serves, SRR is FCP specific (i.e. an FC-4 link service, not
> a
> generic ELS).  As such, it is reasonable to put FCP specific hooks in.
> The
> root problem is incorrect identification of the SCSI task to retransmit
> data
> on behalf of.  It is being misinterpreted because of an alias of the
> task
> tag (i.e. the OX_ID).  We could add Command Reference Number and LUN to
> the
> SRR request payload to avoid the alias problem.  Of course this has two
> drawbacks:
> - requires use of CRN (otherwise not needed if single threaded I/O)
> - changes payload of SRR which has been stable for quite a while.
>
> Charles Binford
> LSI Logic Storage Systems
> (316) 636-8566
>
> -----Original Message-----
> From: Jim McGrath [mailto:Jim.McGrath at quantum.com]
> Sent: Wednesday, June 14, 2000 10:41 AM
> To: 'Baldwin, Dave'; Fibre Reflector; T10 Reflector
> Cc: Robert Snively (Brocade)
> Subject: RE: FCP-2 problem
>
> * From the T10 Reflector (t10 at t10.org), posted by:
> * Jim McGrath <Jim.McGrath at quantum.com>
> *
>
> I don't think your solution works.  Specifically, how does the target
> know
> in this example that the initiator has reused OX-ID, since the target
> never
> received the frame containing the associated command?  As far as it
> knows,
> the REC ELS (which the initiator associates with the second, never
> received
> command) is associated still with the first command.
>
> Conversely, suppose the initiator sends the first command, the target
> completes and sends back status (which is then dropped, so the initiator
> never sees it), but then rather than sending the second command and then
> a
> REC ELS the initiator just sends a REC ELS?  The target cannot tell the
> difference between this and the first sequence of events.
>
> Maybe my concern is disallowed by another aspect of the error recovery
> protocol, or will be handled OK at a higher level of error recovery.  It
> just seems dangerous to make assumptions when frames are being dropped
> and
> so the recovery situation is very complicated.
>
> This is why I always advocate a simple, brute force error recovery
> whenever
> possible.  Specifically, the immediate reuse of OX_ID appears to be a
> very
> bad policy to follow.  Normally a window of values is established
> precisely
> to avoid these sorts of problems.  So another solution is to make sure
> you
> do not reuse an OX_ID until you are really sure that this sort of
> problem
> cannot occur.  In this case, using two values in alteration would work
> OK.
> In general use N+1 values if you want N commands outstanding at a time
> (and
> in practice me, being paranoid, would use N+m, where m is >1 to add some
> extra margin for strange situations I am too stupid to foresee).
>
> Jim
>
> ------_=_NextPart_001_01BFD621.824EB7C8
> Content-Type: text/html;
>         charset="iso-8859-1"
> Content-Transfer-Encoding: quoted-printable
>
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">

> > charset=3Diso-8859-1"> > 5.5.2650.12"> > RE: FCP-2 problem > > > > I agree with Jim that immediate OX_ID reuse by the = > initiator is bad.  However, as Dave said in his original posting, = > the OX_ID  may have been reused for a wide variety of = > reasons.  I'd support the 'must use n+1 OX_IDs' solution, but let = > me through out another possibility in case others object to the OX_ID = > restriction. > > If my memory serves, SRR is FCP specific (i.e. an = > FC-4 link service, not a generic ELS).  As such, it is reasonable = > to put FCP specific hooks in.  The root problem is incorrect = > identification of the SCSI task to retransmit data on behalf of.  = > It is being misinterpreted because of an alias of the task tag (i.e. = > the OX_ID).  We could add Command Reference Number and LUN to the = > SRR request payload to avoid the alias problem.  Of course this = > has two drawbacks: > > - requires use of CRN (otherwise not needed if single > = > threaded I/O) > 
- changes payload of SRR which has been stable for = > quite a while. > > > Charles Binford > 
LSI Logic Storage Systems > 
(316) 636-8566 > > 
> > -----Original Message----- > 
From: Jim McGrath [ HREF=3D"mailto:Jim.McGrath at quantum.com">mailto:Jim.McGrath at quantum.com A>] > 
Sent: Wednesday, June 14, 2000 10:41 AM > 
To: 'Baldwin, Dave'; Fibre Reflector; T10 = > Reflector > 
Cc: Robert Snively (Brocade) > 
Subject: RE: FCP-2 problem > > 
> > * From the T10 Reflector (t10 at t10.org), posted = > by: > 
* Jim McGrath <Jim.McGrath at quantum.com> > 
* > > > I don't think your solution works.  = > Specifically, how does the target know > 
in this example that the initiator has reused OX-ID, > = > since the target never > 
received the frame containing the associated = > command?  As far as it knows, > 
the REC ELS (which the initiator associates with the > = > second, never received > 
command) is associated still with the first = > command. > > > Conversely, suppose the initiator sends the first = > command, the target > 
completes and sends back status (which is then = > dropped, so the initiator > 
never sees it), but then rather than sending the = > second command and then a > 
REC ELS the initiator just sends a REC ELS?  = > The target cannot tell the > 
difference between this and the first sequence of = > events. > > > Maybe my concern is disallowed by another aspect of = > the error recovery > 
protocol, or will be handled OK at a higher level of > = > error recovery.  It > 
just seems dangerous to make assumptions when frames > = > are being dropped and > 
so the recovery situation is very = > complicated. > > > This is why I always advocate a simple, brute force = > error recovery whenever > 
possible.  Specifically, the immediate reuse of > = > OX_ID appears to be a very > 
bad policy to follow.  Normally a window of = > values is established precisely > 
to avoid these sorts of problems.  So another = > solution is to make sure you > 
do not reuse an OX_ID until you are really sure that > = > this sort of problem > 
cannot occur.  In this case, using two values = > in alteration would work OK. > 
In general use N+1 values if you want N commands = > outstanding at a time (and > 
in practice me, being paranoid, would use N+m, where > = > m is >1 to add some > 
extra margin for strange situations I am too stupid = > to foresee). > > > Jim > > > > 
> ------_=_NextPart_001_01BFD621.824EB7C8--

*
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at t10.org




More information about the T10 mailing list