FCP-2 problem

Jim McGrath Jim.McGrath at quantum.com
Wed Jun 14 18:13:43 PDT 2000


* From the T10 Reflector (t10 at t10.org), posted by:
* Jim McGrath <Jim.McGrath at quantum.com>
*

Actually, if you are doing 30K IO/s and only have 64K OX_IDs, then you have
a much bigger problem.

Anyone up for 32 bit XO_IDs?

Jim


-----Original Message-----
From: Baldwin, Dave [mailto:Dave.Baldwin at emulex.com]
Sent: Wednesday, June 14, 2000 2:09 PM
To: Robert Snively
Cc: Jim McGrath; Fibre Reflector; T10 Reflector; Charles Binford
(E-mail)
Subject: Re: FCP-2 problem


* From the T10 Reflector (t10 at t10.org), posted by:
* "Baldwin, Dave" <Dave.Baldwin at emulex.com>
*
Robert,

Just to make it perfectly clear, the solution you listed under my name is
NOT
a proposed solution to my first issue! It is a secondary suggestion to
resolve
some multi-LUN issues.

See my previous email for why FCP_CONF doesn't fix everything. Also, we are
sending more than 30,000 I/Os per second, so it is very easy to burn through
64k OX_IDs in RR_TOV!

Best regards,
Dave Baldwin

Robert Snively wrote:

> Proposed solutions:
>
> BALDWIN:
>
> >  I think a good solution is for the target to release all resources
> >  associated with the old command with OX_ID = n (which the target
> >  believes has been completed), when it gets a new OX_ID = n
> >  frame in with
> >  a new command (R_CTL = 6). The reuse of the OX_ID by the
> >  initiator is a
> >  confirmation that the old command has been completed. Since
> >  the target
> >  and initiator both think the old exchange is complete, this should be
> >  sufficient confirmation to get rid of the old information in
> >  the target.
>
> McGRATH:
>
> >
> >  This is why I always advocate a simple, brute force error
> >  recovery whenever
> >  possible.  Specifically, the immediate reuse of OX_ID
> >  appears to be a very
> >  bad policy to follow.  Normally a window of values is
> >  established precisely
> >  to avoid these sorts of problems.  So another solution is to
> >  make sure you
> >  do not reuse an OX_ID until you are really sure that this
> >  sort of problem
> >  cannot occur.  In this case, using two values in alteration
> >  would work OK.
> >  In general use N+1 values if you want N commands outstanding
> >  at a time (and
> >  in practice me, being paranoid, would use N+m, where m is >1
> >  to add some
> >  extra margin for strange situations I am too stupid to foresee).
> >
>
> BINFORD:
>
> > If my memory serves, SRR is FCP specific (i.e. an FC-4 link service,
> > not a generic ELS).  As such, it is reasonable to put FCP specific
> > hooks in.  The root problem is incorrect identification of the SCSI
> > task to retransmit data on behalf of.  It is being misinterpreted
> > because of an alias of the task tag (i.e. the OX_ID).  We could add
> > Command Reference Number and LUN to the SRR request payload to avoid
> > the alias problem.  Of course this has two drawbacks:
>
> >   - requires use of CRN (otherwise not needed if single threaded I/O)
> >   - changes payload of SRR which has been stable for quite a while.
>
> AND NOW SNIVELY:
>
> While BS'ing about a similar problem, several things struck my
> mind:
>
>         This problem requires class 3 behavior (that is probably a
>         good idea anyway, because the complexity of class 2 behavior
>         during errors, including full recovery qualifier discarding,
>         is pretty significant)
>
>         It requires OX_ID re-use to be relatively frequent compared with
>         RR_TOV (probably a bad idea anyway, as shown by Jim.)
>         Fortunately, OX_ID is qualified by D_ID/S_ID, so it is not
>         a resource scarce in bits and extended periods between reuse are
>         easy to achieve.
>
>         This can really only occur on operations without data transfers.
>         Writes trade RX_IDs during XFER_RDY.  Reads trade RX_IDs during
>         read data transfer.  So it is really only in the no data case that
>         you can get a case where an operation was successfully completed
>         without getting hit by an REC to perform a recovery before
>         a new command reuses the OX_ID.
>
>         As a side issue, it becomes a bit trickier if you have lots
>         of logical units.  That increases the probability of
>         encountering a rapid enough turn-over of OX_IDs to
>         create a re-use before RR_TOV.
>
> With this all in mind, one possible solution is to require FCP_CONF
> on SCSI commands performing no data transfer in environments
> performing link-level recovery with rapid OX_ID turn-over.
>
>

*
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at t10.org
*
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at t10.org




More information about the T10 mailing list