FCL Error Recovery Policy Recommendations
Binford, Charles
cbinford at ppdpost.ks.symbios.com
Wed Sep 24 13:39:00 PDT 1997
* From the T10 (formerly SCSI) Reflector (t10 at symbios.com), posted by:
* "Binford, Charles" <cbinford at ppdpost.ks.symbios.com>
*
See comments below.
Charles Binford
Symbios Logic
----------
>From: t10-owner
>To: disk_attach; scsi; fc; jmcgrath
>Subject: FCL Error Recovery Policy Recommendations
>Date: Wednesday, September 24, 1997 10:24AM
>
>---------------------------------------------------------------------------
---
>* From the T10 (formerly SCSI) Reflector (t10 at symbios.com), posted by:
>* jmcgrath at qntm.com (Jim Mcgrath)
>*
.... stuff deleted .....
>
> Second, we created a matrix for error recovery with two dimensions:
> the degree of threadedness between target and inititiator
> (no queueing, so only one command outstanding at a time,
> and queueing, so multiple commands (and thus errors) are
> outstanding at a time), and
> the level of error recovery
> (sequence, which could mean data retransmission without
> respect to data sequence boundaries, and exchange, or
> simply retrying the command):
>
>
>
> Are IO's queued? Sequence or Exchange Comment
> Level Error recovery
>
> -----------------------------------------------------------------
> No Exchange FC-AL today
> No Sequence Tape today - Crossroads
> class 3 proposal
> Yes Exchange Extend Crossroads and/or
> Doug's class 2
> Yes Sequence NEVER DO
>
This chart may be a bit mis-leading to those who were not on the call.
Taken literally, one could come to the conclusion that the SSWG believes
there is no queuing in FC-AL today. That, of course, is not true. Here is
my interpretation of what we said plus some additional commentary:
Are IO's queued? Sequence or Exchange Comment
Level Error recovery
-----------------------------------------------------------------
1) No Exchange Simple case of # 3) below
2) No Sequence Tape today - Crossroads
class 3 proposal
3) Yes Exchange Early Error Detect possible
4) Yes Sequence NEVER DO
General Commentary:
(sorry if I got a little long winded, couldn't help myself :-) )
The backdrop for this entire discussion (at least for me) is developing a
mechanism to detect and recover link related errors before a ULP timer
expires. This issue was brought into focus by the tape guys because of the
length of the ULP timers required for tapes (minutes in some cases). In
addition to link error detection prior to ULP timer, tape has an additional
need for sequence level recovery. A simple retry of the IO (exchange)
doesn't work with a sequential model. The Crossroads proposal is aimed at
solving both of these problems. It works because tapes are (today) limited
to a queue depth of 1.
At various times during the discussion of the Crossroads proposal, attempts
have been made to generalize it so us disk and RAID guys could take
advantage of it also. Even though our ULP timeouts are generally measured
in seconds instead of minutes, the need for link error detection and
recovery before a ULP timeout is still there (at least for some). However,
the Crossroads proposal just didn't scale to a generalized queued
environment. However, what was discussed in SSWG call, was the fact that
the reason the Crossroads proposal didn't scale to queued IO dealt with the
sequence level recovery. Targets are required to hold on to exchange
information AFTER completion of the IO in case the FCP_RSP was lost.
Holding on to exchange info for 2xR_A_TOV is not possible for a disk/RAID
device doing 5,000 to 10,000 IOs/sec. But, if disk/RAID devices were going
to do exchange level recovery anyway, then holding on the completed exchange
info IS NOT NECESSARY. For disk/RAID, the "did the FCP_CMD or FCP_RSP get
lost" is a don't care. Just re-issue the IO.
Therefore, # 3) above is suggesting an error detection mechanism similar to
the Crossroads proposal, but the error recovery side the same as in PLDA.
In other words, disk/RAID devices could implement the new REC link service.
If a host adapter/driver wanted to, it could time exchanges at an interval
which would allow link error detection and recovery before a ULP timeout.
If this new, shorter timer expired, issue the REC. If the drive answers,
yes, I've still got the exchange, then wait. If the answer is, don't know
about that one, then ABTS, and re-issue the IO as in PLDA.
*
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at symbios.com
More information about the T10
mailing list