T10/97-184R2.TXT

Douglas Hagerman Hagerman at mail.dec.com
Mon Jul 28 21:31:45 PDT 1997


* From the T10 (formerly SCSI) Reflector (t10 at symbios.com), posted by:
* Douglas Hagerman <Hagerman at mail.dec.com>
*


Use of Class 2 for Fibre Channel Tapes              T10/97-184R2.TXT

970728

This paper describes the use of the Fibre Channel Class 2 protocol
when communicating to a tape device that implements the SCSI-3
Streaming Commands (SSC) device model. Some preliminary discussion
on this topic is in document T10/97-155R3.TXT.

This version incorporates a number of comments by Charles Binford
and Bill Martin, and some ideas stolen from Brian Smith. I did not
actually make it all the way through Charles's comments, and hope
to issue another revision in a few days. Hopefully this will suffice
for the Error Detection teleconference on July 29.

1. Scope

1.1 The basic proposal here is to "use Class 2 for tapes".

1.2. The protocol is intended to work using the FC-PH Class 2 behavior
as it applies to switched Fibre Channel, FC-AL, and FCL environments.
This has numerous implications such as, for example, the possibility
of out-of-order delivery of frames within a sequence.

1.3. While this discussion is narrowly targeted at SSC devices, it is
believed that the protocol described here would work for all SCSI-3
device models.

2. Other possibilities

Refer to 97-189Rx.TXT (the Crossroads proposal) for discussion of a
protocol for the use of Class 3 for tapes. This Class 3 proposal
is to poll the target device at certain times to determine the
status of a transfer.

3. FC/FCP/SCSI Reminder

According to the FCP mapping of SCSI to Fibre Channel, each
SCSI command is completely processed within a single Fibre
Channel exchange. If command sequentiality is required, it is to be
managed entirely by the initiator.

Throughout this discussion it is held as a fundamental assumption
that when a SCSI command, contained within the bounds of an exchange,
is issued by the initiator every possible attempt shall be made
to successfully transfer that command, and its associated data and
return status, between the initiator and the target.

Proposals that require commands to be retried at the ULP level, except
in the most severe cases, are not under discussion.

4. Proposal

4.1. Class 2 Concepts and Rules

- Use Class 2 ACK 0 model. This is one ACK per sequence.
- Use E_D_TOV to detect most errors.
- Use a specified large number "n" of retry sequences; n is currently TBD.
- Rely on ULP timeout for detection of certain remaining errors
(mostly target device failures).
- Use existing FCP Information Units (IU) and FC-PH features.
- If E_D_TOV expires before a given sequence expires, retransmit
the entire IU in a new sequence using new sequence ID and counts.
- Follow all existing Class 2 rules regarding the use of ABTS and RRQ.
- Use the same rules whether on a loop or a fabric.

The basic rules are as follows. It is intended that these not
conflict with the normal use of Class 2 as described in FC-PH.

a.) For each exchange, the exchange initiator starts a ULP timer
using a value defined by the SCSI command timeout for the given command.
If the timer expires before the SCSI status is successfully
returned in the FCP_RSP IU, then the exchange and SCSI command have
failed and this is reported to the user's program.

The ULP timer is primarily intended for detection of a failed
target device, and is not used in the error recovery process.

b.) For each sequence within the exchange, the sender starts a
sequence timer with a value of E_D_TOV (as defined on FC-PH page 261)
upon the transmission of the last frame of the sequence. If
the sequence ACK is not received by the time the timer expires, then
[the following is new] the initiator must attempt to determine whether
   i.) the FCP_CMD was lost, or
   ii.) the FCP_CMD was delivered but the ACK and FCP_RSP were lost,
        and the target thinks that the command completed successfully.
At this point the initiator sends an RES (Read Exchange Status) ELS
in a new sequence in the same exchange (passing SI to the recipient).
The reply indicates which case it is: LS_RJT implies i.), ACC implies ii.).
If the RES sequence fails (by E_D_TOV timeout), the initiator resends
the RES up to n times before terminating the exchange.

After the RES is finished, the initiator perform the ABTS process.
In case i.) the FCP_CMD is resent in a new sequence (which will be
the first sequence in the exchange). In case ii.) the initiator
can either [best approach tbd] inform the ULP that the command
was executed but that the results are unknown, or it can try
to get the FCP_RSP info again using an unknown method.

After the ABTS process completes, if the initiator has determined
that the exchange is still intact, then it sends the same IU in a
new sequence. [clarification needed because this isn't one of the
above cases]

The process (send-wait-resend) is repeated n times, where n is a number
that has yet to be determined but will be a fixed feature of
the protocol. The value of n is the same for both the initiator
and the target.

(Previously it was suggested to use the ULP timer for this purpose,
but there are two difficulties with this:

	- The target doesn't know the value of the ULP timer, and
	- Resending the same sequence for 10 minutes seems a bit excessive.)

The same rule is followed by the initiator and the target. Whoever
starts a sequence applies the E_D_TOV timeout test. Context for
each sequence is held until the timer expires or the ACK is received.

c.) If the target is acting as the Sequence Initiator and it is
unable to successfully send the sequence and get the associated ACK,
then the sequence timeout (also E_D_TOV) will cause the sequence
to be aborted.

4.2. Notes on Rules

4.2.x. Use of Relative Offset

This proposal uses the offset information in each frame to define
where data is to be stored in the recipient's buffer. Out-of-order
delivery of frames in a sequence and sequences in an exchange is
supported. In case of an error, the entire sequence containing
the error is discarded, and the data is retransmitted in a new
sequence. There is no frame-level recovery. (Although a sequence
may in some cases only contain one frame.)

4.2.y. The transmission of a single SCSI command (in one exchange)
is performed using the following four steps:

a.) Normal exchange, sequence, and frame delivery, including
correct placement of data into the reciever's buffer.
b.) Detection of errors by missing ACKs or other methods.
c.) Run-down and clean-up of the remains of any failed exchanges,
sequences, and/or frames. This includes the use of ABTS and RRQ
processes.
c.) Retransmission of the SCSI command or data in a new sequence.

A goal of the comparison between this Class 2 proposal and the Class 3
proposal is to attempt to make steps c.) and d.) as similar as possible.

4.2.z. Timers

Review FC-PH 29.2. The E_D_TOV timer is for error detection, currently
said to be 2 seconds in various profiles.

The R_A_TOV timer is for "how long a frame may be held in a fabric".
Currently 120 seconds.

4.2.1. Discard Policy

We have not talked about this, but I think the policy of interest
is "discard a single sequence" (see FC-PH 29.6.1) since that allows
full out-of-order delivery at both the frame and sequence level.

4.2.2. Sequence initiative (SI)

The holder of SI is the sequence initiator. If a node holds SI and
receives a frame for that exchange, it transmits a P_RJT.

SI is considered accepted by the sequence recipient when the ACK for
the sequence is sent. Recovery of SI is the primary reason for
the ABTS requirement. See FC-PH 24.6.4.

This becomes an issue if a sequence that transfers SI is lost and
as a result neither side thinks it has SI.

4.2.3. Sequence Error Detection

Sequence timeout (FC-PH 29.2.4) is the basic Class 2 error detection
method. The sequence timer is started with a value of E_D_TOV, and
if no ACK is received for the sequence within that time, a sequence
timeout has occured.

If a sequence timeout is detected by the sequence initiator, then
it performs the ABTS protocol. The first step in this protocol is
to send an ABTS Basic Link Services frame.

The sequence recipient also runs an E_D_TOV timer, and if all frames for
a sequence have not been received within E_D_TOV (if they arrive out
of order it's ok), a sequence timeout has occured.

If a sequence timeout is detected by the sequence recipient, then
it performs the abnormal sequence termination protocol (FC-PH 29.7.1).
The first step in this process is to return an ACK with the Abort
Sequence Condition bits set.

4.2.4. ABTS process

This is used if the failure is detected by the initiator. See below
for the case where the failure is detected by the recipient.

ABTS may be sent without holding SI (FC-PH 21.2.2).

The sequence initiator sends ABTS, then starts an E_D_TOV timer.
The ABTS frame is considered part of the aborted sequence,
but runs under its own timer, since the sequence timer has
already expired (that's what got us here). The ABTS condition
is indicated by a bit in the F_CTL field of a frame.

After the ABTS is transmitted, the sequence is in an indeterminate
state.

At this point the recipient returns the Basic Accept (BA_ACC, FC-PH 21.2.2).
The BA_ACC payload includes the Recovery Qualifier, a data structure
that allows the initiator and recipient to synchronize their understanding
of the status of the sequence. This is done by providing a list
of SEQ_IDs that are non-deliverable, so that the initiator can be
sure to not send that sequence ID again.

In perhaps the most typical case, a single frame in a multi-frame
sequence will have vanished for some reason. After the timeout,
the initiator sends ABTS in a frame appended to the end of the sequence.
[Even though it has already sent the last frame of the sequence, which
is so marked.] Since at the recipient's end the frame never arrived,
the ABTS indicates a sequence to be aborted. [Note that the recipient
might time out the sequence before it receives the ABTS.] The SEQ_ID
is known. The high SEQ_CNT in the Recovery Qualifier range is
the SEQ_ID of the ABTS frame, while the low SEQ_CNT of the Recovery
Qualifier is that of the first frame in the sequence (probably 1).

Thus by FC-PH 21.2.2.1 "a sequence is in error" (page 135--"ABTS
Recipient"), so a recovery range "shall be established for both N_Ports".

Assuming that the BA_ACC makes it back to the initiator, it then
knows that the sequence has been successfully aborted, and it may
proceed to send the same IU in a new sequence. After E_D_TOV expires
at the recipient end, the recipient may discard its context for that
aborted sequence and proceed.

4.2.5. What if the recipient has never heard of this sequence?

If the first frame of the sequence failed, the recipient will have
no information about the sequence. In this case, when it gets the
ABTS it returns a BA_RJT instead of a BA_ACC.

Regardless of whether it's a BA_RJT or a BA_ACC, when the response
to the ABTS arrives then the initiator knows it's ok to send a
new sequence.

4.2.6. What if the ABTS fails?

If, after sending ABTS, the E_D_TOV timer expires at the initiator before
a BA_ACC or BA_RJT is received, then the initiator sends another ABTS.

4.2.7. What if the BA_ACC fails?

>From the initiator's viewpoint it's the same as above.

4.2.8. What if the BA_RJT fails?

>From the initiator's viewpoint it's the same as above.

4.2.9. RRQ process

The recovery qualifier is a data structure provided by the recipient
to the initiator in the BA_ACC to the ABTS. (FC-PH 29.7.1.1) It describes
a completely qualified exchange and sequence and a range of frames.

The RRQ process is required because the fabric is allowed to hold
frames for up to R_A_TOV time (120 seconds). Thus if an ACK is lost
(i.e. not delivered within E_D_TOV) the goal is to construct a range
of frames that are declared invalid for R_A_TOV time.

For example, if the RRQ process were not used, if a frame were to
suddenly pop out of the fabric with a sequence ID unknown to the
recipient, should the recipient think that this is a mid-sequence
frame for a sequence whose first frame has not yet arrived? If so
it would hold on to it.

Using RRQ, the Recover Qualifier allows invalid frames to be made
known to the recipient in advance.

After R_A_TOV expires, the initiator issues an RRQ Link Service
(FC-PH 21.4.14) in a separate exchange. If the RRQ exchange fails,
a nested ABTS and new subsequent RRQ may be used, up to the nesting
capability of the hardware. [Which is determined how?]

4.2.10. Abnormal Sequence Termination

This is used if the failure is detected by the recipient. See below
for the case where the failure is detected by the initiator.

[more needed here]

4.2.11. R_RDY to Precede Every Expected ACK

It is implied in FC-PH, but not clearly stated (FC-PH 16.3.2, 20.2.1,
20.3.1, table 56) that an R_RDY must be transmitted before each
sequence for which an ACK is expected. This is to insure that a buffer
is available for the ACK frame. This requirement implies that, since
the time until the ACK is returned is unpredictable, the sequence
initiator must have an available ACK buffer before sending
a sequence for which an ACK is expected.

4.2.12 Use of Relative Offset

This is intended to be a straightforward reading of FC-PH.

Sequences and frames are normally transmitted in sequential order
by the sequence initiator. The fabric may reorder both. Thus
the recipient must reassemble frames into sequences and must deliver
complete sequences of data to the upper layer in the proper order.
This is achieved by the use of Relative Offset (FC-PH 18.11, 27, 24).

(Sequence count (i.e. "frame id within sequence", FC-PH 18.8) could
also be used to define the storage location, but is not in this
proposal.

(Sequence ID (i.e. "location of sequence within this exchange",
FC-PH 18.6) cannot be used to define the storage location because if
a sequence fails, the same data will be sent in a new sequence
with a new sequence ID.)

(Sequence Count:  2 Bytes
 Relative Offset: 4 Bytes
 Sequence ID:     1 Byte)

Relative Offset is define with respect to the beginning of the
data to be transmitted in the SCSI command (i.e. Exchange), in
terms of Bytes. This rule defines the Relative Offset Space
(FC-PH 27.2). The use of this rule is indicated by word 2, bit 3 = 1
in the F_CTL field (FC-PH 18.5, table 37). Continuously Increasing
relative offset shall be used (FC-PH 27.6).

The sequence recipient shall present data to the ULP only
after a sequence has been completely received. If frames arrive
out of order, the contents of the frames is stored in the recipient's
buffer until the sequence is complete. Before data is presented
to the ULP, all previous data (as indicated by completeness of
the relative offset values) shall have been presented to the ULP.





4.3. Examples

The following examples show the case of processing an error
frame that occurs during the transfer of an FCP_DATA sequence.
Errors that occur during the many other frames and sequences are
discussed in the notes following the examples.

============================================

Regardless of whether the command is a READ or a WRITE, there are
really only about three cases that need to be handled:

	- first sequence in an exchange, including exchange initiation
	- mid-exchange sequences going in either direction
	- last sequence in an exchange, including exchange completion

The next version of this document will cover these three cases
without specifying whether a given sequence is for READ data or
WRITE data, since they are really the same when considered at the
Fibre Channel level.

[not so sure about this any more]

4.3.1. First Sequence in an Exchange

In this case, in addition to the normal sequence processing there
is some overhead related to starting up the exchange. However, the
procedure for trying to get the sequence to the recipient is the same
as for the mid-exchange case. 

4.3.2. Mid-Exchange Sequences

This is the "normal" case.

4.3.3. Last Sequence in an Exchange

In this case the primary issue is running down the exchange and
releasing various resources.

The specific difficulty is at the target end, because the target
must at some point decide to discard the context for the current
exchange. This is done when the ACK for the last sequence is
received at the target. The last sequence in every exchange must
be from the target to the initiator so that the last event in
the exchange is the return of the ACK to the target. This is currently
the case with FCP.

If the ACK does not make it back to the target, then the target
will (after performing the ABTS protocol) attempt to send the data
again in a new sequence. This process is to be repeated for a number
of times--but that number of times is not currently defined. The
amount of time spent on an exchange before discarding it is related
to the characteristics of the errors on the interconnect. 

============================================

4.4. Mappings of FCP to Class 2

This section shows how FCP is used with Class 2.

4.4.1. WRITE

Example: Transfer of 2 data sequences, each
containing 4 frames of data. Target indicates its ability to accept
data by use of FCP_XFR_RDY IUs.

Initiator          Target
--------------------------------------------

FCP_CMD ---------->

        <----------  ACK
                     Receipt of ACK by initiator indicates FCP_CMD is ok.
                     (FCP_CMD is a single-frame sequence.)

                     A long period of time may be required here
                     for the target to find space for the data or to
                     do some preliminary media positioning.

        <----------  FCP_XFR_RDY
                     Target indicates readiness to accept one
                     sequence of data
ACK     ---------->

        ---------->  DATA sequence ID = 1, CNT = 1 (frame 1)
        ----->X      DATA sequence ID = 1, CNT = 2 (frame 2)
                     Error occurs on interconnect at "X". The frame is lost.

        ---------->  DATA sequence ID = 1, CNT = 3 (frame 3)
        ---------->  DATA sequence ID = 1, CNT = 4 (frame 4)

Initiator has sent all the data frames,
so it starts an E_D_TOV timer for this sequence.

                     Target does not get all the frames, so it doesn't
                     send an ACK. [Does it experience a sequence
                     timeout? Probably E_D_TOV.]

Initiator's timer expires. ACK not received,
so the sequence has failed.
Initiator sends ABTS (with LS bit set) to make sure that this
sequence is aborted. [Required by FC-PH but of no apparent value.]
Initiator retransmits the data in a new sequence.

[Note requirement for support of non-ascending tranfers.]

        ---------->  DATA sequence ID = 2, CNT = 1 (frame 1)
        ---------->  DATA sequence ID = 2, CNT = 2 (frame 2)
        ---------->  DATA sequence ID = 2, CNT = 3 (frame 3)
        ---------->  DATA sequence ID = 2, CNT = 4 (frame 4)

        <----------  ACK

Receipt of ACK by initiator indicates that the sequence is ok.
Initiator may discard context for this sequence.

                     If no additional data for this [mumble] is
                     received by E_D_TOV after sending the ACK,
                     the target may discard context for this sequence.

                     When the tape is ready to receive additional
                     data, it sends another FCP_XFR_RDY.

[The wording of this example is not meant to imply that streaming is not allowed,
but we must verify that streaming works properly.]

        <----------  FCP_XFR_RDY
                     Target indicates readiness to accept one
                     sequence of data
ACK     ---------->

        ---------->  DATA sequence ID = 3, CNT = 1 (frame 1)
        ---------->  DATA sequence ID = 3, CNT = 2 (frame 2)
        ---------->  DATA sequence ID = 3, CNT = 3 (frame 3)
        ---------->  DATA sequence ID = 3, CNT = 4 (frame 4)

        <----------  ACK

                     Receipt of ACK indicates that the sequence is ok.
                     Proceed to next sequence.

                     Target observes that it has enough data to
                     satisfy the requirements of the SCSI WRITE command,
                     so it sends the SCSI status back.

        <----------  FCP_RSP
                     With SCSI Status.
ACK     ---------->
                     Target closes exchange and deletes command context.

Initiator waits for E_D_TOV after sending ACK to make sure no more
sequences are coming. (This would be a resend of the FCP_RSP if the
last ACK had been lost on its way to the target.) After timout expires,
Initiator discards context for this exchange.

============================================

4.4.2. READ

Example: Transfer of 2 data sequences, each containing 4 frames
of data. Assume the host can accept all the data specified in the command.

Initiator          Target
--------------------------------------------

FCP_CMD ---------->
        <----------  ACK
                     Receipt of ACK by initiator indicates FCP_CMD is ok.
                     (FCP_CMD is a single-frame sequence.)

                     A long period of time may be required here
                     for the target to get the data from the media.

        <----------  DATA sequence ID = 1, CNT = 1 (frame 1)

             X<----  DATA sequence ID = 1, CNT = 2 (frame 2)
                     Error occurs on interconnect at "X". The frame is lost.

        <----------  DATA sequence ID = 1, CNT = 3 (frame 3)
        <----------  DATA sequence ID = 1, CNT = 4 (frame 4)

                     Target has sent all the data frames,
                     so it starts an E_D_TOV timer for this sequence.

Initiator does not get all the frames, so it doesn't send an ACK.
[Does it experience a sequence timeout? Probably E_D_TOV.]

                     Target's timer expires. ACK not received,
                     so the sequence has failed. Retransmit sequence.

[Does frame header give enough info to allow Initiator to put
the re-sent data in the right place?]
[Note requirement for support of non-ascending tranfers.]

        <----------  DATA sequence ID = 2, CNT = 1 (frame 1)
        <----------  DATA sequence ID = 2, CNT = 2 (frame 2)
        <----------  DATA sequence ID = 2, CNT = 3 (frame 3)
        <----------  DATA sequence ID = 2, CNT = 4 (frame 4)

ACK     ---------->
                     Receipt of ACK indicates that the sequence is ok.
                     Proceed to next sequence. [Not meant to imply
                     that streaming is not allowed.]

        <----------  DATA sequence ID = 3, CNT = 1 (frame 1)
        <----------  DATA sequence ID = 3, CNT = 2 (frame 2)
        <----------  DATA sequence ID = 3, CNT = 3 (frame 3)
        <----------  DATA sequence ID = 3, CNT = 4 (frame 4)

ACK     ---------->
                     Receipt of ACK indicates that the sequence is ok.
                     Proceed to next sequence.

        <----------  FCP_RSP
                     With SCSI Status.
ACK     ---------->
                     Target closes exchange and deletes command context.

Initiator waits for E_D_TOV after sending ACK to make sure no more
sequences are coming. (This would be a resend of the FCP_RSP if the
last ACK had been lost on its way to the target.) After timout expires,
Initiator discards context for this exchange.

============================================

4.5. Notes on Examples

Any frame in the exchange may fail. The following lists the
handling of each possible case. This is intended to follow
the FC-PH protocol exactly.

4.5.1. Loss of FCP_CMD single-frame sequence. In this case the
target never sees the command. When initiator's E_D_TOV timer
expires it first sends an ABTS with the LS bit set. This has
no effect on the target (which simply ignores this attempt to
abort a sequence in an exchange it has never heard of).
Then the initiator resends the FCP_CMD IU using the same information
and the same exchange identifier in a new sequence. 

[See FC-PH 29.7.1.1. for discussion of ABTS at start of exchange.
[Note need for BA_ACC frame to acknowledge the ABTS. What's the point?]

4.5.2. Loss of ACK after FCP_CMD. In this case the target saw the command
and has constructed exchange context. When initiator's E_D_TOV timer
expires it first sends an ABTS with the LS bit set; this aborts
the sequence but maintains the exchange. Then it resends the FCP_CMD IU
using the same information and the same exchange identifier in a
new sequence. 

4.5.3 Loss of ABTS after loss of FCP_CMD sequence. In this case
the target has not seen the command anyway. [Check for ACCEPT needed
for handshake on ABTS.]

4.5.4. Loss of ABTS after loss of ACK after FCP_CMD sequence. In this
case the target has constructed exchange context. When the FCP_CMD
is resent by the initiator, the target observes that the exchange
identifier is for an existing exchange and thus ignores the redundant
information.

4.5.5. Loss of ACK after FCP_CMD, then loss of FCP_RSP.
This initiator doesn't get anything back-so how does it know
whether the command was lost or executed? This is why the RES
is needed before the ABTS.


5. Advantages of Class 2

- Uses existing FC-PH features.
- Uses existing FCP Information Units.
- Good opportunity to automate the processing of each exchange.
- Sequence Initiative management is already defined in FC-PH.
- Sequence streaming is already defined in FC-PH.


------------------------------------------------
Doug Hagerman
Storage Architecture
Digital Equipment Corporation
+1 508 841 2145
please reply to hagerman at mail.dec.com
regardless of what the reply-to: field says
*
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at symbios.com




More information about the T10 mailing list