discussion of tapes on fibre channel loops

Doug Hagerman, 508-841-2145, Flames to NL: 17-Jun-1996 1136 hagerman at starch.enet.dec.com
Mon Jun 17 08:38:06 PDT 1996

* From the SCSI Reflector, posted by:
* "Doug Hagerman, 508-841-2145, Flames to NL:  17-Jun-1996 1136" <hagerman at starch.ENET.dec.com>
This is in reference to comments from Mike O'Donnell regarding the
May, 1996, SSC discussion about Fibre Channel tapes.

Hi Mike.

(Michael O'Donnell, mike_odonnell at stortek.com              (303) 673-3286)

Thanks for your comments. As you know the discussion continued at the
Fibre Channel working group meeting last week and will undoubtably continue
at the SSC meeting next month.


>I assume the intent of the discussions that ensued (with
>regards to error recovery scenarios) were to define the 
>minimum command set needed to support that error recovery.

My goal is to get an agreement between all players as to what the
basic error recovery model is for tapes, and to get it recorded somewhere.
Ultimately what would be cool would be an industry-wide tape model
that is independent of the underlying media. This doesn't seem to be
a popular idea right now but perhaps a step in that direction can be made.

>I would be very interested in understanding precisely what
>data integrity applications are willing to 'give up'. 
>A standard should should not force an integrator into a minimum 
>data integrity requirement, but understanding the system integrators
>expectations and limitations would ease the task of determining
>minimum error recovery functionality. 

My understanding of the system integration expectation is that for
"data" applications the primary requirement is "no data loss" even
at the cost of some performance reduction, while for the "bandwidth"
applications the primary requirement is to maximize bandwidth at the
cost of potential loss of data.

>> 1. For the first type of system, transfer sizes as
>> described by a single SCSI command (i.e. one Fibre
>> Channel exchange) do not exceed 1 MByte in size. In fact,
>> the group chose that number as one with a considerable
>> safety factor since known applications specify less than
>> 64 kB in one command. For this type of system there is a
>> high expectation that the data must be recorded correctly
>> on the tape, and the operating system is willing to
>> manage this requirement. This will be done as follows.
>> If the tape drive detects any sort of error in
>> transferring the data over the interconnect, it stops
>> processing this command. Further data for this command is
>> discarded. The target returns an FCP_RSP indicating a
>> "CHECK CONDITION" status to the initiator. The drive
>> continues to process any previously issued commands. The
>Any previously issued commands that DO NOT affect the ability
>of the host to reliably recover the failed operations
>can be executed. Otherwise media altercation could render
>recovery attempts unusable.

The goal is to perform a READ POSITION to find out the result of
the previous commands. I do not think that the tape can predict
what operations should and should not be executed based on potential
future efforts to recover data.

>> host (initiator)will, in response to the status, issue a
>> READ POSITION command. The tape drive will eventually
>> complete all previously issued commands and then report
>> the tape position at that time. This position is the
>> position of the tape before the receipt of the failed
>> command. 
>Before the receipt of the failed command?  Wouldn't the 
>initiator rather know where the drive is now?  If the
>drive has other commands queued, it must now remember where
>it's positioned before executing EVERY command on the assumption
>that the command might fail, and the drive would have to 
>reposition back to that location.     

No, the initiator wants to know where the drive is when all the previous
commands complete. If the write-to-media operation fails (after
the command has already been reported as complete in the SCSI STATUS
message) then it's a deferred error, and the initiator may or may
not be sophisticated enough to reposition back to where it wanted
to be several commands ago.

>> At this point the tape drive does no further
>> activity. (The drive does not attempt any automatic
>> repositioning.)
>No automatic repositioning AFTER the Read Position command

Correct, but in normal operation the drive won't be doing any repositioning
anyway. The goal is to divide the problem into two parts: getting
the data into the drive's buffer and then, afterwards, getting the data
onto the media. The second part is completely the job of the drive.

>> Based on the reported position, the host will calculate
>> and issue a suitable new command to continue or restart
>> the failed command. Normally this will be a complete
>> retransmission of the command. Since the time to send a
>> single command of this size is relatively small, the
>> performance cost of this approach is minimal.
>> Note that this activity takes place at the device's input
>> buffer. If there are queued commands in the device, the
>> READ POSITION command shall report the position after all
>> the previous commands are completed. Obviously in a tape
>> environment command sequentiality must be maintained.
>This conflicts with the previous paragraphs and enforces 
>my assertion that the host really wants to know where the 
>drive is position when the READ POSITION command is 
>processesd.  Drive manufacturers can implement either
>(with this second approach being the more informative).

Perhaps the earlier wording is unclear. The drive is to complete
all previous commands, then perform a READ POSITION. The drive may
decide to report the results of the READ POSITION before it completes
those commands if it is confident that it can get to that point on
the tape. The goal is to process the previous commands, report the
tape position, and then have the initiator send down a new command with
confidence that everything it's done before is complete.

>> Another case that was discussed was the possibility that
>> the interconnect successfully transfers the data but the
>> drive is unable to handle it. An example of this is when
>> a compression engine is unable to process incoming data
>> and the drive needs to request it again. The drive may
>> issue an FCP XFR_RDY specifying the retransmission of
>> data that has already been sent.
>Is there an open issue here?

I don't think so.

>> Another point that was discussed but not verified was the
>> operation of the FC-2 layer in the case of a detected EDC
>> mismatch (see error scenario 7 below). It was stated that
>Where is this error scenario described?

Further down in the original paper, which was attached to the
one you're commenting on--I hope so at least; it was supposed to be...

>> after the mismatch is detected, the FC-2 layer actually
>> swallows all further frames in that sequence and sends a
>> notification up to the next higher layer. This does not
>> change the description below except that the remaining
>> frames are not delivered to the upper layer.
>Which model (reliable or sceismic data) is being referred
>to?  If an EDC mismatch is detected in the initial transfer
>of a record from a drive (read), not only is there a potential
>for drive repositioning, but a heck of a lot of data could have
>to be Bit bucketed.  For short records bit bucketing is probably
>not an issue.  Repositioning drives for large sceismic records

Both. The EDC is on the FC frame, and it won't make it up to
the upper layers. Drive repositioning may be a "potential", but is
specifically being excluded here in an attempt to make things
cleaner. "A heck of a lot of data" goes by in a heck of a short time
in Fibre Channel. At least that is the theory.

Bit bucketing at the Fibre Channel level is used regardless of the
record size.

>> Another related question is what the process is for
>> recovery if the SCSI status is not successfully returned
>> to the initiator in a case where the command itself
>> completed successfully. In this case it is important that
>> the initiator issue a READ POSITION command before
>> issuing further commands, in order to synchronize with
>> the tape position.
>It is unclear what you mean by 'synchronize'.  What is
>being synchronized here?

There are two intermixed things here (my fault). The question is about
the loss of the STATUS due to a fault at the interconnect level. This
is the root of the suggestions for using Class 2 (ACKed packets) instead
of Class 3 (datagrams) for tapes. However, the discussion isn't over yet!

What is being synchronized is the tape drive's understanding of where
the media is positioned and the inititator's understanding of the same thing.

>> The conclusion to this is that tapes on FC-AL will use
>> either the "large buffer model" or the "keep writing
>> regardless model", as discussed above. Words to this
>> effect should be added to the PLDA profile.
>What about the 'keep READING regardless model'?  Error scenarios
>for write recovery are substantially different that read 

I welcome any suggestion, particularly from knowledgeable tape folks.

Doug Hagerman
Digital Equipment

More information about the T10 mailing list