Re comments from Bill Martin

rdv at ISI.EDU rdv at ISI.EDU
Tue Jul 9 15:16:26 PDT 1996

* From the SCSI Reflector, posted by:
* rdv at ISI.EDU
(combining comments from a couple of pieces of email. Let me know if
I'm off in the weeds or covering too-well-trodden ground. Apologies
for the length.)

Charles said:

>I believe the "size of tape buffer" parameter represents the largest write
>command that is guaranteed to have the properties described in Doug's note. ie.
>As long as the amount of data to be written is equal to or less than this
>number, a write command failure due to a transport fault will not result in a
>partially written record.

This I believe to be true. Buffering space should be a non-problem; in
order to get a buffer that's _fast_ enough for a high-performance tape
drive, you're going to have to build it wide and therefore of
significant capacity.

Doug said:

>> >  I'm assuming that the "large buffer" model is in effect. (That is,
>>I can't quite tell if you're suggesting this to simplify the
>>discussion, or if you believe it's a practical approach. I'm assuming
>>the latter.
>This was what I proposed at the SSC meeting in May, and the tape
>representatives present at that time were comfortable with it.

Hey, if the tape drive manufacturers are comfortable with it, who am I
to object? That said, I'll open my mouth again :-).

>>Barring an infinite-speed interconnect (where "infinite" is a relative
>>term :-), it's unrealistic to expect the drive to wait until all
>>interconnect activity is complete before beginning to write the tape.
>>It just throws away too much performance. Minimally, if the drive is X
>>MB/sec and the interconnect is Y MB/sec., you're limiting yourself,
>>optimistically, to Y/(X+Y) of the drive's performance. Give than
>>current high-end drive/interconnect systems are working with Y/X
>>ratios of 1.3 (Ampex DST w/ FW SCSI) to 8 (Sony DTF w/ HiPPI), you're
>>talking about reducing performance to, respectively, 56% or 89% of
>>max. The latter may be acceptable, the former certainly is not.
>I agree with your analysis, but what we're talking about is full speed
>Fibre Channel at 100 MBps. This gets you well up into the 90s.

Okay, how about the Sony ID-1 @ 32 MB/sec. or the Datatape at 50
MB/sec.? Back down in the 65-75% range. HiPPI-6400 will qualify as
"infinite speed" for tape drives for the foreseeable future, though.
Also, if you're doing write-behind caching, you may have to pay this
penalty only once.

>The current proposal is that tape don't support the SCSI queueing model
>(although they could according to the standard). What they do is
>accept commands and data into an input buffer and return a success status
>when all the data is received. If an error occurs in trying to write
>to the media, it's a deferred error. We haven't actually talked about
>that yet, and yes I suspect it's a nightmare.

Ah, here's part of my misunderstanding.

Let's see if I can formulate this as a series of questions that might
help set a good framework for understanding the problem:


* Is it true that an error from the tape drive means the cartridge is
essentially unusable? What about VOLUME OVERFLOW (rare due to extra
room past EOT, but on write retries with lots of write caching and
long records, could happen)? What about RECOVERED ERRORs that the
system might want info about (e.g. to decide to make a second copy, if
error rate was high)?

* What do current systems do about tape errors? tar obviously handles
it poorly; VMS BACKUP, better. What about HSM packages and automated
backups? How do they deal with the perhaps gigabytes of good data on
the tape?

* Especially, how do current systems deal with deferred errors?  Tape
drives support nothing like SYNCHRONIZE CACHE (at least not in
SCSI-2); how do systems know the data is safe?

* What is the true performance impact of the "large buffer" model with
write caching and deferred errors (LBWCDE)?  Perhaps it's so small as
to be a non-problem.

* What errors does the LBWCDE model protect us against?  Transport
problems only?  I think this takes us back where the discussion
started, in FC classes, which I'm not too conversant with, but perhaps
picking a different class loosens the restrictions on the model? I
don't see that LBWCDE gains us much; in any environment in which the
host expects to execute any error recovery, what matters is not
receipt of the data, but its commit to stable storage.


Deferred errors and command queueing:

I think supporting write-behind caching and deferred errors is
equivalent in many ways to full command queueing, but is weaker in
several respects.

Both require that the device manage several buffers and be able to
report errors in execution of any of a series of commands. This means
the host must also retain some minimal information about any command
that might yet produce an error.

Arguably, write-behind caching with deferred errors is a restricted
form of full queueing.

Advantages of full command queueing:

* Command completion reports true status, allowing host to release
data buffers.

* More queueing options are allowed (although presumably strict
ordering will be most common for tape drives).

* Cleaner model (exactly one response per command).

For deferred errors, the host I/O algorithm looks something like
(algo. A):

loop {
	send write
	get status
} until done
wait for errors
process errors
release buffers

high performance, but with no real way to release buffers safely; it's
hard to know when things are really done. In the real world I suspect
it's done more like this (algo. A'):

loop {
	send write
	get status
	if error then die
	release buffer
} until done

which isn't all that great.

Now, if you add a sync command (algo. B):

loop {
	loop {
		send write
		get status
	} until no more buffers
	process errors
	release buffers
} until done

this puts in lots of syncs, each with associated performance hit, but
is safe.

Now, do it with command queueing (algo. C):

loop {
	if (buffer available and not all sent)
		send write (with strict queueing, if needed)
	if (response ready)
		get status
		process errors
		release buffer
} until done

This releases buffers at close to the optimal time and is no more

All three require a process error routine something like:

for each unfinished I/O {
	decide if it completed (else decide if should abort)
	decide to report error or retry

and in fact, the list of uncompleted I/Os is longer and potentially
more ambiguous in cases A and B.

In summary, I think:

(1) command queueing is cleaner AND more efficient than deferred
errors, and only slightly more complicated.

(2) the large buffer, delayed write model offers only minimal
improvement in operational integrity, while potentially making it more
difficult to achieve high performance.

That said, when _I_ was doing the device side, I was reluctant to
implement _either_, due to the complexity :-).

I hope this is useful in some fashion, and thanks for reading this


Rod Van Meter USC/ISI rdv at,rdv at +1(310)822-1511x417
Douglas Adams        "In the beginning the Universe was created. This
has made a lot of people very angry and been widely regarded as a bad move."

More information about the T10 mailing list