Third-party SCSI data transfers
Lansing J Sloan
ljsloan at anduin.ocf.llnl.gov
Fri Sep 9 10:14:34 PDT 1994
The Lawrence Livermore National Laboratory (LLNL) is working on
high-performance, cost-effective storage systems. Goals we
consider important include
o high performance (including large transfers),
o scalable (to many processors, many devices, large
interconnection fabrics, and high speeds)
o integrity and security,
o modularizing and hiding implementation details,
o conformance to accepted industry standards.
Our main interest is transfers between a processor and a storage
system. Most of our goals have been achieved by at least one
vendor product, using extensions of IPI. SCSI's market-driven
successes and price/performance advantages lead us to want to
achieve our goals with SCSI also. Similar goals for data
transfers directly between devices are also important, though
The issues we raise arise from experience in the National Storage
Laboratory and with the High-Performance Storage System.
Storage systems that are front-ended by processors (e.g., Network
File System servers) would be suitable if they had sufficiently
high performance and low cost and otherwise meet requirements.
(An important feature is that device management issues are local
to the storage system.) Their performance suffers from having
the data go through processors rather than directly from source
to destination. Their cost increases from requiring high-
performance data-handling processors. The IEEE 1244 Mass Storage
System Reference Model (MSSRM) allows for direct data paths while
retaining the good features of front-ended systems.
The significance of attachment to a large fabric is that there
may be many systems on the fabric that should not be allowed to
control devices (and many that will not want to be burdened with
the software to do so), but that should be able to exchange data
directly with devices. Possibly some systems on a fabric should
have no access at all to devices. The MSSRM allows solutions.
We believe that SCSI standards do not yet provide adequate
mechanisms for direct data paths. There are probably many
possible solutions. We hope satisfactory solutions can be
included in SCSI standards, to facilitate interoperability among
SCSI devices and device drivers that support such capabilities.
(We do not expect that all SCSI interfaces will support the third-
party transfers, but do want standards to guide implementations,
to aid interoperability among those who chose to implement them.
It is possible that interoperability can be achieved while
leaving some details to implementors.)
I thank Bob Snively of Sun Microsystems and Peter Walford of the
Fibre Channel Systems Initiative for their influential thoughts.
They should not be blamed for the mutations, of course.
2. Transfers directly between processors and devices
Three general approaches are summarized here. For all these
approaches, some control information is exchanged, probably on
other paths and not necessarily using SCSI, before anything moves
on the data path.
Figure 1 helps to illustrate all three approaches. It shows a
client processor and a storage system with a processor and a
device. It shows several paths. The control path at the top
carries control information, such as the original client request,
status of the request, and information on location of resources.
SCSI is not necessarily used on this path, which is not discussed
further. We assume the control path on the right side of Figure
1 is trusted and (logically) strictly inside the storage system.
We believe SCSI should have standard rules for the control path
on the right and for the data path. (Incidentally, the trusted
control path need not be physically distinct from other paths.
The requirement is that the storage system processor and device
recognize which traffic is between themselves, with forgery
prevented, to a site-specific level of assurance. We would not
expect a standard to mandate physically distinct paths.)
Figure 1: Major modules and paths for data
transfers directly between a client and device.
| | control | |
| | path | storage |
| client |<---------------->| system |
| processor | (not discussed) | processor |
| | | |
| | +------+------+
| | ^
| | trusted |
| | control |
| | path |
| | v
| | +------+------+
| | data path | |
| |<---------------->| device |
| | | |
Very briefly, the three approaches follow.
o "Data-exchange path": Once the client processor is ready,
the storage system processor sends commands on the storage-
system control path to tell the device to initiate or
expect data transfer on the data path. The data path's
capabilities are limited to data exchange (which includes
some control information). Commands received on a data-
exchange path are rejected. We prefer this approach. The
major issue is how to do this with SCSI.
o "READ/WRITE": The storage system processor uses the trusted
control path to alert the device to accept specific
commands. The storage system processor then uses the top
control path to tell the client processor to transfer data,
and how. The client processor uses the data path to send
SCSI READs or WRITEs directly to the device. The device
rejects commands on the data path that don't match what it
has been alerted to expect. Third-party reservations are
used to alert the device.
o "COPY": The storage system processor uses the top control
path to alert the client processor to expect data transfer,
and how. Then the storage system processor uses the
trusted control path to send SCSI COPY commands to the
device. The device uses the data path to send READ or
WRITE commands to the client processor, in order to
transfer data. The client processor checks that it can
associate the READ and WRITE commands with one of its
requests. The device is not alerted to accept commands on
the data path.
Table 1 summarizes some problems and how well we think the
various approaches handle them. In Table 1, "I" denotes a major
issue, "i" denotes a minor issue, "D" denotes a major
disadvantage, "d" denotes a minor disadvantage, and "-" denotes
that an issue has been avoided. "Modified RESERVE" rules are
discussed following the table.
Table 1: Potential problems with various approaches
data- READ/WRITE COPY
exchange w/modified w/modified
problem: path RESERVE RESERVE
Identify I/O request I - I
Hazards after power-up, resets, etc. i i i
Compatible with normal SCSI intrfc I i i
Need Extent reservations - d -
Overlap RESERVE cmds with xfers - D -
Use RESERVE for other reasons also - I I
Determine completion i D -
Error recovery I i i
Incompatible with SCSI standards I I I
Multiple ports/domains required i i i
Compatible with inter-device xfer I I I
Require chained reservations - - -
Prevent reserve fragmentation - - -
Identify Ports Early d d d
Some current SCSI standard rules are incompatible with allowing
data, but not control, on some paths. For instance, with current
rules any Initiator can send TARGET RESET at any time over what
should be a data path, clearing all reservations and enabling
commands that may violate storage system policies, thus
preventing the storage system from ensuring its own integrity.
Data-exchange path approaches prevent this problem because
commands on data paths are rejected. The other approaches solve
the problem with "modified RESERVE" rules such as the following:
o Targets accept commands from trusted Initiators.
o Targets reject commands from other Initiators unless
alerted by third party reservations (made by trusted
We prefer protocols to work both for transfers directly between
processor and device and for transfers directly between two
devices. We think either the data-exchange path approach or a
combination of the "READ/WRITE" and "COPY" approaches could
transfer data directly between two devices.
We are looking for people and vendors interested in solutions for
third-party SCSI data transfers, particularly solutions that
could be standard. Please get in touch if you are interested.
We are preparing a more extensive discussion of issues and
For further information, contact
Lansing Sloan ljsloan at llnl.gov
Phone (510) 422-4356
Kim Minuzzo minuzzo1 at llnl.gov
Project Leader for Scalable I/O Facility
Phone (510) 422-2141
Information is also available on the World Wide Web.
Lawrence Livermore National Laboratory home page:
You can reach the following either directly or by starting from
the LLNL home page. (If you start from the home page, try
"Disciplines", "Computing and Networking", and then look at both
"Livermore Computing" and "National Storage Laboratory".)
National Storage Laboratory:
High-Performance Storage System:
Scalable I/O Facility:
More information about the T10