SBP-2: Drawbacks or Virtues?
PJohansson at aol.com
PJohansson at aol.com
Thu Feb 27 17:20:22 PST 1997
* From the SCSI Reflector (scsi at symbios.com), posted by:
* PJohansson at aol.com
As David Wooten observed in a recent message on this same subject, in short
order these reflector debates get hard to follow. So, I've done some pruning
The points below were raised by Frank Campbell this last week as serious
reservations about SBP-2. I think of some of them as virtues, others as
specious and some I just don't understand.
Frank's original remarks are bracketed with << and >>. See below....
<<SBP-2 requires that targets deal with scatter/gather and paging issues
within the host.>>
This is an aspect of 1394-1995, which has an underlying model of a memory
mapped bus, and it is part of 1394 that SBP-2 uses to great advantage. By
placing DMA responsibilities in the targets we have arrived at a better, more
scalable design than traditional I/O channels. In I/O channels, such as
parallel SCSI, all the intelligence to sort and route noncontiguous data to
scatter/gather buffers is in the host adapter. Translation: the bottleneck is
also in the host adapter as you add more peripherals to the system. By
distributing this DMA capability to the peripherals, SBP-2 lets the total DMA
capabilities of the system grow in tandem with the addition of peripherals.
<<SBP-2 requires that targets have the capability to break transfers up on
odd byte boundaries.>>
SBP-2 does not require this. This decision is left open on a device class by
device class basis. After vigorous discussions, disk vendors concluded that
a) the ability to handle byte-aligned 1394 transaction packets was not so
difficult as first believed and that b) it was in their own self-interest to
do so. Hence the mass storage profile for SBP-2 disks does not restrict data
transfers to 512 byte or even quadlet boundaries.
<<SBP-2 uses host addresses as handles with no means of identifying stale
handles, exposing the risk of stale handles causing corruption.>>
I think I don't understand what you mean. What is a "stale" handle and how
does it arise? I'll guess you mean a 64-bit address whose 6-bit physical ID
portion has been rendered invalid by a bus reset. Upon detection of a bus
reset, an SBP-2 device immediately aborts all I/O (other than isochronous
stream I/O, but thats a separate story...). When the initiator queues the
commands anew, the assumption is that the initiator has updated all relevant
data structures so that the 1394 addresses ("handles") are valid.
In fact, some thought went into the organization of the SBP-2 data
structures----ORB's, page tables and data descriptors----so that the
initiator has to update as few of these 6-bit physical ID's as possible.
Are there other conditions under which a handle goes "stale"?
<<SBP-2 uses more bus transactions than necessary to perform I/O. As a worst
case example, a single sector read from disk requires up to 14 bus
arbitration cycles and 14 packet transfers.
host writes to doorbell address to notify target
target reads the most recent ORB to get the address of the new ORB
target reads the new ORB
target reads page map
target writes data up to page boundary
target writes data from page boundary to end
target writes status>>
Lies, damn lies and statistics.
If you pick assumptions that you like, whether they make common sense or not,
you can contrive to make anything look bad.
What sensible vendor who creates an SBP-2 device is going to design for a
split transaction in response to ANY of the writes to the target's various
On the host side, what host adapter designer is going to design for split
transaction writes to physical memory-----when that strategy is known, before
hand, to produce poor performance?
What percentage of single-block READ's are going to cross page boundaries
(and hence require the second access to the page table)?
A more plausible sequence for a single-block read (if the disk fetch engine
has become idle) is:
a) Quadlet write to the DOORBELL (One arbitration)
b) 8-byte block read of the next_orb pointer from the last ORB executed
c) 32-byte block read of the new ORB to be executed (Two arbitrations)
d) 512-byte block write of the READ data to initiator memory (One
e) 16-byte block write of the status block to initiator memory (One
This is not an artificial worst case; this is a typical case: seven (7)
arbitrations. And the interesting thing about SBP-2 is that the busier you
keep the disk the MORE efficient it becomes----if there is always sufficient
work queued, step b) above is eliminated and you're down to five (5)
You're quite right, Frank, that single-block READ's are inefficient in
comparison to, say, 4 KB READ's. Based on some earlier calculations that
assumed a favorable, inside the enclosure topology with a wide fan-out for
PHY connections, I derived a sustainable throughput rate of 630 Mbps for 4KB
READ's with a Serial Bus cable speed of S800. In comparison, this drops
dramatically to 260 Mbps for 512 byte READ's. Is the moral that SBP-2 is a
poor design? No----it's that you ought to avoid small data transfers----which
is very compatible with the sorts of I/O requests that the File Systems of
most contemporary paged and cached operating systems do anyway!
<<SBP-2 hosts use DMA addresses supplied by targets, creating security
What DMA addresses are being supplied by the targets? Are you referring to
the address of the fetch agent CSR's, such as the ORB_POINTER? These are
subject to the same safeguards as the buffer addresses supplied by the host.
Upon the occurrence of a bus reset, they're all invalidated. The host must
reexamine the EUI-64 in the bus information block to rediscover the new
physical address of the target.
If I've misunderstood any of your objections, Frank, please explain them more
Congruent Software, Inc.
3998 Whittle Avenue
Oakland, CA 94602
(510) 531-2942 FAX
pjohansson at aol.com
* For SCSI Reflector information, send a message with
* 'info scsi' (no quotes) in the message body to majordomo at symbios.com
More information about the T10