Object oriented issues

hafner at almaden.ibm.com hafner at almaden.ibm.com
Thu Jul 20 16:37:35 PDT 2000

* From the T10 Reflector (t10 at t10.org), posted by:
* hafner at almaden.ibm.com


>First of all, thanks for your contributions to the OBSD effort.
You're welcome!

As for your other points.  I think the fundamental difference is
that I don't look at an OSD as a storage device (i.e., glorified
disk) but as a simplified filesystem (NAS).  I think that difference
of perspective is the "root" of our disagreement.

>I really have two reasons for preferring to start out with SCSI.
>First, it is the path of least resistence to consideration.
Least resistence from the point of view of customer migration or
standardization effort.  If the latter, then IETF would be much
faster than T10 (particularly if you need to change SAM).  If the
former, we're back to our fundamental difference.  A glorified
disk will require much more extensive OS changes (you still need
some filesystem layer above the OSD abstraction, but then you
need to also figure out how to get that rammed into SCSI drivers;
for NT that probably means an OSD device class, changes to the
IRP, SCSI SRB and other data structures, changes to SCSIPort,
changes to the Miniports and (if you need bi-directional, add
HBA hardware as well). A simplified filesystem needs only an
IFS driver to deal with the layer between the OS-filesystem layer
and the OSD (so not too much work).  Think AFS or DFS client

In other words, as a SCSI device, there is extensive changes
required at all levels.  As a network based "filesystem", you
only need the client code for TCP protocol.

>Second, it should make a really tough migration issue at least a
>little simpler.
Again, are we migrating from a disk to an OSD or NFS/AFS/DFS/CIFS
to OSD! Based on the list above, the latter sounds easier.

Aside: I've heard from one company that plans to build an OSD
and they want to do it with iSCSI (so they'll have TCP there

>From the start OBSD work was predicated on the premise that it would be
>transport neutral...
But some transports are better suited to the functional requirements of
an OSD than others....

>Suppose a new transport is called for. Well, just as the traditional block
>commands and everything else would have to be ported to a new transport,
>could OBSD.  I don't think that the work to put it on SCSI would have been
I'm not advocating a new transport.  In fact, I'm advocating an existing
transport (TCP).  I'm advocating a "protocol" to use over that transport.
In fact a lean an mean protocol that has just the stuff in it that
you need (and not the other baggage).  The rest of this paragraph
falls back on our "root" difference, namely, I don't see any reason
to deal with "block" stuff at all, as this OSD isn't a disk!

>I also feel strongly that for ease of conversion reasons the OBSD storage
>has to operate on the same transport as its contemporary LBA cousins.
Again, our "root" difference rears its head!

>Until something changes, that means SCSI.  If I follow your point, we
>should look to use TCP/IP for storage when implementing OBSD.  The problem
>is that no one will make a switch from LBA's to objects instantly and
>completely.  It will take time.  If the two cannot coexist, then the user
>will have to have maintain two separate infrastructures depending on which
>protocol is used.
But we already have two separate infrastructures, one for disks (SCSI) and
one for network mounted filesystems (NFS/CIFS/AFS/DFS over TCP).

[... deleted...]

>I think there could well be products that could operate
>in either mode (I know you don't agree) to accommodate users who want to
>stage a conversion to OBSD.
It's not that I don't agree with having products that can operate in
both modes.  That's far from what I support/don't support.  What I
objected to was SCSI commands in the OSD specification that did this.
I don't have any problem with a device that makes this conversion
in vendor-specific ways or a SCSI controller device that can create
an OSD logical unit out of some physical disks that it owns (via
SCSI controller commands in the standard).  It's more a question
of where this function of switching belongs in the standards.  The
only place in todays SCSI standards where a logical unit gets created
or destroyed (or gets physical storage to coordinate as a single
logical unit) is in the SCSI controller commands.  Additionally, there
is no existing SCSI command in any "device type" command set that
*changes* the SCSI device type of a logical unit.  SCSI logical units
don't change themselves; they get changed "from the outside". This is
really a minor point, but I wanted you to understand that my objection
was only the location of the function in the OSD command set, not
the function itself.

[... deleted...]

>You mentioned the maturity of TCP/IP.  True enough, though when I look at
>what is going on with SCSI over IP the impression I get is that (1) there
>is a lot of work to do before it will handle block access and (2) they are
>using SCSI as the appropriate payload. Isn't it because it embodies the
>solution of so many storage isseus that the proponents do not want to
>waste/take the time to redo that work?
The hard work going into mapping SCSI over IP is essentially because
the inherent requirements of SAM don't map so easily on the nifty-ness
of TCP/IP *because* SAM was designed for the parallel bus, with all
its inherent limitations.

>A long winded way of saying that I hope we can consider SCSI until/unless
>we come to a point where it is obvious that it will not work.  Independent
>of that raise the question of another transport being more appropriate.
My opinion right now is that you can get an OSD defined in SCSI.  What
you won't get is a lot of higher level function you might want (e.g., what
I called compound commands that require bi-directional data), at least
not in less than a couple of years.   So, you can do it, more or less
as I describe, by breaking down commands into smaller steps.  You loose
in a few things (and security may be the death knoll for this), but you
can get the object abstraction in place.

Security is a big issue in this space. But as I mentioned to someone else,
you might be able to get a good chunk of the security stuff you need
for OSDs on top of TCP/IP (more or less for free) by having the host
and OSD request security services of the underlying transport (TSL,
IPsec,etc.) and not have to put much into security in SCSI at all.
SCSI doesn't have any notion of security at all.  Other SCSI transports
like parellel bus and FC have little or no security.  TCP at least
has infrastructure for some of it. E.g., certificate authorities on
the net, TSL, IPsec, etc..

Jim Hafner

.... lots of more stuff attached.....

hafner at almaden.ibm.com on 07/20/2000 04:13:17 PM

To:   Dave_B_Anderson at Seagate.COM
cc:   rsnively at Brocade.COM

Subject:  RE: Object oriented issues

Thanks for copying me on this note.  But I did have a comment.

You say that "so much is invested in SCSI".  That's true, but in the area
of block data on a private bus.   OSDs will get their real value in the
networked world (particularly after you get the security stuff in it).  And
the current (and perhaps very long term) networked world is TCP/IP.  We
already have very decent NAS boxes that shove data around (NFS/CIFS over
TCP).  With the expected hardware accelerators for TCP, these can only get
better.  TCP is also "mature", "stable" and has provided the fexibility to
provide numerous "solutions" over time (the quoted words are from your
[Yeah, I'm a real hard sell on this one!]

I guess I'm still looking at OSD as "down from NAS" and not "up from disk".
Can you "sing the praises" of SCSI and what it really buys you in the
context of OSDs (particularly with all the extra baggage that SCSI brings
with it)?

I'm willing to be convinced (and spend some effort helping out the OSD
standard), so go ahead, preach to the heretic in the back row!

Jim Hafner

Dave_B_Anderson at seagate.com on 07-20-2000 01:50:24 PM

To:   rsnively at Brocade.COM
cc:   Jim Hafner/Almaden/IBM at IBMUS, t10 at t10.org
Subject:  RE: Object oriented issues

Hi Bob,

Thanks for your comments.  This issue of data transfers in both directions
is a tough problem for SCSI, but at the same time, I think required for
efficiently implementing OBSD.  Perhaps Jim is right when, in another memo,
he wrote that some other protocol, not SCSI, probably would be more
appropriate.  Also, Jim's suggestions for alternative solutions to the
problem are worthwhile, though I actually think that more instances will
arise for employing bi-directional transfers.  If that is the case, then it
is better we come to this realization earlier than later.  Still, I would
really prefer to find a way to make it work in SCSI, if at all possible.
We can always port the definition to another protocol, and certainly there
are some candidates in the wings eager for a chunk of the storage
interconnect business - TCP/IP and InfiniBand, for example.  But there is
so much invested by the industry in SCSI, and it has proven itself over and
over to have the flexibility and extensibility to meet our needs for so
long, that it would seem really unfortunate if we cannot find a SCSI
solution to this requirement, as well.  I shudder to think about having to
bring another protocol up to the level of SCSI, in terms of maturity,
stability and solutions - i.e. all the problems that have been addressed
and resolved in it.

This is probably heresy to you, and most of the rest of T10 probably does
not want to hear anything that smells like changing SAM, but we have
actually developed and are shipping SPI drives that implement a command
that sends data in both directions.  (The host in this case is a large
system that has low level control over command processing and is able to
accept the second, in-bound data stream.)  It has worked very well for this
customer.  As this was done as a proprietary command for a single customer,
we have not pursued approaching T10 about accommodating the command.
Nevertheless, it was not that hard to do and, as I said, continues to work
very well.

Seagate would certainly be willing to propose the needed changes.

Appreciate your thoughts,

Robert Snively <rsnively at Brocade.COM> on 07/18/2000 10:40:23 AM

To:   "'hafner at almaden.ibm.com'" <hafner at almaden.ibm.com>, t10 at t10.org
cc:   Dave_Anderson at notes.Seagate.COM

Subject:  RE: Object oriented issues

Making data transfer flow both ways requires the management of TWO data
pointers, including the capability of explicitly modifying both of them.
While serial SCSI usually contains the pointer embedded in the data or
in the data request, parallel SCSI does not have that capability and must
create a new labeling process for the pointers.

Normally, the data pointers are actually DMA engines, implying that two
simultaneous DMA engines would be required for each command, one inbound
and one outbound.  That doubles the DMA state that has to be maintained for
both serial and parallel SCSI host adapters, assuming that there are a pair
of re-usable DMA engines on each host adapter.

And, as one of you folks has already pointed out, the error processing
opens a whole new can of worms.

The savings of overhead with a simultaneous CREATE AND WRITE is probably
not significant in serial SCSI.  The overheads on the link are low compared
with the processing overheads required to perform and commit the CREATE
function on the logical unit.  Note that the CREATE should probably be
pretty much an uninterruptible operation.  Locking the WRITE to it forces
the loss of a revolution on a disk device that doesn't cache.  Depending on
how you implement this, it could create an extended period of
or busyness for a device.  On devices that do cache, data integrity is
threatened because you must not only record the data, but record the
of the object and the descriptors of the recorded data before you can be
assured of
data integrity.  And of course, at the individual disk level, the whole
oriented approach is somewhat suspect unless file system level mirroring
is provided.  RAIDs should be okay, since they implement recording
and redundant non-volatile caching.

If the overhead you are actually worrying about is related to the
and creation of each object oriented command, then you have a far more
problem anyway.

Doable, yes.  Wise, no.  Unless we can contain the data in a fixed maximum
sub-field of the command (say 32 bytes) so that it is never transmitted in
"data phase", let's instead look at Jim's solutions below.


>  -----Original Message-----
>  From: hafner at almaden.ibm.com [mailto:hafner at almaden.ibm.com]
>  Sent: Monday, July 17, 2000 4:14 PM
>  To: t10 at t10.org
>  Cc: Dave_Anderson at notes.seagate.com
>  Subject:
>  * From the T10 Reflector (t10 at t10.org), posted by:
>  * hafner at almaden.ibm.com
>  *
>  Folks,
>  There was an interesting discussion at the last T10 meeting on OSDs
>  (osd-r01).  I presented some suggestions in 00-262r0, and a reply
>  (00-295r0) was supplied by Dave Anderson (Seagate).  In many
>  cases, we
>  agreed on many things.   A few things I raised questions
>  about because I
>  wasn't clear on the issues and requirements.  A few things
>  we disagree on
>  mostly in terms of implementation. But....
>  One issue that I want to open for discussion here is the
>  CREATE.  I didn't
>  see how a CREATE could include write data (DataOut) and
>  still get back an
>  ObjectID in DataIn, so I suggested separating the two operations.
>  Dave remarked that "this seemed wasteful in an environment
>  where there are
>  a lot of small file creates" (from 00-295r0, last page, item 6).
>  My response is:
>  1) if or until there is bidirectional data on a single SCSI
>  command, I
>  don't see an immediate and good alternative but...
>  2) one could CREATE+WRITE with no returned ObjectID, and
>  then follow that
>  with a second command to request a report on the created object's ID
>  (that's still two commands though there is better atomicity of
>  3) one could CREATE+WRITEw/suggested ObjectID and then the
>  OSD can return
>  Status GOOD if the ObjectID was acceptable (to the OSD) and
>  some other
>  status (CHECK CONDITION and include in the additional sense
>  data the actual
>  ObjectID that was assigned by the OSD). This is a hack (in
>  my opinion) but
>  workable, though it does distribute "namespace" responsibilities in a
>  different way.
>  4) The filesystem that expects to open lots of small files
>  could issue a
>  number of CREATE commands and cache the ObjectIDs for when a
>  real file
>  needs to be created.  This modifies the filesystem behavior,
>  but is not
>  unreasonable.
>  5) We can mitigate the latency and overhead of many CREATES
>  (as in (4)) by
>  having a CREATE MULTIPLE (create 'n' objects) which would
>  return a list of
>  ObjectIDs.  Interesting error scenarios arise in this case, however.
>  Anybody else got thoughts on this?
>  Anybody want to bite off changing SAM to allow for
>  bi-directional data?
>  Jim Hafner
>  *
>  * For T10 Reflector information, send a message with
>  * 'info t10' (no quotes) in the message body to majordomo at t10.org

* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at t10.org

More information about the T10 mailing list