File system over IP and/or TCP, third-party, etc. discussions (long)

James E. [Jed] Donnelley jed at llnl.gov
Wed Sep 21 09:49:54 PDT 1994


I am happy to see a somewhat higher level issue like this raised
in this forum.  Whenever anyone steps out of the current model
(mode, rut, ...) it is pretty sure to disturb some thinking and
perhaps some work...  Of course, there may well be good reasons
for the current model - but this particular issue, file access protocols
over reliable WAN capable protocols, is one that I also think deserves
careful consideration.  I would like to share some thoughts on it.

In Don's latest note (I believe I have seen all the others) he points out
possible Cons to layering TCP/IP (I won't distinguish for the moment)
under SCSI:

>1. TCP and IP may add complexity and possible poor performance
>2. The storage device may not have enough intelligence to do TCP/IP
>3. TCP and IP are intended for point-to-point, with no provision for third
>party transfers

I believe that 2 and 3 are not serious considerations.  For #2 it would
be interesting to get recent feedback from a company like Maximum
Strategy that I believe has experience in this area.  In my opinion
TCP and IP together do not add substantial complexity over what
one has to do for SCSI anyway, particularly if all the error cases
(timeouts etc.) are dealt with properly in the SCSI over DLL case (using
Don's terminology).

For #3, third party transfers are made up of point-to-point transfers
anyway, A to B, B to C, and C to A (bidirectional "to").  I see no problem
with using TCP/IP for the point-to-point communication.

#1, the overhead aspect, is probably the most difficult to address.
One part of this, the data part - i.e. the IP and TCP headers themselves -
I don't believe is a significant problem.  Looked at in terms of percentages
they may be noticeable, but when considered with respect to the
bandwidth available on FC or other newer network technologies, I don't
believe this is a serious issue.  I'm staying at a pretty loose, high level
here.  I think an important aspect of this phase of this discussion is to get
as many as possible of the highest level issues on the table to see where
people think the most serious problems are.

The other overhead aspect is processing time that is a cost in and of
itself for a workstation or other computer that is also used for
processing and may contribute to latency.  I believe that a great
deal can be done to optimize the TCP/IP portion of the processing
overhead (e.g. along the lines of what Van Jacobson suggested in some of
the optimizations that he described).  Essentially this amounts to expecting
the "normal" case and optimizing for it.  Despite such optimizations, some
processing will be left.

Both the CPU cycles lost and the latency costs are impacted by the
type of I/O being done.  For ordinary disk I/O I believe that the
disk latency itself is so high (10 milliseconds or so typically) that
I don't believe CPU cycles or latency lost to TCP/IP overhead would
be significant.  However, there are various "solid state disk" systems,
or memory-disk facilities where memory is used essentially like a
very fast, very small disk.  In these cases the latency and possibly
the CPU costs can be important because the latency of the SSD is
essentially zero.

I personally believe that disk emulation is a poor use of "memory."
It would seem to me better to make better use of it's low latency
by treating it as real memory.  However, this is an issue too complex
to deal with here.

When the SSD case is considered, I believe the TCP/IP processing/latency
overhead could well be a factor.

There are two other "Cons" that I would like to mention:

4.  Software structuring.  It may be difficult/awkward to access
TCP under SCSI under a device driver for a file system.  Of course
this is "only a matter of software."

5.  Using TCP/IP in some ways may constrain tuning options
like tuning timeouts for the typical local area case.  TCP/IP has to
deal with a very wide range of performance (exactly it's typical
application).  Trying to tune its timeouts for the case of local
network disks may conflict with WAN tuning.  I'm not saying that
this case can't be dealt with.  It just may require some sophistication.

I believe it is illustrative to consider NFS (as Don mentioned) in these
deliberations.  NFS was built on top of UDP and not TCP for some reasons
that were probably similar to those being considered here.  Since NFS
has it's own retransmission scheme, it can be tuned for its LAN environment.
Of course, this adds administration and code complexity.

I would also like to take a bit of a cut at Don's list of advantages:

1. TCP is available on most machines and for most media
2. Can intermix media through bridges, routers and gateways
3. Easy coexistence with other networked devices and systems
4. Takes advantage of TCP/IP protocol suite and work by others
5. Provides wide area network capability
6. TCP and IP are robust with good error controls and recovery
7. TCP and IP are complete, tested, enhanced, and proven in practice
8. NFS is an example of a system running on top of UDP/IP

I believe there is substantial overlap in this list, so I don't think the
fact that there are seven items is important.  To deal with them
one at a time:

#1 availability - a DLL will be available on any
machine and media (in fact it will be used by IP) - so this isn't
really an advantage of TCP.

#2. I believe this one of the strongest points favoring at least IP

#3.  How does this differ from #1 and #2?

#4.  This relates to the discussion of NFS's own retransmission vs.
using TCP retransmission - it cuts both ways in my opinion.

#5.  Is WAN capability for SCSI really a significant value?  I think this
depends to some extent on some management aspects of storage.
SCSI is dealing with the lowest level access to the device.  It isn't
a file system with file names, user access controls, etc.  That level
of interface is left to systems like NFS, Andrew, DFS, etc.  Isn't it
more appropriate to use higher level systems like these with their
richer semantics for WAN access?   Of course, one can imagine
accessing a disk directly from a workstation across a WAN.  I consider
this scenario fairly unlikely.  One would at least need some sort of
(e.g. IP address based) access control on the access to the disk.  Would
anyone trust such control?

Don mentioned in his original note that he was driven to this suggestion
by some of his experiences with HIPPI.  I would like to ask specifically
what practical situtations arose where it would have been useful
to have TCP and/or IP between HIPPI and its computers/storage systems.

#6  This is a strong point for TCP (not IP).  If you don't use TCP
retransmission, you need to do something else (e.g. as NFS does).

#7  Of course TCP and IP have been well proven, but simply not having
them in the stack simplifies things, cuts down complexity as well
as overhead.  Having them there certainly doesn't add to the reliability,
maintainability, etc. of the system (except perhaps by reducing the need
to redo the retransmission scheme - this was the point of #6)

#8  NFS is a good system to use for comparisons.  Remember, however, that
NFS is setting at a somewhat higher level where naming and access control
are included in the semantics.  It is not clear that the value that NFS
gains from being layered on top of IP is comparably of value for SCSI
which, as noted above, may well have a more local scope of usage just
because of its semantics.

Then Don asks:

>1. What is the environment you are aiming for?
>    a. Heterogeneous or homogenous media?
>    b. Is it essentially error free?
>    c. Large or small number of interconnected systems?
>2. How much network capability, error recovery, etc. will need to be added
>to SCSI to work without TCP/IP?
>3. How easy would it be to add a third-party-transfer capability to TCP/IP?
>4. Would a non-TCP/IP solution run significantly faster than TCP/IP?

My thoughts would be (least significant first):

3.  It would not be difficult to layer third party transfer on top of TCP/IP.

1.  Doing peripheral access over a nicely behaved homogeneous LAN network
is already enough of a bite.  Trying to include wider area issues under SCSI
is probably not wise at this point for a "production" track.  For a "research"
track it seems to me worthwhile to investigate.

2.  For the nice ... LAN, I believe that essentially no error recovery is needed
beyond what the disks and file systems already support.

4.  I believe the performance advantage of a non-TCP/IP implementation
would only be significant for SSD-like (i.e. memory disk) use.  For ordinary
disk access I believe the difference would not be noticeable.

Finally, Don comments:

>The bottom line is that you pay your money and take your choice.  One
>choice is to provide for both capabilities.

I believe that an approach along the lines that Lance originally suggested
is the best one to pursue for the "production" track (i.e. we need a product
within the next 4 months or we go out of business) but that it makes sense
to allow for support of IP and/or TCP in the underlying layer and to
develop this capability either along with a specific product that needs
the additional coverage under SCSI or as part of a research effort.

_________________________  Separate topic

If you have read this much, you may be interested in the
computer and communication Web pages that I am supporting:

http://www-atp.llnl.gov/atp/telecom.html

In there you will find links to pages about: companies, media, organizations,
programs, projects, standards, and usenet groups related to computers and
communications.

If you represent a company that isn't listed or is inaccurately listed, I
would like to hear from you.

If you know of more effective links to standards information, I would like
to hear from you.

If you know of related organizations with Web pages or other Internet
accessable information (e.g. I just recently heard about and added the
HIPPI Networking forum) I would like to hear from you.

Finally, I am currently working on putting together a page of links to
"communication" products.  Many of you may have products listed in the
CNRI gigabit database (including mostly Postscript scanned images of
product literature, but recently also some html).  Unfortunately, access to
this database is restricted at the moment to testbed participants for what
are in many cases historical reasons.  I would like to at least put together
links to as much information as is currently included in this database (if
your organization has access to this data, you will find it at:

http://www.cnri.reston.va.us:4000/gigabit/home.html

).  If you have product information that you would like to see listed,
please send me a URL (ftp URLs to Postscript files are OK).

James E. (Jed) Donnelley -  Staff Computer Scientist
Advanced Telecommunications Program
LAWRENCE LIVERMORE NATIONAL LABORATORY
On sabbatical until 4/95 to University of Stuttgart
Internet: jed at llnl.gov
http://www-atp.llnl.gov/atp/jed.html
Current phone: +49 711 685 4514
Current fax:      +49 711 678 7626






More information about the T10 mailing list