unique worldwide tape names

Mike Wenzel mw at core.rose.hp.com
Mon Aug 19 23:13:38 PDT 1996


* From the SCSI Reflector, posted by:
* Mike Wenzel <mw at core.rose.hp.com>
*
Hi Bob,

I also have been following the discussion on unique tape names, but haven't
had a chance to respond until now.  Here's my home-spun "take" on some of
the issues.

At 10:52 AM 8/19/96 -0700, Bob Baird wrote:
>* From the SCSI Reflector, posted by:
>* Bob Baird <bbaird at hpmfas3.cup.hp.com>
>*
>To SSSWG followers,
>
>Since I seem to have created some controversy, I guess I should clarify my
>position. This will probably create more controversy --- good.
>
>I agree that addressing and naming are two different aspects of an object.
>However, they are not as different in practice as they seem. Also, most
>names are only meaningful within a context. The following summarizes my
>position:
>
>1. I think we should consolidate device ADDRESSING around IP or a compatible
>extension to IP. I believe that network attached devices are the way to go.

It seems like two overall paradigms possible for network-attached mass storage,
"dumb" vs. "smart" devices.  The addressing and "locational" naming method used
depends on the paradigm you assume.  

A "dumb" device would be kept as cheap as possible by having minimal-weight
protocols, access control features, etc. For FC, for example, it might do as
much as possible at the link level by handling only those transaction-ordering,
broken connection, etc., cases that can occur within a single fabric and show up
in the link protocol itself.  In order to reach a dumb device from anywhere in
the world, there would need to be a server of some sort on the same FC fabric.
The remote client would then use internet protocols (e.g., TCP-IP) to talk to
the server box over the internet, and the server box would use only local,
link-level protocols to talk to the dumb device.  The dumb device would not
speak the IP protocol and may not be assigned an IP address.  The server would
provide the access control features wrt the client and could rely on physical
security (e.g., "glass house") between the server and the device.  

A "smart" device would be prepared to handle all the race, duplication, and loss
conditions that can occur on the internet, as well as the local fabric.  It
would also contain access control features that would ensure that it could not
be affected by unauthorized users.  It would probably need to use the IP
protocol to expedite routing of requests and replies to/from the remote client.

Now, to the point, IP-based addressing of devices would be appropriate for the
"smart" device model; it may or may not be appropriate for the "dumb" device
model.  The "dumb" device model is sort of beyond IP: you may need some
additional name component for the hop between the server and the device.
Assuming there isn't some sort of inter-galactic directory service, to reach the
"dumb" mass storage device, the user would probably specify the domain name
(e.g., "core.rose.hp.com") for the server, plus a user-friendly name for the
device that is understood by the server.  The server's domain name would be
translated into the appropriate IP address which is used by the client machine
to talk to the server.  The device name would be passed to the server and
translated there into the FC port address used by the server to reach the
device.  

Assuming "smart" devices speak IP, a smart device might as well have a domain
name.  This would be used by the client machine to find the IP address to talk
to the device directly.

By the way, in addition to Greg and Lester's comments on IP addresses, also
notice that IP addresses do NOT uniquely ID a multi-homed machine.  If a
device has multiple ports, it is likely to have multiple IP addresses:
at a MINIMUM, one for each network that the device is connected to.  So
you cannot tell from two different IP address values whether the same or
different nodes are being referenced.  This creates an ambiguity when
you need to ID the node.  Even if you pick one of the interfaces as the
"key" interface, for the purposes of IDing the node, what happens when the
key interface needs to be re-configured?  In short, from all that's been
said, IP addresses are in most ways too time- and location-dependent to
be used for many types of ID.

>2. I think that "device-files' should have names like normal files and be
>represented like normal files in a file-system directory. By "device-file",
>I mean a linear sequence of bytes on a unit of media having no file-system
>structure. Files on tapes and database files on a raw disk partitions are
>two examples of device-files. An application should be able to open
>device-file through the (Unix or NT) file directory and access the file with
>normal file semantics. In this case, user's would know the name of a
>device-file just like they know the name of any other file. Do we wish to
>give universally unique names to normal files too? If so, device-files and
>normally files should be treated equally. If there is a need for UUIDs for
>device-files, the same should apply to  normal files.

You seem to be mixing apples and oranges here!  Some of the other replies
touched on this already.  There are multiple entities involved here and each
entity may need multiple names or aliases for different uses:

 1. For now, I'll skip over several levels of entities related to the device:
    the port used to access the (multi-port) device, the controller involved,
    levels of enclosures, etc.  These are significant and each needs to be
    named reliably for certain, low-level operations, like high-availability
    path-selection or fault isolation.

 2. The key item that I've been getting at in my "Device ID" memos is
    what I've been calling the "storage entity" (other folks have suggested
    better terms from the SCSI glossary, but I won't switch right now to
    keep consistency in these subject chains).  To me, the "storage entity"
    is the "chunk" of physical or virtual storage being manipulated, for
    fixed media, or it is the load and access point for removable media.
    Storage entities need at least two types of names: one to identify it
    uniquely and unambiguously, and the other to locate it currently.

    a. Device ID.  The SCSI-3 "Device ID" is a world-wide unique name that
       is known by the storage entity itself and may be retrieved by a SCSI
       query over any path used to reach the entity.  The SAME value will
       be returned, regardless of path.  This is EXTREMELY important to 
       host computers in order to separate alternative paths to the SAME
       device from DIFFERENT devices.  (Picture getting this wrong: you put
       your accounting data on one disk and format the other.  If they end
       up being the same disk, whoops!)  "World Wide" is almost synonymous
       with "unfriendly".  Most of the algorithms for ensuring uniqueness
       are either registered bit-field values or long, unwieldy strings.
       When an IEEE format is used for the Device ID, the Device ID 
       consists of four bits to reference IEEE as the registration authority,
       24 bits to ID the manufacturer via IEEE, and 36 bits used by the
       manufacturer to ID the storage entity.  You wouldn't want to type
       this every time you open the device.

    b. Locational Name.  For Unix, this is typically the physical device 
       file name.  The name often references a path to a specific port,
       used to reach the storage entity.  Even without aliases, there
       are often multiple names that can be used to reach the same
       storage entity--especially among nodes in a cluster (or even more
       widely scattered).  You cannot compare these names for equality
       to be sure you're talking about the same storage entity.  Locational
       names (including aliases) should be user-friendly.  They ARE what you
       type when you open the device.  I think this is the type of name
       you're mainly talking about, Bob.

 3. For removable media, there is a need to identify the specific item
    currently loaded or to be loaded.  Sorry that I don't have the official
    terminology handy, let's call it the "media name" for now.  I have NOT
    thought as much about identifying media, since it is above the level
    I'm working at right now.  But it seems like the media would also
    need two types of name, similar to the storage entity: one to uniquely
    and unambiguously label the media instance, and another to locate it
    currently.

    a. "Media ID."  The Media ID should probably be a world-wide unique
       name that is assigned to the media instance during manufacture and
       readable from it at any time.  I don't think a 64-bit ID is big
       enough to structure the registration and still enumerate all the
       instances.  Maybe 128 bits would be safer.  This ID could be used
       to disambiguate locational names and aliases, and it could also be
       used for reliable media management (e.g., track the age of a tape 
       that has had a series of things recorded on it over time.

    b. Locational Name.  This is where I'm really getting outside my area.
       But it seems like a given tape cartridge could migrate from one
       storage site to another (e.g., as the information it contains gets
       older, or as sub-organizations move).  So a given locational name
       could become obsolete by having a component that references a prior
       directory server.  I would guess that each name would be as
       persistent as possible, with forward-pointers being created to the
       next repository when the object is relocated.

>3. The first part of the IP address (depending on whether it is type 1, 2,
>or 3) is already a customer id of sorts. There is already a committee to
>assign those numbers. Why not use them for UUIDs and remove the need for
>another committee? Do we believe that tapes physically travel between
>customers so much that we need another standard for it?

I doubt very much that we would be doing the customers a service by
"allowing" them to maintain uniqueness of their world-wide names!  For
media IDs especially, it would be MUCH easier to number each one as it
rolls off the assembly line.  The locational names are user-friendly and
probably client-relative anyway.  A manufacturer could use one of his own
IP network numbers in place of the IEEE OUI.  But I don't see any advantage
one way or the other with using this over the IEEE's registration authority.
It also has been working fine for years for IEEE-802 and Ethernet.

>4. Given my assumptions that raw "device-files" should be named in file
>directories, do we really need UNIVERSAL identifiers for removable media?
>Isn't  file name plus server sufficient for locating a tape, CDROM, disk
>partition, MO, etc. device-file? Once the device-file is located by
>file-name it can be addressed by IP.

What directory name do you use for the tape (media) when there is nothing
stored on the tape?  Do you give it a "temp" name?  What system does a
free tape below to?  As I pointed out above, there may be advantages (e.g.,
media management) to being able to unambiguously refer to a media instance,
regardless of where it's been or what's been stored on it.

>5. Multiple paths to the same device certainly exists. However,
>multi-pathing is totally limitted to a server (single computer or cluster).
>Having multiple paths to a device is like having multiple entrances to a
>house. Except for shared disk applications like OPS, only one path is active
>at the same time.

But what about shared-disk applications like OPS, they do exist?  I don't 
understand your assertion that "multi-pathing is totally limited to a 
server."  It's true that some arrays can only stand to have one port active
at a time for a given LUN, but there are others that can concurrently
access a LUN from multiple ports.  High-end arrays, like EMC, have up to
32 SCSI or FC ports that map flexibly to LUNs.

Anyway, it does not matter than only one path is active at a time during
run-time.  During the path discovery phase (e.g., boot-time probing),
alternative paths are all tried and identified.  This is the main place
that the world-wide unique Device ID is used.  (It could be used at other
times to ensure that the device has not been reconfigured between location
look-up and open access.)

>6. I definitely agree that media verification standards should exist withing
>a site. Yes, you must know which paths lead to the same destination.
>However, I am not convinced that volume labelling is the only way to
>validate paths or that media labels must be universally unique.

I don't think that path identification involves the media label at all: it
involves the name of the Storage Entity (load and access point).  You
MUST be able to identify the paths even when there is NO media loaded.

I'm not sure that a world-wide unique media ID is a "must".  But it does
seem like a convenience.

Best Regards,

Mike

 ************* |  Mike Wenzel,
 *****   ***** |  Hewlett-Packard - NCD System Interconnect Lab,
 *** /_  _ *** |  Mailstop 5601,
 ** / / /_/ ** |  8000 Foothills Blvd.,
 ***   /   *** |  Roseville, CA 95747-5601
 *****   ***** |  E-mail: mw at core.rose.hp.com
 ************* |  Telephone (916) 785-5609  FAX (916) 785-2875





More information about the T10 mailing list