Proposal for IEEE company_id based formats for FC-PH world-wideidentifiers

Mike Wenzel mw at core.rose.hp.com
Tue Jan 7 18:15:17 PST 1997


* From the SCSI Reflector (scsi at symbios.com), posted by:
* Mike Wenzel <mw at core.rose.hp.com>
*
Hi Larry and Rod,

Just digging out from time off during the holidays.  Sorry I haven't
responded sooner.

I agree that being able to unquely identify a chunk of storage regardless
of path is of paramont importance.  I'm very concerned that a storage 
device vendor is being encouraged to include 16-byte WWNs in the list on 
the Device ID page, that are based on the 8-byte WWNs of replaceable
components!  This will create ambiguities that I don't think host computers
will be able to resolve.  I think WWNs that correspond to chunks of storage
need to be based on something NON-replaceable.

In response to Larry's need to find controller-to-LUN associations, I think
this is just one example of a general need to identify the field-replaceable
units (FRU) anywhere in a given path to a storage chunk: IO cards,
port hardware, fabric components, etc.  For this, I think we need to use a
subset of the current path address information to find the FRU.  In general, 
I don't see any way to put all the associations of interest in the Device ID
page list!  Not only would this be messy, but it also would be bad layering,
impossible for most implementations, etc.

Please see below for more detail.  Sorry if the attached rambles a bit,
it seems like there are many implications to consider.

Best Regards,

Mike

 ************* |  Mike Wenzel,
 *****   ***** |  Hewlett-Packard - NCD System Interconnect Lab,
 *** /_  _ *** |  Mailstop 5601,
 ** / / /_/ ** |  8000 Foothills Blvd.,
 ***   /   *** |  Roseville, CA 95747-5601
 *****   ***** |  E-mail: mw at core.rose.hp.com
 ************* |  Telephone (916) 785-5609  FAX (916) 785-2875

At 01:18 PM 1/6/97 CST, DeKoning, Rod wrote:
>* From the SCSI Reflector (scsi at symbios.com), posted by:
>* "DeKoning, Rod" <rdekonin at ppdpost.ks.symbios.com>
>*
>
>Larry, Mike, Bob, et.al.,
>
>Below, Larry discusses a mechanism to associate a controller (node) name to 
>a Logical Unit.  This concerns me (and possibly others?) for the following 
>reasons.

This concern also immediately sprang to mind when I read the description
of the proposed association.  This is certainly a valid way to assign
WWNs based on the current documents, but I'm concerned that it would be 
abused in a multi-controller topology.  However, after looking more closely, 
I get the feeling that the write-up wasn't complete enough to do justice to 
the thought process.  Maybe Larry or Bob can provide more detail after
reading this.

>I went back to my personal notes for the July Serial Concerns Meeting to be 
>sure we had discussed this point adequately, and I believe that we covered 
>this topic in some detail.  At that meeting, we discussed the fact that we 
>wanted to purposely break the link between the Volume Logical Unit WWN and 
>the Node Name of the device reporting the LUN's WWN.  This is to ensure that 
>we provide the following capabilities:
>
>1. Allow for dual controller environments in which the LUNs may be addressed 
>from different controllers with different node IDs.
>2. Allow for hot swapping of devices used to access the LUNs.
>
>In the final analysis, (at least as I recall and recorded it) we are trying 
>to encourage OS drivers and applications to avoid making a static link 
>between a Volume Logical Unit and the device (controller) that is used to 
>access the Volume Logical Unit.  In general the OS drivers should be 
>concerned with the WWN of the LUNs, and not the access devices since the 
>data they are concerned with is associated with the LUNs and not the access 
>device(s).

There are a large number of cases that need to be covered, including:
 A) Host computers can have multiple IO card interfaces into a high-
    availability storage network
 B) A device controller can have multiple port interfaces into the
    storage network (therefore, multiple link addresses, possibly
    concurrently active)
 C) A chunk of storage can be reached by multiple controllers

The classical probing method alluded to in previous memos will result in
a host computer finding multiple paths to a given chunk of storage.  The 
number of paths is mainly the product of the numbers in A-thru-C above.  
So when probing is complete, the host has a list of paths (possibly a 
long one!) that can be used to reach a given chunk of storage.

Now we can make the following observations:
 1) Much of the information in a given path record is non-permanent:
    a) Link addresses can be dynamically negotiated for both the devices
       and hosts (especially for FC loop and FireWire).
    b) Hardware components can be replaced: port hardware, controllers,
       IO cards, etc.
 2) Array controllers may often be configurable in terms of the LUN ID
    value assigned to a given chunk of storage.  This assignment may not
    be the same LUN ID value for all controllers!  A multi-controller
    array may be configurable in terms of which controllers can and cannot
    be used to access a given chunk of storage.
 3) For high-availability configurations, there need to be multiple paths
    that are concurrently usable by the host which involve a minimum number
    of shared hardware components (single points of failure).  These 
    redundant paths allow the chunk of storage to be accessed regardless
    of which single component fails and without manual intervention at
    the device.
 4) It is OS-dependent as to what criteria are used to select one path 
    over another to reach a given chunk of storage at any specific time.
    This criteria could be based on queue lengths, response times, or on
    data obtained by protocol-dependent queries of the data from the
    transport layers, etc.
 5) As Rod pointed out, the Serial Solutions group agreed that it was 
    absolutely crucial that each chunk of storage be labelled with
    an unambiguous world-wide name (WWN) so that the host can tell
    when a path reaches the SAME chunk of storage versus a DIFFERENT
    one!  The the WWN on the chunk of storage is the key used to sort
    the multiple paths.  

Background:  The Device Identification Page in SPC provides for a list of
identifiers to be returned as a result of an Inquiry command (see 
Table 108).  This list of identifiers could be of varying encoding or
Identifier type, since there are tag fields for both.  The encoding is 
mainly binary versus ASCII, and the Identifier type is primarily related to
the identifier encoding and/or registration authority.  Bob's tutorial
(97-101r1) further recommends that for RAID devices that the IEEE
Registered-Extended format be used and says that the first 64 bits
could be equal to the WWN of the RAID controller.  

I think we need to refine our understanding as to what the Device ID list 
of identifiers may include and how the host should use the list.  

I propose that the WWNs in this list should be based only on values that
are invariant for a chunk of storage.  If the RAID controller, is a
field-replaceable unit, then I strongly suggest that the WWN of a LUN
NOT be based on the WWN of the controller, as the tutorial and Larry's
memo would suggest, but rather on something that won't change--the WWN
of the chassis?  I also propose that the list mainly be used to give one
identifier value for each different format needed to correspond with 
the various transport conventions used to reach the storage chunk (e.g., 
an EUI-64 format identifier if a FireWire interface is present, an 
IEEE Registered-Extended format if a FibreChannel interface is present, 
etc.).  Personally, I would prefer to see a single, unique WWN of ANY
format for each chunk of storage and don't feel we need a list, unless
some additional 'association' or 'named entity' field is added. 

I also think we need to refine our understanding as to what the entity is,
that the Device ID is labelling.  The ANSI group agreed that the Device
ID is NOT a media label, but rather more of physical device or mount
point (see the latest SPC).  

I propose that for an array, the actual storage devices are NOT the entities
being labelled, and if the storage devices are field-replaceable units,
then the IDs should NOT follow a storage device if it is moved to a new
array.  For RAID devices, the virtual volume IS the entity being labelled.
If a virtual volume is destroyed and the related storage devices and LUN
IDs are re-used to create a new virtual volume in the same or different
arrays, then a new unique set of WWNs need to be created for the new volume.
So there is no ambiguity, for example, when a device containing half of a
mirrored pair is moved to a new array.

This implies to me that the Device IDs are not useful DIRECTLY for giving
the LUN-to-controller associations--especially where the controller is
replaceable.  Instead, what about using other procedures to find the
association?

A. Inquiry Data from LUN 0 at the same port address as the path of interest
   to the LUN in question.  In other words, take the parallel SCSI target
   address, the FibreChannel port address, etc., from a path to the LUN
   and query LUN 0 at the same address.

B. A different (possibly new) Inquiry data page.

I think we need to do something like 'A' to find ANY of the FRUs in a given
path to a storage chunk, not just controllers but also port hardware,
IO cards, FibreChannel switch or hub components, etc.  In other words,
I think we need to take information from a subset of the path and use it
in a protocol-dependent way to find the identity, properties, etc., that
are of interest for some component in the path. 

Larry also pointed out the need to ensure that all of the commands in a
set go to the device via the same controller (e.g., for consistency, 
ordering, etc.).  I agree that this is needed, but I think it has more
to do with how the host uses path information.  For FC Loop for example,
I think the transport layer needs to give the SCSI layer a handle that
will not change when a FC Loop address is renegotiated and changes value.
This way, all SCSI commands using the same handle, will go to the same
port, regardless of address changes.  If the host wants to load-level
commands to a given controller among multiple paths, then it will need
some sort of data structures to remember what paths can be used for a
given controller, and other structures to map storage chunks onto
multiple controllers.  There are very good analogs to this in networking,
where packets can be multiplexed over multiple links between a pair of
routers without the applications being aware or involved.

I think we need to discuss exactly how we can use standard protocol features
to find the various FRUs in a path to a given storage chunk.  But I don't
think that the Device ID page itself is of much direct help.


BACKGROUND REASONING BEHIND PROPOSALS:

A) One must assume that the vendor for a multi-controller array will
   have SOME basis for ensuring the uniqueness of the WWNs assigned
   to chunks of storage.  For example, the LUN WWN could use the IEEE
   registered-extended format, then take the first 64-bits (company ID
   and vendor-specific identifier) from some NON-replaceable component
   in the array (chassis?), and the second 64-bits from a table that
   is managed within the array to ensure uniqueness.  If a controller
   is a replaceable component, it would be hazardous to give a chunk
   of storage a WWN that is based in any way on the controller's WWN--
   the controller could be swapped out of one array and into another
   one, creating the possiblity of duplicate, conflicting LUN WWNs.
   I think it is too hard to require the controller design to ensure
   that any new WWNs created in the new array will not conflict with
   WWNs created in previous arrays.  Also, once a host has created
   a correspondence between a WWN and a file system object, I think it
   will be very tricky to change the correspondence to a new WWN, or
   reassign an exiting WWN to a new file system object.

B) A device vendor is not required to use the 8-byte to 16-byte
   WWN relationship suggested in the tutorial.  If the controller-to-
   LUN association is vital, then either the host code must have some
   additional algorithms for finding this association for vendor that
   don't follow the suggestion, or else this sub-encoding needs to be
   required, rather than suggested (a bitter pill).

C) A host MUST NOT assume that two LUN WWNs are equivalent that have
   the same value for the second 64-bit field of the IEEE registered-
   extended format, but different values for the company-id and/or 
   vendor-specific identifier (first 64-bits).  The extension field
   is only guaranteed to be unique within the context of the first
   64-bits.  
   i) So an association cannot be made between multiple controllers 
      and a given chunk of storage based on an assumed sub-structuring
      of LUN WWNs.  For example, if an Device ID Inquiry command is 
      sent to a LUN via one controller and the value "x,y" is returned
      where "x" is the first 64-bits of the registered-extended format
      and "y" is the second 64-bits), and an Inquiry sent to another 
      address returns the value "z,y", the host must NOT assume that 
      these are two paths to the same chunk of storage "y".
   ii) Giving the full list of controller-based LUN WWNs is not sufficient
      if the controllers are replaceable units.  For example, a Device ID
      Inquiry response received via one controller could contain the WWN 
      list "w,x;y,z" and another response could be received via another
      controller having the list "y,z;w,x".  If we assume the 
      substructuring indicated in the tutorial and Larry's earlier memo, 
      this COULD give the correspondences between the chunk of storage and 
      the controllers used to reach it (i.e., WWNs "y" and "w"), and yet 
      give the host enough information to know when the same chunk of 
      storage is being reached.  BUT, if both controllers are replaceable
      units, then:
      * each list would also need to contain a WWN that does not depend 
        on a (currently-present) controller, otherwise, how would the 
        host know when it had reached the same, previous chunk of storage 
        when both controllers have been replaced?  
      * the design of the controllers would need to ensure that new
        extension fields assigned after the controller is relocated 
        do not match any that were assigned in the previous array,
        otherwise duplicate WWNs would result.  The host can't use
        the full list to identify a chunk of storage because the
        chunk doesn't change if a new controller is added, or an old
        one is subtracted from the list.  Also, how would the host 
        handle having two WWNs that were previously together in the
        same list, but now are located in different lists.  So the host
        can't do anything with the list that would relieve the controllers
        from supporting this uniqueness requirement.  Therefore, I assume
        it would be easier not to have the WWN for a LUN depend on a
        the WWN for a replaceable controller in the first place!
      * If the Device ID page list contains both WWN values based on an
        invariant and on replaceable controllers, then the host would need
        to know one from the other, in order to resolve the ambiguities
        just discussed.  But then the host could use just the invariant-
        based WWN as the key for path-sorting.           
 
>Please let me know if this is not consistent with the July Serial Concerns 
>discussion.
>
>Thanks,
>Rod D
>
>By the way, not linking the Logical Unit to the device is a key aspect to 
>some of the changes we are proposing in the SCC2 model using the ASSIGN and 
>DEASSIGN commands.  These commands allow Volumes to be assigned or 
>deassigned Logical Unit Numbers to different controllers connected to the 
>same storage.  To take advantage of such a capability, the system OS must be 
>concerned first and foremost with the Volumes WWN, and only then, its 
>physical path.
================
*
* For SCSI Reflector information, send a message with
* 'info scsi' (no quotes) in the message body to majordomo at symbios.com




More information about the T10 mailing list