Proposal for IEEE company_id based formats for FC-PH world-wideidentifiers
Mike Wenzel
mw at core.rose.hp.com
Tue Jan 7 18:15:17 PST 1997
* From the SCSI Reflector (scsi at symbios.com), posted by:
* Mike Wenzel <mw at core.rose.hp.com>
*
Hi Larry and Rod,
Just digging out from time off during the holidays. Sorry I haven't
responded sooner.
I agree that being able to unquely identify a chunk of storage regardless
of path is of paramont importance. I'm very concerned that a storage
device vendor is being encouraged to include 16-byte WWNs in the list on
the Device ID page, that are based on the 8-byte WWNs of replaceable
components! This will create ambiguities that I don't think host computers
will be able to resolve. I think WWNs that correspond to chunks of storage
need to be based on something NON-replaceable.
In response to Larry's need to find controller-to-LUN associations, I think
this is just one example of a general need to identify the field-replaceable
units (FRU) anywhere in a given path to a storage chunk: IO cards,
port hardware, fabric components, etc. For this, I think we need to use a
subset of the current path address information to find the FRU. In general,
I don't see any way to put all the associations of interest in the Device ID
page list! Not only would this be messy, but it also would be bad layering,
impossible for most implementations, etc.
Please see below for more detail. Sorry if the attached rambles a bit,
it seems like there are many implications to consider.
Best Regards,
Mike
************* | Mike Wenzel,
***** ***** | Hewlett-Packard - NCD System Interconnect Lab,
*** /_ _ *** | Mailstop 5601,
** / / /_/ ** | 8000 Foothills Blvd.,
*** / *** | Roseville, CA 95747-5601
***** ***** | E-mail: mw at core.rose.hp.com
************* | Telephone (916) 785-5609 FAX (916) 785-2875
At 01:18 PM 1/6/97 CST, DeKoning, Rod wrote:
>* From the SCSI Reflector (scsi at symbios.com), posted by:
>* "DeKoning, Rod" <rdekonin at ppdpost.ks.symbios.com>
>*
>
>Larry, Mike, Bob, et.al.,
>
>Below, Larry discusses a mechanism to associate a controller (node) name to
>a Logical Unit. This concerns me (and possibly others?) for the following
>reasons.
This concern also immediately sprang to mind when I read the description
of the proposed association. This is certainly a valid way to assign
WWNs based on the current documents, but I'm concerned that it would be
abused in a multi-controller topology. However, after looking more closely,
I get the feeling that the write-up wasn't complete enough to do justice to
the thought process. Maybe Larry or Bob can provide more detail after
reading this.
>I went back to my personal notes for the July Serial Concerns Meeting to be
>sure we had discussed this point adequately, and I believe that we covered
>this topic in some detail. At that meeting, we discussed the fact that we
>wanted to purposely break the link between the Volume Logical Unit WWN and
>the Node Name of the device reporting the LUN's WWN. This is to ensure that
>we provide the following capabilities:
>
>1. Allow for dual controller environments in which the LUNs may be addressed
>from different controllers with different node IDs.
>2. Allow for hot swapping of devices used to access the LUNs.
>
>In the final analysis, (at least as I recall and recorded it) we are trying
>to encourage OS drivers and applications to avoid making a static link
>between a Volume Logical Unit and the device (controller) that is used to
>access the Volume Logical Unit. In general the OS drivers should be
>concerned with the WWN of the LUNs, and not the access devices since the
>data they are concerned with is associated with the LUNs and not the access
>device(s).
There are a large number of cases that need to be covered, including:
A) Host computers can have multiple IO card interfaces into a high-
availability storage network
B) A device controller can have multiple port interfaces into the
storage network (therefore, multiple link addresses, possibly
concurrently active)
C) A chunk of storage can be reached by multiple controllers
The classical probing method alluded to in previous memos will result in
a host computer finding multiple paths to a given chunk of storage. The
number of paths is mainly the product of the numbers in A-thru-C above.
So when probing is complete, the host has a list of paths (possibly a
long one!) that can be used to reach a given chunk of storage.
Now we can make the following observations:
1) Much of the information in a given path record is non-permanent:
a) Link addresses can be dynamically negotiated for both the devices
and hosts (especially for FC loop and FireWire).
b) Hardware components can be replaced: port hardware, controllers,
IO cards, etc.
2) Array controllers may often be configurable in terms of the LUN ID
value assigned to a given chunk of storage. This assignment may not
be the same LUN ID value for all controllers! A multi-controller
array may be configurable in terms of which controllers can and cannot
be used to access a given chunk of storage.
3) For high-availability configurations, there need to be multiple paths
that are concurrently usable by the host which involve a minimum number
of shared hardware components (single points of failure). These
redundant paths allow the chunk of storage to be accessed regardless
of which single component fails and without manual intervention at
the device.
4) It is OS-dependent as to what criteria are used to select one path
over another to reach a given chunk of storage at any specific time.
This criteria could be based on queue lengths, response times, or on
data obtained by protocol-dependent queries of the data from the
transport layers, etc.
5) As Rod pointed out, the Serial Solutions group agreed that it was
absolutely crucial that each chunk of storage be labelled with
an unambiguous world-wide name (WWN) so that the host can tell
when a path reaches the SAME chunk of storage versus a DIFFERENT
one! The the WWN on the chunk of storage is the key used to sort
the multiple paths.
Background: The Device Identification Page in SPC provides for a list of
identifiers to be returned as a result of an Inquiry command (see
Table 108). This list of identifiers could be of varying encoding or
Identifier type, since there are tag fields for both. The encoding is
mainly binary versus ASCII, and the Identifier type is primarily related to
the identifier encoding and/or registration authority. Bob's tutorial
(97-101r1) further recommends that for RAID devices that the IEEE
Registered-Extended format be used and says that the first 64 bits
could be equal to the WWN of the RAID controller.
I think we need to refine our understanding as to what the Device ID list
of identifiers may include and how the host should use the list.
I propose that the WWNs in this list should be based only on values that
are invariant for a chunk of storage. If the RAID controller, is a
field-replaceable unit, then I strongly suggest that the WWN of a LUN
NOT be based on the WWN of the controller, as the tutorial and Larry's
memo would suggest, but rather on something that won't change--the WWN
of the chassis? I also propose that the list mainly be used to give one
identifier value for each different format needed to correspond with
the various transport conventions used to reach the storage chunk (e.g.,
an EUI-64 format identifier if a FireWire interface is present, an
IEEE Registered-Extended format if a FibreChannel interface is present,
etc.). Personally, I would prefer to see a single, unique WWN of ANY
format for each chunk of storage and don't feel we need a list, unless
some additional 'association' or 'named entity' field is added.
I also think we need to refine our understanding as to what the entity is,
that the Device ID is labelling. The ANSI group agreed that the Device
ID is NOT a media label, but rather more of physical device or mount
point (see the latest SPC).
I propose that for an array, the actual storage devices are NOT the entities
being labelled, and if the storage devices are field-replaceable units,
then the IDs should NOT follow a storage device if it is moved to a new
array. For RAID devices, the virtual volume IS the entity being labelled.
If a virtual volume is destroyed and the related storage devices and LUN
IDs are re-used to create a new virtual volume in the same or different
arrays, then a new unique set of WWNs need to be created for the new volume.
So there is no ambiguity, for example, when a device containing half of a
mirrored pair is moved to a new array.
This implies to me that the Device IDs are not useful DIRECTLY for giving
the LUN-to-controller associations--especially where the controller is
replaceable. Instead, what about using other procedures to find the
association?
A. Inquiry Data from LUN 0 at the same port address as the path of interest
to the LUN in question. In other words, take the parallel SCSI target
address, the FibreChannel port address, etc., from a path to the LUN
and query LUN 0 at the same address.
B. A different (possibly new) Inquiry data page.
I think we need to do something like 'A' to find ANY of the FRUs in a given
path to a storage chunk, not just controllers but also port hardware,
IO cards, FibreChannel switch or hub components, etc. In other words,
I think we need to take information from a subset of the path and use it
in a protocol-dependent way to find the identity, properties, etc., that
are of interest for some component in the path.
Larry also pointed out the need to ensure that all of the commands in a
set go to the device via the same controller (e.g., for consistency,
ordering, etc.). I agree that this is needed, but I think it has more
to do with how the host uses path information. For FC Loop for example,
I think the transport layer needs to give the SCSI layer a handle that
will not change when a FC Loop address is renegotiated and changes value.
This way, all SCSI commands using the same handle, will go to the same
port, regardless of address changes. If the host wants to load-level
commands to a given controller among multiple paths, then it will need
some sort of data structures to remember what paths can be used for a
given controller, and other structures to map storage chunks onto
multiple controllers. There are very good analogs to this in networking,
where packets can be multiplexed over multiple links between a pair of
routers without the applications being aware or involved.
I think we need to discuss exactly how we can use standard protocol features
to find the various FRUs in a path to a given storage chunk. But I don't
think that the Device ID page itself is of much direct help.
BACKGROUND REASONING BEHIND PROPOSALS:
A) One must assume that the vendor for a multi-controller array will
have SOME basis for ensuring the uniqueness of the WWNs assigned
to chunks of storage. For example, the LUN WWN could use the IEEE
registered-extended format, then take the first 64-bits (company ID
and vendor-specific identifier) from some NON-replaceable component
in the array (chassis?), and the second 64-bits from a table that
is managed within the array to ensure uniqueness. If a controller
is a replaceable component, it would be hazardous to give a chunk
of storage a WWN that is based in any way on the controller's WWN--
the controller could be swapped out of one array and into another
one, creating the possiblity of duplicate, conflicting LUN WWNs.
I think it is too hard to require the controller design to ensure
that any new WWNs created in the new array will not conflict with
WWNs created in previous arrays. Also, once a host has created
a correspondence between a WWN and a file system object, I think it
will be very tricky to change the correspondence to a new WWN, or
reassign an exiting WWN to a new file system object.
B) A device vendor is not required to use the 8-byte to 16-byte
WWN relationship suggested in the tutorial. If the controller-to-
LUN association is vital, then either the host code must have some
additional algorithms for finding this association for vendor that
don't follow the suggestion, or else this sub-encoding needs to be
required, rather than suggested (a bitter pill).
C) A host MUST NOT assume that two LUN WWNs are equivalent that have
the same value for the second 64-bit field of the IEEE registered-
extended format, but different values for the company-id and/or
vendor-specific identifier (first 64-bits). The extension field
is only guaranteed to be unique within the context of the first
64-bits.
i) So an association cannot be made between multiple controllers
and a given chunk of storage based on an assumed sub-structuring
of LUN WWNs. For example, if an Device ID Inquiry command is
sent to a LUN via one controller and the value "x,y" is returned
where "x" is the first 64-bits of the registered-extended format
and "y" is the second 64-bits), and an Inquiry sent to another
address returns the value "z,y", the host must NOT assume that
these are two paths to the same chunk of storage "y".
ii) Giving the full list of controller-based LUN WWNs is not sufficient
if the controllers are replaceable units. For example, a Device ID
Inquiry response received via one controller could contain the WWN
list "w,x;y,z" and another response could be received via another
controller having the list "y,z;w,x". If we assume the
substructuring indicated in the tutorial and Larry's earlier memo,
this COULD give the correspondences between the chunk of storage and
the controllers used to reach it (i.e., WWNs "y" and "w"), and yet
give the host enough information to know when the same chunk of
storage is being reached. BUT, if both controllers are replaceable
units, then:
* each list would also need to contain a WWN that does not depend
on a (currently-present) controller, otherwise, how would the
host know when it had reached the same, previous chunk of storage
when both controllers have been replaced?
* the design of the controllers would need to ensure that new
extension fields assigned after the controller is relocated
do not match any that were assigned in the previous array,
otherwise duplicate WWNs would result. The host can't use
the full list to identify a chunk of storage because the
chunk doesn't change if a new controller is added, or an old
one is subtracted from the list. Also, how would the host
handle having two WWNs that were previously together in the
same list, but now are located in different lists. So the host
can't do anything with the list that would relieve the controllers
from supporting this uniqueness requirement. Therefore, I assume
it would be easier not to have the WWN for a LUN depend on a
the WWN for a replaceable controller in the first place!
* If the Device ID page list contains both WWN values based on an
invariant and on replaceable controllers, then the host would need
to know one from the other, in order to resolve the ambiguities
just discussed. But then the host could use just the invariant-
based WWN as the key for path-sorting.
>Please let me know if this is not consistent with the July Serial Concerns
>discussion.
>
>Thanks,
>Rod D
>
>By the way, not linking the Logical Unit to the device is a key aspect to
>some of the changes we are proposing in the SCC2 model using the ASSIGN and
>DEASSIGN commands. These commands allow Volumes to be assigned or
>deassigned Logical Unit Numbers to different controllers connected to the
>same storage. To take advantage of such a capability, the system OS must be
>concerned first and foremost with the Volumes WWN, and only then, its
>physical path.
================
*
* For SCSI Reflector information, send a message with
* 'info scsi' (no quotes) in the message body to majordomo at symbios.com
More information about the T10
mailing list