Proposed agenda for X3T10 Serial Issues Meeting (7/15/96)
dallas at zk3.dec.com
dallas at zk3.dec.com
Tue Jul 9 07:07:42 PDT 1996
* From the SCSI Reflector, posted by:
* dallas at zk3.dec.com
*
Accredited Standards Committee
X3, Information Processing Systems
Doc. No.:
Date: July 8, 1996
Project:
To: Membership of X3T10
From: William Dallas
Subject: Agenda for the serial issues meeting July 15, 1996
AGENDA
1.0 Opening Remarks
2.0 Introductions
3.0 Document Distribution
4.0 Call for Patents
5.0 Approval of Agenda
6.0 Issues for discussion
6.1 Device Identification
6.2 Topology Probing/System Configuration
6.3 Execution order of SCSI tasks by a Logical Unit
6.4 In flight I/O tasks in relation to QUEUE FULL and BUSY
statuses.
6.5 Re-transmission of data
6.6 Minimizing loop disruption in case of device plugging
6.7 Poor documentation and understanding of global device ID
6.8 Delivered Error Rates
6.9 Out order command reception and Abort and Clear Task Set
operations (issued by own initiator) - Determining what was
actually terminated.
6.10 Out of order command reception and Clear Task Set
operations (issued by another initiator) - Determining
what was actually
6.11 Out of order command reception and Async Event
Notification - Trying to track device state transitions.
6.12 System identification is unclear in the FC specifications
6.13 FC Link Level responses are independent of FCP.
6.14 Multiple path and/or multiple-host switchover - making sure
in flight ios from the "failed" path have been terminated.
8.0 Review of Action Items
9.0 Future Meeting Schedule (if any)
10.0 Adjournment
Issues Descriptions:
Issue: Device Identification
Cause(s)/Reason(s):
There is a fundamental need to identify a given box, media
instance, and/or interface. Operating systems and applications
need to know, for example, that a given disk is still the same,
even though its interface hardware may have been replaced (so a
disk can't take its identity from the interface). In
FibreChannel, since devices can be widely shared, the naming
mechanisms need to be independent of the OSs and file systems.
a. World-Wide Names
- Reportedly, some WWNs are guaranteed to be globally unique
via specific registration authorities and others are not.
* There is currently no requirement for global uniqueness,
only for uniqueness within a given fabric. But how can
a manufacturer in practicality accommodate the latter
without the former?
* There are conflicting schemes of determining WWNs from
time-of-day plus slot/box location to full IEEE extended
addressing.
* IEEE extended addressing is guaranteed unique, but contains
no media-relative geographic information.
- Which name format(s) can device vendors use and be sure they
will not collide with names from other vendors, as well as
meet the needs of the computer vendors?
b. Use of SCSI ID Serial Number and vendor information pages
- FCP encapsulates SCSI. Identification could be done via
the SCSI level, rather than the FC level. This might set
the stage for the same systems solutions working for other
encapsulations beyond FC.
Environment:
Problem is most greatest in largest configurations, but occurs
for all fabric sizes. Therefore, solutions are needed for use
in public loop and switched fabrics. Unique device names are
especially needed when there are alternative paths between a
computer and a device for high-availability (HA). The naming
scheme can also be helpful in maintaining the configuration in
environments where online replacement is used.
Example:
SCSI defines information pages for manufacturer's information
and serial number. Since these pages are optional and since the
content is unregistered, operating systems must attempt to match
the entire content of several of these pages, in order to ensure
that a device is uniquely identified. Some manufacturers have
reported the SAME serial number page for every device of given
type! Many older devices don't provide an information page at
all. Some dual-ported devices build their information page by
contatenating the serial number of the port queried, the serial
number of the second port, and the serial number of the
platform--meaning that the sting looks different when queried
from the second port because the first two components are in
different positions. Without knowing the device intimately,
there is no way of knowing that this is the same device being
reached via multiple paths.
Justification for Discussing:
The "mess" described in the example must not be allowed to
re-occur for network-attached mass storage, where topologies are
far more complicated than they are for SPI. We must act NOW to
fix this before precedents are set that will be VERY difficult to
live with later.
A unique device ID (rather than for each interface of a multi-
ported device) is strongly needed for:
1. Discriminating between alternate routes to SAME box versus
multiple instances of DIFFERENT, similar boxes (e.g., could
format drive by mistake).
2. Identifying existing boxes at different port addresses if
fabric is reconfigured (e.g., a computer must find its
boot device even if the loop is reconfigured).
3. Identifying existing box if FC interface is replaced
(e.g., as a result of component failure)
Objectives in Fixing:
1. Use existing standards and registration authorities, where
possible.
2. Minimize expense and complexity for the manufacturer.
3. If possible, provide solutions that will work for future
topologies and media.
Suggested Correction:
1. Carefully distinguish between "naming" and "addressing":
"Naming" identifies WHAT something is; "addressing"
determines how to reach it. Fabric-relative geographical
information (e.g., whether one device is before or after
another on a loop) is more important for addressing than for
naming, because WHAT something is generally does not change
when its position changes in the loop. If geography is
important in naming, it is generally PHYSICAL geography
(e.g., where a disk is located in an identified rack for easy
service access). A name can potentially be more lengthy,
since it is done comparatively infrequently to find an
address. An address must often be used for every packet or
request.
2. Since World-Wide Names (WWN) are already specified by the
standards, and since FCP ecapsulates SCSI, REQUIRE that
FC-based devices provide both a unique WWN via FC protocols
and a unique serial number via encapsulated SCSI. The serial
number reported by an equiry may BE the platform WWN (not
interface WWN, especially for multi-ported devices).
3. Ensure global uniqueness of WWNs by using the IEEE registered
prefix method to identify the manufacturer, who then
administers his own unique suffix from his portion of the
number space.
---------------------------------------------------------------------------
Issue: Topology Probing/System Configuration
Cause(s)/Reason(s):
- There are optional protocol facilities for a device to
register itself with a directory server. Unless ALL devices
provide this facility, at least one computer in the network
must probe the entire topology. It is not practical to
rapidly probe the entire port address space because of:
* Time needed to scan all possible FC addresses (2^24) delays
access to all devices and responsiveness of the system when
a device is added.
* Overhead for both devices and computers
- Unless there is a well-designated function that is provided
centrally in a fabric (possibly replicated), all computers
must fen for themselves. This is costly because of:
* Memory needed to store information for devices of no
(immediate) interest to a given computer
* Waste of having *all* computers on an FC fabric probe the
*same* topology, therefore getting the *same* answer!
Why not have the computer get the configuration from a
directory or other server?
- For switched fabrics, the device must log in with the fabric,
but there is currently no agreement wrt whether the device
registers itself with the directory or whether the fabric
registers the device.
Environment:
The problems are most severe for switched configurations or
large meshes (e.g., loops of loops). Single-loop
configurations are more limited in potential size and generally
provide addresses as part of loop initialization. These
addresses may then be queried directly for the related device's
name.
Example:
If all FibreChannel address components (one byte each for Domain,
Area and Port) are used by a fabric, the probing would involve
a host attempting a login with 2^24 addresses. Since each login
attempt to an absent port address involves waiting a reasonable
amount of time for the device to respond, the worst-case time
for the host to probe the fabric is 2^24 * <individual device
timeout>. This worst-case time could be reduced by some factor
by having N probes concurrently outstanding, at the cost of added
complexity in coding. If N=8 outstanding probes, the time required
to probe the fabric is still 2^21*<device timeout>. The device
timeout depends on the class of service, size of the fabric, and
the number of retries used for datagram classes of traffic. If the
timeout were 1 msec, then the probe time for the case where there
are 8 probes outstanding would be about 35 minutes and would cause
over 16 million packets to be injected into the fabric. It is
likely that the probing computer would be completely consumed by
the probing activity.
If instead, all devices log in with a directory server, only a
couple of packets per device are needed to build the data base,
and a host computer could retrieve the entire topology of the
fabric in the time it takes to transfer a file listing all the
devices, probably much less than one second.
Justification for Discussing:
Without some sub-setting of optionality, ALL devices and
computers will have to be prepared to handle all options, in
order to ensure interoperation in the widest number of cases.
Devices that attempt to be good citizens of the system by
providing better facilities may be penalized slightly in cost,
as well as have their efforts be somewhat undermined by devices
that do not provide these facilities.
Objectives in Fixing:
See Issue 1.
Suggested Correction:
1. This is expected to be controversial: REQUIRE ALL devices
to attempt registration with the directory server.
2. Ensure that the directory server functions could be provided
as a network-like, user-level application on any operating
system. (In other words, ensure that any computer COULD
become a directory server with minimal (if any) addition of
cost to the lower, transport levels. This ensures that a
directory server will always be available in any
configuration.
3. Develop standard computer-to-computer directory query calls and
responses (if they aren't defined already).
4. Provide more guidance wrt the information reported to the
directory server. It must include at least the device's WWN
and current address.
5. Ensure that the device login protocol contains the facilities
needed to provide flow control needed to deal with the storm
of device logins that would follow a fabric-wide power-up.
Ensure that the login directory server can be replicated for
high-availability configurations.
---------------------------------------------------------------------------
Issue: Execution order of SCSI tasks by a Logical Unit
Cause(s)/Reason(s):
In a Fibre Channel environment the SCSI Logical Unit
can receive SCSI tasks out of the order sent by the operating
system. This behavior presents the following problems
for the OS sending tagged commands to a Logical Unit:
* Task management functions (Abort Tagged Task) received
before the I/O Task being aborted. This creates several different
situations:
- An unexpected I/O Task is established in the Logical Unit
after the host told Logical Unit to abort the task.
- Host has difficilty ensuring that a Task/Tasks actually
terminated (CLEAR and ABORT Task sets).
- Logical Unit creates an invalid reconnection state
(SIP terminology) for a task that the host has aborted.
- May create a data integrity problems depending on the
application.
* Ordered tagged tasks do not work as specified in both
SCSI-2 and SCSI-3.
* I/O Tasks received out of order for sequential stream devices
(tapes) or other device models that can be fixed by retrying
command:
- All the above issues
- Creates data integrity problems.
In a private loop delivery order is sequential but the same
types of problems still exist. Currently there is no detection
of lost SCSI Tasks due to bit errors on the transport media
and dropped frames due to hot plugging of devices and power cycles
of the device. This characteristic of dropped SCSI Tasks can also
be viewed as the Logical Unit receiving Tasks out of order.
The following order problems exist in a private loops:
* Ordered tagged tasks do not work as specified in both
SCSI-2 and SCSI-3.
* May create a data integrity problems depending on the
application (disks).
* For most other device types running tagged commands
creates data integrity problems and error recovery problems
for OS driver.
Environment:
Fibre Channel Class 2 and 3 devices.
Example:
See Cause(s)/Reason(s)
Justification for Discussing:
If this item is not discussed by this group and resolved by the
standards groups, I envision the following:
* Hosts must deal with invalid reconnections for aborted tasks.
* Hosts error recovery policy is at most a guess.
* Fibre Channel will be mostly a disk inter-connect other types
of devices (tapes, printers, cd-r, etc.) will be extremely limited
or will be unable to function in the environment.
* Future inter-connects/protocols may be built upon Fibre channel
increasing OS problems.
Suggested correction:
Creation of FCP SCSI Task sequencing. Required for all Classes
---------------------------------------------------------------------------
Issue: In flight I/O tasks in relation to QUEUE FULL and BUSY statuses.
Cause(s)/Reason(s)
In SIP, there is an interlock where the host's queue for a Logical
Unit can be frozen based upon the SCSI status and the needs of the
peripheral device model. In the serial channels this is no longer
possible for QUEUE and BUSY statuses. Sequence ordered command
streams for tagged devices require an interlock so that tasks
in flight do not get entered in the task set.
The following problems are created in the situation
* Ordered tagged tasks do not work as specified in both
SCSI-2 and SCSI-3.
* May create a data integrity problems depending on the
application (disks).
* For most other device types running tagged commands
creates data integrity problems and error recovery problems
for OS driver.
Environment:
All serial channels.
Justification for Discussing:
If this item is not discussed by this group and resolved by the
standards groups, I envision the following:
* Hosts must deal with invalid reconnections for aborted tasks.
* Hosts error recovery policy is at most a guess.
* Fibre Channel will be mostly a disk inter-connect other types
of devices (tapes, printers, cd-r, etc.) will be extremely limited
or will be unable to function in the environment.
Suggest correction:
Require all SCSI-3 devices to implement ACA. Change SAM to
to reflect that all SCSI statuses other then good creates
an ACA condition.
---------------------------------------------------------------------------
Issue: Re-transmission of data
Cause(s)/Reason(s)
Fibre Channel Class 1 and 2 has frame (data) re-transmission which
for all practical purposes can not be done according to the
hardware folks. FCP (protocol) has no provisions to request data
re-transmission which leaves the link/protocol no recovery method
other then a failed I/O Task.
For disk drives retrying the command is not an uncommon recovery
method, but for serial channels we have an issue with ordering
(issue #2). Other types of devices (tapes) do not have that
luxury, data transmission errors can cause partial media updates.
Recovery methods would have guess as to whether data made it to the
media or not (data record request large then devices buffer).
How do the drivers determine there was a data transmission error?
Compressed tape drive engines normally request data re-transmission
when the data they are compressing expands (already compressed data).
Since there is not enough room in the drivers buffer to hold
both the compressed and uncompressed data the engines will ask
for the data again if it expands. This is accomplished in SPI
by doing a DISCONNECT without a SAVE DATA POINTERS and then
reconnecting to obtain the data again but not running the
data through the compression engine.
Environment:
Fibre Channel all Classes
Example:
See Cause(s)/Reason(s)
Justification for Discussing:
If this item is not discussed by this group and resolved by the
standards groups, I envision the following:
* Hosts error recovery policy is at most a guess.
* Fibre Channel will be mostly a disk inter-connect other types
of devices (tapes, printers, cd-r, etc.) will be extremely limited
or will be unable to function in the environment.
* Future inter-connects/protocols may be built upon Fibre channel
increasing OS problems.
Suggested correction:
FCP (ULP) data re-transmission capabilities.
---------------------------------------------------------------------------
12. Issue: Minimizing loop disruption in case of device plugging
Cause(s)/Reason(s):
The current accepted approach to device swapping in FC-AL
environments is that the operating system will detect any
errors that occur and will correct them using the same method
it uses to handle errors that occur during normal operation.
No method is supported to attempt to minimize these errors
by active effort on the part of the system. In large systems
errors triggered by normal maintenance and operation may cause
a substantial impact on the overall performance of the system.
Environment:
FC-AL loops where devices are expected to be removed or replaced
during normal operation.
Example:
The simplest example is the case where a loop has bypass circuits
to protect each device, and a device is removed during the time
when the loop is passing data. The data frame(s) will be corrupted
because of the electrical operation of the bypass circuits,
and recovery may be difficult. More details are reported in
a T11 document on this topic.
Justification for Discussing:
It is assummed that the drivers will be able to handle situations
of this type. It seems reasonable that they will, but there
is a potential performance problem if, for example, the complete
LIP and address verification process must be used on each device
swap. Right now the scope of the problem is not well understood.
Objectives in Fixing:
1. Based on discussions that are currently taking place in T11,
a feel for the magnitude of the problem is being developed.
Currently there are no concrete proposals on the floor that are
directly related to this, although there are some error detection
options that have been generated as a result of the discussion.
An objective is to gain an industry concensus on what the scale
of difficulty is in this area.
2. Depending on whether it is felt that this is in fact a problem
area, proposals to better support device swapping must be developed.
Suggested Correction:
Continue the current discussions at T11 on this topic, with more
input from the operating system people and the performance analysts.
---------------------------------------------------------------------------
Issue: Poor documentation and understanding of global device ID
scheme in the presence of dual controller and dual loop FC-AL.
Cause(s)/Reason(s):
This is merely an extension of the general ID problem to point
out that there are significant problems in the areas of dual-loop
and dual redudant controller subsystems.
Environment:
Any loop system with dual loops or dual controllers in a subsystem.
Example:
One simple question is "where does the ID for the 'subsystem' reside"?
The current proposals assume that there is a single node with
one or more ports, and that the difficulties are in determining
paths between nodes or in identifying ports. The case where a
node is a distributed element made up of a set of components where
any component (up to a limit of "all" components) may be replaced
without causing the existence of the node itself to vanish has
not been considered.
Justification for Discussing:
This is just part of the general ID problem.
Objectives in Fixing:
Support dual controller subsystem configurations.
Suggested Correction:
Determine a model for dual controller subsystems that fits
within the current SCSI-3 Controller Commmands document version -2
that is currently under development.
---------------------------------------------------------------------------
Issue: Delivered Error Rates
Cause(s)/Reason(s):
The specification of how error rates are to be measured is not
adequately spelled out in the Fibre Channel standards. Depending
on the interpretation, it may be that large scale systems will
not work because of excessive time spent recovering from errors.
Environment:
All Fibre Channel environments.
Example:
See T11 working paper for more discussion. The current FC-PH document
specifies an error rate but does not describe where it applies.
The FC protocol assumes that the specified rate will be the node-to-node
delivered rate, but the physical documents assume that the rate
specification refers to link-level connections. Because of the
"discard exchange" error handling protocol chosen by the various
profiles, for large configurations the system may spend all its time
recovering from errors.
Justification for Discussing:
Depending on how this works out, the solution to this problem may
require some rethinking of how the protocol works. If frame-level
retransmission is used, for example, this would affect the way
the driver software works. This difficulty has the potential to
be extremely disruptful to the orderly release of FC products.
Objectives in Fixing:
a. Clarify the FC standard so that there is no ambiguity about
the meaning of the error rate specification.
b. Verify that the protocol as currently defined works with the
newly-clarified standard.
c. Otherwise rewrite a potentially significant fraction of the
protocol and suffer the schedule consequences.
Suggested Correction:
Monitor the T11 discussion on this topic. If the committee decides
to make changes to the protocol because of this, prepare to
suffer the schedule consequences.
---------------------------------------------------------------------------
Issue: Out order command reception and Abort and Clear Task Set
operations (issued by own initiator) - Determining what was
actually terminated.
Cause(s)/Reason(s):
* In flight race conditions caused by out of order delivery
allows Abort/Clear Task Set to arrive prior to commmand
reception.
* In flight race conditions caused by out of order delivery
allows Abort/Clear Task Set responses to bypass in-flight
command response/completion.
* There currently exists no mechanism to query what commands
are "active" on a device (that have been received and are
being processed).
* Even with a mechanism to query command statuses, that by itself
will not ensure that the response is correct as the query
itself may bypass in-flight commands.
* (Fibre Channel) - sequence/exchange id reuse of "assumed
terminated" ios leaves us open to inadvertantly receiving
and accepting frames that may cause data corruption.
Alternative is to delay for some time period that would
ensure that any in-flight sequences/exchanges have been
terminated. Assumption is that E_D_TOV should be enough
time. Unfortunately, E_D_TOV is spec'd for 10 seconds, which
may be too long of a delay for upper layers and cause other
cascading errors.
Environment(s):
Fabric environments
Example:
Abort Task Set bypasses an in-flight command. The initiator
does not know which commands were terminated via the Abort
Task set.
Justification:
Although error recovery should be minimal, when it does happen,
it should be containable in scope (should not cascade into
additional errors) and should not freeze device io for significant
amounts of time.
Solution:
Target devices should mandatorily enable an ACA condition
after Abort and Clear Task Set operations.
There should be a way to obtain a list of all commands that
are pending during the ACA condition.
Other Notes:
* A mechanism utilizing FC2 level information (e.g. sequence
status blocks and Exchange status blocks) will most likely
be out of the question due to the following:
Much of this information may be embedded in FC2 and FC4
assist implementations and unable to be obtain
externally.
Current schemes to obtain this info is on a one by one
sequence/exchange level. Given devices with possibly
hundreds of active ios (disk arrays), this may be too
cumbersome/time consuming.
---------------------------------------------------------------------------
Issue: Out order command reception and Clear Task Set operations
(issued by another initiator) - Determining what was actually
terminated.
Cause(s)/Reason(s):
* In flight race conditions allow for new ios to be started on
the device while the "cleared by other initiator" response is
in transit to the host.
* In flight race conditions caused by out of order delivery
allows command responses to bypass earlier command responses.
* There currently exists no mechanism to query what commands
are "active" on a device (that have been received and are
being processed).
* Even with a mechanism to query command statuses, that by itself
will not ensure that the response is correct as the query
itself may bypass in-flight commands.
* (Fibre Channel) - sequence/exchange id reuse of "assumed
terminated" ios leaves us open to inadvertantly receiving
and accepting frames that may cause data corruption.
Alternative is to delay for some time period that would
ensure that any in-flight sequences/exchanges have been
terminated. Assumption is that E_D_TOV should be enough
time. Unfortunately, E_D_TOV is spec'd for 10 seconds, which
may be too long of a delay for upper layers and cause other
cascading errors.
Environment(s):
Multi-initiator Fabric environments.
Example:
Clear task set is received by the target from the second intiator.
Command is then received from the first initiator. The target
generates a reply.
Another Command is then received from the first initiator. The
target acts on the command.
Justification:
Although error recovery should be minimal, when it does happen,
it should be containable in scope (should not cascade into
additional errors) and should not freeze device io for significant
amounts of time.
Solution:
Target devices should mandatorily enable an ACA condition
after Clear Task Set operations.
There should be a way to obtain a list of all commands that
are pending during the ACA condition.
An initiator should only be logged into a device (Fibre Channel)
if it is actively using the device.
Other Notes:
* A mechanism utilizing FC2 level information (e.g. sequence
status blocks and Exchange status blocks) will most likely
be out of the question due to the following:
Much of this information may be embedded in FC2 and FC4
assist implementations and unable to be obtain
externally.
Current schemes to obtain this info is on a one by one
sequence/exchange level. Given devices with possibly
hundreds of active ios (disk arrays), this may be too
cumbersome/time consuming.
* Caution should be taken as an outstanding ACA condition for
an initiator that is not actively using the device may result
in the other initiators receiving an "ACA active" condition
that never clears.
18. Issue: Out order command reception and Async Event Notification -
Trying to track device state transitions.
Cause(s)/Reason(s):
* Fabric environments may allow AEN's to arrive out of order.
* Frame errors may cause AEN's to arrive out of order (or to
be lost).
Environment(s):
Loop environments in the presence of frame errors.
Fabric environments.
Example:
<see cause>
Justification:
Many device state transitions are based on an explict transition
table. Missing a transition (based on it not arriving at the
proper time) may cause the driver to misidentify the state that
the device is actually in.
Solution:
Implement an ordered delivery/execution scheme for FCP
Other Notes:
* Using FC'isms to solve this (sequence id's, etc) may be
unusable, as the upper layer recieving the status's may have
no notion that it's using FC and what a sequence id is.
Also depends on the ability of the FC implementation to be
able to supply this information to upper layer, which some
FC4-assist implemenations may not be able to do.
---------------------------------------------------------------------------
Issue: System identification is unclear in the FC specifications
Cause(s)/Reason(s):
* The definition of "Node" in FC-PH is :
"A collection of one or more N_Ports controlled by a level
above FC-2"
Thus, under this definition, the following may be a node:
An adapter with 1 or more N_Ports
A host with 1 or more adapters (if the adapters do not claim
to be the "node").
Environment(s):
All FC environments
Example:
A host has 2 FC adapters, each with a single N_Port. One adapter
claims itself to be the "node". The other adapter defers the
"node" to the host. A second host, attempting to correlate multiple
connections to the first host, obtains different Node Names from
each of the adapters. How does the second system correlate these
2 N_Ports as being part of the same system?
Justification:
Failure to properly understand what a "node" is will make
system identification extremely difficult or impossible.
Solution:
The definition of "Node" should be further refined.
Other Notes:
Care should be taken to define the definition in terms of the
existing or near-term system implementations rather than
defining an abstract definition that attempts to hide the system
implementation.
---------------------------------------------------------------------------
Issue: FC Link Level responses are independent of FCP.
Cause(s)/Reason(s):
* The FC response frames for F_BSY/P_BSY/F_RJT/P_RJT may be
generated independently from the FCP entity associated with
the frame that was rejected.
Environment(s):
All FC environments
Example:
A N_Port of F_Port entity is temporarily unable to deliver a
frame and thus terminates transmission/reception of the frame
and sends a F_BSY or P_BSY status.
-or-
An configuration change occurs on a public loop, forcing a
logout condition for all initiators. One of the public loop
devices receives a frame from a logged-out initiator and rejects
it with a F_RJT status.
Unfortunately, the BSY/RJT'd frame contained an ordered command,
which needed to be executed prior to the reception of a later
command (that the N_Port is able to receive).
Justification:
If ACA is required to be enabled to ensure some kind of ordering
of processing, how does this lower level FC2 indication get
propagated up to the FC4 (FCP) so that it can enable the ACA ?
Solution:
Implement an ordered delivery/execution scheme for FCP
---------------------------------------------------------------------------
Issue: Multiple path and/or multiple-host switchover - making sure
in flight ios from the "failed" path have been terminated.
Cause(s)/Reason(s):
* Clearing the "old" initiators work queue with a target reset or
clear task set without holding a reservation will not be
successful due to in-flight commands for the older initiator.
* If you do not have a method for reserving a device, you must
delay an E_D_TOV (10 seconds) timeout to ensure that the older
initiators ios have errored/terminated. This timeout may be beyond
the acceptable switchover time for the initiator.
* Persistent Reserve/Release is not mandated by the SPC.
SBC and SSC currently state that Persistent Reservation is optional.
* Reserve/Release is not mandated by the SPC.
Reserve/Release is tied to the physical address of the reserver,
making it susceptible to temporal id changes. Many race conditions
exist with Reserve/Release that make it difficult to implement a
robust reservation mechanism.
Reserve and Release are mandatory for SBC.
Reserve and Release are optional for SSC.
Environment(s):
Fabric environments.
Dual Ported device environments
Example:
Justification:
In order to make robust dual ported and/or dual initiator
configurations, some type of reservation must be mandated
for all device types.
Given the implementation history from parallel scsi, persistent
reservation is the preferred mechanism.
Solution:
Mandate Reserve/Release and/or Persistent Reserve/Release for
all device models.
More information about the T10
mailing list