PLEASE EVALUATE RESPONSES, PERSISTENT RESERVE DESCRIPTION, 2nd DRAFT

Bob Snively Bob.Snively at Eng.Sun.COM
Wed Jan 25 22:44:57 PST 1995




Many comments and questions have been received about Persistent
Reserve.  I have provided a set of responses below, together with
a reworked draft of the description which will become a basis
for the standard proposal.  I expect that more may come.

The responses are summarized below and changes/clarifications are
included in the updated enclosed message.

1)	Optionality

>> 
>> The Persistent Reserve In/Out commands have the following functional
>> options:
>
>I'm not following what you mean by "options".  Do you mean to
>imply that a vendor who implements Persistent Reserve may be
>subsetting it, i.e., are some features "optional"?
>

	I have been sloppy about my wording here.  It is true
	that many SCSI functions are optional and need not
	be implemented by devices or invoked by hosts that
	don't require the special capabilities provided by the
	option.

	I will add my first impressions of what are really
	required by all devices and what could be optional.

	These are updated in the text below, but not marked,
	because they are so obvious.


2)	ID value

>> 	For ID Presentation:
>> 
>> 		Defines a 16-byte current ID for the
>> 		sourcing initiator to the attached device.
>
>Is the 16 byte identifier required to have any relationship at
>all to the SCSI address of the "sourcing initiator", or is
>the software free to use the 16 bytes in any way that it sees fit?
>(I prefer the later.)

	The ID is an arbitrary 16-byte value provided 
	by each initiator and understood by cooperating
	initiators to be associated with that initiator.
	They could use a configuration ID, a network
	ID, a dynamically assigned task ID, 
	or any other convention they like to generate
	the value. 

3)	Examples required to show why Check Condition is appropriate.

>> 	For over-ride of a present reservation:
>> 
>> .....
>> 
>> 		Provides a Check Condition if the current
>> 		reservation is for an initiator other than
>> 		that specified by the 16-byte ID provided by
>> 		the command.
>
>Is creating a Check Condition better than just returning a
>RESERVATION_CONFLICT error?  Does the Check Condition affect just the
>initiator who sent the request, or does it affect all initiators?

	This is a good question.  After a great deal of thought
	and work on the examples, I agree that a Reservation
	Conflict is at least as good as a Check Condition and
	probably more flexible.
	
	I will add an example usage paragraph to the outline.
	Note that this example usage also brings up another
	question.

4)	SHOULD A PR ID BE REQUIRED TO PERFORM A RESERVATION?

	If PR ID is required to be presented
	during a reservation, this provides an additional
	time-stamp-like authorization that can determine that there have
	been no power off conditions or system substitutions
	since the last initialization period. If a PR ID is
	required during the reservation, checking of the 
	previous PR ID may be required by some systems, but
	not required by other systems, so an appropriate
	PR_ID_VER bit would be added.  I think this is a
	valuable feature.


5)	Device internals
 
>> ......
>> The ABORT TASK SET, OTHER INITIATOR task management function
>> works very much like we have defined, except that it uses the
>> ID defined by the Persistent Reserve command to specify the
>> initiator for which the ABORT TASK SET will be performed.
>
>In the device's internals, does this mean that for each task,
>the device must keep track of which 16 byte ID issued that task?
>How does the implementation map from the ID in the Persistent
>Reserve command to the set of tasks that should be aborted?

	After reviewing the examples, it seems that the following
	device internals will be required:

	For each initiator that ever performs a PR OUT command
	to the target:
		PR_ID for that initiator, null if never contacted,
		maximum number of initiators is defined by table space.
		Should be > 16/port, but is not specified.  If
		not sufficient for a system, better not use
		PR on at least some of the initiators.

	For each Persistent Reserve condition that has been
	established:
		PR_ID for that reserve condition.
		Note that each Extent Reservation may have
			a different PR_ID.
		The maximum number is:
			1/initiator if extent reservation is not implemented.
			
			n/initiator if n extent reservations are permitted
				per initiator

			n if n extent reservations total are permitted

		Note that LUN reservations always conflict with
			extent reservations.

	For each Persistent Reserve condition that has been
	established:
		The proper set of conflict resolution parameters:
		For LUN reservations:  initiator to which LUN is reserved
		For Extent reservations:  Extent size/Extent properties/
					initiator to which extent is reserved.
		For third party reservations:  initiator to which LUN is
					 reserved.

	Unintended overflow conditions can be eliminated by Overriding
	PR IDs for which no reservations exist, since the 
	relevant table space will be released.

	The tasks are each associated with an initiator during
	the start of each command.  Thus it is only necessary to
	find out which initiator (if any) is referenced by the PR ID
	and to perform the appropriate set of ABORT functions.



6)	Protection against unauthorized hosts

>> 
>> 
>> Discussion with Roger Cummings and Bill Dallas on 
>> Persistent Reservation provided these additional recommendations
>> and clarifications:
>> 
>> ....
>> 
>> d)  Roger feels that the protection of peripherals from
>>     unauthorized hosts should be done by the switch.
>
>I'm confused by point (d), in particular, what is meant by "unauthorized".
>
>Consider a pair of server hosts that are both connected to the same set
>of disks.  Suppose host A initially has the disks Persistently
>Reserved.  Then host B comes to believe that A has crashed -- Host A
>may or may not have crashed, or A may be taking its sweet time to
>crash, or it may be very sick but still issuing some i/o commands,
>e.g., all user-level processes have frozen in the scheduler but device
>drivers are still running at interrupt level.  Host B will issue a
>Persistent Reserve to seize control of the disks and to prevent stale
>i/o commands from host A from interfering.  So in some sense B is using
>Persistent Reserve to revoke access to the no-longer-authorized host A.
>
>Is Roger saying that we shouldn't be using Persistent Reserve for this
>purpose?  Alternatively, perhaps by "unauthorized" Roger has in mind a
>malicious host?  Is he just saying that network security issues are beyond
>the scope of SCSI?

	Roger is saying that the network security issues are beyond
	the scope of SCSI, as you concluded. 

7)	How big should the PR ID be?


>> ......
>> 
>>     The id will be 8 bytes.
>>
>
>Please keep it at 16 bytes.  16 gives enough room for
>higher-level software to use some of it, by convention, as
>a system id and some of it as uniquifying timestamp.
> 

	Okay, 16 bytes it is.

8)	Task Management Functions and Reservations



>> 
>> f)  All the task management functions violate reservations.
>
>I don't follow (f).  Does this mean that the task management
>functions bypass the reservation checks?


	Yes.  All task management functions bypass reservation
	checks.



9)	Is Abort Task Set Other Initiator really critical?

>> g)  Abort Task Set Other Initiator is a critical function.
>
>I see that it's useful, but I'm not up to speed on why it is critical.
>Consider my Host A and B scenario above, where Host A is crashing, but
>perhaps slowly and issuing some i/o commands as it goes down.  Once A
>has finally halted, it will issue no more commands.  Can the
>not-yet-completed tasks that the device is doing, or trying to do, for
>Host A cause it to not respond to Host B?  Can the device get into a
>state where it is waiting for A to do something and therefore the
>device is hanging?  Can deadlock happen? 

	The Abort Task Set Other Initiator is critical to
	clear resources that have been tied up in the logical
	unit by a failing host that will never complete the
	commands that tied up those resources in the first place.
	Without that command, those resources may even try
	to reconnect when an initiator with the same id becomes
	active again.  In general, we have assumed that
	targets are too dumb to perform meaningful timeouts and
	know when to free these resources on their own.  In most
	cases, disks do not keep any clock at all.


10)	Are other priority reserve proposals being considered?

	Yes, but this one will supplant all the others.


11)	Behavior of commands by other initiators not clear

>Allow only get_pr, put_pr, and inquiry commands and ABORT TASK SET OTHER
>INITIATOR to work on a drive whose other port is reserved.

	If I have done the work correctly, the examples
	now tell you what you need to know about the behavior
	of devices other than those that hold a reservation.

12) 	Determination that PR OUT/IN commands are supported.

>The inquiry data provide information to determine that this is a
>dual ported drive.

	This is no longer a case of how many ports the device
	has, but rather whether the PR OUT/IN commands are
	supported.  I have added a statement that indicates
	that this behavior is indicated in the INQUIRY command.




----- Begin Included Message, updated to contain corrections -----
	Updates and changes are indicated with >>> markers <<<
	
To:		Distribution

From:		Bob Snively

Date:		Jan 10, 1995

Subject:	PLEASE EVALUATE proposal for "Persistent Reserve".


This document defines a new function called "Persistent Reserve".
It has all the properties of a normal reserve except that
it is not cleared by Target or LUN Reset (Bus Device Reset) or by 
SCSI RST for those protocols that have that capability.  
It is only cleared by a power cycle or by
a properly qualified Persistent Reserve from another 
Initiator.  It requires the definition of two new commands
having different properties than the present RESERVE/RELEASE
commands.  The commands are Persistent Reserve In and Persistent 
Reserve Out. 

This proposal replaces all other multi port and priority reserve
proposals for the SCSI-3 standard.

The Persistent Reserve In/Out commands have the following mandatory
and optional features:

	Implementation of the multi-port multi-initiator Persistent
	Reserve In/Out command (optional.  If implemented the 
	mandatory features must be implemented.)

	For Reservations (mandatory):

		Extent (optional)
		Third-Party  (mandatory)
		LUN  (mandatory)

		Release  (mandatory)

>>>>>>	PR_ID may be verified or not (capability mandatory)   <<<<<<<

	For ID Presentation (mandatory, need not be invoked):

		Defines a 16-byte current ID for the
		sourcing initiator to the attached device.

	For determination of present/last reservation (mandatory):

		Recovers the 16-byte current ID for the presently
		reserving initiator  (null if never set).

		Recovers the 16-byte ID for the immediately
		previous reserving initiator (null if never set).

		This uses the Persistent Reserve In command.

	For over-ride of a present reservation:

		Replaces the current reservation with a Persistent
		Reservation for the issuing initiator IF the
		16-byte current ID provided matches the ID
		of the currently reserving initiator (may not be
		invoked unless ID presentation was performed, mandatory).

		Creates a Persistent Reservation IF there is
		no present reservation (mandatory).

		Provides a Reservation Conflict if the current
		reservation is for an initiator other than
		that specified by the 16-byte ID provided by
		the command  or if other conflicting
		reservations must also be cleared.(mandatory).

This command has no effect on reservations established by the
normal RESERVE/RELEASE command and will receive a Reservation
Conflict indication if it is executed to a LUN having a normal
reservation in effect.  Multi-initiator environments must use
ONLY the new reservation command to operate correctly, since there
would be no Priority Reserve command to take over from the old
RESERVE commands.

The ABORT TASK SET, OTHER INITIATOR task management function
works very much like we have defined, except that it uses the
ID defined by the Persistent Reserve command to specify the
initiator for which the ABORT TASK SET will be performed. (mandatory)

>>>>>>>>>>>>>>>>>
The ABORT TASK SET, OTHER INITIATOR is ignored if the PR_ID does
not match some initiator.  For hosts that have never executed
a PR OUT and assigned a PR_ID, their tasks cannot be cleared by
the ABORT TASK SET, OTHER INITIATOR, but can be cleared by other
task management functions.  <<<<<<<<<<<<

>>>>>>>>>>>>>>>>>>
The INQUIRY command obtains Inquiry Data which contains an
indicator bit that tells whether or not the persistent reserve
function is supported.   <<<<<<<<<<<<<<<<<

>>>>>>>>

Examples of normal usage of PERSISTENT RESERVE IN  (PR IN) and 
PERSISTENT RESERVE OUT (PR OUT).

  A)	During initialization:

	No initialization is required. 


	A PR IN command is executed to determine if a
	Persistent Reserve ID (PR ID) has been established for that
	initiator.  If it has not and the host is expected to
	be using the persistent reservation function, a
	PR OUT command is executed to set the PR ID.


  B)    To create a reservation:

	A PR OUT command is executed to set the proper
	LUN PR, Extent PR or third party PR.  The first time that
	a PR OUT command is used to establish a persistent
	reservation, the null PR ID is replaced with the
	PR ID of the reservation command.  The PR ID is
	maintained by the target for that initiator until
	the next PR OUT command establishes a new persistent
	reservation, at which time the PR ID is replaced with
	the the new PR ID.  Normally, a host will always use
	the same PR ID from power on until the next major
	host reboot or reconfiguration, but that is not
	required.  

	Appropriate Extent ID and Third Party secondary initiator
	information, if relevant, is contained in the PR OUT command.

	The use of a null PR ID in a PR OUT command is allowed
	and is treated as a normal PR ID.

	A control bit is set in the command to demand
	that the PR ID be verified.  If the PR_ID_VER bit is
	set to zero, the PR ID will not be verified and 
	the new PR ID will simply replace the old PR ID for
	that initiator.  If the PR_ID_VER bit is set to one,
	the PR ID will be verified.  If the PR ID is the
	same as the previous PR ID received from that
	initiator, the PR OUT command will be accepted.  
	If the new PR ID is different from the previous PR ID
	from that initiator, the PR OUT command will be
	rejected with Reservation Conflict status.

	Recovery of an unexpected PR ID change while operating
	with the PR_ID_VER bit set to one is executed in the
	same manner as a reservation over-ride. An unexpected 
	PR ID change normally results from a re-configuration that 
	was not expected by the system.

	A third party reservation is established applying the PR_ID 
	to the secondary host.  If a different host may have been
	the previous primary host for a third party operation
	using the same secondary host, setting the PR_ID_VER
	bit to one will create a Reservation Conflict status
	to an otherwise legitimate PR OUT command.  This
	bit should only be used for third party reservations
	in environments that avoid this case or require detection
	of this case.

	If the target has no more space to maintain PR IDs
	or to maintain the extent reservation parameters, 
	a Check Condition with Illegal Request/Reservation Table Full
	is generated.  The override function clears the
	addressed reservation tables and initiator tables.

  C)	To release a reservation:

	A PR OUT command is executed to release a PR.  Appropriate
	Extent ID and Third Party secondary initiator information 
	is carried.  No PR ID is carried or used by the command.
	The PR_ID_VER bit is not defined for release.

  D)	Operation during PR:

	All commands for the destination from the reserving initiator
	are allowed to be performed.

	With the following exceptions, all commands from initiators
	other than the reserving initiator are rejected with
	Reservation Conflict status.

		PR IN			Executed
		INQUIRY			Executed
		REQUEST SENSE		Executed
		
		A number of commands are evaluated in a special manner
		as to whether or not they conflict with an Extent Reservation
		These behave as specified in the SCSI-2 standard
		and the corresponding SCSI-3 documents.

	PR IN and PR OUT commands are evaluated as normal reserve
	commands if a standard reservation has been established
	by a RESERVE command.  RESERVE commands are evaluated 
	against a persistent reservation status exactly as if 
	a standard reservation was active.  The RESERVE/RELEASE commands
	act on the "reserved" state of the device independently
	from the PR IN/OUT commands' actions on the "persistent
	reserved" state of the device.

  E)	Identification of configuration:

	The PR IN command obtains a header and a PR ID based
	on the setting of a number of control bits.  The
	PR IN command specifies whether LUN, Third-party, or
	extent reservations are being considered and contains
	the appropriate defining parameters.

	Present Reservation:
		Obtains PR ID of one present reservation that
		conflicts with the indicated reservation and
		indicates that a conflicting reservation is active.
		If more than one present reservation conflicts,
		only one of the PR IDs is provided, but a bit
		is set to indicate multiple conflicts.

	Last Reservation:
		
		Obtains PR ID of the last reservation of any
		type that was established.
	
	Own last reservation:

		Obtains present PR ID for the connected initiator,
		whether or not a reservation is active.

   F)	Override of reservations:

	A PR IN command is executed to determine the PR_ID
	of the initiator holding a conflicting reservation.

	A PR OUT command containing the Override control bit set
	to one is transmitted to the target.  The target 
	releases any conflicting reservations for the PR_ID
	specified by the PR OUT command and establishes the
	specified reservation with the PR_ID supplied by the
	initiator executing the command unless there are
	other conflicting reservations from other initiators.
	Non-conflicting reservations are uneffected.  Reservations
	generated by a RESERVE command and persistent reservations
	from initiators other than the one identified by the PR_ID
	are uneffected.  If uneffected reservations conflict with
	the persistent reservation being established by the 
	PR OUT command, the command is executed as far as
	possible, then terminated with Reservation Conflict status.

	Subsequent PR IN commands are required to determine
	additional reservations that may be in conflict with
	the PR OUT command that encounters Reservation Conflict status.

	Internal states for PR_IDs  that have been overridden are
	discarded regardless of whether or not a reservation was
	cleared.  This frees table space for the use of other
	initiators or for other extent reservations. 
	
	
	
<<<<<<<<<<<<<<<<<<<<<




Discussion with Roger Cummings and Bill Dallas on 
Persistent Reservation provided these additional recommendations
and clarifications:

a)  IPI allowed a target reservation, LUN reservation,
    and third-party reservation that was persistent until
    explicitly cleared or until Priority Reservation
    was performed.  Once established, a Priority Reservation
    had all the properties of a normal reservation, including
    the capability of being broken by a Priority Reservation.

    IPI also had a physical level reservation too, but
    Roger thought this was not an essential part of the
    structure.

b)  Bill was studying take over functions.  If a system
    goes down that has a reservation, a Bus Reset or
    Target Reset was required and screwed up everything.

    Bill feels it is useful to have a mechanism for
    identifying the system that has the reservation.
    Rogue systems may not have knowledge of the meaning
    of the identifier and may violate the convention.

c)  Roger feels that the IPI was deficient because the
    physical layer did not provide a response to
    a reservation, so that the reserved state looked
    like a dead device.

d)  Roger feels that the protection of peripherals from
    unauthorized hosts should be done by the switch.

e)  There was some discussion about what happens when
    there are queued tasks in place.  At present, SCSI
    treats reservation as an Ordered Queue operation.
    Bill wants to over-ride them, since queueing may be
    against an initiator that will not serve the queue.
    Roger says that they blasted off all tasks if 
    a reservation could not be made because other tasks
    were present.

    The id will be 16 bytes.

    Bill suggests a bit to indicate whether 
    the reservation will take place as an Ordered or
    Head of Queue command.

    Roger suggests that Lansing Sloane be consulted for
    corner case considerations.

f)  All the task management functions violate reservations.

g)  Abort Task Set Other Initiator is a critical function.

h)  Persistent Reserve does not need a priority reserve,
    since it has a Get PR Info.  The Get PR Info may be
    a list of extent reservations.


----- End Included Message -----




More information about the T10 mailing list