Would moving access restrictions from being based on
the registration to
being based on a specific reservation type help for
this?
Today, a bunch of initiators register - and that
basically has no impact on
anyones access. When 1 initiator does the
reservation, then that action
impacts all previous registrations (allowing
continued access), and all other
(non-registered) initiators (denying them
access). Anyone who registers
after that immediately joins the existing
reservation. That is what this
group reservation is trying to deal with (getting
free access under the
reservation just by doing a register - which is easy
to do).
Could we create reservations types
for:
write exclusive - reserved
only
exclusive access - reserved
only
This would create reservation types that require
reservation actions to allow
access. A simple registration all by itself
would still have no impact on access
until an initiator also performed the reservation
step. Once any initiator
uses this new type reservation, then a registered
node would loose access
(a reservation conflict status) until that initiator
also performed a reserve
function (with type reserved only). This also
means there would be multiple
reservation holders (since every initiator
does a reserve); so no need to deal
with the one reservation holder case (#5
below).
Once this reservation type (reserved only) is in
place, an initiator that is already
registered but not reserved, could not do I/O or
change the reservation type (reserves
with other reservation types would
fail). Only a reserved initiator could change
the reservation type (with a new
reserve).
This would cover all cases below (1-6) except for
#5. As for #4 (preempt),
the process could be a little more protected.
With all the existing reservation
types, the initiator just registers and
preempts. With these new types, the
initiator would have to register, reserve, and then
preempt. Would that meet
the #4 requirement, or do you feel preempt can't have
any changes at all?
My opinion is that a new reservation type could when
it is used, create new
requirements. On the other hand, if you want to
have preempt without reserve,
then we could exempt that 1 function from the reserve
requirement.
The question would be a group ID. Is one
needed? or would the simple change
to require a matching reservation (of type reserved
only) be enough? Using a
simple shared value wouldn't work for
this idea because of the problem it
would create for preempt (register,
reserve with shared value, then preempt);
if you don't know the shared value, you
can't preempt; so that would make
using a shared value impracticle; unless we exempt preempt
from the
reserve requirement, and just allow
register; preempt (without a reserve);
then, this approach could
work.
More comments below on the existing
proposal.
Fred
Knight
Ray,
Please see this
font.
Kevin D. Butt
SCSI
& Fibre Channel Architect, Tape Firmware
MS 6TYA, 9000 S. Rita Rd.,
Tucson, AZ 85744
Tel: 520-799-2869 / 520-799-5280
Fax: 520-799-2723
(T/L:321)
Email address:
kdbutt@us.ibm.com
http://www-03.ibm.com/servers/storage/
| "Raymond Gilson"
<raymond_gilson@symantec.com>
12/22/2007 07:06 AM
|
|
To
| Kevin D
Butt/Tucson/IBM@IBMUS
|
|
cc
| Christine R
Knibloe/Tucson/IBM@IBMUS, "Knight, Frederick"
<Frederick.Knight@netapp.com>, "Roger Cummings"
<roger_cummings@symantec.com>, <t10@t10.org>
|
|
Subject
| RE: Persistent Reservation
Proposal - Group Reservations |
|
Comments in line
From: Kevin D Butt
[mailto:kdbutt@us.ibm.com]
Sent: Friday, December 21, 2007 5:33
PM
To: Raymond Gilson
Cc: Christine R Knibloe; Knight,
Frederick; Roger Cummings; t10@t10.org
Subject: RE: Persistent
Reservation Proposal - Group Reservations
Ray,
Thanks for joining in.
Let me summarize what I think has been said by all parties who have
joined in these discussions.
1) (From Ray) Some applications will have trouble
providing a list of Transport IDs.
2) (From Fred) There is a desire to allow members
of a cluster that were not active at the creation of the reservation to join
in.
3) (From
Kevin) Who can join or participate in a group reservation is required to be
controlled such that only those initiators that are part of the cluster (i.e.
group) can join.
4) PREEMPT must be allowed (i.e., what we do cannot lock out
PREEMPT or make it not work correctly)
5) Both Registrants Only and All Registrants types
of functionality need to be provided for.
6) There should be an option for all target
ports
If we
can't require a list of Transport ID's then it seems that the suggested
"shared secret" (not cryptographic, just some unique value that is protected
through obfuscation) is probably the best way to do this. This would be
something akin to requiring the same reservation key value. However,
using the reservation key does not seem to be plausible because of how it is
being used today. What we need would be some different value that would
not be reportable via a Persistent Reserve In. This would keep
third-party initiators from joining the reservation. If this new Group
Reservation Identifier (GRID) were added, that would take care of #1, #2, and
#3 above.
For
#5 above, it seems that we could provide that by adding a bit in the reserve
command that indicates "All Participants are reservation holders". If
set to one, then it acts like an All Registrants. If set to zero then it
acts like Registrants Only. This includes in the unregistering.
I'm not sure a bit is the right place for this. Right now,
it's specified in the reservation type (registrants only, or all
registrants). Creating a bit creates a place for conflicting information
to be supplied. Are you suggesting this bit would apply to only some
reservation types, and be unused for other reservation types? What would
it mean if you did a registrants only reservation type, but set the
all registrants bit?
Another method would be to use all the existing reservation
types (for #5 above), but add a GRID bit to specify that the reservation
applies to only those that supply a matching GRID (all others get reservation
conflict until they supply a matching GRID). Then it could in fact apply
to all reservation types.
For #6
we use the ALL_TGT_PORTS bit the same as other reservations today.
Issues still to be
resolved:
a) Some
systems won't want to require all initiators to send a Persistent Reserve Out
command. Possible solution is to allow reserving multiple initiators if
a Transport ID list is sent. Additional initiators could join later if
they have the GRID. However this would make it more complicated and if
it is not needed I would rather not add this option.
In a practical sense, I cannot see how
this could be avoided (all initiators sending PRO) -- since PR requires trust
and good behavior, each initiator must make no assumption about what the
protection level is currently set at -- so it must verify the settings as the
correct and expected. If the settings aren't as expected, it must bail
out, or go into error recovery to attempt to avoid messing up some other
application (a fist fight on the SAN for device control does nobody any good).
I see no reason to provide for this (I do know that the current command
allows a registration for multiple ports, but I cannot imagine using it in the
real world).
I'm not sure I understand the
issue here. How can a system that doesn't want to send PR commands take
advantage of the features offered by that command? Are you thinking of
multi-path systems (where a single host system has multiple initiators with
access to the same target)? How does this new proposal make this
different than the situation today (where they need to use the transport ID
list and the spec_i_pt bit), or send PR-OUT from every initiator? I
guess I''m mostly agreeing that good behavior is already
required.
<<kdbutt: I am
certainly willing to agree. All could still be registered by using the
all_i_pt bit. However, I suspect there will be those that will find this
unacceptable. Anybody who needs a way to add all initiators who are
currently registered to the group reservation, please speak up (and comment on
a method to accomplish this).>>
I would suggest we do not want a way to add all currently
registered initiators to the group. This would tend to have the
potential to enlarge the group beyond what is intended. I'd
prefer a method that requires explicit
action.
b) If the first I_T nexus
sets the "All Participants are reservation holders" to zero when it creates
the reservation and then a subsequent I_T nexus sets it to one, what is the
behavior? Change the type? Reject? Also, what is reported in
the Report Full Status if All Participants are reservation holders is set to
zero?
I
don't think this is a problem -- once a reservation is established it cannot
be changed without a preempt, clear, or removal of the old. If this
isn't either true, I would want the attempt to change it to get rejected.
I would expect a change to require a preempt type operation. <<kdbutt: I think the correct response for
a new participant that attempts to change the type is to reject a command that
attempts to change the type.
I don't think you can
require a preempt/clear in order to change the type. The whole point of
PR is that a reservation is present at all times; you can change the
type, you can move the owner of the reservation (such as preempt on
a registrants only type), but you never want to loose the protection provided
by the PR (see note 10 in SPC4 - section 5.6 -
clearing).
For what to return in Report Full Status if
"All Participants are reservation holders" is set to zero, I am
concerned about confusion. In reality, only the first is a reservation
holder and therefore only the first should set the reservation_holder bit to
one. However, there would now be two groups that cannot be
distinguished. There first is not the reservation_holder but part of the
reservation and the second is not the reservation_holder and not part of the
reservation. I think we should probably add a "group reservation
participant" bit to distinguish the two.>> .
c) If we go to this
method of using the GRID to determine who can join, then the Reservation Key
may or may not be different.
c-1) if the Reservation Key is different, then a PREEMPT of a
Reservation Key will do what?
c-2) if the Reservation Key is the same, then a PREEMPT will act
the same as an AR or RO reservation today.
c-3) Do we require the Reservation Key to be the
same?
Preempt is of a
reservation, not a key. The key's currently are not compared, and have
no valid use (by the device) except that each initiator has registered one,
and only one at a time. We don't want to change this behavior -- a key
is random number assigned for some external purpose that the device records
and reports. (My application requires this to operate
properly)
<<kdbutt: Look at clause 5.6.10.4 of
SPC-4r11. This looks to me like the Reservation Key is used to decide
between unregistering I_T nexuses with the sent reservation key or if the
reservation key is that of the reservation holder, then removing the
reservation and registrations of all that have that reservation key. My
intent is not to change the current behavior.>>
Agreed Kevin. A Preempt
should impact the registration/reservation of all those initiators with
a key that matches the one that is being preempted - the same as current
behavior.
d) Does this approach still have the issues that
Roger was concerned about (e.g., the corner cases)?
I hope the use of a GRID
would not introduce any new issues to SPR -- it only prevents a registrant
from becoming a reservation participant without some external knowledge.
It doesn't prevent a registrant from preempt, clear, or any other error
recovery operations (and MUST not).
I think this is one of the
questions. Error recovery is often one of the cases where you
end up with fist-fights out in the SAN over who owns the device.
Hosts do exactly what you suggested above (host 1 checks with PR-IN, doesn't
like what it sees, and preempts and "fixes" it; then, host 2 does exactly the
same - and the fight is on). It's
perfectly valid to want to leave this working as is. I understand that
desire. I just would like to discuss the possibility of improving the
situation. If we can't or have other requirements not to change it,
that's fine.
Thanks,
Kevin D. Butt
SCSI & Fibre
Channel Architect, Tape Firmware
MS 6TYA, 9000 S. Rita Rd., Tucson, AZ
85744
Tel: 520-799-2869 / 520-799-5280
Fax: 520-799-2723
(T/L:321)
Email address:
kdbutt@us.ibm.com
http://www-03.ibm.com/servers/storage/
| "Raymond Gilson"
<raymond_gilson@symantec.com>
12/21/2007 12:45 PM
|
|
To
| "Roger Cummings"
<roger_cummings@symantec.com>, "Knight, Frederick"
<Frederick.Knight@netapp.com>, Kevin D
Butt/Tucson/IBM@IBMUS
|
|
cc
| <t10@t10.org>, Christine R
Knibloe/Tucson/IBM@IBMUS
|
|
Subject
| RE: Persistent Reservation
Proposal - Group Reservations |
|
Several years ago
I was trying to figure out a way to introduce a "JOIN" function to the SPR.
The initiator would register, but that would not grant it access to a
reservation of the "joined only" type. To join it, the initiator would
have to send a join SPR command -- we could add a "shared secret" field to the
join, so that only those initiators that knew the secret could
join.
I think we will have a great deal of trouble with a "white list"
approach -- as an application, I have no idea what my port ID is (or anything
else for that matter).
Would something like this make sense?
Thanks,
Ray Gilson
From: owner-t10@t10.org
[mailto:owner-t10@t10.org] On Behalf Of Roger Cummings
Sent:
Tuesday, December 18, 2007 10:24 AM
To: Knight, Frederick; Kevin D
Butt
Cc: t10@t10.org; Christine R Knibloe
Subject: RE:
Persistent Reservation Proposal - Group Reservations
Fred,
The way you
clean up from a disaster is to Preempt, that's what it's there for. Most of
the applications that I know that will actually issue a Preempt make it a very
special function that doesn't happen in the normal flow, and one app at least
DOES require manual intervention of an operator before kicking off the
preempt.
Yes, today, a Preempt has to be issued through a registered I_T
nexus, but a registration with the SPEC_I_T bit doesn't have to come from an
already registered initiator - see Table 33 in SPC-4, and I don't believe
Kevin changed that in his proposal.
For the future, however we define a "group"
for the purposes of new reservation types, we will have to make sure that an
Initiator outside of the "group" can issue a Preempt to handle the disaster
recovery case.
Regards,
Roger
From: Knight, Frederick
[mailto:Frederick.Knight@netapp.com]
Sent: Tuesday, December 18,
2007 10:56 AM
To: Roger Cummings; Kevin D Butt
Cc:
t10@t10.org; Christine R Knibloe
Subject: RE: Persistent Reservation
Proposal - Group Reservations
My question has had to do with differentiating the
disaster clean up
case from the non-cooperating host case.
How do I clean up from
a disaster? If all my "reserved" initiators
melt down, and there aren't any
of them left anymore (because of
a site disaster, or whatever), how does some other node
come along
and clean up so it can gain access?
Would it require
manual intervention? Or, is there a way in the protocol
that I can register and
preempt the group reservation (does the use
of the SPEC_I_PT bit allow this as you have
suggested Roger). I
thought the SPEC_I_PT had to come from an already
registered
initiator (which in a disaster, none exist anymore).
Fred Knight
From: Roger Cummings
[mailto:roger_cummings@symantec.com]
Sent: Tuesday, December 18,
2007 10:03 AM
To: Kevin D Butt; Knight, Frederick
Cc:
t10@t10.org; Christine R Knibloe
Subject: RE: Persistent Reservation
Proposal - Group Reservations
Kevin,
I'm sorry, I don't think it's as cut and
dried as you make out. This gets into some of the corner cases that I listed
in my first response.
The point to be made in response to Fred's case is that
a third-party can create registrations for a downed initiator (via the
SPEC_I_PT) bit, so that when it comes up again it will be able to participate
in the reservation without having to register itself.
Also, you say that "We
have made provisions for adding members once the reservation exists, but only
one of the reservation holders can add another entity." Two things in response
to that:
1) I didn't see any specific provision for adding members in your
proposal, so I presume you'd just issue another RESERVE with the same type and
the whole list of transport IDs to be included again, and thus the Target
would have a whole lot of work to do again to set up another
reservation.
2) I that really what you want, that an member of the existing
group can reissue the RESERVE with a whole bunch of different TransportIDs,
perhaps excluding some that were previously there?
Regards,
Roger
From: owner-t10@t10.org
[mailto:owner-t10@t10.org] On Behalf Of Kevin D Butt
Sent:
Monday, December 17, 2007 3:54 PM
To: Knight,
Frederick
Cc: t10@t10.org; Christine R Knibloe
Subject:
RE: Persistent Reservation Proposal - Group Reservations
Fred,
This is being proposed for
SPC.
There are
multiple types of reservations. In an environment where one node of a
cluster must join later, one of the other types can be used. Either that
or have an existing node in your cluster add the new node. The whole
intent of this Group reservation is to lock out everybody that is not
explicitly specified during the reserve. We have made provisions for
adding members once the reservation exists, but only one of the reservation
holders can add another entity. The new entity cannot add itself.
This is the whole point of reservations (i.e., lock out others from
doing stuff while I think I have exclusive rights).
To put it in other word's, to
allow somebody to join the reservation of their own accord without permission
is EXACTLY what I am trying to protect against.
Thanks,
Kevin D. Butt
SCSI & Fibre
Channel Architect, Tape Firmware
MS 6TYA, 9000 S. Rita Rd., Tucson, AZ
85744
Tel: 520-799-2869 / 520-799-5280
Fax: 520-799-2723
(T/L:321)
Email address:
kdbutt@us.ibm.com
http://www-03.ibm.com/servers/storage/
| "Knight, Frederick"
<Frederick.Knight@netapp.com>
12/17/2007 01:38 PM
|
|
To
| Kevin D
Butt/Tucson/IBM@IBMUS
|
|
cc
|
|
|
Subject
| RE: Persistent Reservation
Proposal - Group Reservations |
|
Sorry, you
can't require everyone to register before the reserve.
That's like saying my whole
cluster can't boot because 1 node is down. You need
to have a way for a "down"
initiator to join the fun after the fact.
I helped write a host cluster product that
used a shared tape (failover model). The
backup application would write
to the tape. If a system failure ever happened, the
backup application would
failover to a different host. It would skip backwards on
the tape for a few
records, recognize where it left off, and then resume operation.
BUT, for some
protection, we used reservations to make sure only 1 initiator at
a
time could
access the tape. The interesting point however, is that we were in
the
process
of upgrading from old SCSI-2 RESERVE to using PR. Because, we
also
have
multiple HBAs in the host, and we wanted to be able to use more than 1
of
those
HBAs (so we needed multiple reservations - aka PR). Having this
idea
(group
reservations) would have been a real nice addition.
As for the RA/AR
differences. It seemed to be timing. Registrants Only was
fairly
early
on (as I remember), and so implemented by several O/S vendors. Later
on,
some
issues were found (which got complicated spec-ees added to address), but
also,
the
All Registrants was added (which didn't have those issues). But, since
there were
implementations, it couldn't be removed like the other old PR types
that no one ever
used. Anyway, I agree, they offer basically the same
capabilities, but RO is already
out there, and AR is probably what new implementers are
using (it's easier to understand
and implement from the host side). Most of the
differences are already documented,
so there wouldn't be that much extra for you
to write to have both types (which I think
would be better than bit somewhere - do it
the same way all the others are done). But,
you could also just do the AR
version, and let someone else add the RO version if they
want it.
Are you proposing this for
tape only? or SPC in general? I assume SPC in general.
Fred
Knight
From: Kevin D Butt [mailto:kdbutt@us.ibm.com]
Sent: Monday, December 17, 2007 9:51 AM
To: Roger
Cummings
Cc: t10@t10.org
Subject: RE: Persistent
Reservation Proposal - Group Reservations
Roger,
Thank you for your feedback. I am certainly willing to
entertain other methods for accomplishing the end goal in an easier
fashion. I am not sure I understand how your proposed method makes it
more backward compatible. In my proposal PRin would show a different
type of reservation and hence the application clients would not try to join
the reservation because they don't know about the type. In your
proposal, application clients would not be allowed to register. This is
a deviation from what they can always do today - unless there is a resource
issue. This seems more disruptive to me. I would assume that there
would be a new additional sense code added for UNABLE TO REGISTER BECAUSE A
GROUP RESERVATION IS IN PLACE (or analogous). This would be a new thing
for failure to register and there would be pain at the register point.
Perhaps that is better than at the reserve point - but I would think
that it would be better handled as a reservation conflict since that is what
it is instead of something the application client does not
understand.
As
for "all registrants" type vs. "registrants only" I didn't see where the
difference would be interesting, but I am not opposed to providing a way to
switch between which of these two types is done. Whether it is
additional types or some bit during registration etc.
As for some of the corner cases
mentioned below, if each I_T nexus that is supposed to be part of the group
reservation is required to be registered before the reservation is made, and
if the reservation is released when the last group reservation participant is
unregistered, then I think we don't have an issue.
I would prefer that we work
together to shape a mutually beneficial proposal as opposed to have
"competing" proposals. I am willing to modify my proposal where it can
be made easier and such. I am not sold that my proposed method is the
only way or even the best - it's just the way I thought of doing it. I
admit that I have always been very confused about the usefulness of RA and AR
types. They make absolutely no sense in the tape world.
Thanks,
Kevin D. Butt
SCSI
& Fibre Channel Architect, Tape Firmware
MS 6TYA, 9000 S. Rita Rd.,
Tucson, AZ 85744
Tel: 520-799-2869 / 520-799-5280
Fax: 520-799-2723
(T/L:321)
Email address:
kdbutt@us.ibm.com
http://www-03.ibm.com/servers/storage/
"Roger Cummings"
<roger_cummings@symantec.com> Sent by:
owner-t10@t10.org
12/14/2007 12:10 PM
|
|
To
| Kevin D
Butt/Tucson/IBM@IBMUS
|
|
cc
| <t10@t10.org>
|
|
Subject
| RE: Persistent Reservation
Proposal - Group Reservations |
|
Kevin,
First of all, let me say that I
completely support what you're trying to do here. I think that providing a
method in persistent reservations (PRs) to support shared access between ONLY
a specifically-designated set of systems is a worthy goal, and something we
should do in SPC-4.
Adding a set of Transport IDs to Reserve as per your document
08-024 & 08-025 is certainly possible, but it's a massive change to the
way that PRs work today, and it throws up a bunch of nasty corner cases and
backwards compatibility issues.
The massive change comes from the fact that now the Target will
have to remember which registrations are in the Reservation, and which are
not. It will probably have to preserve all of the transport information for
the life of the reservation.
The corner cases are things like, what happens if there's no
longer a registration that corresponds to the transport ID in the Reserve?
Does the Reserve succeed? What happens if a registration comes in later, after
the reservation has been established - does that device it get
access?
Backwards
compatibility issues may arise like this: An existing device registers, and
finds it has no access, so it does a PR In and finds out that a reservation is
in place, retries its access and still it has no access. What does it do next,
preempt the reservation because it assumes the Target is broken?
Reserve also has to be an
"atomic" command, and I've always thought that was why it's functionality is
as compact as it is today. Most of the complex operations related to addresses
and keys are done at registration time, and those operations don't have to be
atomic.
One more
thing: you chose for your new "group" reservations to follow the "all
registrants" approach is terms of the definition of the reservation holder.
While that's fine by me (obviously), I suspect there are also situations where
group reservations that follow the "registrants only" approach might be
useful.
The
bottom line from my point of view is this: Your proposal is feasible and we
can probably make it work. But I wonder if there's an easier way to achieve
the same goal that is more compatible with existing practice and requires less
of a change in functionality on the Target side.
What if we didn't add any new
reservations types, but instead added some new functionality to the
registration process? What I'm thinking of a new Register feature that causes
the Target to kill all existing registrations, create the registrations
identified in the transport IDs in the Register command, and not accept any
future registrations. That way, we don't need any changes to Reserve, and an
Initiator with existing functionality would just not be able to register and
therefore would not be confused.
Does that make sense to you? Is there a chance this is an
easier approach? If so, I'll write up a detailed proposal that's the
equivalent of 08-025r0 and we can compare and contrast at the next
CAP.
Again, thanks
for getting this started, I think it's a worthwhile endeavor and I'll be glad
to put some cycles towards defining this sort of functionality for
SPC-4.
Regards,
Roger
From: owner-t10@t10.org
[mailto:owner-t10@t10.org] On Behalf Of Kevin D Butt
Sent:
Monday, December 10, 2007 4:18 PM
To: t10@t10.org
Subject:
Persistent Reservation Proposal - Group Reservations
I have posted two documents related to an
additional Persistent Reservation Type. The first document is a
presentation on where persistent reservations are today and where they fall
short in the scenarios covered by the proposal. It also covers the
intent of the proposal and what will be proposed. The second is the
actual proposal
Your PDF file
will be posted
at:
http://www.t10.org/ftp/t10/document.08/08-024r0.pdf
http://www.t10.org/ftp/t10/document.08/08-025r0.pdf
Normally,
the posting/archiving process takes about 30 minutes.
Kevin D. Butt
SCSI & Fibre
Channel Architect, Tape Firmware, IBM
MS 6TYA, 9000 S. Rita Rd., Tucson, AZ
85744
Tel: 520-799-2869 / 520-799-5280
Fax: 520-799-2723
(T/L:321)
Email address:
kdbutt@us.ibm.com
http://www-03.ibm.com/servers/storage/