Would moving access restrictions from being based on
the registration to
being based on a specific reservation type help for
this?
Today, a bunch of initiators register - and that
basically has no impact on
anyones access. When 1 initiator does the
reservation, then that action
impacts all previous registrations (allowing continued
access), and all other
(non-registered) initiators (denying them
access). Anyone who registers
after that immediately joins the existing
reservation. That is what this
group reservation is trying to deal with (getting free
access under the
reservation just by doing a register - which is easy to
do).
Could we create reservations types
for:
write exclusive - reserved
only
exclusive access - reserved
only
This would create reservation types that require
reservation actions to allow
access. A simple registration all by itself would
still have no impact on access
until an initiator also performed the reservation
step. Once any initiator
uses this new type reservation, then a registered node
would loose access
(a reservation conflict status) until that initiator
also performed a reserve
function (with type reserved only). This also
means there would be multiple
reservation holders (since every initiator
does a reserve); so no need to deal
with the one reservation holder case (#5
below).
Once this reservation type (reserved only) is in place,
an initiator that is already
registered but not reserved, could not do I/O or change
the reservation type (reserves
with other reservation types would fail). Only
a reserved initiator could change
the reservation type (with a new
reserve).
This would cover all cases below (1-6) except for
#5. As for #4 (preempt),
the process could be a little more protected.
With all the existing reservation
types, the initiator just registers and preempts.
With these new types, the
initiator would have to register, reserve, and then
preempt. Would that meet
the #4 requirement, or do you feel preempt can't have
any changes at all?
My opinion is that a new reservation type could when it
is used, create new
requirements. On the other hand, if you want to
have preempt without reserve,
then we could exempt that 1 function from the reserve
requirement.
The question would be a group ID. Is one needed?
or would the simple change
to require a matching reservation (of type reserved
only) be enough? Using a
simple shared value wouldn't work for this
idea because of the problem it
would create for preempt (register,
reserve with shared value, then preempt);
if you don't know the shared value, you
can't preempt; so that would make
using a shared value impracticle; unless we exempt preempt from
the
reserve requirement, and just allow register; preempt
(without a reserve);
then, this approach could
work.
More comments below on the existing
proposal.
Fred
Knight
Ray,
Please see this
font.
Kevin D. Butt
SCSI
& Fibre Channel Architect, Tape Firmware
MS 6TYA, 9000 S. Rita Rd.,
Tucson, AZ 85744
Tel: 520-799-2869 / 520-799-5280
Fax: 520-799-2723
(T/L:321)
Email address:
kdbutt@us.ibm.com
http://www-03.ibm.com/servers/storage/
| "Raymond Gilson"
<raymond_gilson@symantec.com>
12/22/2007 07:06 AM
|
|
To
| Kevin D
Butt/Tucson/IBM@IBMUS
|
|
cc
| Christine R
Knibloe/Tucson/IBM@IBMUS, "Knight, Frederick"
<Frederick.Knight@netapp.com>, "Roger Cummings"
<roger_cummings@symantec.com>, <t10@t10.org>
|
|
Subject
| RE: Persistent Reservation Proposal
- Group Reservations |
|
Comments in line
From: Kevin D Butt
[mailto:kdbutt@us.ibm.com]
Sent: Friday, December 21, 2007 5:33
PM
To: Raymond Gilson
Cc: Christine R Knibloe; Knight,
Frederick; Roger Cummings; t10@t10.org
Subject: RE: Persistent
Reservation Proposal - Group Reservations
Ray,
Thanks for joining in. Let me
summarize what I think has been said by all parties who have joined in these
discussions.
1)
(From Ray) Some applications will have trouble providing a list of Transport
IDs.
2) (From
Fred) There is a desire to allow members of a cluster that were not active at
the creation of the reservation to join in.
3) (From Kevin) Who can join or participate in a
group reservation is required to be controlled such that only those initiators
that are part of the cluster (i.e. group) can join.
4) PREEMPT must be allowed (i.e.,
what we do cannot lock out PREEMPT or make it not work correctly)
5) Both Registrants Only and
All Registrants types of functionality need to be provided for.
6) There should be an option
for all target ports
If we can't require a list of Transport ID's then it seems that the
suggested "shared secret" (not cryptographic, just some unique value that is
protected through obfuscation) is probably the best way to do this. This
would be something akin to requiring the same reservation key value.
However, using the reservation key does not seem to be plausible because
of how it is being used today. What we need would be some different value
that would not be reportable via a Persistent Reserve In. This would keep
third-party initiators from joining the reservation. If this new Group
Reservation Identifier (GRID) were added, that would take care of #1, #2, and #3
above.
For #5
above, it seems that we could provide that by adding a bit in the reserve
command that indicates "All Participants are reservation holders". If set
to one, then it acts like an All Registrants. If set to zero then it acts
like Registrants Only. This includes in the unregistering.
I'm not sure a bit is the right place for this. Right now,
it's specified in the reservation type (registrants only, or all
registrants). Creating a bit creates a place for conflicting information
to be supplied. Are you suggesting this bit would apply to only some
reservation types, and be unused for other reservation types? What would
it mean if you did a registrants only reservation type, but set the
all registrants bit?
Another method would be to use all the existing reservation
types (for #5 above), but add a GRID bit to specify that the reservation applies
to only those that supply a matching GRID (all others get reservation conflict
until they supply a matching GRID). Then it could in fact apply to all
reservation types.
For #6 we
use the ALL_TGT_PORTS bit the same as other reservations today.
Issues still to be
resolved:
a) Some
systems won't want to require all initiators to send a Persistent Reserve Out
command. Possible solution is to allow reserving multiple initiators if a
Transport ID list is sent. Additional initiators could join later if they
have the GRID. However this would make it more complicated and if it is
not needed I would rather not add this option.
In a practical sense, I cannot see how
this could be avoided (all initiators sending PRO) -- since PR requires trust
and good behavior, each initiator must make no assumption about what the
protection level is currently set at -- so it must verify the settings as the
correct and expected. If the settings aren't as expected, it must bail
out, or go into error recovery to attempt to avoid messing up some other
application (a fist fight on the SAN for device control does nobody any good).
I see no reason to provide for this (I do know that the current command
allows a registration for multiple ports, but I cannot imagine using it in the
real world).
I'm not sure I understand the issue
here. How can a system that doesn't want to send PR commands take
advantage of the features offered by that command? Are you thinking of
multi-path systems (where a single host system has multiple initiators with
access to the same target)? How does this new proposal make this different
than the situation today (where they need to use the transport ID list and the
spec_i_pt bit), or send PR-OUT from every initiator? I guess I''m mostly
agreeing that good behavior is already
required.
<<kdbutt: I am certainly willing to agree.
All could still be registered by using the all_i_pt bit. However, I
suspect there will be those that will find this unacceptable. Anybody who
needs a way to add all initiators who are currently registered to the group
reservation, please speak up (and comment on a method to accomplish
this).>>
I would suggest we do
not want a way to add all currently registered initiators to the group.
This would tend to have the potential to enlarge the group beyond what is
intended. I'd prefer a method that requires explicit
action.
b) If the first I_T nexus sets the "All Participants
are reservation holders" to zero when it creates the reservation and then a
subsequent I_T nexus sets it to one, what is the behavior? Change the
type? Reject? Also, what is reported in the Report Full Status if
All Participants are reservation holders is set to zero?
I don't think this is a problem -- once
a reservation is established it cannot be changed without a preempt, clear, or
removal of the old. If this isn't either true, I would want the attempt to
change it to get rejected. I would expect a change to require a preempt
type operation. <<kdbutt: I think
the correct response for a new participant that attempts to change the type is
to reject a command that attempts to change the type.
I don't think you can require
a preempt/clear in order to change the type. The whole point of PR is
that a reservation is present at all times; you can change the type, you
can move the owner of the reservation (such as preempt on a
registrants only type), but you never want to loose the protection provided by
the PR (see note 10 in SPC4 - section 5.6 -
clearing).
For what to return in Report Full Status if
"All Participants are reservation holders" is set to zero, I am concerned
about confusion. In reality, only the first is a reservation holder and
therefore only the first should set the reservation_holder bit to one.
However, there would now be two groups that cannot be distinguished.
There first is not the reservation_holder but part of the reservation and
the second is not the reservation_holder and not part of the reservation.
I think we should probably add a "group reservation participant" bit to
distinguish the two.>> .
c) If we go to this method
of using the GRID to determine who can join, then the Reservation Key may or may
not be different.
c-1) if the Reservation Key is different, then a PREEMPT of a
Reservation Key will do what?
c-2) if the Reservation Key is the same, then a PREEMPT will act the
same as an AR or RO reservation today.
c-3) Do we require the Reservation Key to be the
same?
Preempt is of a
reservation, not a key. The key's currently are not compared, and have no
valid use (by the device) except that each initiator has registered one, and
only one at a time. We don't want to change this behavior -- a key is
random number assigned for some external purpose that the device records and
reports. (My application requires this to operate properly)
<<kdbutt: Look at clause 5.6.10.4 of SPC-4r11. This
looks to me like the Reservation Key is used to decide between unregistering I_T
nexuses with the sent reservation key or if the reservation key is that of the
reservation holder, then removing the reservation and registrations of all that
have that reservation key. My intent is not to change the current
behavior.>>
Agreed Kevin. A Preempt
should impact the registration/reservation of all those initiators with
a key that matches the one that is being preempted - the same as current
behavior.
d) Does this approach still have the issues that
Roger was concerned about (e.g., the corner cases)?
I hope the use of a GRID would
not introduce any new issues to SPR -- it only prevents a registrant from
becoming a reservation participant without some external knowledge. It
doesn't prevent a registrant from preempt, clear, or any other error recovery
operations (and MUST not).
I think this is one of the
questions. Error recovery is often one of the cases where you
end up with fist-fights out in the SAN over who owns the device.
Hosts do exactly what you suggested above (host 1 checks with PR-IN, doesn't
like what it sees, and preempts and "fixes" it; then, host 2 does exactly the
same - and the fight is on). It's perfectly
valid to want to leave this working as is. I understand that desire.
I just would like to discuss the possibility of improving the situation.
If we can't or have other requirements not to change it, that's
fine.
Thanks,
Kevin D. Butt
SCSI & Fibre Channel
Architect, Tape Firmware
MS 6TYA, 9000 S. Rita Rd., Tucson, AZ 85744
Tel:
520-799-2869 / 520-799-5280
Fax: 520-799-2723 (T/L:321)
Email address:
kdbutt@us.ibm.com
http://www-03.ibm.com/servers/storage/
| "Raymond Gilson"
<raymond_gilson@symantec.com>
12/21/2007 12:45 PM
|
|
To
| "Roger Cummings"
<roger_cummings@symantec.com>, "Knight, Frederick"
<Frederick.Knight@netapp.com>, Kevin D
Butt/Tucson/IBM@IBMUS
|
|
cc
| <t10@t10.org>, Christine R
Knibloe/Tucson/IBM@IBMUS
|
|
Subject
| RE: Persistent Reservation Proposal
- Group Reservations |
|
Several years ago I
was trying to figure out a way to introduce a "JOIN" function to the SPR.
The initiator would register, but that would not grant it access to a
reservation of the "joined only" type. To join it, the initiator would
have to send a join SPR command -- we could add a "shared secret" field to the
join, so that only those initiators that knew the secret could join.
I think we will
have a great deal of trouble with a "white list" approach -- as an application,
I have no idea what my port ID is (or anything else for that
matter).
Would something like this make sense?
Thanks,
Ray Gilson
From: owner-t10@t10.org
[mailto:owner-t10@t10.org] On Behalf Of Roger Cummings
Sent:
Tuesday, December 18, 2007 10:24 AM
To: Knight, Frederick; Kevin D
Butt
Cc: t10@t10.org; Christine R Knibloe
Subject: RE:
Persistent Reservation Proposal - Group Reservations
Fred,
The way you
clean up from a disaster is to Preempt, that's what it's there for. Most of the
applications that I know that will actually issue a Preempt make it a very
special function that doesn't happen in the normal flow, and one app at least
DOES require manual intervention of an operator before kicking off the
preempt.
Yes, today, a Preempt has to be issued through a registered I_T
nexus, but a registration with the SPEC_I_T bit doesn't have to come from an
already registered initiator - see Table 33 in SPC-4, and I don't believe Kevin
changed that in his proposal.
For the future, however we define a "group" for
the purposes of new reservation types, we will have to make sure that an
Initiator outside of the "group" can issue a Preempt to handle the disaster
recovery case.
Regards,
Roger
From: Knight, Frederick
[mailto:Frederick.Knight@netapp.com]
Sent: Tuesday, December 18, 2007
10:56 AM
To: Roger Cummings; Kevin D Butt
Cc: t10@t10.org;
Christine R Knibloe
Subject: RE: Persistent Reservation Proposal -
Group Reservations
My question has had to do with differentiating the disaster clean
up
case from
the non-cooperating host case.
How do I clean up from a disaster? If all
my "reserved" initiators
melt down, and there aren't any of them left anymore (because
of
a site
disaster, or whatever), how does some other node come along
and clean up so it can gain
access?
Would it require manual intervention? Or, is there a way in the
protocol
that
I can register and preempt the group reservation (does the use
of the SPEC_I_PT bit allow
this as you have suggested Roger). I
thought the SPEC_I_PT had to come from an
already registered
initiator (which in a disaster, none exist anymore).
Fred Knight
From: Roger Cummings
[mailto:roger_cummings@symantec.com]
Sent: Tuesday, December 18, 2007
10:03 AM
To: Kevin D Butt; Knight, Frederick
Cc:
t10@t10.org; Christine R Knibloe
Subject: RE: Persistent Reservation
Proposal - Group Reservations
Kevin,
I'm sorry, I don't think it's as cut and dried
as you make out. This gets into some of the corner cases that I listed in my
first response.
The point to be made in response to Fred's case is that a third-party
can create registrations for a downed initiator (via the SPEC_I_PT) bit, so that
when it comes up again it will be able to participate in the reservation without
having to register itself.
Also, you say that "We have made provisions for adding
members once the reservation exists, but only one of the reservation holders can
add another entity." Two things in response to that:
1) I didn't see any
specific provision for adding members in your proposal, so I presume you'd just
issue another RESERVE with the same type and the whole list of transport IDs to
be included again, and thus the Target would have a whole lot of work to do
again to set up another reservation.
2) I that really what you want, that an member
of the existing group can reissue the RESERVE with a whole bunch of different
TransportIDs, perhaps excluding some that were previously there?
Regards,
Roger
From: owner-t10@t10.org
[mailto:owner-t10@t10.org] On Behalf Of Kevin D Butt
Sent:
Monday, December 17, 2007 3:54 PM
To: Knight, Frederick
Cc:
t10@t10.org; Christine R Knibloe
Subject: RE: Persistent Reservation
Proposal - Group Reservations
Fred,
This is being proposed for SPC.
There are multiple types of
reservations. In an environment where one node of a cluster must join
later, one of the other types can be used. Either that or have an existing
node in your cluster add the new node. The whole intent of this Group
reservation is to lock out everybody that is not explicitly specified during the
reserve. We have made provisions for adding members once the reservation
exists, but only one of the reservation holders can add another entity.
The new entity cannot add itself. This is the whole point of
reservations (i.e., lock out others from doing stuff while I think I have
exclusive rights).
To put it in other word's, to allow somebody to join the
reservation of their own accord without permission is EXACTLY what I am trying
to protect against.
Thanks,
Kevin D. Butt
SCSI & Fibre Channel Architect, Tape
Firmware
MS 6TYA, 9000 S. Rita Rd., Tucson, AZ 85744
Tel: 520-799-2869 /
520-799-5280
Fax: 520-799-2723 (T/L:321)
Email address:
kdbutt@us.ibm.com
http://www-03.ibm.com/servers/storage/
| "Knight, Frederick"
<Frederick.Knight@netapp.com>
12/17/2007 01:38 PM
|
|
To
| Kevin D
Butt/Tucson/IBM@IBMUS
|
|
cc
|
|
|
Subject
| RE: Persistent Reservation Proposal
- Group Reservations |
|
Sorry, you
can't require everyone to register before the reserve.
That's like saying my whole
cluster can't boot because 1 node is down. You need
to have a way for a "down"
initiator to join the fun after the fact.
I helped write a host cluster product that used
a shared tape (failover model). The
backup application would write to the tape.
If a system failure ever happened, the
backup application would failover to a
different host. It would skip backwards on
the tape for a few records,
recognize where it left off, and then resume operation.
BUT, for some protection, we used
reservations to make sure only 1 initiator at a
time could access the tape. The
interesting point however, is that we were in the
process of upgrading from old
SCSI-2 RESERVE to using PR. Because, we also
have multiple HBAs in the host,
and we wanted to be able to use more than 1 of
those HBAs (so we needed multiple reservations
- aka PR). Having this idea
(group reservations) would have been a real nice
addition.
As for the RA/AR differences. It seemed to be timing.
Registrants Only was fairly
early on (as I remember), and so implemented by several
O/S vendors. Later on,
some issues were found (which got complicated spec-ees
added to address), but also,
the All Registrants was added (which didn't have those
issues). But, since there were
implementations, it couldn't be removed like the other old
PR types that no one ever
used. Anyway, I agree, they offer basically the same
capabilities, but RO is already
out there, and AR is probably what new implementers are
using (it's easier to understand
and implement from the host side). Most of the
differences are already documented,
so there wouldn't be that much extra for you to write to
have both types (which I think
would be better than bit somewhere - do it the same way
all the others are done). But,
you could also just do the AR version, and let someone
else add the RO version if they
want it.
Are you proposing this for tape only? or SPC in general?
I assume SPC in general.
Fred Knight
From: Kevin D Butt [mailto:kdbutt@us.ibm.com]
Sent: Monday, December 17, 2007 9:51 AM
To: Roger
Cummings
Cc: t10@t10.org
Subject: RE: Persistent Reservation
Proposal - Group Reservations
Roger,
Thank you for your feedback. I am certainly willing to
entertain other methods for accomplishing the end goal in an easier
fashion. I am not sure I understand how your proposed method makes it more
backward compatible. In my proposal PRin would show a different type of
reservation and hence the application clients would not try to join the
reservation because they don't know about the type. In your proposal,
application clients would not be allowed to register. This is a deviation
from what they can always do today - unless there is a resource issue.
This seems more disruptive to me. I would assume that there would be
a new additional sense code added for UNABLE TO REGISTER BECAUSE A GROUP
RESERVATION IS IN PLACE (or analogous). This would be a new thing for
failure to register and there would be pain at the register point. Perhaps
that is better than at the reserve point - but I would think that it would be
better handled as a reservation conflict since that is what it is instead of
something the application client does not understand.
As for "all registrants" type vs.
"registrants only" I didn't see where the difference would be interesting, but I
am not opposed to providing a way to switch between which of these two types is
done. Whether it is additional types or some bit during registration
etc.
As for some
of the corner cases mentioned below, if each I_T nexus that is supposed to be
part of the group reservation is required to be registered before the
reservation is made, and if the reservation is released when the last group
reservation participant is unregistered, then I think we don't have an
issue.
I would
prefer that we work together to shape a mutually beneficial proposal as opposed
to have "competing" proposals. I am willing to modify my proposal where it
can be made easier and such. I am not sold that my proposed method is the
only way or even the best - it's just the way I thought of doing it. I
admit that I have always been very confused about the usefulness of RA and AR
types. They make absolutely no sense in the tape world.
Thanks,
Kevin D. Butt
SCSI & Fibre
Channel Architect, Tape Firmware
MS 6TYA, 9000 S. Rita Rd., Tucson, AZ
85744
Tel: 520-799-2869 / 520-799-5280
Fax: 520-799-2723
(T/L:321)
Email address:
kdbutt@us.ibm.com
http://www-03.ibm.com/servers/storage/
"Roger Cummings"
<roger_cummings@symantec.com> Sent by:
owner-t10@t10.org
12/14/2007 12:10 PM
|
|
To
| Kevin D
Butt/Tucson/IBM@IBMUS
|
|
cc
| <t10@t10.org>
|
|
Subject
| RE: Persistent Reservation Proposal
- Group Reservations |
|
Kevin,
First of all, let me say that I
completely support what you're trying to do here. I think that providing a
method in persistent reservations (PRs) to support shared access between ONLY a
specifically-designated set of systems is a worthy goal, and something we should
do in SPC-4.
Adding a
set of Transport IDs to Reserve as per your document 08-024 & 08-025 is
certainly possible, but it's a massive change to the way that PRs work today,
and it throws up a bunch of nasty corner cases and backwards compatibility
issues.
The massive
change comes from the fact that now the Target will have to remember which
registrations are in the Reservation, and which are not. It will probably have
to preserve all of the transport information for the life of the
reservation.
The
corner cases are things like, what happens if there's no longer a registration
that corresponds to the transport ID in the Reserve? Does the Reserve succeed?
What happens if a registration comes in later, after the reservation has been
established - does that device it get access?
Backwards compatibility issues may arise like this: An
existing device registers, and finds it has no access, so it does a PR In and
finds out that a reservation is in place, retries its access and still it has no
access. What does it do next, preempt the reservation because it assumes the
Target is broken?
Reserve also has to be an "atomic" command, and I've always
thought that was why it's functionality is as compact as it is today. Most of
the complex operations related to addresses and keys are done at registration
time, and those operations don't have to be atomic.
One more thing: you chose for your new
"group" reservations to follow the "all registrants" approach is terms of the
definition of the reservation holder. While that's fine by me (obviously), I
suspect there are also situations where group reservations that follow the
"registrants only" approach might be useful.
The bottom line from my point of view
is this: Your proposal is feasible and we can probably make it work. But I
wonder if there's an easier way to achieve the same goal that is more compatible
with existing practice and requires less of a change in functionality on the
Target side.
What
if we didn't add any new reservations types, but instead added some new
functionality to the registration process? What I'm thinking of a new Register
feature that causes the Target to kill all existing registrations, create the
registrations identified in the transport IDs in the Register command, and not
accept any future registrations. That way, we don't need any changes to Reserve,
and an Initiator with existing functionality would just not be able to register
and therefore would not be confused.
Does that make sense to you? Is there a chance this is an easier
approach? If so, I'll write up a detailed proposal that's the equivalent of
08-025r0 and we can compare and contrast at the next CAP.
Again, thanks for getting this started, I
think it's a worthwhile endeavor and I'll be glad to put some cycles towards
defining this sort of functionality for SPC-4.
Regards,
Roger
From: owner-t10@t10.org
[mailto:owner-t10@t10.org] On Behalf Of Kevin D Butt
Sent:
Monday, December 10, 2007 4:18 PM
To: t10@t10.org
Subject:
Persistent Reservation Proposal - Group Reservations
I have posted two documents related to an additional
Persistent Reservation Type. The first document is a presentation on where
persistent reservations are today and where they fall short in the scenarios
covered by the proposal. It also covers the intent of the proposal and
what will be proposed. The second is the actual proposal
Your PDF file will be posted
at:
http://www.t10.org/ftp/t10/document.08/08-024r0.pdf
http://www.t10.org/ftp/t10/document.08/08-025r0.pdf
Normally,
the posting/archiving process takes about 30 minutes.
Kevin D. Butt
SCSI & Fibre
Channel Architect, Tape Firmware, IBM
MS 6TYA, 9000 S. Rita Rd., Tucson, AZ
85744
Tel: 520-799-2869 / 520-799-5280
Fax: 520-799-2723
(T/L:321)
Email address:
kdbutt@us.ibm.com
http://www-03.ibm.com/servers/storage/