Comments in line
Ray,
Thanks for joining in. Let me summarize what I think has been said
by all parties who have joined in these discussions.
1) (From Ray) Some applications will have trouble
providing a list of Transport IDs.
2) (From Fred) There is a desire to allow members of a cluster that were
not active at the creation of the reservation to join in.
3) (From Kevin) Who can join or participate in a group
reservation is required to be controlled such that only those initiators that
are part of the cluster (i.e. group) can join.
4) PREEMPT must be allowed (i.e., what we do cannot lock
out PREEMPT or make it not work correctly)
5) Both Registrants Only and All Registrants types of functionality need
to be provided for.
6) There should
be an option for all target ports
If
we can't require a list of Transport ID's then it seems that the suggested
"shared secret" (not cryptographic, just some unique value that is protected
through obfuscation) is probably the best way to do this. This would be
something akin to requiring the same reservation key value. However, using
the reservation key does not seem to be plausible because of how it is being
used today. What we need would be some different value that would not be
reportable via a Persistent Reserve In. This would keep third-party
initiators from joining the reservation. If this new Group Reservation
Identifier (GRID) were added, that would take care of #1, #2, and #3
above.
For #5 above, it seems that
we could provide that by adding a bit in the reserve command that indicates "All
Participants are reservation holders". If set to one, then it acts like an
All Registrants. If set to zero then it acts like Registrants Only.
This includes in the unregistering.
For #6 we use the ALL_TGT_PORTS bit the same as other reservations
today.
Issues still to be
resolved:
a) Some systems won't want to
require all initiators to send a Persistent Reserve Out command. Possible
solution is to allow reserving multiple initiators if a Transport ID list is
sent. Additional initiators could join later if they have the GRID.
However this would make it more complicated and if it is not needed I
would rather not add this option.
In a
practical sense, I cannot see how this could be avoided (all initiators sending
PRO) -- since PR requires trust and good behavior, each initiator must
make no assumption about what the protection level is currently set at --
so it must verify the settings as the correct and expected. If the
settings aren't as expected, it must bail out, or go into error recovery to
attempt to avoid messing up some other application (a fist fight on the
SAN for device control does nobody any good). I see no reason to
provide for this (I do know that the current command allows a registration
for multiple ports, but I cannot imagine using it in the real
world).
b) If the first
I_T nexus sets the "All Participants are reservation holders" to zero when it
creates the reservation and then a subsequent I_T nexus sets it to one, what is
the behavior? Change the type? Reject? Also, what is reported
in the Report Full Status if All Participants are reservation holders is set to
zero?
I
don't think this is a problem -- once a reservation is established it cannot be
changed without a preempt, clear, or removal of the old. If this
isn't either true, I would want the attempt to change it to get rejected.
I would expect a change to require a preempt type operation.
c) If we go to this method of using the GRID to determine who can join,
then the Reservation Key may or may not be different.
c-1) if the Reservation Key is different, then a PREEMPT
of a Reservation Key will do what?
c-2)
if the Reservation Key is the same, then a PREEMPT will act the same as an AR or
RO reservation today.
c-3) Do we require
the Reservation Key to be the same?
Preempt is of a reservation, not a key. The key's currently are not
compared, and have no valid use (by the device) except that each initiator
has registered one, and only one at a time. We don't want to change
this behavior -- a key is random number assigned for some external purpose
that the device records and reports. (My application requires this to
operate properly)
d)
Does this approach still have the issues that Roger was concerned about (e.g.,
the corner cases)?
I hope the use of a GRID would not
introduce any new issues to SPR -- it only prevents a registrant from
becoming a reservation participant without some external knowledge. It
doesn't prevent a registrant from preempt, clear, or any other error
recovery operations (and MUST not).
Thanks,
Kevin
D. Butt
SCSI & Fibre Channel Architect, Tape Firmware
MS 6TYA, 9000 S.
Rita Rd., Tucson, AZ 85744
Tel: 520-799-2869 / 520-799-5280
Fax:
520-799-2723 (T/L:321)
Email address:
kdbutt@us.ibm.com
http://www-03.ibm.com/servers/storage/
| "Raymond Gilson"
<raymond_gilson@symantec.com>
12/21/2007 12:45 PM
|
|
To
| "Roger Cummings"
<roger_cummings@symantec.com>, "Knight, Frederick"
<Frederick.Knight@netapp.com>, Kevin D
Butt/Tucson/IBM@IBMUS
|
|
cc
| <t10@t10.org>, Christine R
Knibloe/Tucson/IBM@IBMUS
|
|
Subject
| RE: Persistent Reservation Proposal
- Group Reservations |
|
Several years ago I was trying to figure out a way to
introduce a "JOIN" function to the SPR. The initiator would register, but
that would not grant it access to a reservation of the "joined only" type.
To join it, the initiator would have to send a join SPR command -- we
could add a "shared secret" field to the join, so that only those initiators
that knew the secret could join.
I think we will have a great deal of trouble with a
"white list" approach -- as an application, I have no idea what my port ID is
(or anything else for that matter).
Would something like this make
sense?
Thanks,
Ray Gilson
From: owner-t10@t10.org
[mailto:owner-t10@t10.org] On Behalf Of Roger Cummings
Sent:
Tuesday, December 18, 2007 10:24 AM
To: Knight, Frederick; Kevin D
Butt
Cc: t10@t10.org; Christine R Knibloe
Subject: RE:
Persistent Reservation Proposal - Group Reservations
Fred,
The way you clean up
from a disaster is to Preempt, that's what it's there for. Most of the
applications that I know that will actually issue a Preempt make it a very
special function that doesn't happen in the normal flow, and one app at least
DOES require manual intervention of an operator before kicking off the
preempt.
Yes, today, a Preempt has to be issued through a registered I_T nexus,
but a registration with the SPEC_I_T bit doesn't have to come from an already
registered initiator - see Table 33 in SPC-4, and I don't believe Kevin changed
that in his proposal.
For the future, however we define a "group" for the purposes
of new reservation types, we will have to make sure that an Initiator outside of
the "group" can issue a Preempt to handle the disaster recovery case.
Regards,
Roger
From: Knight, Frederick
[mailto:Frederick.Knight@netapp.com]
Sent: Tuesday, December 18, 2007
10:56 AM
To: Roger Cummings; Kevin D Butt
Cc: t10@t10.org;
Christine R Knibloe
Subject: RE: Persistent Reservation Proposal -
Group Reservations
My question has had to do with differentiating the disaster clean
up
case from the non-cooperating
host case.
How do I clean up from a disaster? If all my "reserved"
initiators
melt down, and there
aren't any of them left anymore (because of
a site disaster, or whatever), how does some other node come
along
and clean up so it can gain
access?
Would it require manual intervention? Or, is there a way in the
protocol
that I can register and
preempt the group reservation (does the use
of the SPEC_I_PT bit allow this as you have suggested Roger).
I
thought the SPEC_I_PT had
to come from an already registered
initiator (which in a disaster, none exist anymore).
Fred
Knight
From: Roger Cummings
[mailto:roger_cummings@symantec.com]
Sent: Tuesday, December 18, 2007
10:03 AM
To: Kevin D Butt; Knight, Frederick
Cc:
t10@t10.org; Christine R Knibloe
Subject: RE: Persistent Reservation
Proposal - Group Reservations
Kevin,
I'm sorry, I don't think it's as cut and dried as
you make out. This gets into some of the corner cases that I listed in my first
response.
The point to be made in response to Fred's case is that a third-party can
create registrations for a downed initiator (via the SPEC_I_PT) bit, so that
when it comes up again it will be able to participate in the reservation without
having to register itself.
Also, you say that "We have made provisions for
adding members once the reservation exists, but only one of the reservation
holders can add another entity." Two things in response to that:
1) I
didn't see any specific provision for adding members in your proposal, so I
presume you'd just issue another RESERVE with the same type and the whole list
of transport IDs to be included again, and thus the Target would have a whole
lot of work to do again to set up another reservation.
2) I that really
what you want, that an member of the existing group can reissue the RESERVE with
a whole bunch of different TransportIDs, perhaps excluding some that were
previously there?
Regards,
Roger
From: owner-t10@t10.org
[mailto:owner-t10@t10.org] On Behalf Of Kevin D Butt
Sent:
Monday, December 17, 2007 3:54 PM
To: Knight, Frederick
Cc:
t10@t10.org; Christine R Knibloe
Subject: RE: Persistent Reservation
Proposal - Group Reservations
Fred,
This is being proposed for SPC.
There are multiple types of
reservations. In an environment where one node of a cluster must join
later, one of the other types can be used. Either that or have an existing
node in your cluster add the new node. The whole intent of this Group
reservation is to lock out everybody that is not explicitly specified during the
reserve. We have made provisions for adding members once the reservation
exists, but only one of the reservation holders can add another entity.
The new entity cannot add itself. This is the whole point of
reservations (i.e., lock out others from doing stuff while I think I have
exclusive rights).
To put it in other word's, to allow somebody to join the reservation
of their own accord without permission is EXACTLY what I am trying to protect
against.
Thanks,
Kevin D. Butt
SCSI & Fibre Channel Architect, Tape
Firmware
MS 6TYA, 9000 S. Rita Rd., Tucson, AZ 85744
Tel: 520-799-2869 /
520-799-5280
Fax: 520-799-2723 (T/L:321)
Email address:
kdbutt@us.ibm.com
http://www-03.ibm.com/servers/storage/
| "Knight, Frederick"
<Frederick.Knight@netapp.com>
12/17/2007 01:38 PM
|
|
To
| Kevin D
Butt/Tucson/IBM@IBMUS
|
|
cc
|
|
|
Subject
| RE: Persistent Reservation Proposal
- Group Reservations |
|
Sorry, you can't
require everyone to register before the reserve.
That's like saying my
whole cluster can't boot because 1 node is down. You need
to have a way for a "down"
initiator to join the fun after the fact.
I helped write a host
cluster product that used a shared tape (failover model). The
backup application would
write to the tape. If a system failure ever happened, the
backup application would
failover to a different host. It would skip backwards on
the tape for a few
records, recognize where it left off, and then resume operation.
BUT, for some protection,
we used reservations to make sure only 1 initiator at a
time could access the tape.
The interesting point however, is that we were in the
process of upgrading from old
SCSI-2 RESERVE to using PR. Because, we also
have multiple HBAs in the host,
and we wanted to be able to use more than 1 of
those HBAs (so we needed multiple reservations
- aka PR). Having this idea
(group reservations) would have been a real nice
addition.
As for the RA/AR differences. It seemed to be timing.
Registrants Only was fairly
early on (as I remember), and so implemented by several
O/S vendors. Later on,
some issues were found (which got complicated spec-ees
added to address), but also,
the All Registrants was added (which didn't have those
issues). But, since there were
implementations, it couldn't be removed like the other old
PR types that no one ever
used. Anyway, I agree, they offer basically the same
capabilities, but RO is already
out there, and AR is probably what new implementers are
using (it's easier to understand
and implement from the host side). Most of the
differences are already documented,
so there wouldn't be that much extra for you to write to
have both types (which I think
would be better than bit somewhere - do it the same way
all the others are done). But,
you could also just do the AR version, and let someone
else add the RO version if they
want it.
Are you proposing this for tape only? or SPC in
general? I assume SPC in general.
Fred
Knight
From: Kevin D Butt [mailto:kdbutt@us.ibm.com]
Sent: Monday, December 17, 2007 9:51 AM
To: Roger
Cummings
Cc: t10@t10.org
Subject: RE: Persistent Reservation
Proposal - Group Reservations
Roger,
Thank you for your feedback. I am certainly
willing to entertain other methods for accomplishing the end goal in an
easier fashion. I am not sure I understand how your proposed method makes
it more backward compatible. In my proposal PRin would show a different
type of reservation and hence the application clients would not try to join the
reservation because they don't know about the type. In your proposal,
application clients would not be allowed to register. This is a deviation
from what they can always do today - unless there is a resource issue.
This seems more disruptive to me. I would assume that there would be
a new additional sense code added for UNABLE TO REGISTER BECAUSE A GROUP
RESERVATION IS IN PLACE (or analogous). This would be a new thing for
failure to register and there would be pain at the register point. Perhaps
that is better than at the reserve point - but I would think that it would be
better handled as a reservation conflict since that is what it is instead of
something the application client does not understand.
As for "all registrants" type vs.
"registrants only" I didn't see where the difference would be interesting, but I
am not opposed to providing a way to switch between which of these two types is
done. Whether it is additional types or some bit during registration
etc.
As for some
of the corner cases mentioned below, if each I_T nexus that is supposed to be
part of the group reservation is required to be registered before the
reservation is made, and if the reservation is released when the last group
reservation participant is unregistered, then I think we don't have an
issue.
I would
prefer that we work together to shape a mutually beneficial proposal as opposed
to have "competing" proposals. I am willing to modify my proposal where it
can be made easier and such. I am not sold that my proposed method is the
only way or even the best - it's just the way I thought of doing it. I
admit that I have always been very confused about the usefulness of RA and AR
types. They make absolutely no sense in the tape world.
Thanks,
Kevin D. Butt
SCSI & Fibre
Channel Architect, Tape Firmware
MS 6TYA, 9000 S. Rita Rd., Tucson, AZ
85744
Tel: 520-799-2869 / 520-799-5280
Fax: 520-799-2723
(T/L:321)
Email address:
kdbutt@us.ibm.com
http://www-03.ibm.com/servers/storage/
"Roger Cummings"
<roger_cummings@symantec.com> Sent by:
owner-t10@t10.org
12/14/2007 12:10 PM
|
|
To
| Kevin D
Butt/Tucson/IBM@IBMUS
|
|
cc
| <t10@t10.org>
|
|
Subject
| RE: Persistent Reservation Proposal
- Group Reservations |
|
Kevin,
First of all, let me say that I
completely support what you're trying to do here. I think that providing a
method in persistent reservations (PRs) to support shared access between ONLY a
specifically-designated set of systems is a worthy goal, and something we should
do in SPC-4.
Adding a
set of Transport IDs to Reserve as per your document 08-024 & 08-025 is
certainly possible, but it's a massive change to the way that PRs work today,
and it throws up a bunch of nasty corner cases and backwards compatibility
issues.
The massive
change comes from the fact that now the Target will have to remember which
registrations are in the Reservation, and which are not. It will probably have
to preserve all of the transport information for the life of the
reservation.
The
corner cases are things like, what happens if there's no longer a registration
that corresponds to the transport ID in the Reserve? Does the Reserve succeed?
What happens if a registration comes in later, after the reservation has been
established - does that device it get access?
Backwards compatibility issues may arise
like this: An existing device registers, and finds it has no access, so it does
a PR In and finds out that a reservation is in place, retries its access and
still it has no access. What does it do next, preempt the reservation because it
assumes the Target is broken?
Reserve also has to be an "atomic" command, and I've always thought
that was why it's functionality is as compact as it is today. Most of the
complex operations related to addresses and keys are done at registration time,
and those operations don't have to be atomic.
One more thing: you chose for your new
"group" reservations to follow the "all registrants" approach is terms of the
definition of the reservation holder. While that's fine by me (obviously), I
suspect there are also situations where group reservations that follow the
"registrants only" approach might be useful.
The bottom line from my point of view
is this: Your proposal is feasible and we can probably make it work. But I
wonder if there's an easier way to achieve the same goal that is more compatible
with existing practice and requires less of a change in functionality on the
Target side.
What
if we didn't add any new reservations types, but instead added some new
functionality to the registration process? What I'm thinking of a new Register
feature that causes the Target to kill all existing registrations, create the
registrations identified in the transport IDs in the Register command, and not
accept any future registrations. That way, we don't need any changes to Reserve,
and an Initiator with existing functionality would just not be able to register
and therefore would not be confused.
Does that make sense to you? Is there a chance this is an
easier approach? If so, I'll write up a detailed proposal that's the equivalent
of 08-025r0 and we can compare and contrast at the next CAP.
Again, thanks for getting this started, I
think it's a worthwhile endeavor and I'll be glad to put some cycles towards
defining this sort of functionality for SPC-4.
Regards,
Roger
From: owner-t10@t10.org
[mailto:owner-t10@t10.org] On Behalf Of Kevin D Butt
Sent:
Monday, December 10, 2007 4:18 PM
To: t10@t10.org
Subject:
Persistent Reservation Proposal - Group Reservations
I have posted two documents related to an additional
Persistent Reservation Type. The first document is a presentation on where
persistent reservations are today and where they fall short in the scenarios
covered by the proposal. It also covers the intent of the proposal and
what will be proposed. The second is the actual proposal
Your PDF file will be posted
at:
http://www.t10.org/ftp/t10/document.08/08-024r0.pdf
http://www.t10.org/ftp/t10/document.08/08-025r0.pdf
Normally,
the posting/archiving process takes about 30 minutes.
Kevin D. Butt
SCSI & Fibre
Channel Architect, Tape Firmware, IBM
MS 6TYA, 9000 S. Rita Rd., Tucson, AZ
85744
Tel: 520-799-2869 / 520-799-5280
Fax: 520-799-2723
(T/L:321)
Email address:
kdbutt@us.ibm.com
http://www-03.ibm.com/servers/storage/