Disks and Reservations

Knight, Frederick Frederick.Knight at netapp.com
Fri Jun 28 18:38:30 PDT 2013


* From the T10 Reflector (t10 at t10.org), posted by:
* "Knight, Frederick" <Frederick.Knight at netapp.com>
*
So here is one example of how it might work (in an over simplified
description):
4 nodes with 4 initiators each all do a PR registration (and 1 or more does a
reserve to activate it all).  Node A registers all initiators with the key
"1"; node B registers all initiators with the key "2"; node C registers all
initiators with the key "3", and node X registers all initiators with key "4"
(there are now 16 registrations).  Then, they then all go at it - all doing
READs and WRITEs.  All four nodes are also talking to each other via a
network of some kind.
Now, something happens to one of the nodes (say node "X").  Nobody (none of
A, B, or C) knows the real state of node X because they can't talk to it
anymore (but nodes A, B, and C are all still communicating with each other). 
Node X might still be trying to issue IO, but the other three nodes want to
get real work done (which will have predictable results).  Since they don't
know what it going on with node X, one (or more) of the three surviving nodes
issues a PREEMPT to remove the registration for node X (PREEMPT key "4").
That aborts all the IOs from all the initiators issued from node X, and
prevents any new IOs (any new ones will now get RESERVATION CONFLICT).	Nodes
A, B, and C are now free to proceed with the work knowing that node X can't
interfere with what they are trying to do.  Nodes A, B, and C are free to
resume the work that node X was attempting to accomplish with the knowledge
that node X will not come along and stomp on it with some old out of date
data.
At this point, node X starts getting RESERVATION CONFLICT and so it tries to
use the network to validate the operational state of the cluster.  Since it
can't communicate with the other nodes (the majority), it stops (it may even
commit suicide).  Node X is out of quorum and so must stop operation.  Only
nodes A, B, and C are in quorum and therefore proceed with their operation.
	Fred
-----Original Message-----
From: owner-t10 at t10.org [mailto:owner-t10 at t10.org] On Behalf Of Ralph Weber
Sent: Wednesday, June 26, 2013 8:22 AM
Cc: T10 Reflector
Subject: Re: Disks and Reservations
* From the T10 Reflector (t10 at t10.org), posted by:
* Ralph Weber <roweber at ieee.org>
*
Ah! Some beef to chew on. ...
David Black wrote:
> There are cluster failure cases (e.g., failure of a quorum leader) in 
> which preemption may be used (e.g., by the new leader). Quorum 
> maintenance/management details vary by cluster implementation.
A key point is that the 'new leader' must be sure that everything about the
old 'quorum leader' is expunged from the disk. A careful look at the PREEMPT
rules will reveal mountains about what this means.
  * Not only is it critical that the old quorum leader not be allowed to
    access the disk, but ...
  * Everything the disk happens to be doing for the old quorum leader
    must be sent to the ashcan.
David's observation that persistent reservations were built to support quorum
clusters is spot on.
All the best,
.Ralph
On 6/25/2013 11:45 PM, Black, David wrote:
>
> Kevin,
>
> > I am struggling to understand Reservations in the disk world. Me 
> > being a tape person,
>
> > my mind set is wrapped around protecting the logical position of the 
> > tape. This does
>
> > not seem to be a concern for disks.
>
> That's correct.
>
> > What are the reasons reservations are used in the disk world? I 
> > assume it is to protect
>
> > a Logical Unit for one applications sole use (at least writing). Are 
> > there other conceptual
> reasons?
>
> Actually, sole usage is atypical for persistent reservations in the 
> disk world. A significant user of persistent reservations in the disk 
> world is high availability server clustering software where shared 
> reservations are used for cluster quorum maintenance/management on 
> shared storage.
>
> > What are the reasons that a PREEMPT would be used? In the tape 
> > world, PREEMPT is typically
>
> > only used to perform a fail-over from a lost I_T nexus to an 
> > alternate path. Is this the
>
> > same in the disk world, or are there other reasons?
>
> No, there are other reasons. In the disk world, concurrent use of 
> multiple paths is common, as typical use of the SBC command set has no 
> dependence on completion ordering of current commands - if an 
> initiator wants disk command A to complete before disk command B, the 
> initiator generally has to wait for the status from command A before 
> issuing command B.
>
> There are cluster failure cases (e.g., failure of a quorum leader) in 
> which preemption may be used (e.g., by the new leader). Quorum 
> maintenance/management details vary by cluster implementation.
>
> > Also, sometimes in the tape world, Unit Attentions are ignored. Is 
> > this the same
>
> > in the disk world, specifically related to reservations?
>
> Ignoring unit attentions is generally considered poor form in the disk 
> world, but it's not unheard-of behavior.
>
> Thanks,
> --David
>
> *From:*owner-t10 at t10.org [mailto:owner-t10 at t10.org] *On Behalf Of 
> *Kevin D Butt
> *Sent:* Tuesday, June 25, 2013 4:30 PM
> *To:* T10 Reflector
> *Subject:* Disks and Reservations
>
> I am struggling to understand Reservations in the disk world. Me being 
> a tape person, my mind set is wrapped around protecting the logical 
> position of the tape. This is does not seem to be a concern for disks.
> What are the reasons reservations are used in the disk world? I assume 
> it is to protect a Logical Unit for one applications sole use (at 
> least writing). Are there other conceptual reasons?
> What are the reasons that a PREEMPT would be used? In the tape world, 
> PREEMPT is typically only used to perform a fail-over from a lost I_T 
> nexus to an alternate path. Is this the same in the disk world, or are 
> there other reasons?
> Also, sometimes in the tape world, Unit Attentions are ignored. Is 
> this the same in the disk world, specifically related to reservations?
>
> Thanks,
>
> Kevin D. Butt
> SCSI & Fibre Channel Architect, Tape Firmware Data Protection & 
> Retention MS 6TYA, 9000 S. Rita Rd., Tucson, AZ 85744
> Tel: 520-799-5280
> Fax: 520-799-2723 (T/L:321)
> Email address: kdbutt at us.ibm.com
> http://www-03.ibm.com/servers/storage/
>
*
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at t10.org
*
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at t10.org



More information about the T10 mailing list