SBC question about block coherency
Joseph Glider
gliderj at almaden.ibm.com
Tue Nov 15 07:04:32 PST 2011
Formatted message: <a href="http://www.t10.org/cgi-bin/ac.pl?t=r&f=r1111152_f.htm">HTML-formatted message</a>
You need to remember that SCSI commands are often processed by
sophisticated storage controllers that are often active-active multi-path
systems where the same volume is exported as multiple LUs (and might be
stitched together by multi-path drivers), and where the same volume can be
accessed by applications running on multiple hosts (e.g. Oracle). A
well-behaved application will not submit I/O in such a way as to create
race conditions, but the SCSI specs do not provide very many guarantees of
atomicity. Fred is perfectly correct that aside from special commands that
guarantee atomicity, LBA-by-LBA atomicity is the only one that host
applications can in general rely on.
Jody Glider
STSM, Storage Systems and Servers Research, IBM
Tieline: 457-1853
External: 408-927-1853
From:
Hugh Curley <HCurley at indra.com>
To:
"Knight, Frederick" <Frederick.Knight at netapp.com>
Cc:
Kevin D Butt/Tucson/IBM at IBMUS, T10 Reflector <t10 at t10.org>
Date:
11/15/2011 05:00 AM
Subject:
Re: SBC question about block coherency
Sent by:
owner-t10 at t10.org
Hello Fred and Gerry,
Perhaps how you are defining the operations is allowed in the
specification - and has been that way for over 20 years - but the actual
operation of HDD would be far less onerous. Each read or write command
must do a sequential operation starting at LBA x and continue for y
blocks. I believe that disk drives do each read and each write in an
atomic operation. I believe there is only one command microprocessor in
each drive and I cannot envision it executing: Read (cmd 1) at LBA 100,
Write (cmd 2) at LBA 101, Read (cmd 1) at LBA 101. I also cannot envision
it executing: Read (cmd 1) at LBA 100, Write (cmd 2) at LBA 10,000, Read
(cmd 1) at LBA 101.
Two exceptions that may not be done as atomic operations: 1) part of the
data is in cache and part on the media, 2) if a head or cylinder seek is
required.
Am I missing something?
Thank you,
Hugh
On 11/14/2011 11:37 PM, Knight, Frederick wrote:
I would also add, that your statement about âthe LBAâ is also correct.
If
in your example, these READ and WRITE commands are operating on an
overlapping multi-LBA range, Iâm not aware of any language that requires
serial command completion.
Obvious, as Gerry stated, restricted reordering (see SAM) does add
requirements on WRITES, but multiple normal SIMPLE tagged commands in the
unrestricted environment may all overlap. If you extend your READ/WRITE
question so that each request involves multiple LBAs, you can see that
âthe LBAâ is now significant, in that the READ (of multiple LBAs) is no
longer guaranteed to read all old data or all new data â each LBA would
independently be old or new, such that your data-in buffer may contain
some random mix of blocks containing old data, and blocks containing new
data.
I would call it Option 3:
There is an inherent race condition so that as the WRITE and the READ
commands are processed, they do atomic operations individually on each LBA
referenced by that command, at the same time that the other command also
does atomic operations individually on each LBA referenced by that
command. No matter which command begins processing first, the data that
is read may contain some whole blocks that contain old data and some whole
blocks that contain new data.
Fred
From: Gerry Houlder [mailto:gerry.houlder at seagate.com]
Sent: Monday, November 14, 2011 4:51 PM
To: T10 Reflector
Subject: Re: SBC question about block coherency
Option 1 is the expected behavior. From the point of view of the SCSI
target device, commands are not received simultaneously; there will always
be a mechanism that will cause one of the commands to be logged as being
received ahead of the other.
However there can be other influences on that ordering.
(a) Due to system configuration details (e.g., expanders), command might
be sent from a host in one order but received by the target in a different
order.
(b) If all commands are SIMPLE task attribute, they may be reordered once
they are received into the target's queue. I would not expect commands
that access the same LBA to be reordered with respect to each other, but
this is allowed if "unrestricted reordering" is set in the Control mode
page. If "restricted reordering" is set in the mode page instead, then
reordering of writes with respect to reads of the same LBA is not allowed.
(c) A command with HEAD OF QUEUE attribute is usually placed ahead of
other commands that are in the queue. There are exceptions for commands
that have already started processing so there is some gray area here.
On Mon, Nov 14, 2011 at 11:32 AM, Kevin D Butt <kdbutt at us.ibm.com> wrote:
I have a question that seems obvious, but since I come from the tape world
and have not spent much time in the disk world I could be assuming
behaviors that I shouldn't.
In the tape world, if a logical block is overwritten, then the read of
that logical block cannot occur until after the write has been completed.
The tape's command queue is essentially an ordered queue. In the disk
world, as I understand it, many commands can be processed in parallel,
that is, the queue is not necessarily an ordered queue. So, I have a
question about what that parallel'ness means when the command arrives with
a task type of HEAD OF QUEUE.
In the example of a WRITE being issued at the same time as a READ being
issued for the same LBA from multiple application clients (i.e., in
different task sets), what should be expected?
Option 1:
There is an inherent race condition so either the WRITE or the READ
command will arrive first and be processed as an atomic operation on the
LBA, then the other command will be processed on the LBA. If the READ
arrives first the data that is read is the old data. If the write arrives
first the new data is written and then the read reads the new data.
Option 2:
There is an inherent race condition so either the WRITE or the READ
command arrives first, but both commands are processed simultaneously and
the READ command returns data that contains partially old data and
partially new data.
Thanks,
Kevin D. Butt
SCSI & Fibre Channel Architect, Tape Firmware
Data Protection & Retention
MS 6TYA, 9000 S. Rita Rd., Tucson, AZ 85744
Tel: 520-799-5280
Fax: 520-799-2723 (T/L:321)
Email address: kdbutt at us.ibm.com
http://www-03.ibm.com/servers/storage/
More information about the T10
mailing list