SBC question about block coherency

Joseph Glider gliderj at almaden.ibm.com
Tue Nov 15 07:04:32 PST 2011


Formatted message: <a href="http://www.t10.org/cgi-bin/ac.pl?t=r&f=r1111152_f.htm">HTML-formatted message</a>

You need to remember that SCSI commands are often processed by 
sophisticated storage controllers that are often active-active multi-path 
systems where the same volume is exported as multiple LUs (and might be 
stitched together by multi-path drivers), and where the same volume can be 
accessed by applications running on multiple hosts (e.g. Oracle). A 
well-behaved application will not submit I/O in such a way as to create 
race conditions, but the SCSI specs do not provide very many guarantees of 
atomicity. Fred is perfectly correct that aside from special commands that 
guarantee atomicity, LBA-by-LBA atomicity is the only one that host 
applications can in general rely on.
Jody Glider
STSM, Storage Systems and Servers Research, IBM
Tieline: 457-1853
External: 408-927-1853
From:
Hugh Curley <HCurley at indra.com>
To:
"Knight, Frederick" <Frederick.Knight at netapp.com>
Cc:
Kevin D Butt/Tucson/IBM at IBMUS, T10 Reflector <t10 at t10.org>
Date:
11/15/2011 05:00 AM
Subject:
Re: SBC question about block coherency
Sent by:
owner-t10 at t10.org
Hello Fred and Gerry,
Perhaps how you are defining the operations is allowed in the 
specification - and has been that way for over 20 years - but the actual 
operation of HDD would be far less onerous. Each read or write command 
must do a sequential operation starting at LBA x and continue for y 
blocks. I believe that	disk drives  do each read and each write in an 
atomic operation. I believe there is only one command microprocessor in 
each drive and I cannot envision it executing: Read (cmd 1) at LBA 100, 
Write (cmd 2) at LBA 101, Read (cmd 1) at LBA 101. I also cannot envision 
it executing: Read (cmd 1) at LBA 100, Write (cmd 2) at LBA 10,000, Read 
(cmd 1) at LBA 101.
Two exceptions that may not be done as atomic operations: 1) part of the 
data is in cache and part on the media, 2) if a head or cylinder seek is 
required.
Am I missing something?
Thank you,
Hugh
On 11/14/2011 11:37 PM, Knight, Frederick wrote: 
I would also add, that your statement about “the LBA” is also correct. 
If 
in your example, these READ and WRITE commands are operating on an 
overlapping multi-LBA range, I’m not aware of any language that requires 
serial command completion.
Obvious, as Gerry stated, restricted reordering (see SAM) does add 
requirements on WRITES, but multiple normal SIMPLE tagged commands in the 
unrestricted environment may all overlap.  If you extend your READ/WRITE 
question so that each request involves multiple LBAs, you can see that 
“the LBA” is now significant, in that the READ (of multiple LBAs) is no 
longer guaranteed to read all old data or all new data – each LBA would 
independently be old or new, such that your data-in buffer may contain 
some random mix of blocks containing old data, and blocks containing new 
data.
I would call it Option 3:
There is an inherent race condition so that as	the WRITE and the READ 
commands are processed, they do atomic operations individually on each LBA 
referenced by that command, at the same time that the other command also 
does atomic operations individually on each LBA referenced by that 
command.  No matter which command begins processing first, the data that 
is read may contain some whole blocks that contain old data and some whole 
blocks that contain new data.
		Fred
From: Gerry Houlder [mailto:gerry.houlder at seagate.com] 
Sent: Monday, November 14, 2011 4:51 PM
To: T10 Reflector
Subject: Re: SBC question about block coherency
Option 1 is the expected behavior. From the point of view of the SCSI 
target device, commands are not received simultaneously; there will always 
be a mechanism that will cause one of the commands to be logged as being 
received ahead of the other.
However there can be other influences on that ordering.
(a) Due to system configuration details (e.g., expanders), command might 
be sent from a host in one order but received by the target in a different 
order.
(b) If all commands are SIMPLE task attribute, they may be reordered once 
they are received into the target's queue. I would not expect commands 
that access the same LBA to be reordered with respect to each other, but 
this is allowed if "unrestricted reordering" is set in the Control mode 
page. If "restricted reordering" is set in the mode page instead, then 
reordering of writes with respect to reads of the same LBA is not allowed.
(c) A command with HEAD OF QUEUE attribute is usually placed ahead of 
other commands that are in the queue. There are exceptions for commands 
that have already started processing so there is some gray area here.
On Mon, Nov 14, 2011 at 11:32 AM, Kevin D Butt <kdbutt at us.ibm.com> wrote:
I have a question that seems obvious, but since I come from the tape world 
and have not spent much time in the disk world I could be assuming 
behaviors that I shouldn't. 
In the tape world, if a logical block is overwritten, then the read of 
that logical block cannot occur until after the write has been completed. 
The tape's command queue is essentially an ordered queue. In the disk 
world, as I understand it, many commands can be processed in parallel, 
that is, the queue is not necessarily an ordered queue. So, I have a 
question about what that parallel'ness means when the command arrives with 
a task type of HEAD OF QUEUE. 
In the example of a WRITE being issued at the same time as a READ being 
issued for the same LBA from multiple application clients (i.e., in 
different task sets), what should be expected?
Option 1:
There is an inherent race condition so either the WRITE or the READ 
command will arrive first and be processed as an atomic operation on the 
LBA, then the other command will be processed on the LBA. If the READ 
arrives first the data that is read is the old data. If the write arrives 
first the new data is written and then the read reads the new data.
Option 2:
There is an inherent race condition so either the WRITE or the READ 
command arrives first, but both commands are processed simultaneously and 
the READ command returns data that contains partially old data and 
partially new data.
Thanks,
Kevin D. Butt
SCSI & Fibre Channel Architect, Tape Firmware
Data Protection & Retention
MS 6TYA, 9000 S. Rita Rd., Tucson, AZ 85744
Tel: 520-799-5280
Fax: 520-799-2723 (T/L:321)
Email address: kdbutt at us.ibm.com
http://www-03.ibm.com/servers/storage/ 



More information about the T10 mailing list