SBC question about block coherency

Sheffield, Bob Bob.Sheffield at lsi.com
Fri Nov 18 09:36:05 PST 2011


Formatted message: <a href="http://www.t10.org/cgi-bin/ac.pl?t=r&f=r1111181_f.htm">HTML-formatted message</a>

I've wondered about the interpretation of the following paragraph in
subclause 7.5.7 Control mode page from SPC-4r32:
A value of zero in the QUEUE ALGORITHM MODIFIER field specifies that the
device server shall order the processing
sequence of commands having the SIMPLE task attribute such that data
integrity is maintained for that I_T nexus
(i.e., if the transmission of new SCSI transport protocol requests is halted
at any time, the final value of all data
observable on the medium shall be the same as if all the commands had been
processed with the ORDERED task
attribute).
So, the way I read this, for the default value of zero in the QAM field, the
device server is expected to operate on the extent of LBAs specified in the
extent as an atomic operation, and make certain overlapping writes are not
reordered. What I wonder is if this requirement perhaps extends beyond just
the range of LBAs specified in a single command. For example, if two writes
are issued to adjacent extents. Say the writes are reordered so that the
second write is processed first, and then "transmission of new SCSI transport
protocol requests is halted". Then a read is issued to read both extents. If
the 2nd processed write (which was issued first) never began processing
(e.g., intervening hard reset, or something like that), then the read would
read valid data for the 2nd issued write extent, but stale data for the 1st
issued write extent. Makes me wonder if the requirement might be that both
writes have to be processed in order even though their extents do not
overlap.
In practice, I think device server implementations only enforce ordering of
writes when the extents overlap. So it would be possible for the device
server to reorder writes accessing adjacent extents so that if I/Os are
aborted after one write is processed, but before the other is processed, the
two extents, taken as a set, would not be coherent.
With QAM=1, the device server is informing the application client explicitly
that it reorders reads and writes indiscriminantly, so all bets are off if
the application client doesn't avoid issuing I/O patterns that could result
in incoherent information on media.
Bob
________________________________
From: owner-t10 at t10.org [mailto:owner-t10 at t10.org] On Behalf Of Yoder, Alan
Sent: Tuesday, November 15, 2011 7:43 PM
To: Knight, Frederick; Kevin D Butt; T10 Reflector
Subject: Re: SBC question about block coherency
> my statement was about  "any language" in the SCSI family of standards.
Guess my programming languages slip is showing.  ;-)
Alan
On 11/15/11 5:52 PM, "Knight, Frederick" <Frederick.Knight at netapp.com> wrote:
Since the original question was about raw SCSI commands and various task
attributes associated with them, my statement was about  "any language" in
the SCSI family of standards.
You are absolutely, this kind of synchronization is the responsibility of the
APPLICATION LAYER (where application is defined as anything above the H/W and
its driver - such as file systems, volume managers, etc, etc). At the lowest
level, the protocol only does what you tell it to do (and the protocol is
perfectly happy to do overlapping reads/writes at the same time if you tell
it to).
		Fred
From: Yoder, Alan
Sent: Tuesday, November 15, 2011 1:31 PM
To: Knight, Frederick; Kevin D Butt; T10 Reflector
Subject: Re: SBC question about block coherency
> I'm not aware of any language that requires serial command completion
Maybe not languages proper, but OS primitives  and FS interfaces do.  In
CIFS, for example, all I/O is serial unless you specify "overlapped" I/O. And
you'd use locking to control whether reads simultaneous with your writes were
allowed (default is not).  POSIX has similar controls.
My take is that any issue here is handled by higher level constructs; no real
OS is going to allow overlapping reads and writes without the explicit
permission of the programmer.
Alan
On 11/14/11 10:37 PM, "Knight, Frederick" <Frederick.Knight at netapp.com>
wrote:
I would also add, that your statement about "the LBA" is also correct.	If in
your example, these READ and WRITE commands are operating on an overlapping
multi-LBA range, I'm not aware of any language that requires serial command
completion.
Obvious, as Gerry stated, restricted reordering (see SAM) does add
requirements on WRITES, but multiple normal SIMPLE tagged commands in the
unrestricted environment may all overlap.  If you extend your READ/WRITE
question so that each request involves multiple LBAs, you can see that "the
LBA" is now significant, in that the READ (of multiple LBAs) is no longer
guaranteed to read all old data or all new data - each LBA would
independently be old or new, such that your data-in buffer may contain some
random mix of blocks containing old data, and blocks containing new data.
I would call it Option 3:
There is an inherent race condition so that as	the WRITE and the READ
commands are processed, they do atomic operations individually on each LBA
referenced by that command, at the same time that the other command also does
atomic operations individually on each LBA referenced by that command.	No
matter which command begins processing first, the data that is read may
contain some whole blocks that contain old data and some whole blocks that
contain new data.
		Fred
From: Gerry Houlder [mailto:gerry.houlder at seagate.com]
Sent: Monday, November 14, 2011 4:51 PM
To: T10 Reflector
Subject: Re: SBC question about block coherency
Option 1 is the expected behavior. From the point of view of the SCSI target
device, commands are not received simultaneously; there will always be a
mechanism that will cause one of the commands to be logged as being received
ahead of the other.
However there can be other influences on that ordering.
(a) Due to system configuration details (e.g., expanders), command might be
sent from a host in one order but received by the target in a different
order.
(b) If all commands are SIMPLE task attribute, they may be reordered once
they are received into the target's queue. I would not expect commands that
access the same LBA to be reordered with respect to each other, but this is
allowed if "unrestricted reordering" is set in the Control mode page. If
"restricted reordering" is set in the mode page instead, then reordering of
writes with respect to reads of the same LBA is not allowed.
(c) A command with HEAD OF QUEUE attribute is usually placed ahead of other
commands that are in the queue. There are exceptions for commands that have
already started processing so there is some gray area here.
On Mon, Nov 14, 2011 at 11:32 AM, Kevin D Butt <kdbutt at us.ibm.com> wrote:
I have a question that seems obvious, but since I come from the tape world
and have not spent much time in the disk world I could be assuming behaviors
that I shouldn't.
In the tape world, if a logical block is overwritten, then the read of that
logical block cannot occur until after the write has been completed. The
tape's command queue is essentially an ordered queue. In the disk world, as I
understand it, many commands can be processed in parallel, that is, the queue
is not necessarily an ordered queue. So, I have a question about what that
parallel'ness means when the command arrives with a task type of HEAD OF
QUEUE.
In the example of a WRITE being issued at the same time as a READ being
issued for the same LBA from multiple application clients (i.e., in different
task sets), what should be expected?
Option 1:
There is an inherent race condition so either the WRITE or the READ command
will arrive first and be processed as an atomic operation on the LBA, then
the other command will be processed on the LBA. If the READ arrives first the
data that is read is the old data. If the write arrives first the new data is
written and then the read reads the new data.
Option 2:
There is an inherent race condition so either the WRITE or the READ command
arrives first, but both commands are processed simultaneously and the READ
command returns data that contains partially old data and partially new data.
Thanks,
Kevin D. Butt
SCSI & Fibre Channel Architect, Tape Firmware
Data Protection & Retention
MS 6TYA, 9000 S. Rita Rd., Tucson, AZ 85744
Tel: 520-799-5280
Fax: 520-799-2723 (T/L:321)
Email address: kdbutt at us.ibm.com
http://www-03.ibm.com/servers/storage/



More information about the T10 mailing list