SBC question about block coherency

Sheffield, Bob Bob.Sheffield at
Mon Nov 21 07:56:26 PST 2011

Formatted message: <a href="">HTML-formatted message</a>

Yes, I wasn't very specific about what I meant by "default".
Question, what value would be assumed by an application client for the QAM
field for a device server that does not support the Control mode page?
Regarding ordering, I think you'll find the wide-port rules for SAS are very
carefully crafted to make certain in-order delivery is not compromised.
Granted, certain implementation choices in the initiator or target device can
compromise the order, but that's not an inherent property of the transport.
If a transport cannot guarantee in-order delivery, then what would be the
point of the paragraph I provided below?
Right - there is no ordering with respect to multiple initiator ports, unless
the initiator ports coordinate amongst themselves.
Bob Sheffield
From: Knight, Frederick [mailto:Frederick.Knight at]
Sent: Monday, November 21, 2011 8:46 AM
To: Sheffield, Bob; Yoder, Alan; Kevin D Butt; T10 Reflector
Subject: RE: SBC question about block coherency
First, the use of the word "default" in this context is hard to understand. 
SPC/SBC never specify a "default".  There are saved mode pages, there are
current mode pages, there are default mode pages, and there are changeable
mode pages.
A device manufacturer may initialize the values in the default mode page any
way the want.  That manufacturer can set the "default" to zero, or they can
set that "default" to one.  So default really means whatever the manufacturer
As for ordering, you have made an invalid assumption.  You very specifically
reference "issued first".  Order of issue from the application client is no
guarantee of order of reception by the device server.  A host that issues
COMMAND 1 followed by COMMAND 2, could have those commands received by the
device server as COMMAND 2 followed by COMMAND 1.  So even a device with
QAM=0 would process COMMAND 2 first, then COMMAND 1.  The same is true if
COMMAND 1 and COMMAND 2 were both ORDERED commands.  The device server does
ordering based on the order received, NOT based on the order issued (partly
because there is no way for the device server to determine the issue order).
And add multiple I_T_Nexuses (such as MPIO configurations) and it gets even
more fun.
		Fred Knight
From: Sheffield, Bob [mailto:Bob.Sheffield at]
Sent: Friday, November 18, 2011 12:36 PM
To: Yoder, Alan; Knight, Frederick; Kevin D Butt; T10 Reflector
Subject: RE: SBC question about block coherency
I've wondered about the interpretation of the following paragraph in
subclause 7.5.7 Control mode page from SPC-4r32:
A value of zero in the QUEUE ALGORITHM MODIFIER field specifies that the
device server shall order the processing
sequence of commands having the SIMPLE task attribute such that data
integrity is maintained for that I_T nexus
(i.e., if the transmission of new SCSI transport protocol requests is halted
at any time, the final value of all data
observable on the medium shall be the same as if all the commands had been
processed with the ORDERED task
So, the way I read this, for the default value of zero in the QAM field, the
device server is expected to operate on the extent of LBAs specified in the
extent as an atomic operation, and make certain overlapping writes are not
reordered. What I wonder is if this requirement perhaps extends beyond just
the range of LBAs specified in a single command. For example, if two writes
are issued to adjacent extents. Say the writes are reordered so that the
second write is processed first, and then "transmission of new SCSI transport
protocol requests is halted". Then a read is issued to read both extents. If
the 2nd processed write (which was issued first) never began processing
(e.g., intervening hard reset, or something like that), then the read would
read valid data for the 2nd issued write extent, but stale data for the 1st
issued write extent. Makes me wonder if the requirement might be that both
writes have to be processed in order even though their extents do not
In practice, I think device server implementations only enforce ordering of
writes when the extents overlap. So it would be possible for the device
server to reorder writes accessing adjacent extents so that if I/Os are
aborted after one write is processed, but before the other is processed, the
two extents, taken as a set, would not be coherent.
With QAM=1, the device server is informing the application client explicitly
that it reorders reads and writes indiscriminantly, so all bets are off if
the application client doesn't avoid issuing I/O patterns that could result
in incoherent information on media.
From: owner-t10 at [mailto:owner-t10 at] On Behalf Of Yoder, Alan
Sent: Tuesday, November 15, 2011 7:43 PM
To: Knight, Frederick; Kevin D Butt; T10 Reflector
Subject: Re: SBC question about block coherency
> my statement was about  "any language" in the SCSI family of standards.
Guess my programming languages slip is showing.  ;-)
On 11/15/11 5:52 PM, "Knight, Frederick" <Frederick.Knight at> wrote:
Since the original question was about raw SCSI commands and various task
attributes associated with them, my statement was about  "any language" in
the SCSI family of standards.
You are absolutely, this kind of synchronization is the responsibility of the
APPLICATION LAYER (where application is defined as anything above the H/W and
its driver - such as file systems, volume managers, etc, etc). At the lowest
level, the protocol only does what you tell it to do (and the protocol is
perfectly happy to do overlapping reads/writes at the same time if you tell
it to).
From: Yoder, Alan
Sent: Tuesday, November 15, 2011 1:31 PM
To: Knight, Frederick; Kevin D Butt; T10 Reflector
Subject: Re: SBC question about block coherency
> I'm not aware of any language that requires serial command completion
Maybe not languages proper, but OS primitives  and FS interfaces do.  In
CIFS, for example, all I/O is serial unless you specify "overlapped" I/O. And
you'd use locking to control whether reads simultaneous with your writes were
allowed (default is not).  POSIX has similar controls.
My take is that any issue here is handled by higher level constructs; no real
OS is going to allow overlapping reads and writes without the explicit
permission of the programmer.
On 11/14/11 10:37 PM, "Knight, Frederick" <Frederick.Knight at>
I would also add, that your statement about "the LBA" is also correct.	If in
your example, these READ and WRITE commands are operating on an overlapping
multi-LBA range, I'm not aware of any language that requires serial command
Obvious, as Gerry stated, restricted reordering (see SAM) does add
requirements on WRITES, but multiple normal SIMPLE tagged commands in the
unrestricted environment may all overlap.  If you extend your READ/WRITE
question so that each request involves multiple LBAs, you can see that "the
LBA" is now significant, in that the READ (of multiple LBAs) is no longer
guaranteed to read all old data or all new data - each LBA would
independently be old or new, such that your data-in buffer may contain some
random mix of blocks containing old data, and blocks containing new data.
I would call it Option 3:
There is an inherent race condition so that as	the WRITE and the READ
commands are processed, they do atomic operations individually on each LBA
referenced by that command, at the same time that the other command also does
atomic operations individually on each LBA referenced by that command.	No
matter which command begins processing first, the data that is read may
contain some whole blocks that contain old data and some whole blocks that
contain new data.
From: Gerry Houlder [mailto:gerry.houlder at]
Sent: Monday, November 14, 2011 4:51 PM
To: T10 Reflector
Subject: Re: SBC question about block coherency
Option 1 is the expected behavior. From the point of view of the SCSI target
device, commands are not received simultaneously; there will always be a
mechanism that will cause one of the commands to be logged as being received
ahead of the other.
However there can be other influences on that ordering.
(a) Due to system configuration details (e.g., expanders), command might be
sent from a host in one order but received by the target in a different
(b) If all commands are SIMPLE task attribute, they may be reordered once
they are received into the target's queue. I would not expect commands that
access the same LBA to be reordered with respect to each other, but this is
allowed if "unrestricted reordering" is set in the Control mode page. If
"restricted reordering" is set in the mode page instead, then reordering of
writes with respect to reads of the same LBA is not allowed.
(c) A command with HEAD OF QUEUE attribute is usually placed ahead of other
commands that are in the queue. There are exceptions for commands that have
already started processing so there is some gray area here.
On Mon, Nov 14, 2011 at 11:32 AM, Kevin D Butt <kdbutt at> wrote:
I have a question that seems obvious, but since I come from the tape world
and have not spent much time in the disk world I could be assuming behaviors
that I shouldn't.
In the tape world, if a logical block is overwritten, then the read of that
logical block cannot occur until after the write has been completed. The
tape's command queue is essentially an ordered queue. In the disk world, as I
understand it, many commands can be processed in parallel, that is, the queue
is not necessarily an ordered queue. So, I have a question about what that
parallel'ness means when the command arrives with a task type of HEAD OF
In the example of a WRITE being issued at the same time as a READ being
issued for the same LBA from multiple application clients (i.e., in different
task sets), what should be expected?
Option 1:
There is an inherent race condition so either the WRITE or the READ command
will arrive first and be processed as an atomic operation on the LBA, then
the other command will be processed on the LBA. If the READ arrives first the
data that is read is the old data. If the write arrives first the new data is
written and then the read reads the new data.
Option 2:
There is an inherent race condition so either the WRITE or the READ command
arrives first, but both commands are processed simultaneously and the READ
command returns data that contains partially old data and partially new data.
Kevin D. Butt
SCSI & Fibre Channel Architect, Tape Firmware
Data Protection & Retention
MS 6TYA, 9000 S. Rita Rd., Tucson, AZ 85744
Tel: 520-799-5280
Fax: 520-799-2723 (T/L:321)
Email address: kdbutt at

More information about the T10 mailing list