SBC question about block coherency

Knight, Frederick Frederick.Knight at netapp.com
Mon Nov 21 15:20:58 PST 2011


Formatted message: <a href="http://www.t10.org/cgi-bin/ac.pl?t=r&f=r1111212_f.htm">HTML-formatted message</a>

What the application assumes is up to the application.
But I would suggest that an application that has an unstated requirement
is broken.  The application should check the bit, set the bit, or it
should state a requirement (such as - this application must only be used
on devices with QAM=xxx).  Otherwise, it just gets what it gets.
Regarding ordering, SAS is not the only SCSI transport (FC, SRP, iSCSI,
Parallel).  Applications tend to be transport agnostic, therefore,
applications that care about ordering must take action to guarantee it.
Here are some extracted statements from SAM sub-clause 4.4.3
Request/Response ordering:
<...>
The order in which task management requests are processed is not
specified by the SCSI architecture model.
The SCSI architecture model does not require in-order delivery of such
requests or processing by the task
manager in the order received. To guarantee the processing order of task
management requests referencing a
specific logical unit, an application client should not have more than
one such request pending to that logical unit.
To simplify the description of behavior, the SCSI architecture model
assumes in-order delivery of requests or
responses to be a property of a service delivery subsystem. This
assumption does not constitute a requirement.
The SCSI architecture model makes no assumption about and places no
requirement on the ordering of requests
or responses for different I_T nexuses.
In order delivery is not a requirement for a transport layer.  It is
fine that SAS has chosen to have in order delivery, but that is not
because SCSI requires it for all transports.
		Fred
From: Sheffield, Bob [mailto:Bob.Sheffield at lsi.com] 
Sent: Monday, November 21, 2011 10:56 AM
To: Knight, Frederick; Yoder, Alan; Kevin D Butt; T10 Reflector
Subject: RE: SBC question about block coherency
Fred,
Yes, I wasn't very specific about what I meant by "default". 
Question, what value would be assumed by an application client for the
QAM field for a device server that does not support the Control mode
page?
Regarding ordering, I think you'll find the wide-port rules for SAS are
very carefully crafted to make certain in-order delivery is not
compromised. Granted, certain implementation choices in the initiator or
target device can compromise the order, but that's not an inherent
property of the transport.
If a transport cannot guarantee in-order delivery, then what would be
the point of the paragraph I provided below?
Right - there is no ordering with respect to multiple initiator ports,
unless the initiator ports coordinate amongst themselves.
Regards,
Bob Sheffield
________________________________
From: Knight, Frederick [mailto:Frederick.Knight at netapp.com] 
Sent: Monday, November 21, 2011 8:46 AM
To: Sheffield, Bob; Yoder, Alan; Kevin D Butt; T10 Reflector
Subject: RE: SBC question about block coherency
First, the use of the word "default" in this context is hard to
understand.  SPC/SBC never specify a "default".  There are saved mode
pages, there are current mode pages, there are default mode pages, and
there are changeable mode pages.
A device manufacturer may initialize the values in the default mode page
any way the want.  That manufacturer can set the "default" to zero, or
they can set that "default" to one.  So default really means whatever
the manufacturer wants.
As for ordering, you have made an invalid assumption.  You very
specifically reference "issued first".	Order of issue from the
application client is no guarantee of order of reception by the device
server.  A host that issues COMMAND 1 followed by COMMAND 2, could have
those commands received by the device server as COMMAND 2 followed by
COMMAND 1.  So even a device with QAM=0 would process COMMAND 2 first,
then COMMAND 1.  The same is true if COMMAND 1 and COMMAND 2 were both
ORDERED commands.  The device server does ordering based on the order
received, NOT based on the order issued (partly because there is no way
for the device server to determine the issue order).
And add multiple I_T_Nexuses (such as MPIO configurations) and it gets
even more fun.
		Fred Knight
From: Sheffield, Bob [mailto:Bob.Sheffield at lsi.com] 
Sent: Friday, November 18, 2011 12:36 PM
To: Yoder, Alan; Knight, Frederick; Kevin D Butt; T10 Reflector
Subject: RE: SBC question about block coherency
I've wondered about the interpretation of the following paragraph in
subclause 7.5.7 Control mode page from SPC-4r32:
	A value of zero in the QUEUE ALGORITHM MODIFIER field specifies
that the device server shall order the processing
	sequence of commands having the SIMPLE task attribute such that
data integrity is maintained for that I_T nexus
	(i.e., if the transmission of new SCSI transport protocol
requests is halted at any time, the final value of all data
	observable on the medium shall be the same as if all the
commands had been processed with the ORDERED task
	attribute).
So, the way I read this, for the default value of zero in the QAM field,
the device server is expected to operate on the extent of LBAs specified
in the extent as an atomic operation, and make certain overlapping
writes are not reordered. What I wonder is if this requirement perhaps
extends beyond just the range of LBAs specified in a single command. For
example, if two writes are issued to adjacent extents. Say the writes
are reordered so that the second write is processed first, and then
"transmission of new SCSI transport protocol requests is halted". Then a
read is issued to read both extents. If the 2nd processed write (which
was issued first) never began processing (e.g., intervening hard reset,
or something like that), then the read would read valid data for the 2nd
issued write extent, but stale data for the 1st issued write extent.
Makes me wonder if the requirement might be that both writes have to be
processed in order even though their extents do not overlap.
In practice, I think device server implementations only enforce ordering
of writes when the extents overlap. So it would be possible for the
device server to reorder writes accessing adjacent extents so that if
I/Os are aborted after one write is processed, but before the other is
processed, the two extents, taken as a set, would not be coherent.
With QAM=1, the device server is informing the application client
explicitly that it reorders reads and writes indiscriminantly, so all
bets are off if the application client doesn't avoid issuing I/O
patterns that could result in incoherent information on media.
Bob
________________________________
From: owner-t10 at t10.org [mailto:owner-t10 at t10.org] On Behalf Of Yoder,
Alan
Sent: Tuesday, November 15, 2011 7:43 PM
To: Knight, Frederick; Kevin D Butt; T10 Reflector
Subject: Re: SBC question about block coherency
> my statement was about  "any language" in the SCSI family of
standards.
Guess my programming languages slip is showing.  ;-)
Alan
On 11/15/11 5:52 PM, "Knight, Frederick" <Frederick.Knight at netapp.com>
wrote:
Since the original question was about raw SCSI commands and various task
attributes associated with them, my statement was about  "any language"
in the SCSI family of standards.
You are absolutely, this kind of synchronization is the responsibility
of the APPLICATION LAYER (where application is defined as anything above
the H/W and its driver - such as file systems, volume managers, etc,
etc). At the lowest level, the protocol only does what you tell it to do
(and the protocol is perfectly happy to do overlapping reads/writes at
the same time if you tell it to).
		Fred
From: Yoder, Alan 
Sent: Tuesday, November 15, 2011 1:31 PM
To: Knight, Frederick; Kevin D Butt; T10 Reflector
Subject: Re: SBC question about block coherency
> I'm not aware of any language that requires serial command completion
Maybe not languages proper, but OS primitives  and FS interfaces do.  In
CIFS, for example, all I/O is serial unless you specify "overlapped"
I/O. And you'd use locking to control whether reads simultaneous with
your writes were allowed (default is not).  POSIX has similar controls.
My take is that any issue here is handled by higher level constructs; no
real OS is going to allow overlapping reads and writes without the
explicit permission of the programmer. 
Alan
On 11/14/11 10:37 PM, "Knight, Frederick" <Frederick.Knight at netapp.com>
wrote:
I would also add, that your statement about "the LBA" is also correct.
If in your example, these READ and WRITE commands are operating on an
overlapping multi-LBA range, I'm not aware of any language that requires
serial command completion.
Obvious, as Gerry stated, restricted reordering (see SAM) does add
requirements on WRITES, but multiple normal SIMPLE tagged commands in
the unrestricted environment may all overlap.  If you extend your
READ/WRITE question so that each request involves multiple LBAs, you can
see that "the LBA" is now significant, in that the READ (of multiple
LBAs) is no longer guaranteed to read all old data or all new data -
each LBA would independently be old or new, such that your data-in
buffer may contain some random mix of blocks containing old data, and
blocks containing new data.
I would call it Option 3:
There is an inherent race condition so that as	the WRITE and the READ
commands are processed, they do atomic operations individually on each
LBA referenced by that command, at the same time that the other command
also does atomic operations individually on each LBA referenced by that
command.  No matter which command begins processing first, the data that
is read may contain some whole blocks that contain old data and some
whole blocks that contain new data.
		Fred
From: Gerry Houlder [mailto:gerry.houlder at seagate.com] 
Sent: Monday, November 14, 2011 4:51 PM
To: T10 Reflector
Subject: Re: SBC question about block coherency
Option 1 is the expected behavior. From the point of view of the SCSI
target device, commands are not received simultaneously; there will
always be a mechanism that will cause one of the commands to be logged
as being received ahead of the other.
However there can be other influences on that ordering.
(a) Due to system configuration details (e.g., expanders), command might
be sent from a host in one order but received by the target in a
different order.
(b) If all commands are SIMPLE task attribute, they may be reordered
once they are received into the target's queue. I would not expect
commands that access the same LBA to be reordered with respect to each
other, but this is allowed if "unrestricted reordering" is set in the
Control mode page. If "restricted reordering" is set in the mode page
instead, then reordering of writes with respect to reads of the same LBA
is not allowed.
(c) A command with HEAD OF QUEUE attribute is usually placed ahead of
other commands that are in the queue. There are exceptions for commands
that have already started processing so there is some gray area here.
On Mon, Nov 14, 2011 at 11:32 AM, Kevin D Butt <kdbutt at us.ibm.com>
wrote:
I have a question that seems obvious, but since I come from the tape
world and have not spent much time in the disk world I could be assuming
behaviors that I shouldn't. 
In the tape world, if a logical block is overwritten, then the read of
that logical block cannot occur until after the write has been
completed. The tape's command queue is essentially an ordered queue. In
the disk world, as I understand it, many commands can be processed in
parallel, that is, the queue is not necessarily an ordered queue. So, I
have a question about what that parallel'ness means when the command
arrives with a task type of HEAD OF QUEUE. 
In the example of a WRITE being issued at the same time as a READ being
issued for the same LBA from multiple application clients (i.e., in
different task sets), what should be expected?
Option 1:
There is an inherent race condition so either the WRITE or the READ
command will arrive first and be processed as an atomic operation on the
LBA, then the other command will be processed on the LBA. If the READ
arrives first the data that is read is the old data. If the write
arrives first the new data is written and then the read reads the new
data.
Option 2:
There is an inherent race condition so either the WRITE or the READ
command arrives first, but both commands are processed simultaneously
and the READ command returns data that contains partially old data and
partially new data.
Thanks,
Kevin D. Butt
SCSI & Fibre Channel Architect, Tape Firmware
Data Protection & Retention
MS 6TYA, 9000 S. Rita Rd., Tucson, AZ 85744
Tel: 520-799-5280
Fax: 520-799-2723 (T/L:321)
Email address: kdbutt at us.ibm.com
http://www-03.ibm.com/servers/storage/ 



More information about the T10 mailing list