Question about SCSI consistency model
vst at vlnb.net
Mon Nov 11 13:27:28 PST 2013
* From the T10 Reflector (t10 at t10.org), posted by:
* Vladislav Bolkhovitin <vst at vlnb.net>
I see, thank you for your reply and detail explanation.
But, apparently, there is a demand for such looser consistensy model.
One colleague told me that when he was working in IBM they were
considering such optimization as well.
Would it make sense if we submit a formal proposal to add a bit in the
Control mode page saying that if it's set, for not completed WRITE
commands after power loss different I_T nexuses can return mix of old
and new data for blocks belonging to the not completed WRITEs?
Such behavior is fully OK for modern journaled database systems, which
on recovery either retry those not completed WRITEs after timeout right
away, or, if the device was disconnected or they crashed at the same
time as well, replay journal, i.e. also retry the not completed WRITEs.
Knight, Frederick, on 10/26/2013 09:07 AM wrote:
> I don't believe that interpretation is valid. Notice that the text you
quote mentions nothing about the path. It only mentions the LBAs (which are
just addresses), and the data contained at that address.
> If you process a WRITE command to an address (to LBAs), and some data is
written into those LBAs, but not all of the data is written, then a READ (to
that same address) may return some of the new data that was written, and some
of the old data that didn't get replaced yet, or any combination. Remember,
it is the device server doing this processing, not the path. There is no
such thing as having multiple device servers. Look at the SAM model, and you
find lots of target ports, but a single logical unit, a single task router,
and a single device server.
> Consider a WRITE to addresses 101-110 (LBAs 101-110). Consider a failure
where that write successfully puts data into LBAs 105-110, but encounters an
error before any of the other data can be written to persistent storage. For
this failure case, I would expect a READ of LBA 101-110 to return the
original old data from LBAs 101-104 and then return the new data from LBAs
105-110. But, the text you are quoting allows other behaviors as well.
> What I do NOT find in that text, is permission to return DIFFERENT data for
different READ commands to the same LBA, just because those different READ
commands happen to come into the logical unit/task router/device server via
different target ports (again, look at the SAM model). The LBA is the
address of the data; and the data is the data (singular). One address (one
LBA) can't have multiple DIFFERENT data values.
> Fred Knight
> -----Original Message-----
> From: owner-t10 at t10.org [mailto:owner-t10 at t10.org] On Behalf Of Vladislav
> Sent: Thursday, October 24, 2013 1:23 PM
> To: t10 at t10.org
> Subject: Question about SCSI consistency model
> * From the T10 Reflector (t10 at t10.org), posted by:
> * Vladislav Bolkhovitin <vst at vlnb.net>
> We are working on creating a distributed SCSI device, when several nodes
are combined together to create something, which looks as a single multipath
SCSI device to initiators, where each path is the path to separate node. We
figured out that fully exploiting SCSI consistency model would allow us to
significantly improve performance, but we wonder if the SCSI consistency
model is going AS far as we need.
> SPC-3 section "Write and unmap failures" says:
> If one or more write commands are have not completed when a power loss
occurs (e.g., resulting in a vendor specific command timeout by the
application client) or a medium error or hardware error occurs (e.g., because
a removable medium was incorrectly undemounted), then any data in the logical
blocks referenced by the LBAs specified by any of those commands is
indeterminate. Before sending a read command or verify command specifying any
LBAs that were specified by one of the write commands that did not complete,
the application client should resend that write command. If an application
client sends a read command or verify command specifying any LBAs that were
specified by one of the write commands that did not complete before resending
that write command, then the device server may return old data, new data,
vendor-specific data, or any combination thereof for the logical blocks
referenced by the specified LBAs
> The question is if the device server after a failure of a write command on
block X starts returning on reads from this block from one path - old data
and from another path - new data, would it still be in line with the above
SCSI consistency model?
> * For T10 Reflector information, send a message with
> * 'info t10' (no quotes) in the message body to majordomo at t10.org
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at t10.org
More information about the T10