Data Integrity: DIF Stacking and Suggested Mode page change

Rassbach, Walter B walter.b.rassbach at intel.com
Thu Jun 5 15:00:51 PDT 2003


* From the T10 Reflector (t10 at t10.org), posted by:
* "Rassbach, Walter B" <walter.b.rassbach at intel.com>
*
We are currently working on a document that describes various usage models, including the error cases that are addressed -- Most of those error cases are probably the result of a bug in crontroller firmware, so, in one sense one might say that that it is unneccessary. However, I have yet to see bug-free controller firmware (including my own), and the Data Integrity proposals provide a last-check mechanism to catch such problems before they result in cacscaded errors. In other words, the data integrity proposals arise out of a lot of people's experience with intermediate controllers -- A lot of that is RAID controller experience, but I think we need to ensure that the scope is not limited to just RAID -- Eventually, the goal is that the host will be involved and that the range will be closer to end-to-end. We will not have true end-to-end until the actual applicarions participate, but that cannot occur unless the tools are first made available. What we have to do is to avoid !
limiting those tools too much while still making them feasible.

The need for something like "stacking" is already embedded in the proposals. After a couple of months of discussion, no alternative was found for the situations where it comes up. There is no "variable" element involved however: The block sizes at each level of the heirarchy are fixed. DIFs get "stacked" only in situations where there are multiple usage requirements (at different levels of the heirarchy) placed on the DIF sub-fields and a resulting conflict. The concept of "DIF stacking" is simply a method for dealing with that kind of conflict, but there are no alternatives on the table yet.

As an example of a scenario where stacking is required, consider the following system:

1) A host (or hosts) that use the object-oriented command set to talk to controller A (e.g., on a SAN).
2) Controller A stores its objects on a block-architecture basis, using DIFs. It places the object "number" in the META tag and the block-within-object in the REF tag (in fact, it would probably split the object number over the META tag and the high order part of the REF tag). It uses the tag information to cross-check its mapping tables to ensure that the right data is being presented to the host(s).
3) The back-end for controller A is actually a (virtualizing) RAID controller (presumably dedicated).
4) The RAID controler (B) also uses the DIF sub-fields internally and requires that DIFs presented by controller B's host (in this case, controlller A) obey certain restrictions (if the DIF is shared), because it uses the REF tag as the virtual LBA (to verify its own mapping functions) and uses the META tag to hold the vitual LUN number (verification) and possibly various control flags (e.g., a flag indicating "parity block" and/or a flag indicating "intentionally bad block").
5) Attaching controller B as the "back-end" of controller A is going to result in a conflict. The only way to handle this conflict would seem to be for controller B to extend the data block with an additional DIF which holds the information it uses internally. In other words, to be able to work with controller A, controller B has to provide DIF stacking or some equivalent.

There is no "variability" involved here. Controller A takes data in (say) 512 byte blocks and internally adds a DIF to each block before sending it to its "back-end" (i.e., controller B). Controller B accepts 512+8 byte blocks (512+DIF) and stacks an additional DIF, making the physical block size 512+8+8. The disk drives behind controller B are formatted as 520 blocks plus DIF, with an EXCL_Bytes count of 2.

If controller B does not provide DIF stacking, then it would simply not be usable as a back-end device for controller A. The need to allow for DIF stacking does not become apparent until one considers such multi-level scenarios, but quickly becomes apparent when considering scenarios like the one above. Note that it is only a controller that serves as an intermediate level (and uses/restricts the DIF) that needs to provide "stacking". A simple disk drive would never need to provide such a capability.

Note that the eamil I submitted was purely within the context of the proposak, suggesting a better method of accomplishing something that the proposal already provides for, not an attempt to justify the proposal itself or the concept of "stacking".

-----Original Message-----
From: Evans, Mark [mailto:Mark_Evans at maxtor.com]
Sent: Thursday, June 05, 2003 6:39 AM
To: Rassbach, Walter B
Cc: t10 at t10.org
Subject: RE: Data Integrity: DIF Stacking and Suggested Mode page change


Hello Walter,

I appreciate all of the effort that you're putting into this solution.
However, I think that many, including myself, would more appreciate it if
you would put effort into developing material to share with us that would
specifically identify and quantify the issue(s?) your solution is being
designed to address, with particular focus on those issues that are not
currently addressed by other methods.  Requests for this type of
identification and justification have been stated during several T10
meetings and conference calls.  George Penokie has at least attempted to
describe some end-to-end error cases.  Though George has not yet related how
his proposal would address the cases he has described, I expect that he will
do so soon.

I am also surprised that your email recommends to continue with variable
"stacking" after you had received such negative reaction to this concept.
However, until we understand the issues, I think that it's difficult for
many of us to justify the time required to rigorously evaluate your
complicated solution.

Please feel free to call or send an email to me with any questions or
comments that you may have about this.

Regards,

Mark Evans
Maxtor Corporation
408-894-5310

 -----Original Message-----
From: 	Rassbach, Walter B [mailto:walter.b.rassbach at intel.com] 
Sent:	Wednesday, June 04, 2003 11:52 PM
To:	t10 at t10.org
Subject:	Data Integrity: DIF Stacking and Suggested Mode page change

* From the T10 Reflector (t10 at t10.org), posted by:
* "Rassbach, Walter B" <walter.b.rassbach at intel.com>
*
This note describes (as background information) the concept of "DIF
stacking" (already implicit in the Data Integrity proposals) and then
suggests a modification of the mode page controls that should simplify the
handling of this situation.

BACKGROUND

A DIF consists of 3 sub-fiels: A Reference tag, a Meta tag, and a Guard
value. A simple device (e.g., a disk) will simply store these  fields, as
presented by the host on a write, with each data block and return them
unchanged upon a read -- It is also expected to verify  the information in
the DIF according to the appropriate control settings. The host can place
any value in the tag fields as long as it can  provide those values during a
Read or has disabled cecking of that tag field.

One implementation model of a RAID controller (with virtual LUNs) might use
the tag and guard fields as follows:

REF tag -- Holds the virtual LBA of the block. Note that this will not be
the same as the physical LBA for the block on the backend media.  The
controller uses this to cross-check its mapping functions. The REF tag for
parity blocks will be taken from an independent space,  differentiated
either by a special "marker" in the META tag or by using values larger than
the maximum LBA (e.g., negative values in 2's  complement form).

META tag -- Holds the virtual LUN number plus, possibly, certain special
handling flags, e.g., a flag indicating that the associated block  is a
parity block or that the block has been intentionally marked as "bad".

Guard -- A RAID controller might require the use of the checksum method of
guard calculation because that method can be used to provide a  check across
the whole stipe and thus be used to close the "write hole" problem.

If the RAID controller does not present a DIF-aware image to its host(s),
the usage model for the DIF sub-fields is not a problem.  However, if the
RAID controller does present a DIF-image to the host, it may (will) have to
force its hosts to respect certain  restrictions on the DIF sub-fields. In
particular, it might force the REF Method to be 00 (so that the REF Tag is
always the virtual LBA),  it may force META Echo because it cannot guarantee
that the contents of the META tag sub-field will be preserved from write to
read, and  it may force the Guard algorithm to be a checksum since it uses
that as part of its recovery methodology.

However, the host (application) may be such that it cannot accept such
restrictions. For example, the RAID controller's "host" may be an
object-oriented controller that wants to use the META tag to hold the
"object ID (number)" and the REF tag to hold the block-in-object  number.
Or, the host may implement a log-structured file system where the REF tag is
used to hold the nominal LBA (which is seldom the  same as the
actual/physical LBA) and the META tag to hold version information.

Such applications cannot accept the restrictions that the RAID controller
places on the usage of the tags, but the RAID controller's  internal
algorithms are dependent on those restrictions. In order to allow for such
situations, a "friendly" RAID controller  implementation would provide a
method to "stack" DIF fields. It would allow its host to use the tag fields
as it desired and append its  own information internally (Note: RAID
controllers have been adding hidden fields to data blocks for years). The
implementation of this  "stacking" approach is internal to the RAID
controller, but one implementation (specifically allowed for in the Data
Integrity proposal)  would be to add an additional DIF to the block, making
the physical block (on the backend of the RAID controller) consist of the
block  itself, the host's DIF, and the hidden DIF appended by the RAID
controller. The second, hidden DIF would contain the tag values that the
RAID controller uses internally. If the Gua!
rd value in the hidden DIF is calculated using the same method as the
original DIF, with an  exclusion covering the host's DIF (i.e., an
EXCL_Bytes count that is 2 larger), the Guard value in the two DIFs will be
identical. The  second DIF can be easily built and appended (probably using
a hardware assist).

A DIF-aware controller might always use "stacking", but this leads to a
larger physical block size (by 8 bytes) and a consequent reduction  in
capacity. The Data Integrity proposals are structured to allow both the host
and the controller to share a single DIF as long as there  is no conflict
over the usage of the sub-fields. The mode page implicitly indicates the
sub-field usages by the controller and which  sub-field usage changes will
require a reformatting operation to allow "DIF stacking".

Note that a device (controller) does not have to allow stacking. If it
places no requirements on the DIF sub-fields, then there is no  reason to
stack. A controller that does place requirements on one or more sub-fields
and does not allow "stacking" still may conform to  the Data Integrity
extensions but is limited and may not be usable in all contexts.

Also, note that a device/controller may use a second DIF to implement
"stacking" or it may use some other (internal) mechanism. Since the
"stacked DIF" information is not available to the host, it is purely an
internal issue.

DIF-stacking may be controlled/indicated in the form currently proposed
(indicated by the STK_xyz flags) or by an alternate method, as  proposed
below.

PROPOSED CHANGE

Change the Data Integrity mode page controls as follows:

1) Define byte 2, bit 6 of the Data Integrity mode page to be the
"Hidden_DI" bit. A change in this setting generally requires a  formatting
operation and the device will normally return Format Required sense data
until the format operation is performed. The main  exception might be a
change to the mode page that clears STOR_DIF and sets Hidden_DI, or vice
versa, since the device probably does not  require reformatting (the
physical block size probably doesn't change).

2) Eliminate the STK_META, STK_REF, STK_GRD, and DI_AVAIL flags in byte 5 of
the mode page (leaving all of byte 5 reserved).

Note that a device or controller is not required to accept all settings or
combination of the mode page controls.

If a device/controller would formerly set the DI_AVAIL flag to indicate that
it provides some form of internal Data Integrity protection,  it would
instead set the Hidden_DI control to indicate that Data Integrity
information is kept internally. If the device or controller  always keeps
such information, it would force the Hidden_DI flag to 1 and mark it as
unchangeable. If the device or controller allows the  host to control
whether its internal Data Integrity features are enabled, it would allow the
host to alter the Hidden_DI flag (and  reformat). Note that the
device/controller is not required to allow the STOR_DIF flag to be set
active or to provide the additional CDBs  or DIF support -- Existing
devices/controllers may "back into" the Data Integrity functionality by
implementing the Data Integrity mode  page with the Hidden_DI control
handled appropriately and the STOR_DIF control held inactive.

If a device/controller would formerly have set one or more of the STK_META,
STK_REF, or STK_GRD flags, it will reject those changes to the  associated
controls, when the Hidden_DI flag is not active (and, the STOR_DIF flag is
active), with sense data that indicates that the  Hidden_DI flag must first
be activated (and the device reformatted). When both the STOR_DIF and
Hidden_DI flags are active, the  device/controller will be (internally)
"stacking" the Data Integrity information and thus has no reason to restrict
the usage of the tag  sub-fields (Note: It may still restrict the Guard
calculation method if it only implements a subset of [or a single] guard
calculation  methods).

A re-format operation is only required in cases where either the STOR_DIF or
Hidden_DI controls is altered. Moreover, changing one of  those controls
generally implies that a format operation is required (the only exceptions
may be cases where the STOR_DIF control is  turned off and the Hidden_DI
control turned on, or vice-versa). This should make things simpler and more
straightforward.

A DIF-aware disk device will probably allow either the STOR_DIF or Hidden_DI
control to be set, but not both. If one of these controls is  set, the drive
will be formatted with a DIF attached to each block. If the STOR_DIF control
is set, the drive would provide the additional  CDBs and allow the DIFs to
be accepted from and sent to the host. If the Hidden_DI control is set, the
drive would be formatted with  attached DIFs, but these DIFs would be
created on writes and checked and stripped on reads.

A controller that does not make use of the tag fields internally would be
similar. The only devices that would allow both controls to be  set would be
a controller that places restrictions on the usage of the tag fields due to
internal algorithm requirements.
*
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at t10.org
*
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at t10.org




More information about the T10 mailing list