Data Integrity: DIF Stacking and Suggested Mode page change
Rassbach, Walter B
walter.b.rassbach at intel.com
Wed Jun 4 23:51:39 PDT 2003
* From the T10 Reflector (t10 at t10.org), posted by:
* "Rassbach, Walter B" <walter.b.rassbach at intel.com>
This note describes (as background information) the concept of "DIF stacking" (already implicit in the Data Integrity proposals) and then suggests a modification of the mode page controls that should simplify the handling of this situation.
A DIF consists of 3 sub-fiels: A Reference tag, a Meta tag, and a Guard value. A simple device (e.g., a disk) will simply store these fields, as presented by the host on a write, with each data block and return them unchanged upon a read -- It is also expected to verify the information in the DIF according to the appropriate control settings. The host can place any value in the tag fields as long as it can provide those values during a Read or has disabled cecking of that tag field.
One implementation model of a RAID controller (with virtual LUNs) might use the tag and guard fields as follows:
REF tag -- Holds the virtual LBA of the block. Note that this will not be the same as the physical LBA for the block on the backend media. The controller uses this to cross-check its mapping functions. The REF tag for parity blocks will be taken from an independent space, differentiated either by a special "marker" in the META tag or by using values larger than the maximum LBA (e.g., negative values in 2's complement form).
META tag -- Holds the virtual LUN number plus, possibly, certain special handling flags, e.g., a flag indicating that the associated block is a parity block or that the block has been intentionally marked as "bad".
Guard -- A RAID controller might require the use of the checksum method of guard calculation because that method can be used to provide a check across the whole stipe and thus be used to close the "write hole" problem.
If the RAID controller does not present a DIF-aware image to its host(s), the usage model for the DIF sub-fields is not a problem. However, if the RAID controller does present a DIF-image to the host, it may (will) have to force its hosts to respect certain restrictions on the DIF sub-fields. In particular, it might force the REF Method to be 00 (so that the REF Tag is always the virtual LBA), it may force META Echo because it cannot guarantee that the contents of the META tag sub-field will be preserved from write to read, and it may force the Guard algorithm to be a checksum since it uses that as part of its recovery methodology.
However, the host (application) may be such that it cannot accept such restrictions. For example, the RAID controller's "host" may be an object-oriented controller that wants to use the META tag to hold the "object ID (number)" and the REF tag to hold the block-in-object number. Or, the host may implement a log-structured file system where the REF tag is used to hold the nominal LBA (which is seldom the same as the actual/physical LBA) and the META tag to hold version information.
Such applications cannot accept the restrictions that the RAID controller places on the usage of the tags, but the RAID controller's internal algorithms are dependent on those restrictions. In order to allow for such situations, a "friendly" RAID controller implementation would provide a method to "stack" DIF fields. It would allow its host to use the tag fields as it desired and append its own information internally (Note: RAID controllers have been adding hidden fields to data blocks for years). The implementation of this "stacking" approach is internal to the RAID controller, but one implementation (specifically allowed for in the Data Integrity proposal) would be to add an additional DIF to the block, making the physical block (on the backend of the RAID controller) consist of the block itself, the host's DIF, and the hidden DIF appended by the RAID controller. The second, hidden DIF would contain the tag values that the RAID controller uses internally. If the Gua!
rd value in the hidden DIF is calculated using the same method as the original DIF, with an exclusion covering the host's DIF (i.e., an EXCL_Bytes count that is 2 larger), the Guard value in the two DIFs will be identical. The second DIF can be easily built and appended (probably using a hardware assist).
A DIF-aware controller might always use "stacking", but this leads to a larger physical block size (by 8 bytes) and a consequent reduction in capacity. The Data Integrity proposals are structured to allow both the host and the controller to share a single DIF as long as there is no conflict over the usage of the sub-fields. The mode page implicitly indicates the sub-field usages by the controller and which sub-field usage changes will require a reformatting operation to allow "DIF stacking".
Note that a device (controller) does not have to allow stacking. If it places no requirements on the DIF sub-fields, then there is no reason to stack. A controller that does place requirements on one or more sub-fields and does not allow "stacking" still may conform to the Data Integrity extensions but is limited and may not be usable in all contexts.
Also, note that a device/controller may use a second DIF to implement "stacking" or it may use some other (internal) mechanism. Since the "stacked DIF" information is not available to the host, it is purely an internal issue.
DIF-stacking may be controlled/indicated in the form currently proposed (indicated by the STK_xyz flags) or by an alternate method, as proposed below.
Change the Data Integrity mode page controls as follows:
1) Define byte 2, bit 6 of the Data Integrity mode page to be the "Hidden_DI" bit. A change in this setting generally requires a formatting operation and the device will normally return Format Required sense data until the format operation is performed. The main exception might be a change to the mode page that clears STOR_DIF and sets Hidden_DI, or vice versa, since the device probably does not require reformatting (the physical block size probably doesn't change).
2) Eliminate the STK_META, STK_REF, STK_GRD, and DI_AVAIL flags in byte 5 of the mode page (leaving all of byte 5 reserved).
Note that a device or controller is not required to accept all settings or combination of the mode page controls.
If a device/controller would formerly set the DI_AVAIL flag to indicate that it provides some form of internal Data Integrity protection, it would instead set the Hidden_DI control to indicate that Data Integrity information is kept internally. If the device or controller always keeps such information, it would force the Hidden_DI flag to 1 and mark it as unchangeable. If the device or controller allows the host to control whether its internal Data Integrity features are enabled, it would allow the host to alter the Hidden_DI flag (and reformat). Note that the device/controller is not required to allow the STOR_DIF flag to be set active or to provide the additional CDBs or DIF support -- Existing devices/controllers may "back into" the Data Integrity functionality by implementing the Data Integrity mode page with the Hidden_DI control handled appropriately and the STOR_DIF control held inactive.
If a device/controller would formerly have set one or more of the STK_META, STK_REF, or STK_GRD flags, it will reject those changes to the associated controls, when the Hidden_DI flag is not active (and, the STOR_DIF flag is active), with sense data that indicates that the Hidden_DI flag must first be activated (and the device reformatted). When both the STOR_DIF and Hidden_DI flags are active, the device/controller will be (internally) "stacking" the Data Integrity information and thus has no reason to restrict the usage of the tag sub-fields (Note: It may still restrict the Guard calculation method if it only implements a subset of [or a single] guard calculation methods).
A re-format operation is only required in cases where either the STOR_DIF or Hidden_DI controls is altered. Moreover, changing one of those controls generally implies that a format operation is required (the only exceptions may be cases where the STOR_DIF control is turned off and the Hidden_DI control turned on, or vice-versa). This should make things simpler and more straightforward.
A DIF-aware disk device will probably allow either the STOR_DIF or Hidden_DI control to be set, but not both. If one of these controls is set, the drive will be formatted with a DIF attached to each block. If the STOR_DIF control is set, the drive would provide the additional CDBs and allow the DIFs to be accepted from and sent to the host. If the Hidden_DI control is set, the drive would be formatted with attached DIFs, but these DIFs would be created on writes and checked and stripped on reads.
A controller that does not make use of the tag fields internally would be similar. The only devices that would allow both controls to be set would be a controller that places restrictions on the usage of the tag fields due to internal algorithm requirements.
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at t10.org
More information about the T10