Data Integrity telecon, Oct 29 w/agenda

Kevin D Butt kdbutt at us.ibm.com
Wed Oct 29 08:54:28 PST 2003


* From the T10 Reflector (t10 at t10.org), posted by:
* Kevin D Butt <kdbutt at us.ibm.com>
*
This is a multipart message in MIME format.
--=_alternative 005CC52507256DCE_=
Content-Type: text/plain; charset="us-ascii"


Jim, 

I think this discussion about end-to-end and how it would affect
backup/restore operations needs to be discussed. 

First of all, variable length commands are not just a potential.  They
are reality.   Different block sizes are used during typical backups
(e.g. the last block of a file is usually a different length than the
rest, as well as headers and trailers).  Also, many ISVs use different
block sizes.  We have customers who also use overlength and underlength
transfers as a normal mode of operation.  I'm sure that has not been
considered.  CRC must be seeded by 1's to be able to handle variable
length blocks. 

Keep in mind, however, that the current data protection proposals ignore
tape altogether.  In fact, the authors indicated that they took a look
at Extended Copy and determined it was difficult to make work and they
weren't interested in it so they decided to ignore it and let anybody
interested in Extended Copy worry about it.  I have an action from the
rest of the tapeheads to express our concerns and try to bring an
understanding to the tape world as well as to the disk world of how this
data protection will effect backup/restore operations and how
backup/restore operations will effect end-to-end data protection.   

Included herein is a summary of tape issues and pieces of some of the
conversations I've had with tape (that's me + the tape people present at
the last ADI telecon), ISV people from one ISV (guess which one), and
disk people. 

Assumptions: 
a) One of the concerns was what happened when a disk crashed and a
defrag occurred and how that effects LBA.  I've been informed that the
defrag is really a file system issue and as such is an operating system
concept that is outside the disk drive.  So this will have no bearing on
LBA's. - Is this also true for RAID? 
b) Backup/Restore software currently does the equivalent of a meta-data
transform today.  The software reads the data and the blocks to tape are
different sizes than blocks read from disk.  To use disk block sizes
would severely degrade performance. 
c) Software today uses two different types of operations for
backup/restore.  They are 1) request a file from the OS, and 2) Image
backup/restore where they use the actual LBA's.  These LBA's can be
restored to different locations than from where they were read from.
Some examples are restoring to a different size disk than from which it
was backed up, the OS modifying its filesystem. 

These commands are only SBC-2 and any talk of tape commands would need
to be deferred to proposals to be brought in for inclusion in SSC-3 that
would allow tape to play in the same arena.  The issue I am trying to
focus on is making sure that a legacy backup/restore to tape 
1) can still occur (which I think it can with loss of protection across
tape) and 
2) that when a legacy backup/restore to tape occurs that it doesn't
break the scheme such that when the application client reads the data
|from the restored disk with the new data protection it doesn't cause the
application client to call the data bad. 
3) that the Extended Copy command can still be used. 

Hopefully I can clear up some confusion.  I can envision three types of
tape backup: 
<<  ===> Is this section between '<<' and '>>' true? <==== 
All of my notes assume and RELY on: 
1) The Application Tag is unused, 
2) The Reference Tag (the one associated with the LBA) is: 

a.	Only associated to the LBA by the first Write command of a file.


b.	Only associated to the LBA by the first Read command of a file. 

c.	The application client does not compare the Reference Tag of the
Read command back to the Reference Tag of the Write command. 
During a backup/restore operation there is potential that the LBA's used
on the Restore will be different from those used on the Backup.  Indeed,
this is the crux of the concerns related to interoperability issues with
Tape or any other Backup/Restore device.  If the Reference to the LBA is
done in such a manner as to tie the data to the same LBA that was used
on the initial Write, then this will not survive through a
Backup/Restore operation. Indeed if any one of the above assumptions are
not reality, then I see no guarantee that data could survive through a
Backup/Restore operation. 
>> 

<<Mode 1 would nominally not have protection to tape.  However,
proprietary data protection schemes that may be available from some tape
vendors may be used to get a near end-to-end data protection but would
be susceptible to the process that converted the disk protection scheme
into the tape protection scheme>> 
1. Mode 1 - where only the block data (typically 512 bytes) is written
to and read from tape.  The software which reads and writes the disk has
to know the disk block size and LBA, and can therefore check the CRC and
Reference Tag  on disk reads and generate CRC and Tag on disk writes.
This is true as long as the Reference Tag Seed is the existing LBA in
today's CDB.  This is one reason IBM does not want a new Reference Tag
Seed in a new 32 byte CDB. 

The issue here is the Application Tag, which would be lost, but that is
true for legacy disk I/O operations as well. 

<<Mode 2 would nominally not have protection to tape.  However,
proprietary data protection schemes that may be available from some tape
vendors may be used to ensure a full end-to-end data protection >>2.
Mode 2 - where both the block data (typically 512 bytes) plus the eight
check bytes is written to and read from tape.  The software (as with
mode 1) could check the CRC and Tag on disk reads.  The data block plus
check bytes could be written to and read from tape (as raw data).  The
disk controller could check the CRC and Tag on the disk writes. 

<<Mode 3 would have to wait for additions to SSC-3 and may need
modifications to the concept listed>> 3. Mode 3 - where both the block
data (typically 512 bytes) plus the eight check bytes is written to and
read from tape plus checked.  The software (as with mode 1) could check
the CRC and Tag on disk reads.  The data block plus check bytes could be
written to and read from tape (as with mode 2).  The disk controller (as
with mode 2) could check the CRC and Tag on the disk writes. 

In order for tape to check the CRC, tape would need to know the disk
logical block size.  This would allow the SSC streaming commands to be
used to transfer multiple disk blocks as a single tape block and
maintain a high data rate.  In order for tape to check the Tag, tape
would need to know the disk LBA.  The software which reads and writes
the disk has to know the disk LBA, but a new mechanism would be needed
to pass this information to tape.  This would be a very complex
operation of checking data in the middle of a block and I believe there
would be resistance to this as well as possible technical limitations. 

I think the software (or tape device driver) would need to create tape
specific guard data around the tape block.  The user data, from the tape
perspective would be the disk user data plus the disk meta-data.  I
don't know how having multiple [data + CRC] sets would work in
validating the tape block CRC since the value would go to zero several
times throughout a single block.

Kevin D. Butt
Fibre Channel & SCSI Architect, IBM Tape Microcode, 
6TYA, 9000 S. Rita Rd., Tucson, AZ  85744
Tie-line 321; Office: 520-799-5280, Lab: 799-5751, Fax: 799-4138, Email:
kdbutt at us.ibm.com 




		Jim.Coomes at seagate.com 
Sent by: owner-t10 at t10.org 


	10/28/2003 05:28 PM 
        
        To:        "T10, Reflector" <T10 at t10.org> 
        cc:         
        Subject:        Data Integrity telecon,  Oct 29 w/agenda 





	* From the T10 Reflector (t10 at t10.org), posted by:
* Jim.Coomes at seagate.com
*

Below are details for a conference call to review updated proposals for
end
to end data checking.

Wednesday, Oct 29, 2003
2:00-4:00pm EST / 1:00-3:00pm CST / 12:00-2:00pm MST / 11:00am-1:00pm
PST

Teleconference Info:
Toll-Free: 1-866-828-0531
Caller Pd: 1-309-229-0103
Participant:  4022856
Host (Jim Coomes/Seagate)

WebEx Info:
Meeting Name:  Data Integrity Proposal
https://seagate.webex.com/seagate
Host:  Jim Coomes
Password: integrity

Proposed Agenda:

03-307r3 - 32 Byte Commands for End-to-End Data Protection
03-360R1 - End-to-End Data Protection Interoperability with Legacy Host
Impact of enable modify data pointers (EMDP) bit on End to End Data
Protection
03-365r0 - SPC-3; SBC-2; End-to-End Data Protection

If you have any other suggested topics for the agenda please
either e-mail them to me or bring them up at the start of the call.








*
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at t10.org





--=_alternative 005CC52507256DCE_=
Content-Type: text/html; charset="us-ascii"


<br><font size=2 face="sans-serif">Jim,</font>
<br>
<br><font size=2 face="sans-serif">I think this discussion about end-to-end and how it would affect backup/restore operations needs to be discussed.</font>
<br>
<br><font size=2 face="sans-serif">First of all, variable length commands are not just a potential. &nbsp;They are reality. &nbsp; Different block sizes are used during typical backups (e.g. the last block of a file is usually a different length than the rest, as well as headers and trailers). &nbsp;Also, many ISVs use different block sizes. &nbsp;We have customers who also use overlength and underlength transfers as a <b><u>normal </u></b>mode of operation. &nbsp;I'm sure that has not been considered. &nbsp;CRC must be seeded by 1's to be able to handle variable length blocks. </font>
<br>
<br><font size=2 face="sans-serif">Keep in mind, however, that the current data protection proposals ignore tape altogether. &nbsp;In fact, the authors indicated that they took a look at Extended Copy and determined it was difficult to make work and they weren't interested in it so they decided to ignore it and let anybody interested in Extended Copy worry about it. &nbsp;I have an action from the rest of the tapeheads to express our concerns and try to bring an understanding to the tape world as well as to the disk world of how this data protection will effect backup/restore operations and how backup/restore operations will effect end-to-end data protection. &nbsp;</font>
<br>
<br><font size=2 face="sans-serif">Included herein is a summary of tape issues and pieces of some of the conversations I've had with tape (that's me + the tape people present at the last ADI telecon), ISV people from one ISV (guess which one), and disk people.</font>
<br>
<br><font size=2 face="sans-serif">Assumptions:</font>
<br><font size=2 face="sans-serif">a) One of the concerns was what happened when a disk crashed and a defrag occurred and how that effects LBA. &nbsp;I've been informed that the defrag is really a file system issue and as such is an operating system concept that is outside the disk drive. &nbsp;So this will have no bearing on LBA's. - Is this also true for RAID?</font>
<br><font size=2 face="sans-serif">b) Backup/Restore software currently does the equivalent of a meta-data transform today. &nbsp;The software reads the data and the blocks to tape are different sizes than blocks read from disk. &nbsp;To use disk block sizes would severely degrade performance.</font>
<br><font size=2 face="sans-serif">c) Software today uses two different types of operations for backup/restore. &nbsp;They are 1) request a file from the OS, and 2) Image backup/restore where they use the actual LBA's. &nbsp;These LBA's can be restored to different locations than from where they were read from. &nbsp;Some examples are restoring to a different size disk than from which it was backed up, the OS modifying its filesystem.</font>
<br>
<br><font size=2 face="sans-serif">These commands are only SBC-2 and any talk of tape commands would need to be deferred to proposals to be brought in for inclusion in SSC-3 that would allow tape to play in the same arena. &nbsp;The issue I am trying to focus on is making sure that a legacy backup/restore to tape </font>
<br><font size=2 face="sans-serif">1) can still occur (which I think it can with loss of protection across tape) and </font>
<br><font size=2 face="sans-serif">2) that when a legacy backup/restore to tape occurs that it doesn't break the scheme such that when the application client reads the data from the restored disk with the new data protection it doesn't cause the application client to call the data bad.</font>
<br><font size=2 face="sans-serif">3) that the Extended Copy command can still be used.</font>
<br>
<br><font size=2 face="sans-serif">Hopefully I can clear up some confusion. &nbsp;I can envision three types of tape backup:</font>
<br><font size=2 color=blue face="sans-serif"><b><< &nbsp;===> Is this section between '<<' and '>>' true? <====</b></font>
<br><font size=2 color=blue face="sans-serif"><b>All of my notes assume and RELY on:</b></font>
<br><font size=2 color=blue face="sans-serif"><b>1) The Application Tag is unused,</b></font>
<br><font size=2 color=blue face="sans-serif"><b>2) The Reference Tag (the one associated with the LBA) is:</b></font>
<ol type=a>
<li value=1><font size=2 color=blue face="sans-serif"><b>Only associated to the LBA by the first Write command of a file.</b></font>
<li value=2><font size=2 color=blue face="sans-serif"><b>Only associated to the LBA by the first Read command of a file.</b></font>
<li value=3><font size=2 color=blue face="sans-serif"><b>The application client does not compare the Reference Tag of the Read command back to the Reference Tag of the Write command.</b></font>
<br><font size=2 color=blue face="sans-serif"><b>During a backup/restore operation there is potential that the LBA's used on the Restore will be different from those used on the Backup. &nbsp;Indeed, this is the crux of the concerns related to interoperability issues with Tape or any other Backup/Restore device. &nbsp;If the Reference to the LBA is done in such a manner as to tie the data to the same LBA that was used on the initial Write, then this will not survive through a Backup/Restore operation. Indeed if any one of the above assumptions are not reality, then I see no guarantee that data could survive through a Backup/Restore operation.</b></font>
<br><font size=2 color=blue face="sans-serif"><b>>></b></font>
<br>
<br><font size=2 color=blue face="sans-serif"><b><<Mode 1 would nominally not have protection to tape. &nbsp;However, proprietary data protection schemes that may be available from some tape vendors may be used to get a near end-to-end data protection but would be susceptible to the process that converted the disk protection scheme into the tape protection scheme>></b></font>
<br><font size=2 face="sans-serif">1. Mode 1 - where <u>only the block data</u> (typically 512 bytes) is written to and read from tape. &nbsp;The software which reads and writes the disk has to know the disk block size and LBA, and can therefore check the CRC and Reference Tag &nbsp;on disk reads and generate CRC and Tag on disk writes. &nbsp;This is true as long as the Reference Tag Seed is the existing LBA in today's CDB. &nbsp;This is one reason IBM does <u>not</u> want a new Reference Tag Seed in a new 32 byte CDB.</font>
<br>
<br><font size=2 face="sans-serif">The issue here is the Application Tag, which would be lost, but that is true for legacy disk I/O operations as well.</font>
<br>
<br><font size=2 color=blue face="sans-serif"><b><<Mode 2 would nominally not have protection to tape. &nbsp;However, proprietary data protection schemes that may be available from some tape vendors may be used to ensure a full end-to-end data protection >></b></font><font size=2 face="sans-serif">2. Mode 2 - where both the block data (typically 512 bytes) <u>plus the eight check bytes</u> is written to and read from tape. &nbsp;The software (as with mode 1) could check the CRC and Tag on disk reads. &nbsp;The data block plus check bytes could be written to and read from tape (as raw data). &nbsp;The disk controller could check the CRC and Tag on the disk writes.</font><font size=2 color=blue face="sans-serif"><b> </b></font><font size=2 face="sans-serif"><br>
</font>
<br><font size=2 color=blue face="sans-serif"><b><<Mode 3 would have to wait for additions to SSC-3 and may need modifications to the concept listed>> </b></font><font size=2 face="sans-serif">3. Mode 3 - where both the block data (typically 512 bytes) plus the eight check bytes is written to and read from tape <u>plus checked</u>. &nbsp;The software (as with mode 1) could check the CRC and Tag on disk reads. &nbsp;The data block plus check bytes could be written to and read from tape (as with mode 2). &nbsp;The disk controller (as with mode 2) could check the CRC and Tag on the disk writes.</font>
<br>
<br><font size=2 face="sans-serif">In order for tape to check the CRC, tape would need to know the disk logical block size. &nbsp;This would allow the SSC streaming commands to be used to transfer multiple disk blocks as a single tape block and maintain a high data rate. &nbsp;In order for tape to check the Tag, tape would need to know the disk LBA. &nbsp;The software which reads and writes the disk has to know the disk LBA, but a new mechanism would be needed to pass this information to tape. &nbsp;This would be a very complex operation of checking data in the middle of a block and I believe there would be resistance to this as well as possible technical limitations.</font>
<br>
<br><font size=2 face="sans-serif">I think the software (or tape device driver) would need to create tape specific guard data around the tape block. &nbsp;The user data, from the tape perspective would be the disk user data plus the disk meta-data. &nbsp;I don't know how having multiple [data + CRC] sets would work in validating the tape block CRC since the value would go to zero several times throughout a single block.<br>
<br>
Kevin D. Butt<br>
Fibre Channel &amp; SCSI Architect, IBM Tape Microcode, <br>
6TYA, 9000 S. Rita Rd., Tucson, AZ &nbsp;85744<br>
Tie-line 321; Office: 520-799-5280, Lab: 799-5751, Fax: 799-4138, Email: kdbutt at us.ibm.com</font>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td>
<td><font size=1 face="sans-serif"><b>Jim.Coomes at seagate.com</b></font>
<br><font size=1 face="sans-serif">Sent by: owner-t10 at t10.org</font>
<p><font size=1 face="sans-serif">10/28/2003 05:28 PM</font>
<br>
<td><font size=1 face="Arial">&nbsp; &nbsp; &nbsp; &nbsp; </font>
<br><font size=1 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; To: &nbsp; &nbsp; &nbsp; &nbsp;"T10, Reflector" <T10 at t10.org&gt;</font>
<br><font size=1 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; cc: &nbsp; &nbsp; &nbsp; &nbsp;</font>
<br><font size=1 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; Subject: &nbsp; &nbsp; &nbsp; &nbsp;Data Integrity telecon, &nbsp;Oct 29 w/agenda</font>
<br></table>
<br>
<br>
<br><font size=2 face="Courier New">* From the T10 Reflector (t10 at t10.org), posted by:<br>
* Jim.Coomes at seagate.com<br>
*<br>
<br>
Below are details for a conference call to review updated proposals for end<br>
to end data checking.<br>
<br>
Wednesday, Oct 29, 2003<br>
2:00-4:00pm EST / 1:00-3:00pm CST / 12:00-2:00pm MST / 11:00am-1:00pm PST<br>
<br>
 Teleconference Info:<br>
 Toll-Free: 1-866-828-0531<br>
 Caller Pd: 1-309-229-0103<br>
 Participant: &nbsp;4022856<br>
 Host (Jim Coomes/Seagate)<br>
<br>
 WebEx Info:<br>
 Meeting Name: &nbsp;Data Integrity Proposal<br>
 https://seagate.webex.com/seagate<br>
 Host: &nbsp;Jim Coomes<br>
 Password: integrity<br>
<br>
Proposed Agenda:<br>
<br>
03-307r3 - 32 Byte Commands for End-to-End Data Protection<br>
03-360R1 - End-to-End Data Protection Interoperability with Legacy Host<br>
Impact of enable modify data pointers (EMDP) bit on End to End Data<br>
Protection<br>
03-365r0 - SPC-3; SBC-2; End-to-End Data Protection<br>
<br>
If you have any other suggested topics for the agenda please<br>
either e-mail them to me or bring them up at the start of the call.<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
*<br>
* For T10 Reflector information, send a message with<br>
* 'info t10' (no quotes) in the message body to majordomo at t10.org<br>
</font>
<br>
<br></ol>
--=_alternative 005CC52507256DCE_=--




More information about the T10 mailing list