SSC-2 Note 48

BANTHER,MICHAEL (HP-UnitedKingdom,ex2) michael_banther at hp.com
Fri Apr 5 07:12:33 PST 2002


* From the T10 Reflector (t10 at t10.org), posted by:
* "BANTHER,MICHAEL (HP-UnitedKingdom,ex2)" <michael_banther at hp.com>
*
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_001_01C1DCB4.4F9FC7E0
Content-Type: text/plain;
	charset="iso-8859-1"

Joe,
 
Unless the standard forces the drive to return the data without
decompression, I don't see the value in having the drive establish the
logical position following the un-decompressable logical block.  The
application can't read anything until the drive reaches a position
within the data where it can begin to recognize logical elements again.
This same statement with slight modification applies to un-readable
areas on the media as well.
 
I believe that, after encountering un-decompressable data, the drive
should establish the logical position at the first position on the EOP
side of the un-decompressable data when reading forward (or the BOP side
of the un-decompressable data when reading in reverse) where its
decompression algorithm allows access.  Whether the drive's compression
scheme places a single logical element in a compression unit or multiple
logical elements into a single compression unit, the first position
allowing access equates to the beginning of the next (previous if
reading reverse) compression unit.  The standard should remain silent on
the quantitative relationship between the logical block containing the
un-decompressable data and the position established after encountering
the un-decompressable data.  However I do think that a note to the
effect that, 'an application client should use the Read Position command
after receiving CANNOT DECOMPRESS USING DECLARED ALGORITHM' is in order.
 
Cheers,
Michael
 
 
 -----Original Message-----
From: JoeBre at exabyte.com [mailto:JoeBre at exabyte.com]
Sent: Thursday, March 07, 2002 6:57 PM
To: kdbutt at us.ibm.com; Dennis.W.Painter at seagate.com
Cc: t10 at t10.org
Subject: RE: SSC-2 Note 48





> -----Original Message----- 
> From: Kevin D Butt [ mailto:kdbutt at us.ibm.com
 ] 
> Sent: Thursday, March 07, 2002 10:00 AM 
> To: JoeBre at exabyte.com; Dennis.W.Painter at seagate.com 
> Cc: t10 at t10.org 
> Subject: SSC-2 Note 48 
> 
> 
> * From the T10 Reflector (t10 at t10.org), posted by: 
> * "Kevin D Butt" <kdbutt at us.ibm.com> 
> * 
> Joe, 
>       I agree with your concern about Note 48, as it can lead to data 
> integrity problems when used with certain formats.  For 
> example, let's say 
> that there is an undetected compression failure during a 
> write such that an 
> illegal (or incorrect) compression codeword is generated.  
> Let's say the 
> part of the compressed data stream (CDS) which encompasses 
> the bad codeword 
> is after the access point, but before the end of, dataset N.  
> In that case 
> we can decompress dataset N-1, even a record spanning out of 
> dataset N-1 
> into dataset N would be decompressible, because by the 
> definition of an 
> access point that would occur before it. Similarly we could decompress

> starting at an access point in dataset N+1.  The one thing we 
> would almost 
> certainly fail at is decompressing from the access point in 
> dataset N to 
> the access point in dataset N+1, and we would typically get a 
> CRC failure. 
> There might be many records between these two points. Thus we 
> have some 
> length of CDS (e.g. as little as 8 bytes, or as many as 
> ~806000, if the 
> access points are in sequential datasets) which we cannot decompress 
> (properly).  The problem is the record boundaries are 
> embedded into the CDS 
> and an illegal codeword would typically make it impossible for us to 
> discern where those record boundaries are.  What we do know 
> is how many 
> records (and filemarks) are supposed to be contained in that 
> length of CDS. 
> As an example we might have 600000 bytes of CDS and know that it 
> corresponds to 3 records.  For the sake of argument let's say that the

> illegal codeword is in the second of these records. In that 
> case we could 
> even decompress and give to the host, without error, the 
> first record. But 
> we would not typically know how many bytes of the CDS are 
> associated with 
> the second record, and consequently we don't know how many 
> are associated 
> with the third record.  Let's consider this case in the 
> context of your 
> proposed rewording of Note 48: 
> 
> 
>    ========================================= 
> 
> 
>    When compressed data is encountered on the medium that the 
> device is 
>    unable to decompress, the device should treat each logical 
> block of the 
>    data similarly to a block that cannot be read due to a 
> permanent read 
>    media error, i.e.: transfer all data to the initiator up to the 
>    beginning of the first non-decompressible block; set a contingent 
>    allegiance indicating the error (0x03, 0x11, 0x0E - CANNOT 
> DECOMPRESS 
>    USING DECLARED ALGORITHM?); set the VALID, ILI, and 
> INFORMATION fields 
>    according to the original (uncompressed) state of the 
> block; and set the 
>    current logical position to the following logical block, whether 
>    decompressible or not. 
> 
> 
>    This will allow the initiator to issue subsequent reads to 
> the device, 
>    each failing, until the non-decompressible region is exited. This 
>    mechanism is directly analogous to the method the 
> initiator may use to 
>    'step' its way through a damaged area of tape, (sequence of logical

>    blocks with media errors). 
> 
> 
>    ========================================= 
> 
> 
> I agree with your broad strokes -- that is the application 
> should get error 
> codes for the second and third records, but he should be able 
> to traverse 
> these and then continue on reading the next record (which 
> would correspond 
> to the first of data set N+1).  As far as the specifics, 
> given that we were 
> able to decompress the first (let's say this corresponds to 
> 198000 bytes of 
> CDS), how do we apportion the remaining 402000 bytes to the 
> remaining two 
> records so that we can give an ILI? 

I see your point. In the general case, there may exist no ability to
report what length each non-decompressible logical element would be in
its uncompressed state. In this case, parhaps we need to report to the
initiator that the length is indeterminate. However, this may also be
the case for permanent read errors. My desire is to allow the initiator
to employ the same defect skipping algorithm.

> The key point of the whole Note, from my perspective, is "and set the 
> current logical position to the following logical block, whether 
> decompressible or not".  On this I agree with you completely, 
> it is the 
> only way we would be able to allow an application to traverse an 
> incompressible region without data integrity issues. 
> 
> 
> On your other point, some people think of  'logical block' 
> and 'record' as 
> fully interchangeable, but there are some subtle differences. 
>  A 'logical 
> block' might refer to either a record or a filemark. 

I would prefer to banish the term 'record' altogether. A reading of
SSC-2 section 3.1.x will show that 'record' has no defined meaning.
Consequently, using this term to discuss behaviors leads to the
discussion participants to 'talk past each other'. To illustrate this
point, a logical block *cannot* be a mark, as per 3.1.32. Rather, the
object that can be either a logical block or a mark is actually a
logical element, as per 3.1.34. It gets confusing, as logical blocks are
not the objects that have logical block addresses. Unfortunately,
logical *elements* have logical block addresses. This seems to be an
unfortuante consequence of earlier specifications playing fast and loose
with definitions of objects. One way or another, we need to agree on the
terminology before we can make significant headway on defining
behaviors.

> On the issue of filemarks, let's say the 402000 bytes 
> discussed above were 
> associated with two records and a filemark (instead of just 
> two records as 
> discussed above).  In this case we cannot know whether the 
> filemark was the 
> first, second, or third entity in the 402000 bytes (it could 
> only be the 
> third in this specific case if the access point in DS N+1 was 
> at zero, but 
> that is a side issue).   The question is whether even with 
> the rewording of 
> Note 48 we don't still have a data integrity issue.  As an 
> example it might 
> be the intention of the application to read each block until 
> a filemark is 
> encountered and then grab the next 20 records.  In that case 
> it makes a big 
> difference if the filemark is the first, second, or third 
> entity in the 
> incompressible area. Even if the applications intent were 
> more generic and 
> he would grab all of the records after the filemark (e.g. 
> until the next 
> filemark), it cannot know whether it is missing some 
> incompressible records 
> (e.g. if the filemark were the first or second entities), or 
> if in fact it 
> will get everything it wants (e.g. filemark is third entity). 

I see the point. Perhaps the best possible outcome is to ensure that the
application vendors understand that, in this case, they will need to
discard all logical blocks until they reach a decompressible filemark.
If the filemarks are used to delineate 'files' (as the name may
suggest), then this may not be much of a burden, as the 'file' amy be
meaningless if not complete. I understand that there is no requirement
to use filemarks to delineate 'files', however.

  
> Also, is it possible to post the proper error indicators (set 
> a contingent 
> allegiance indicating the error (0x03, 0x11, 0x0E - CANNOT 
> DECOMPRESS USING 
> DECLARED ALGORITHM?); set the VALID, ILI, and INFORMATION 
> fields according 
> to the original (uncompressed) state of the block) without 
> knowing whether 
> the logical entity was a record or a filemark? 

Perhaps we do not have enough information. Perhaps the best we can hope
for is to find some way to specify to the initiator the number of (each
of) logical blocks, filemarks, and setmarks that we were unable to
decompress. I still feel that the best thing to do is to let the
inititor invoke the same defect skipping as for unreadable regions of
media with non-compressed data (be it logical block or mark). 

This has suddenly got me thinking about such unreadable regions of
media. In this case, how is one to determine the relative positioning of
logical blocks and marks in a region of damaged media? I am unsure if
our formats allow us to determine this. I guess this is a roundabout way
of saying that, even for uncompressed data, we may have the same
situation you pointed out above.

> 
> Kevin D. Butt 
> IBM Tape Products 
> SCSI and Fibre Channel Microcode Development 
> 6TYA, 9032 S. Rita Rd. 
> Tucson, AZ  85744 
> Office:  (520)799-5280, Tie-line 321 
> Lab: (520)799-2869 
> Fax:  (520)799-4062 
> Email:  kdbutt at us.ibm.com 
> 
> * 
> * For T10 Reflector information, send a message with 
> * 'info t10' (no quotes) in the message body to majordomo at t10.org 
> 


------_=_NextPart_001_01C1DCB4.4F9FC7E0
Content-Type: text/html;
	charset="iso-8859-1"

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

RE: SSC-2 Note 48 Joe,
  
 Unless the standard forces the drive to return the data without decompression, I don't see the value in having the drive establish the logical position following the un-decompressable logical block.  The application can't read anything until the drive reaches a position within the data where it can begin to recognize logical elements again.  This same statement with slight modification applies to un-readable areas on the media as well.
  
 I believe that, after encountering un-decompressable data, the drive should establish the logical position at the first position on the EOP side of the un-decompressable data when reading forward (or the BOP side of the un-decompressable data when reading in reverse) where its decompression algorithm allows access.  Whether the drive's compression scheme places a single logical element in a compression unit or multiple logical elements into a single compression unit, the first position allowing access equates to the beginning of the next (previous if reading reverse) compression unit.  The standard should remain silent on the quantitative relationship between the logical block containing the un-decompressable data and the position established after encountering the un-decompressable data.  However I do think that a note to the effect that, 'an application client should use the Read Position command after receiving CANNOT DECOMPRESS USING DECLARED ALGORITHM' is in order.
  
 Cheers,
 Michael
  
  
  -----Original Message-----
From: JoeBre at exabyte.com [mailto:JoeBre at exabyte.com]
Sent: Thursday, March 07, 2002 6:57 PM
To: kdbutt at us.ibm.com; Dennis.W.Painter at seagate.com
Cc: t10 at t10.org
Subject: RE: SSC-2 Note 48


 

> -----Original Message----- 
> From: Kevin D Butt [mailto:kdbutt at us.ibm.com] 
> Sent: Thursday, March 07, 2002 10:00 AM 
> To: JoeBre at exabyte.com; Dennis.W.Painter at seagate.com 
> Cc: t10 at t10.org 
> Subject: SSC-2 Note 48 
> 
> 
> * From the T10 Reflector (t10 at t10.org), posted by: 
> * "Kevin D Butt" <kdbutt at us.ibm.com> 
> * 
> Joe, 
>       I agree with your concern about Note 48, as it can lead to data 
> integrity problems when used with certain formats.  For 
> example, let's say 
> that there is an undetected compression failure during a 
> write such that an 
> illegal (or incorrect) compression codeword is generated.  
> Let's say the 
> part of the compressed data stream (CDS) which encompasses 
> the bad codeword 
> is after the access point, but before the end of, dataset N.  
> In that case 
> we can decompress dataset N-1, even a record spanning out of 
> dataset N-1 
> into dataset N would be decompressible, because by the 
> definition of an 
> access point that would occur before it. Similarly we could decompress 
> starting at an access point in dataset N+1.  The one thing we 
> would almost 
> certainly fail at is decompressing from the access point in 
> dataset N to 
> the access point in dataset N+1, and we would typically get a 
> CRC failure. 
> There might be many records between these two points. Thus we 
> have some 
> length of CDS (e.g. as little as 8 bytes, or as many as 
> ~806000, if the 
> access points are in sequential datasets) which we cannot decompress 
> (properly).  The problem is the record boundaries are 
> embedded into the CDS 
> and an illegal codeword would typically make it impossible for us to 
> discern where those record boundaries are.  What we do know 
> is how many 
> records (and filemarks) are supposed to be contained in that 
> length of CDS. 
> As an example we might have 600000 bytes of CDS and know that it 
> corresponds to 3 records.  For the sake of argument let's say that the 
> illegal codeword is in the second of these records. In that 
> case we could 
> even decompress and give to the host, without error, the 
> first record. But 
> we would not typically know how many bytes of the CDS are 
> associated with 
> the second record, and consequently we don't know how many 
> are associated 
> with the third record.  Let's consider this case in the 
> context of your 
> proposed rewording of Note 48: 
> 
> 
>    ========================================= 
> 
> 
>    When compressed data is encountered on the medium that the 
> device is 
>    unable to decompress, the device should treat each logical 
> block of the 
>    data similarly to a block that cannot be read due to a 
> permanent read 
>    media error, i.e.: transfer all data to the initiator up to the 
>    beginning of the first non-decompressible block; set a contingent 
>    allegiance indicating the error (0x03, 0x11, 0x0E - CANNOT 
> DECOMPRESS 
>    USING DECLARED ALGORITHM?); set the VALID, ILI, and 
> INFORMATION fields 
>    according to the original (uncompressed) state of the 
> block; and set the 
>    current logical position to the following logical block, whether 
>    decompressible or not. 
> 
> 
>    This will allow the initiator to issue subsequent reads to 
> the device, 
>    each failing, until the non-decompressible region is exited. This 
>    mechanism is directly analogous to the method the 
> initiator may use to 
>    'step' its way through a damaged area of tape, (sequence of logical 
>    blocks with media errors). 
> 
> 
>    ========================================= 
> 
> 
> I agree with your broad strokes -- that is the application 
> should get error 
> codes for the second and third records, but he should be able 
> to traverse 
> these and then continue on reading the next record (which 
> would correspond 
> to the first of data set N+1).  As far as the specifics, 
> given that we were 
> able to decompress the first (let's say this corresponds to 
> 198000 bytes of 
> CDS), how do we apportion the remaining 402000 bytes to the 
> remaining two 
> records so that we can give an ILI? I see your point. In the general case, there may exist no ability to report what length each non-decompressible logical element would be in its uncompressed state. In this case, parhaps we need to report to the initiator that the length is indeterminate. However, this may also be the case for permanent read errors. My desire is to allow the initiator to employ the same defect skipping algorithm. > The key point of the whole Note, from my perspective, is "and set the 
> current logical position to the following logical block, whether 
> decompressible or not".  On this I agree with you completely, 
> it is the 
> only way we would be able to allow an application to traverse an 
> incompressible region without data integrity issues. 
> 
> 
> On your other point, some people think of  'logical block' 
> and 'record' as 
> fully interchangeable, but there are some subtle differences. 
>  A 'logical 
> block' might refer to either a record or a filemark. I would prefer to banish the term 'record' altogether. A reading of SSC-2 section 3.1.x will show that 'record' has no defined meaning. Consequently, using this term to discuss behaviors leads to the discussion participants to 'talk past each other'. To illustrate this point, a logical block *cannot* be a mark, as per 3.1.32. Rather, the object that can be either a logical block or a mark is actually a logical element, as per 3.1.34. It gets confusing, as logical blocks are not the objects that have logical block addresses. Unfortunately, logical *elements* have logical block addresses. This seems to be an unfortuante consequence of earlier specifications playing fast and loose with definitions of objects. One way or another, we need to agree on the terminology before we can make significant headway on defining behaviors. > On the issue of filemarks, let's say the 402000 bytes 
> discussed above were 
> associated with two records and a filemark (instead of just 
> two records as 
> discussed above).  In this case we cannot know whether the 
> filemark was the 
> first, second, or third entity in the 402000 bytes (it could 
> only be the 
> third in this specific case if the access point in DS N+1 was 
> at zero, but 
> that is a side issue).   The question is whether even with 
> the rewording of 
> Note 48 we don't still have a data integrity issue.  As an 
> example it might 
> be the intention of the application to read each block until 
> a filemark is 
> encountered and then grab the next 20 records.  In that case 
> it makes a big 
> difference if the filemark is the first, second, or third 
> entity in the 
> incompressible area. Even if the applications intent were 
> more generic and 
> he would grab all of the records after the filemark (e.g. 
> until the next 
> filemark), it cannot know whether it is missing some 
> incompressible records 
> (e.g. if the filemark were the first or second entities), or 
> if in fact it 
> will get everything it wants (e.g. filemark is third entity). I see the point. Perhaps the best possible outcome is to ensure that the application vendors understand that, in this case, they will need to discard all logical blocks until they reach a decompressible filemark. If the filemarks are used to delineate 'files' (as the name may suggest), then this may not be much of a burden, as the 'file' amy be meaningless if not complete. I understand that there is no requirement to use filemarks to delineate 'files', however.   
> Also, is it possible to post the proper error indicators (set 
> a contingent 
> allegiance indicating the error (0x03, 0x11, 0x0E - CANNOT 
> DECOMPRESS USING 
> DECLARED ALGORITHM?); set the VALID, ILI, and INFORMATION 
> fields according 
> to the original (uncompressed) state of the block) without 
> knowing whether 
> the logical entity was a record or a filemark? Perhaps we do not have enough information. Perhaps the best we can hope for is to find some way to specify to the initiator the number of (each of) logical blocks, filemarks, and setmarks that we were unable to decompress. I still feel that the best thing to do is to let the inititor invoke the same defect skipping as for unreadable regions of media with non-compressed data (be it logical block or mark). This has suddenly got me thinking about such unreadable regions of media. In this case, how is one to determine the relative positioning of logical blocks and marks in a region of damaged media? I am unsure if our formats allow us to determine this. I guess this is a roundabout way of saying that, even for uncompressed data, we may have the same situation you pointed out above. > 
> Kevin D. Butt 
> IBM Tape Products 
> SCSI and Fibre Channel Microcode Development 
> 6TYA, 9032 S. Rita Rd. 
> Tucson, AZ  85744 
> Office:  (520)799-5280, Tie-line 321 
> Lab: (520)799-2869 
> Fax:  (520)799-4062 
> Email:  kdbutt at us.ibm.com 
> 
> * 
> * For T10 Reflector information, send a message with 
> * 'info t10' (no quotes) in the message body to majordomo at t10.org 
> 

------_=_NextPart_001_01C1DCB4.4F9FC7E0--




More information about the T10 mailing list