H e w l e t t - P a c k a r d Computer Peripherals Bristol *** *** *** *** *** ******** ******** ******** ******** * *** *** *** *** * *** *** *** *** *** **** **** *** *** ******** * * * * * *** *** ******** *** **** **** *** * *** * *** *** *** Data Compression Control Proposal for Sequential Access devices. Document : X3T9.2/90-119 R3 Date : 25th March 1991 Author : Steve Krupa, HP CPB Hewlett Packard, Computer Peripherals Bristol, Filton Rd, Stoke Gifford, Bristol, BS12 6QZ UK Tel : + 44 272 799910 Fax : + 44 272 236091 INTRODUCTION ************** This is the third revision of the document X3T9.2/90-119. The initial document was dated 16th July 1990. Revision 1 was dated 29th October 1990. Revision 2 was dated 14th December 1990 A Proposal for a SCSI MODE SENSE/SELECT page to control and report on the operation of Data Compression in a Sequential access device. It is intended for, but not necessarily limited to, support of devices which make use of lossless compression algorithms which are based on substitution; e.g. those of the Lempel-Ziv family. There are a number of issues addressed by this document with regard to data compression. a) How does a host use the available functionality in SCSI-2 to control data compression ? b) What additional features could a device which supports data compression offer to the host and how should these be made available to the host ? c) If a device which does not support data compression encounters a piece of media containing compressed data, what action should it take ? d) What form does the host-device interface take in order to allow the host to perform software decompression if required ? This is an issue for both current functionality in SCSI-2 and for any additional feature set. Document Structure : PARTS 1 and 2 are INFORMATIONAL ONLY. PART 3 is intended for inclusion in the draft SCSI-3 document. Part 1 : Describes the current level of support for DC control in SCSI 2 Part 2 : Describes how a host interacts with a DDS device using the Data Compression Mode Page with particular emphasis on software decompression. Part 3 : Describes the format of the Data Compression Mode Page. PART 1 ********** DATA COMPRESSION CONTROL using DEVICE CONFIGURATION MODE PAGE ***************************************************************** This section describes the support available in SCSI-2 for Data Compression control. 1.1 DC support in X3T9.2 SCSI-2 Rev 10 ==================================== The only support currently available in the SCSI set is a one-byte field, the SDCA field (byte 14), in the Device Configuration Page (page 10h). This page is specific to Sequential Access devices. The use of this field is in many ways vendor-specific - the QIC manufacturers have reached a separate agreement on its meaning. 1.1.2 X3T9.2 definition ------------------- Byte value Description ------------------------------------------------------------- 00h Disable Compression 01h Select targets default compression algorithm 02h - 7Fh Select compression algorithm # 80H - FFh Vendor Specific 1.1.3 QIC definition ---------------- ------------------------------------------------- | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | ------------------------------------------------- | DC | On | QIC-Approved DC Algorithm | | on |Drive| | ------------------------------------------------- Byte Value Description ------------------------------------------------------------ 00h Disable Compression 01h - 7Fh Invalid 10xxxxxxb DC on - host performs algorithm # xxxxxxb 11xxxxxxb DC on - drive performs algorithm # xxxxxxb As can be seen from the QIC definition, QIC chooses to ignore the X3T9.2 definition of byte values 01h-7Fh. Because one byte is not enough to fulfill the functional requirements for data compression control, the SDCA field will not be supported. The justification for this is : a) Compresion algorithm identifiers are 32 bits wide b) The complexity of allowing the host system to retrieve compressed data which the device cannot decompress requires more than 1 byte. c) Most drivers currently available set the SDCA field to 0 and may therefore inadvertently disable compression. PART 2 ******** APPLICATION NOTE ****************** 2.1 SCSI Protocol Issues ====================== On a typical DDS-DC tape, two different data item types may be encountered. A data item is either : a) An uncompressed record b) An entity compressed using algorithm L Entities are written by DDS drives which support on-board data compression. An entity is made up of a number of same-sized records compressed using the device's compression algorithm and prefixed by an entity header which is an uncompressed descriptor containing information about the data within the entity. Entity Entity Record | | | V V V ------------------------------------------------------- | | | | ------------------------------------------------------- Type N Type M A device which supports compression algorithm N can decompress entities of type N when it encounters them on the media, and return the decompressed data to the host transparently. It can also return uncompressed records to the host as it finds them on the media. If such a device encounters an entity of type M (written by a different device using a different compression algorithm) then it doesn't generally know how to decompress it. Similarly if a device which doesn't support data compression at all encounters an entity on the media then it can not decompress it. Some hosts may support software decompression, where they themselves are capable of decompressing entities. This requires that the device be able to return a compressed entity to the host. The host must also be aware that the data it is receiving is COMPRESSED data and not a normal uncompressed record. Using the Data Compression Mode page, the host has a choice of 2 different methods for handling compressed data which the device is unable to decompress. These 2 methods are selected using the RED field. If the host wishes to be able to read any compressed data on the media which the device is unable to decompress, and at the same time it wishes to minimise the number of CHECK CONDITIONS it receives from the device then it will set the RED field to 1. This will indicate to the device that it should only report DECOMPRESSION EXCEPTION CHECK CONDITIONS at format boundaries, where the type of data it will return to the host changes between uncompressed and compressed data, or that the compressed data that it will return to the host has been processed using a different algorithm to that which was returned before the DECOMPRESSION EXCEPTION. In this mode the host must keep track of whether it is receiving uncompressed or compressed data in response to READ commands. It is important for the host to be aware that any OTHER media-access commands sent to the device will mean that subsequent READ commands will NOT report a DECOMPRESSION EXCEPTION CHECK CONDITION if the device encounters uncompressed data, but WILL report such a CHECK CONDITION upon first encountering compressed data which it cannot decompress. A media-access command is defined as any command which causes logical tape motion. If the host is not concerned about the number of DECOMPRESSION EXCEPTION CHECK CONDITIONS it receives then it will set the RED field to 0. In this mode, the host does not need to keep a track of whether it is receiving uncompressed or compressed data in response to a READ command. The device will issue a DECOMPRESSION EXCEPTION CHECK CONDITION in response to any READ command which encounters compressed data which it cannot decompress. Note that in both modes, progress is made along the tape towards EOP. If a host does not wish to perform software decompression, then it can still read all of the uncompressed data on the media. If the RED field is set to 2 by the initiator, then the device will generate a DECOMPRESSION EXCEPTION CHECK CONDITION every time it encounters data on the tape in the course of a read operation, which has been processed using a different compression algorithm from that which was previously encountered during a prior read operation. This value of the RED field is most useful for test purposes. It informs the initiator of the points in the data on the tape where the device has either appended data processed in a different way, or has automatically switched processing algorithm for optimisation purposes. 2.2 Host/Drive Interaction for Software Decompression =================================================== 2.2.1 DC drive - initial operating mode example ------------------------------------------- After drive reset, a drive which supports data compression may, for example, power up with compression enabled and algorithm N selected. The Data Compression Characteristics Page contains the following values : DCE = 1 : Compression enabled. Compression Algorithm = N : Compression algorithm N selected. DDE = 1 : Decompression enabled. RED = 1 : CHECK CONDITION returned on format change only. When the RED field is first set and decompression is enabled, the device will be in an initial state that is expecting to read either uncompressed data or data compressed with a supported algorithm. If the RED field is set and decompression is disabled, the device will be in an initial state that is expecting to read uncompressed data. If the expected data is not read then a CHECK CONDITION is generated. This corresponds to a format change. Because the RED field is set to 1 the device will now be in a secondary state where it is expecting to read data in the format of that which has just been encountered (ie that which forced the generation of the CHECK CONDITION). Again, if the expected data is not read then a CHECK CONDITION will be generated. Note that any media-access command other than a READ, will return the device to its initial state. 2.2.2 Non-DC drive - initial operating mode example. ------------------------------------------------ A drive which doesn't support data compression may, for example, power up with the Data Compression Characteristics Page containing the following values : DCE = 0 : Compression disabled. Compression Algorithm = 0 : No Compression algorithm selected. DDE = 0 : Decompression disabled. RED = 0 : CHECK CONDITION returned on encountering compressed data. Because the RED field is set to 0 and the device doesn't support data decompression, any compressed data encountered will generate a CHECK CONDITION. 2.2.3 Example of Software Decompression control ------------------------------------------- From power-on, the host issues a number of READ commands to the drive, all of which successfully return uncompressed data. On the next READ command, however, the drive detects an entity of type M on the media, where M is an unsupported compression algorithm. The drive cannot decompress the entity; it therefore treats it as a single variable-length record and returns either the number of bytes in one block or the total number of bytes in the entity, whichever is smaller. At this point, it is necessary for the drive to inform the host that it has encountered a data item on the media which it cannot decompress. It does this by issuing a CHECK CONDITION to the host and setting up the sense data as follows : Valid = 1 To indicate that the Information field contains residual information from the failed READ command. Note this will only be set if the entity length was different from the requested block length. Sense Key = NO SENSE (00h) if encountered data is uncompressed RECOVERED ERROR (01h) if encountered data is decompressable by device MEDIUM ERROR (03h) if encountered data is compressed and not decompressable by device Information = READ residue The READ command failed with a residue as given in this field. Note this will only be set if the entity length was different from the requested block length. Command-Specific Information = Number of records in data item The number of records in the entity is obtained from the entity header which the drive can read. Note that in the case of a compressed-to-uncompressed format change, this field will contain 1 to indicate that 1 uncompressed record was encountered. Additional Sense Code and Qualifier = DECOMPRESSION EXCEPTION SHORT ALGORITHM ID OF NN (70h NNh) This ASC indicates the reason for the CHECK CONDITION as being a DECOMPRESSION EXCEPTION, where the encountered data has been processed using an algorithm whose registered identifier is less than or equal to 0FFh. NN specifies the algorithm identifier for the encountered data. Note that in the case of a compressed-to-uncompressed format change, the ASCQ will contain 0 to indicate that uncompressed data was encountered. DECOMPRESSION EXCEPTION LONG ALGORITHM (71h 00h) This ASC indicates the reason for the CHECK CONDITION as being a DECOMPRESSION EXCEPTION, where the encountered data has been processed using an algorithm whose registered identifier is greater than 0FFh. The drive is now positioned on the EOP side of the entity. If the host doesn't support software decompression then, if it so wishes, it can continue reading. Note that in most cases, this type of host would clear the RED bit during initial device configuration so that it only ever received a DECOMPRESSION EXCEPTION when it encountered entities. All records would be returned without this type of CHECK CONDITION. If the host supports software decompression then it must check the sense data to see if it has received all the data from the entity. If it is reading in variable mode, it does this by looking at the residual count in the Information field. If this field is non-negative then the host has received all the compressed data and will not therefore need to SPACE reverse and reread the entity. If, however, the Information field is negative, then the requested block length is less than the actual entity length, and the host must SPACE reverse and reread the whole entity in order to successfully perform the software decompression. The host does this by looking in the Command-Specific Information field in order to find the number of records in the entity. It then issues a SPACE reverse with the count field set to the 2's complement of this value. This will position the device at the start of the entity. (Note that by subtracting the Information field - ie the residual count - from the requested block length, the host can determine the actual entity size and reserve enough buffer space to receive the data). As long as the RED bit is set, the host will be able to continue reading entities from the device until either an entity is encountered which has been compressed using a different algorithm, or uncompressed data is encountered. Note that in fixed mode, the host will not be able to determine the size of the encountered entity from the Information field and the requested block length, as the residual information will be in terms of blocks, not bytes. It is up to the host in this case to take the appropriate action. Note that whenever the host needs to SPACE reverse over an entity because it has not managed to read all the data the first time around, the device will return to its initial state as far as the RED bit is concerned and will issue a DECOMPRESSION EXCEPTION CHECK CONDITION in response to the following READ command. PART 3 ******** [ The following section is intended to be included in the draft SCSI-3 document. Anything enclosed by [] in this section is a note for the editor. ] 9.3.3.x Data Compression Page Table 9-xx [A] : Data Compression Page ============================================================================== Bit| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | Byte | | | | | | | | | ============================================================================== 0 | PS |Reserved| Page Code (0Fh) | -----|-----------------------------------------------------------------------| 1 | Page Length (0Eh) | -----|-----------------------------------------------------------------------| 2 | DCE | DCC | Reserved | -----|-----------------------------------------------------------------------| 3 | DDE | RED | Reserved | -----|-----------------------------------------------------------------------| 4 | (MSB) | - - -|- - Compression Algorithm - -| 7 | (LSB) | -----|-----------------------------------------------------------------------| 8 | (MSB) | - - -|- - Decompression Algorithm - -| 11 | (LSB) | -----|-----------------------------------------------------------------------| 12 | Reserved | -----|-----------------------------------------------------------------------| 13 | Reserved | -----|-----------------------------------------------------------------------| 14 | Reserved | -----|-----------------------------------------------------------------------| 15 | Reserved | ============================================================================== This page (Table 9-xx [A]) specifies the parameters for the control of data compression in a sequential-access device. A data compression enable (DCE) bit of one indicates that data compression is to be enabled. When this bit is set, data sent to the device by the initiator shall be processed using the selected compression algorithm before being written to the medium. A DCE bit of zero indicates that data compression is to be disabled. A data compression capable (DCC) bit of one indicates that the device supports data compression and shall process data sent to it for transferral to the medium using the selected compression algorithm when the DCE bit is one. A DCC bit of zero indicates that the device does not support data compression. This shall be a non-changeable field. A data decompression enable (DDE) bit of one indicates that data decompression is to be enabled. A DDE bit of zero indicates that data decompression is to be disabled. Implementors Note : When the DDE bit is zero, all decompression algorithms are deemed unsupported by the device. The report exception on decompression (RED) field indicates the device's response to certain boundaries it detects in the data on the medium. There are a number of boundaries which may occur on the medium between compressed and uncompressed data. These boundaries are shown in table 9-xx [B]. Table 9-xx : [B] Possible boundaries encountered on the medium due to data compression. ========================================================= Prior Data | Current Data ========================================================= uncompressed | compressed | (unsupported algorithm) ---------------------------|----------------------------- uncompressed | compressed | (supported algorithm) ---------------------------|----------------------------- compressed | uncompressed (supported algorithm) | ---------------------------|----------------------------- compressed | compressed (supported algorithm) | (unsupported algorithm) ---------------------------|----------------------------- compressed | compressed (supported algorithm A) | (supported algorithm B) ---------------------------|----------------------------- compressed | uncompressed (unsupported algorithm) | ---------------------------|----------------------------- compressed | compressed (unsupported algorithm) | (supported algorithm) ---------------------------|----------------------------- compressed | compressed (unsupported algorithm A) | (unsupported algorithm B) ========================================================= A RED field of zero indicates that the device shall return a CHECK CONDITION status when data is encountered on the medium during a read operation which the device cannot decompress. This is the case at the boundaries shown in table 9-xx [C]. Table 9-xx [C] : Data boundaries which generate CHECK CONDITION status when RED field is zero. ============================================================================ Prior Data | Current Data | Sense Key ============================================================================ uncompressed | compressed | MEDIUM ERROR | (unsupported algorithm) | ---------------------------|----------------------------|------------------- compressed | compressed | MEDIUM ERROR (supported algorithm) | (unsupported algorithm) | ---------------------------|----------------------------|------------------- compressed | compressed | MEDIUM ERROR (unsupported algorithm A) | (unsupported algorithm B) | ============================================================================ A RED field of one indicates that the device shall return a CHECK CONDITION status when data is encountered on the medium during a read operation which requires different handling by the initiator than that data most recently encountered during a prior read operation. This is the case at the boundaries shown in table 9-xx [D]. Table 9-xx [D] : Data boundaries which generate CHECK CONDITION status when RED field is one. ============================================================================ Prior Data | Current Data | Sense Key ============================================================================ uncompressed | compressed | MEDIUM ERROR | (unsupported algorithm) | ---------------------------|----------------------------|------------------- compressed | compressed | MEDIUM ERROR (supported algorithm) | (unsupported algorithm) | ---------------------------|----------------------------|------------------- compressed | uncompressed | NO SENSE (unsupported algorithm) | | ---------------------------|----------------------------|------------------- compressed | compressed | RECOVERED ERROR (unsupported algorithm) | (supported algorithm) | ---------------------------|----------------------------|------------------- compressed | compressed | MEDIUM ERROR (unsupported algorithm A) | (unsupported algorithm B) | ============================================================================ At each of these boundaries, the data which is sent to the initiator is of a fundamentally different nature from that which was previously sent. A RED field of two indicates that the device shall return a CHECK CONDITION status when data is encountered on the medium during a read operation which has been processed using a different algorithm from that data most recently encountered during a prior read operation. This is the case at the boundaries shown in table 9-xx [E]. Table 9-xx [E] : Data boundaries which generate CHECK CONDITION status when RED field is two. ============================================================================ Prior Data | Current Data | Sense Key ============================================================================ uncompressed | compressed | MEDIUM ERROR | (unsupported algorithm) | ---------------------------|----------------------------|------------------- uncompressed | compressed | RECOVERED ERROR | (supported algorithm) | ---------------------------|----------------------------|------------------- compressed | uncompressed | NO SENSE (supported algorithm) | | ---------------------------|----------------------------|------------------- compressed | compressed | MEDIUM ERROR (supported algorithm) | (unsupported algorithm) | ---------------------------|----------------------------|------------------- compressed | compressed | RECOVERED ERROR (supported algorithm A) | (supported algorithm B) | ---------------------------|----------------------------|------------------- compressed | uncompressed | NO SENSE (unsupported algorithm) | | ---------------------------|----------------------------|------------------- compressed | compressed | RECOVERED ERROR (unsupported algorithm) | (supported algorithm) | ---------------------------|----------------------------|------------------- compressed | compressed | MEDIUM ERROR (unsupported algorithm A) | (unsupported algorithm B) | ============================================================================ On any of the boundary conditions described in tables 9-xx [C] thru 9-xx [E] which results in a CHECK CONDITION status, the additional sense code shall be set to either DECOMPRESSION EXCEPTION SHORT ALGORITHM ID OF NN (if the algorithm identifier is less than or equal to 255) or DECOMPRESSION EXCEPTION LONG ALGORITHM ID. The device shall, in both cases, set the decompression algorithm field to the algorithm identifier of the compression algorithm used to process the encountered data. The device shall be positioned on the EOP side of the encountered data, and the command-specific information field in the sense data shall contain a count of the number of data blocks contained within the encountered data. Implementors Note : When compressed data is encountered on the medium which the device cannot decompress, the device should treat the data as a single variable-length record. In the sense data, the valid bit, the ILI bit and the information field should be set accordingly. A RED field of three is undefined and shall result in a CHECK CONDITION status with the sense key set to ILLEGAL REQUEST. The compression algorithm field indicates the compression algorithm the device shall use to process data sent to it by the initiator when the DCE bit is set to one. If the initiator selects an algorithm which the device does not support then the device shall return a CHECK CONDITION status. The sense key shall be set to ILLEGAL REQUEST. A value of zero shall indicate that no compression algorithm is currently selected. Algorithm identifiers are shown in table 9-xx [F]. For the MODE SELECT command, the decompression algorithm field indicates the decompression algorithm selected by the initiator for use in subsequent decompression of data encountered on the medium. For devices capable of the automatic recognition of the compression algorithm used to process data encountered on the medium, the decompression algorithm selected by the initiator may be ignored, or overridden by the target for a subsequent read operation if the selected value does not match the compression algorithm, detected by the device, which was used to process the data encountered on the medium. For the MODE SENSE command, the decompression algorithm field reflects the algorithm selected by the initiator. For some devices, the decompression algorithm value returned in response to a MODE SENSE command may change dynamically to match the compression algorithm, detected by the device, which was used to process the data most recently encountered on the medium, during a read operation. A value of zero shall indicate that the data encountered on the medium during the most recent read operation was uncompressed. Algorithm identifiers are shown in table 9-xx [F]. Table 9-xx [F] shows the compression algorithm identifiers registered under the International Register of Processing Algorithms [to be] established by ISO/IEC JTC1. Table 9-xx : [F] Compression Algorithm Identifiers ============================================================================= Algorithm Identifier | Description ============================================================================= | 00h | No Algorithm Selected (identifies uncompressed data) 01h | Unused 02h - 0Fh | Not assigned 10h | IBM IDRC Data Compaction Algorithm 11h - 1Fh | Not assigned 20h | DCLZ Data Compaction Algorithm 21h - FDh | Not assigned FEh | Reserved FFH | Unregistered Algorithm 100h - FFFFFFFFh | Reserved ============================================================================= Table 9-xx [G] : ASC and ASCQ Assignments for data compression ============================================================================= D = DIRECT ACCESS DEVICE T = SEQUENTIAL ACCESS DEVICE L = PRINTER DEVICE P = PROCESSOR DEVICE W = WRITE ONCE READ MULTIPLE DEVICE R = READ ONLY (CD-ROM) DEVICE S = SCANNER DEVICE O = OPTICAL MEMORY DEVICE M = MEDIA CHANGER DEVICE C = COMMUNICATION DEVICE BYTE 12 13 DTLPWRSOMC DESCRIPTION COMMENTS -- -- -------------------------- --------------------- 70 NN T DECOMPRESSION EXCEPTION ALGORITHM ID <= 255 SHORT ALGORITHM ID OF NN 71 00 T DECOMPRESSION EXCEPTION ALGORITHM ID > 255 LONG ALGORITHM ID