SCSI-2 (&3) parity errors

Dave, 508-770-6877,-6721 16-Mar-1995 1415 cressman at elwood.enet.qntm.com
Thu Mar 16 12:37:21 PST 1995


From: Dave Cressman, Quantum Corp.
To:   SCSI folks

    In response to the following recent reflector emails:

>        Reply to:   RE>>SCSI-2 Parity Errors -Reply
>
>>In reguards to doing tricky things with parity errors - I would recommend 
>>that the drive immediately report any and all parity errors.  If there are 
>>parity errors being detected,  then a 2 bit error could also occur and this 
>> would go undetected.  This could cause the drive to go to the wrong address 
>>if it occurred in the CDB phase or have undetectable data corruption if it 
>>occurred in the Data phase.  Why risk it?!!   Any detection of a parity 
>>error should be a RED FLAG  that you have bus integrity problems that need 
>>to be resolve quickly.  The resolution could be as simple as proper 
>>termination or termination power.  Some drives will correct a parity error 
>>(primarily in the Message phase) and not tell the host.  Even if it did 
>>successfully detect the error I feel the host should be notified.
>>
>>          Jim Miller
>
>Note that if parity error are occuring than they should also be visiable to
>the host when information is sent by the drive to the host.  While I do
>not object to detection reporting as much as error recovery, it still does
>not seem to be really needed (and thus best kept away from in the best of
>KISS traditions).
>
>Jim [McGrath]

    Interesting & relevant discussion in light of what some customers ask
    us to implement.

    I have to speak up here.  Since SCSI has relatively weak error detection,
    and the repercussions of undetected data corruption can be so bad, I feel
    it's extremely important for both Device and Host to take any parity
    error seriously and report it diligently, whether it's on message,
    command, or data bytes; whether it's recovered or not. Note that for
    tapes, most of the time the data is going from the host to the device, so
    most likely it's the *tape drive* that will detect the PE.
    
    But, as one example of PE reporting not always being encouraged, I've
    found that the  Zadian Investigator tool, when it forces a PE on the
    CDB, doesn't like it if the device, after successfully completing the
    request, responds with CheckCondition status (w/ Sense Data of SK=1,
    ASCQ=PE).  Maybe this is an oversight by Zadian, but customers use these
    de-facto standard tools to do testing of drives, and at best we have to
    explain these "failures" to them, and at worst they ask us to "fix" the
    drive.
    
    As for keeping bus error recovery simple:

    Personnally, I favor the SCSI-2 option of the target going directly 
    to Status phase for the simpler cases, and the option of just dropping 
    off the bus for the more complicated cases, like on a data transfer, 
    and preparing Sense Data which defines the cause.  Unfortunately, not
    many hosts react properly in this case (i.e. they don't issue a Request
    Sense as the next command) so the Sense data is lost, although a failure
    is usually logged.  This seems a good balance between an extreme KISS
    tradition and elaborate error recovery/retry in all bus error cases.  

    What makes a data transfer hard to recover on? Well, on a tape drive, the 
    block can be very large-- up to 16MB-1, so when a PE occurs, much of the
    block can be already written to the media.  This is bad enough, but
    tape drives like to have well pipelined data paths into the cache, and
    also they like to have that data compressed (to give a bigger effective
    cache).  It's nice to have the compression engine running in parallel
    with the SCSI DMA (esp. with fast-wide and higher data rates where it
    can be tricky/expensive to support the bus bandwidth all the way into the
    drive's cache.)  In short, for tapes at least, I have found that being
    able to support a bus level retry of data transfers generally means
    less pipelining and therefore a trade-off of some device bandwidth. And
    for what?  Just a rarely used, hard to test, etc.,  BUT aesthetically 
    pleasing error retry scheme.  Clearly not worth it, I think.


    -Dave C.

    David C. Cressman
    Quantum Corp
    DLT (digital linear tape) development
    cressman at elwood.enet.qntm.com

-------------------------------------------------------------------





More information about the T10 mailing list