Comments on Async phase CRC

gop at us.ibm.com gop at us.ibm.com
Fri Apr 30 13:26:37 PDT 1999


* From the T10 Reflector (t10 at symbios.com), posted by:
* gop at us.ibm.com
*
All,

I feel compelled to respond to the recent proposal that has being brought
into the SCSI committee for consideration requesting additional protection
be added to the asynchronous phases (i.e., command, status, and message).
This proposal has no technical merit and could very well lead customers to
false impressions that they are getting something more than they actually
are.

Packetized SCSI has been given as a reason why this is the additional
protection is needed. But this makes no sense. All that I have heard, is
that because packetized SCSI has CRC on the command and status packets that
normal SCSI command/status/message phases are somehow deficient. Although
it may sound valid there is no technical evidence that one provides better
protection than the other.

All of packetized SCSI runs on data phases; all data phases are protected
with CRC. The reason we put the CRC on data phases is because they run at
the maximum synchronous wide data speeds. And, in many cases, there is no
way to determine if the data received is valid except for the CRC check.
However, in normal SCSI, data in the command/status/message phases run, NOT
at high speeds, but basically the same asynchronous speed it has always run
at. The amount of data protection on those phases has been more than
adequate up to now and there is no empirical or measured evidence to
indicate there is a problem now.

Packetized SCSI is NOT about better protection of data (in fact the
protection is probably no better or worse than what is currently defined in
SPI-3 for normal SCSI). Packetized SCSI is about improving performance by
reducing SCSI overheads. So choosing to not implement packetized SCSI is a
statement about performance not protection nothing more nothing less.

Hot plugging has been repeatedly used as a reason for adding in the CRC.
Yes hot plugging does have the potential for disturbing the bus and causing
data errors. But does it cause the kind of errors that would be missed by
parity detection during asynchronous data transfers? Without any data who
can say. But even if some of the errors got through would all of them get
through (i.e., only double bit errors occurred)? And even if all of them
got through would they only occur in the few fields not checked by
software?

To make a long story short there was agreement at the last SCSI meeting
that there are only two fields that is could happen on. One is the Logical
Block Address field in the CDB which is checked but only for a maximum
value. The other was the queue tag. But, after thinking about it, I believe
there is only one remote case where the command would  be  completed
successfully with a corrupted queue tag if it the target disconnects. My
reasons are:
-If the target does not disconnect the command will complete successfully
without error.
-If the corrupted tag is equivalent to an existing tag, there is a
possibility that the command could complete without any error indication
but it would take just the right conditions. The correct way to solve this
problem is for the target to check for duplicate tags.
-If the corrupted tag is unique and the target disconnects, everything will
be OK until the target attempts to reconnect with the bad tag. At that
point the initiator will not recognize the tag and should abort the
command.

So let's review: to get an undetected error using today's parity all the
following have to occur:
1-A drive is hot plugged.
2-Only double bit errors occur.
3-The error has to only occur in an unchecked field on a read or write
command. Of which there is only one.
4-The command and data transfer would have to complete successfully.

What are the odds of all those events happening at the same time? I am not
a statistician but starting with an unusual event and moving down to
progressively less likely events lead me to suspect it is extremely rare
event. So much so that it is more likely the detection circuitry will fail
before any error could get through.

I spite of all the above the supporter of this proposal are pushing hard on
the political front for it to be placed into SPI. The last time something
like this occurred we ended up with SCAM (which at least had some technical
merit) and how much money did we burn on that one.  There is an easy way to
stop this and that is to vote no when this proposal comes up.

Bye for now,
George Penokie

Dept PPV  114-2 N212
E-Mail:    gop at us.ibm.com
Internal:  553-5208
External: 507-253-5208   FAX: 507-253-2880


*
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at symbios.com





More information about the T10 mailing list