SCSI, Networks & Sense Data

Charles Monia monia at
Wed Mar 22 11:49:43 PST 1995

Ken Hallm wrote in part:

$Maybe by encouraging good SCSI etiquette by designers of future products we
$can raise the LCD to a point where a useful exchange of error information
$can occur. Yeah, Right.
$The problem is that as we go farther and faster with the new serial
$interface schemes, the possibility of transient errors becomes very real.
$The channel guys are spoiled by the level of system integrity they have
$enjoyed over the years.

The fact is that all high-level distributed applications, which is what we're
talking about here,  require channel-quality service. It's simply not practical
to write such applications with soft error recovery in mind. Just like no
fortran program  is expected to deal with disk ECC error correction, we should
similarly not require disk drivers, or fortran programs for that matter, to
deal with the error-recovery quirks peculiar to each physical interconnect. In
fact, we want them to be independant of such factors.  I believe similar
considerations apply to systems at the target end of the wire as well.

Besides, protecting applications from such soft errors is one thing LLP
protocols do pretty well so it's not unreasonable for ULP's to be designed on
that assumption.

$................................One parity error could be considered a direct
$indication that the bus was sick. Check cables and terminators, as
$something has gone seriously wrong. We never get parity errors on a
$properly configured bus! Sure, as long as everyone was in the same room,
$probably sharing the same ground referrence and with good solid window
$margins, that was a good bet. Not so with a campus-wide net of peripherals,
$most of which have nothing in common electrically, not to mention the
$questionable quality of the switches and butted-together cables users will
$insist on using.

Obviously, the 'ground rules' that apply to a SPI bus don't extend to a network
interconnect. Why should they? As to network interconnects, the lower layer
protocol, as noted above, is usually the way to compensate for shortcomings in
the electrical environment.

$Preservation of Sense data is not the Link's problem. As you suggest, it
$belongs in the Upper Layer Protocol.

Why? In my opinion, this is analogous  to having your feet cut off because
you're too short to fit  the bed. The notion of retaining sense data would
effectively require all future SCSI implementations to address a problem
specific to one type of interconnect. Why not correct tranport problems in the
transport layer?

$But having said that, how do we get it accomplished? SAM is still based on
$the channel-centric view of the universe and is not likely to change. Or is
$it? Is there a change in philosophy regarding error recovery out there?
$I doubt it, as the prevailing attitude is that of blunt force. Any error
$indication means invoke the software driver, issue an abort, reset the
$device and try again. An error means it is BROKEN, not a transient.This in
$not only the prevailing attitude, but based on experience, (with terrible
$SCSI error recovery implementations) it is the correct one.

In my opinion, this characterization is incorrect. SAM assumes that the service
provided by the transport layer (LLP plus physical interconnect):

	1) is highly reliable,
	2) has very low probability of undetected errors
	3) is free of correctable errors (such errors get fixed transparently
	    to the ULP, usually by the LLP).

When an LLP can't fix a  transport error, it throws up it's hands and notifies
the ULP. The ULP has no other choice than to treat the problem as a hard error.
Apparently, in the case of SIP/SPI, many implementations assume that the
physical medium is free of correctable errors. hence any error is a hard error.
There seems to be a large contingent of people who think that assumption is
reasonable for SPI.

Charles Monia

More information about the T10 mailing list