Transport layer retries for targets
dsluiter at exabyte.com
Wed Sep 14 10:14:40 PDT 2005
* From the T10 Reflector (t10 at t10.org), posted by:
* "Sluiter, David" <dsluiter at exabyte.com>
I'm trying to understand the scope of errors intented to be handled
by transport layer retries (vs. errors handled at a higher layer in
the protocol) & I have some questions. These are for cases where
TRANSPORT LAYER RETRIES bit is set to one in the Protocol Specific
Unit mode page. I'm reading spec 1.1 Rev 9, March 18, 2005.
Section 18.104.22.168.2 Data frame with transport layer retries, states:
"If an SSP target port transmits a read DATA frame and does not
receive an ACK or NAK for that frame (e.g., times out, or the
connection is broken): "
Does "connection broken" cover the case of lost dword sync due
to a pulled cable? - someone briefly pulls the cable & then re-plugs
it. A case exists where a target transmits a DATA frame and it
is sucessfully received by the initiator. The initiator sends
the ACK for that frame but just as the ACK primitive is being
serialized in the initiator's serdes, the cable is yanked and so the
ACK is not received by the target. The spec says "the ST_TTS state
machine retransmits ... all read DATA frames since a previous time
when ACK/NAK balanced occurred" - meaning all frames that haven't yet
been ACK'ed or NAK'ed. So the initiator and the target are out of sync.
The target believes the frame wasn't ACK'ed while the initiator
believes the frame was ACK'ed. I don't see how tranport layer retries
can recover from this case. Since the Link reset will run again
once dword sync is lost, is it up to the application layer to fix up
this read command, exchange pointers etc & get it going again after
the Link reset sequence?
What about where the target sends 4 frames labeled A, B, C & D.
The initiator receives them all fine & sends 4 ACK's labeled
a, b, c & d respectively. ACK c experiences a single bit error
on the wire corrupting it into an invalid dword, and so becomes
"lost". The only "accounting method" I can see a target can
implement is a temporal one [maybe I'm wrong here] and so the
target thinks it received ACKs a, b & c - and again the inititator
and target are out of sync. The target will ACKNAK timeout for frame
D & then resend frame D. Is this a correct understanding? The initiator
can deal with this?
Senior ASIC Designer
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at t10.org
More information about the T10