SAS2 - OPEN TIMEOUT

Larry Chen Larry_Chen at pmc-sierra.com
Wed Jan 9 17:40:41 PST 2008


Formatted message: <A HREF="r0801097_f.htm">HTML-formatted message</A>

See comments inline.
________________________________
From: owner-t10 at t10.org [mailto:owner-t10 at t10.org] On Behalf Of Elliott,
Robert (Server Storage)
Sent: Wednesday, January 09, 2008 2:59 PM
To: t10 at t10.org
Subject: RE: SAS2 - OPEN TIMEOUT
The OPEN is not blindly retried - it's retried only until the I_T Nexus
Loss Time timer expires (normally 2 seconds).
[Larry Chen] I meant auto-retry is enabled w/o allowing any host SW/FW
intervention.
The most likely reason for Open Timeout timer expiration is that the
OPEN address frame suffered a single-bit error.  Since there is no
ACK/NAK for address frames, the only indication of a problem is the lack
of an AIP, OPEN_ACCEPT, or OPEN_REJECT.  The originator times out after
1 ms and retries (so a single-bit error doesn't cause a major error).
[Larry Chen] I agree i.e., there isn't a NAK sent for a bad CRC in the
OPEN address frame.
If the bit error keeps occuring, though, the I_T Nexus Loss Time will
kick in and a major error will be reported (the destination is
unreachable).
[Larry Chen] The problem is that there isn't any history maintained via
error counters i.e., accumulative. In theory, each open
Connection request could go thru lengthy auto-retries and the host would
never be notified since the errors didn't occur consecutively.
-- 
Rob Elliott, elliott at hp.com 
Hewlett-Packard Industry Standard Server Storage Advanced Technology 
________________________________
	From: owner-t10 at t10.org [mailto:owner-t10 at t10.org] On Behalf Of
Larry Chen
	Sent: Wednesday, January 09, 2008 3:35 PM
	To: Kevin D Butt
	Cc: t10 at t10.org
	Subject: RE: SAS2 - OPEN TIMEOUT
	IMO, Timeouts are more serious than OPEN_REJECTs (and NAK, SCSI
Busy and Full Queue) Responses.
	If Timeouts are _not_ reported to the host driver and/or the
diagnostic monitoring code then the problem can not
	be detected and rectified Via a FRU swap.
________________________________
	From: Kevin D Butt [mailto:kdbutt at us.ibm.com] 
	Sent: Wednesday, January 09, 2008 8:15 AM
	To: Larry Chen
	Cc: t10 at t10.org
	Subject: Re: SAS2 - OPEN TIMEOUT
	 I do not see a reason to distinguish an open timeout from the
other errors.  Unless there is a very good reason, I would prefer to
leave the text as is.  It seems to me that we should retry open
timeouts, since it may work the next time.  Also, the point of doing
recovery operations is to mask errors (so that the job can continue), so
that does not seem like a good reason to stop attempting the recovery.
	Kevin D. Butt
	SCSI & Fibre Channel Architect, Tape Firmware
	MS 6TYA, 9000 S. Rita Rd., Tucson, AZ 85744
	Tel: 520-799-2869 / 520-799-5280
	Fax: 520-799-2723 (T/L:321)
	Email address: kdbutt at us.ibm.com
	http://www-03.ibm.com/servers/storage/ 
"Larry Chen" <Larry_Chen at pmc-sierra.com> 
Sent by: owner-t10 at t10.org 
01/08/2008 03:00 PM 
To
<t10 at t10.org> 
cc
Subject
SAS2 - OPEN TIMEOUT
	Is there any mechanism in place to _exclude_ OPEN TIMEOUT from
being retried (see 
	RED font below for details). I think there is a danger of
masking out errors if OPEN TIMEOUT 
	Is blindly retried. 
	--- 
	4.5 I_T nexus loss 
	When a SAS port receives OPEN_REJECT (NO DESTINATION),
OPEN_REJECT (PATHWAY BLOCKED), 
	OPEN_REJECT (RESERVED INITIALIZE 0), OPEN_REJECT (RESERVED
INITIALIZE 1), OPEN_REJECT 
	(RESERVED STOP 0), OPEN_REJECT (RESERVED STOP 1), or an open
connection timeout occurs in 
	response to a connection request, it shall retry the connection
request until: 
	a) the connection is established; 
	b) for SSP target ports, the time indicated by the I_T NEXUS
LOSS TIME field in the Protocol-Specific Port 
	mode page (see 10.2.7.4) expires; or 
	c) the I_T nexus loss timer, if any, expires (see 4.7.1,
8.2.2.1, 10.2.7.4, and 10.4.3.17). 



More information about the T10 mailing list