Subject: RE: SAS2 - OPEN TIMEOUT Date: Wed, 9 Jan 2008 17:40:41 -0800 From: "Larry Chen" <Larry_Chen@pmc-sierra.com> To: "Elliott, Robert (Server Storage)" <Elliott@hp.com>, <t10@t10.org> X-Message-Number: 8439 Formatted message: HTML-formatted message See comments inline. ________________________________ From: owner-t10@t10.org [mailto:owner-t10@t10.org] On Behalf Of Elliott, Robert (Server Storage) Sent: Wednesday, January 09, 2008 2:59 PM To: t10@t10.org Subject: RE: SAS2 - OPEN TIMEOUT The OPEN is not blindly retried - it's retried only until the I_T Nexus Loss Time timer expires (normally 2 seconds). [Larry Chen] I meant auto-retry is enabled w/o allowing any host SW/FW intervention. The most likely reason for Open Timeout timer expiration is that the OPEN address frame suffered a single-bit error. Since there is no ACK/NAK for address frames, the only indication of a problem is the lack of an AIP, OPEN_ACCEPT, or OPEN_REJECT. The originator times out after 1 ms and retries (so a single-bit error doesn't cause a major error). [Larry Chen] I agree i.e., there isn't a NAK sent for a bad CRC in the OPEN address frame. If the bit error keeps occuring, though, the I_T Nexus Loss Time will kick in and a major error will be reported (the destination is unreachable). [Larry Chen] The problem is that there isn't any history maintained via error counters i.e., accumulative. In theory, each open Connection request could go thru lengthy auto-retries and the host would never be notified since the errors didn't occur consecutively. -- Rob Elliott, elliott@hp.com Hewlett-Packard Industry Standard Server Storage Advanced Technology ________________________________ From: owner-t10@t10.org [mailto:owner-t10@t10.org] On Behalf Of Larry Chen Sent: Wednesday, January 09, 2008 3:35 PM To: Kevin D Butt Cc: t10@t10.org Subject: RE: SAS2 - OPEN TIMEOUT IMO, Timeouts are more serious than OPEN_REJECTs (and NAK, SCSI Busy and Full Queue) Responses. If Timeouts are _not_ reported to the host driver and/or the diagnostic monitoring code then the problem can not be detected and rectified Via a FRU swap. ________________________________ From: Kevin D Butt [mailto:kdbutt@us.ibm.com] Sent: Wednesday, January 09, 2008 8:15 AM To: Larry Chen Cc: t10@t10.org Subject: Re: SAS2 - OPEN TIMEOUT I do not see a reason to distinguish an open timeout from the other errors. Unless there is a very good reason, I would prefer to leave the text as is. It seems to me that we should retry open timeouts, since it may work the next time. Also, the point of doing recovery operations is to mask errors (so that the job can continue), so that does not seem like a good reason to stop attempting the recovery. Kevin D. Butt SCSI & Fibre Channel Architect, Tape Firmware MS 6TYA, 9000 S. Rita Rd., Tucson, AZ 85744 Tel: 520-799-2869 / 520-799-5280 Fax: 520-799-2723 (T/L:321) Email address: kdbutt@us.ibm.com http://www-03.ibm.com/servers/storage/ "Larry Chen" <Larry_Chen@pmc-sierra.com> Sent by: owner-t10@t10.org 01/08/2008 03:00 PM To <t10@t10.org> cc Subject SAS2 - OPEN TIMEOUT Is there any mechanism in place to _exclude_ OPEN TIMEOUT from being retried (see RED font below for details). I think there is a danger of masking out errors if OPEN TIMEOUT Is blindly retried. --- 4.5 I_T nexus loss When a SAS port receives OPEN_REJECT (NO DESTINATION), OPEN_REJECT (PATHWAY BLOCKED), OPEN_REJECT (RESERVED INITIALIZE 0), OPEN_REJECT (RESERVED INITIALIZE 1), OPEN_REJECT (RESERVED STOP 0), OPEN_REJECT (RESERVED STOP 1), or an open connection timeout occurs in response to a connection request, it shall retry the connection request until: a) the connection is established; b) for SSP target ports, the time indicated by the I_T NEXUS LOSS TIME field in the Protocol-Specific Port mode page (see 10.2.7.4) expires; or c) the I_T nexus loss timer, if any, expires (see 4.7.1, 8.2.2.1, 10.2.7.4, and 10.4.3.17).