Minutes of X3T10 High Availability Study Group 9/11/95

Steve Sicola...Array Controller Engineering 522-2268 sicola at peaks.ENET.dec.com
Fri Sep 22 10:05:14 PDT 1995



Minutes of X3T10 High Availability Study Group in Bedford, NH   X3T10/95-313r0

                                                  Doc. No.: X3T10/95-313r0
                                                      Date: September 21, 1995
                                                   Project:
                                                 Ref. Doc.:
                                                  Reply to: Steve Sicola

To:       Membership of X3T10

From:     Steve Sicola, Chair HA Study Group

Subject:  Minutes of X3T10 High Availability Study Group
          Bedford, NH -- September 11, 1995


                                       Agenda

1. Review of HA Preliminary Profile for Fault Tolerant Controller
   Configurations

2. Discussion of any/all specific changes required(if any) to SCC to make
   Profile useable.

3. Discussion of Doug Hagerman's High Availability Disk Drive Specification


Attendees:
	Ms Nancy Cheng
	Mr. Doug Hagerman
	Mr. Charles Monia
	Mr. Robert Snively
	Mr. Ed Quiet
	Mr. Stephen J. Sicola
	Mr. Rod Dekoning

The meeting opened with a change in agenda, while a number of other attendees 
were still showing up. Doug opened up with an overview of his high 
availability disk drive specification. It covers high availability aspects 
such as how disks react to bus resets, errors, hot plugging, etc. The document
was distributed to those present and it was decided that the document should
be further reviewed at the next HA study group meeting.

The review of the HA Preliminary Profile for Fault Tolerant Controller 
Configurations began with a few slides from Steve Sicola reviewing the 
general theory of operation of fault tolerant controller configurations within
the confines of SCC. This included the single 'logical' controller model
with multipel ports, which could physically be multiple controllers, so long
as all controllers shared the same devices. The aspects of setting up 
attachments and reporting attachements were discussed as well.

Rod Dekoning of Symbios and Bob Snively of Sun brought up points about
the wording of sharing the same devices. It is safe to say that the 
controllers must share the same path to all devices instead of sharing all
devices. This opens up a few more possibilities for fault tolerant 
configurations. 

Other items brought up early in the discussion were that the profile must
allow for preconfigured systems with attachments already in place and that
the profile must support active/standby configurations to round out the 
profile. The detection mechanisms for controller failure can be host or
controller based. The host can certainly still detect a failure by timing out, 
but the addition of two ASC/ASCQs that describe Failure or Failback(controller
returning to the configuration) will allow for higher availability of customer
data from controller configurations. Also, the use of LUN0 in the profile 
should be changed to the 'base address of the controller.'

It was noted early by Steve Sicola that the SCC attachment command had no way
of identifying the configuration once the attachment was made as well as the
fact that there are no fields in the Attach to Component Device command to
identify the WWN's of the controllers involved in the attachment. This led to
a long discussion between Bob Snively, Rod Dekoning, and other attendees as to
the merits of identifying configurations and the use of SCC to setup and
report them. After a long discussion, it was decided by those present that the
benefits of registering the configuration, reporting the configuration, and
detecting of failures and new members of the configuration would be of great
benefit to generalize the use of fault tolerant controllers within many host
SCSI drivers. It was at this point that specific changes to SCC were
discussed. 

The changes discussed were those around the Attach to Component Device and
Report Attachments. Many options were discused to cover the possible 
configurations, but the issues of initial setup, repair/replacement, and 
upgrade of the configuration drove the need for the attache to component 
device command to include 
	input: who to attach to, type of attachment
	output: Name of attachment (suggested to FC naming convention of 
				    8 bits vendor unique, 24 bits user prog)
	
The uses of the Attach to Component device is in configuration and ease of
verification of configuration.
	
The Report attachments would need the following:
	input: Name of Attachment
	output:WWN's of all attached components, where the 1st WWN reported
		is the one who received the report attachments command
	   Followed by:
		WWN's of eligible components that could be added to attachment
		list.

The uses of Report attachment command would include host crash, host return, 
new hosts, new controller, replaced controller	


The Detection of failures is basically an issue of host polling vs. controller 
direct reporting (ASC/ASCQ's). The addition of the two ASC/ASCQ's will allow
for both methods to work properly and be allowed. The HA group defers to the 
PFA rules on exception handling to drive solutions beyond this. 


The meeting concluded with the decision that the specific changes to the 
SCC document should be redlined for the next Working Group meeting in Palm 
Spring in November. Steve Sicola will write these up and distribute before
the meeting for early review and for any possible modifications to the 
redlines for completeness and accuracy. 




More information about the T10 mailing list