Minutes of X3T10 High Availability Study Group Mtg from July 10, 1995

Steve Sicola...Array Controller Engineering 522-2268 sicola at peaks.enet.dec.com
Thu Jul 20 20:38:54 PDT 1995


Minutes of High Availability Study Group meeting in Colorado Springs, CO 
X3T10/95-263r0



Accredited Standards Committee*
X3, Information Technology
                    		       Doc. No.: X3T10/95-274r0
                                       Date: July 20, 1995
                                       Project:
                                       Ref. Doc.:
                                       Reply to: Steve Sicola

To:         Membership of X3T10

From:       Steve Sicola, HA Study Group Chair
            

Subject:    Minutes of High Availability Study Group Meeting
            Colorado Springs, CO -- July 10, 1995


                                       Agenda

1. Opening Remarks

2. What/Why/How of Controller Fault Tolerant Configurations

3. Suggestions for how to achieve in SCSI-3. What is there already and what
   is needed? Any assumptions?

4. Conclusions

5. Adjournment until September meeting.




                              Results of Meeting

1. Opening Remarks

Steve Sicola, the HA Study group chair opened the meeting at ~1:15 p.m., 
after which everyone introduced themselves.  Attendees from Amdahl, IBM,
Digital, Symbios, Sun, and HP were present.

2. What/Why/How of Controller Fault Tolerant Configurations  
	Steve Sicola presented the slides on Fault Tolerant Controller 
	Configurations for SCSI. These were basically the same slides presented 
	to the Working group and for this meeting have a document number of 
	X3T10/95-273r0


3. Suggestions for how to achieve in SCSI-3. What is there already and what
   is needed? Any assumptions?

	Steve Sicola suggested some use of SCC's spare and report commands
	to register the configuration and report on its state. George Penokie
	stated that Attachment command and Report Attachment commands were
	the appropriate place for  which raised more questions about how to
	use them to 'register' and 'report' configurations with multiple
	controllers sharing the access to attached devices.
	
	George then went to the overhead and we then started going through
	in much greater detail what it meant to use SCC and what assumptions
	had to be made about the configuration. Bob Snively, Charles Binford, 
	Dave Thiel, and others stated that assumptions about the controllers
	sharing configuration was vital for the controller attachment and 
	fault tolerant configuration to be 'useful' to hosts. This basically
	meant that under SCC, any controllers associated with other 
	controllers would report the same configuration through LUN0. 
	Furthermore, there were assumptions based on this that all LUNs
	will have consistent naming based on the v.lui's names etc. Also,
	other issues came up about the speed of failover, during which it
	was granted that a couple more ASC/ASCQ's the denote Failover/Failback
	would be very helpful to host operating systems in cluster 
	environments to speed failover of customer data between controllers.
	
	After this we started talking about the name of the attachment, after
	which people talked about using concatenated serial numbers for the
	name. This would not work according to Charles Binford, Dave Thiel,
	and others because of the issue of replacing controllers. Others
	suggested that backplane keyed 'serial numbers' could be used, but
	other said this cannot be assumed. Some other way to name them 
	must be figured out for use in host systems to identify 
	multi-controller configurations sharing the same devices.
	
4. Conclusions 
	The results of the meeting were that after the presentation, George's 
	trip to the overhead, and the rest of the attendees going through what
	was missing and assumed about the controller configuration, that the 
	use of SCC, Persistent Reserve, and a couple new ASC/ASCQ's would
	adequately cover multi-controller configurations for fault tolerance
	as well as load balancing. The remaining issue was around naming of
	the controller configuration that did not use any of the underlying
	controller's serial numbers because of possible replacement after 
	failure, and the subsequent host confusion that might cause. This
	was reason for another meeting along with the need to see a 'profile'
	of how to use SCC, Persistent Reserve, & the ASC/ASCQ's for host 
	support of Fault Tolerant Controller Configurations.

5. Adjournment until September meeting
	We ended the meeting about 3:45 after agreeing that another meeting 
	was necessary to review the profile as well as resolve any issues with
	configuration naming




More information about the T10 mailing list