Minutes of ESI Working Group Meeting

Bob Snively Bob.Snively at Eng.Sun.COM
Sat Feb 17 23:59:47 PST 1996


* From the SCSI Reflector, posted by:
* Bob.Snively at Eng.Sun.COM (Bob Snively)
*

To:		Distribution

From:		Bob Snively

Date:		February 16, 1996

Subject:	Minutes of ESI adhoc group meeting


The acting chairperson of the ESI adhoc, Bob Snively,
called the meeting to order at 9:00 am.  He announced
that the meeting was an authorized meeting of the
X3T10 committee to study the ESI proposal, 
X3T1/95-324, revision 2.1.

He thanked Adaptec for providing the facilities and amenities.

The members of the group introduced themselves.

The latest revision of the ESI proposal was distributed.

The agenda was established.

	Discussion of background

	Collection of new inputs

	Discussion of features other than slot

	Discussion of slot information

ACTION ITEMS ACCUMULATED DURING THE MEETING

1)	Tom Slaight to provide Unicode document to Bob Snively
	for inclusion of references and appropriate code points
	in language element.

2)	Tom Slaight will consult with Doug Rademacher about the
	best definition for the simple UPS interface status.

3)	Bob Snively to provide updated document, revision 2.2
	about 2/28/96.

4)	John Lohmeyer to please agendize the ESI review in the
	SCSI Working Group meeting in March.


INTRODUCTION AND BACKGROUND

	Bob Snively expressed the desire to make the
	ESI document a simple mechanism for communicating
	with an enclosure.  He further specified a
	desire for a high degree of functional compatibility
	with SAF-TE and a desire that SAF-TE-2 would
	be a profile of this document.  He indicated
	that the goal was a stable document by
	the end of the March meeting.

	Interoperability and a single future model
	were expressed by the group as a critical
	goal.

	Bob Snively indicated that the minutes would be posted as 
	soon as possible after the meeting.  The new revision of
	the document is targeted for completion about 2/28/96.


NEW INPUTS TO THE DOCUMENT
	
1)	Rod Dekoning of Symbios requested the clarification of
	the method for relating the slot id number to the
	device address.  A mechanism will be defined.  The first
	component of the mechanism is a slot ID value from
	0 to 255 in the device element entry.  A second mechanism
	is the new SCSI mechanism for identifying drives, although
	that information must be verified for completeness.

2)	Radek Aster of SGI requested the capability of providing
	more extensive global enclosure information, providing a
	fixed minimum format.  The following fields will be added
	to the global field of the configuration page:

		Global ID (8)
		Vendor ID (8 or 16?) 
		Product ID (16)
		Revision ID (4)

	The last three will use the format from the INQUIRY command.
	This will be useful for a variety of uses.

3)	Gary Watson of Trimm Technologies requested information
	about the LRC controls.  Review showed that it was already
	considered as part of the device definition.  Additional
	configuration switches would be vendor unique elements.

4)	The help text was discussed.  It represents a text string
	that summarizes the present state of the enclosure.  

5)	The use of a bit in the INQUIRY command to define that
	a device that is not an Enclosure Services peripheral type
	supports the Enclosure Services pages was discussed.  The
	group elected to implement such a bit.  This will require
	assignment of a bit by the SPC editor.

6)	Ken Jeffries of Dell Computer requested the ability to read and
	control thresholds on temperature, voltage, current, and
	airflow.  This would be optional, and the information may be ignored. 
	The group elected to assign a separate page for threshholds.
	Any sensor with a settable threshhold would be included in
	the configuration page.  Each threshhold element would 
	consist of four one-byte values, defining a high critical, high
	warning, low critical, and low warning threshhold.  The
	threshhold page would be readable (to determine the present
	values) and settable (to adjust the values), although the
	enclosure would have the right to refuse to indicate a
	threshhold or refuse to modify a threshhold.

7)	Ken Jeffires of Dell Computer requested that a mechanism be
	created that would allow faster response than period polling.
	This would be especially critical for quickly notifying a
	host of a button being pressed on the host.  A timed 
	disconnect capability was developed to meet this requirement
	without using AEN.

	A timed disconnect value is set using MODE SENSE/SELECT using
	a two-byte value with a 100 ms resolution.  That value would
	be the maximum disconnected time a target could wait before 
	reconnecting and providing status page information on the 
	RECEIVE DIAGNOSTIC RESULTS command.  The mode page information
	would identify the capability, enable and disable the capability,
	and provide the value.

	When a target receives a RECEIVE DIAGNOSTIC RESULTS command
	and the timed disconnect is enabled, the enclosure services
	device will accept the command and disconnect. If the
	device is an 8067 disk device (which has no interrupt capability)
	or if the requested page is not a status page, the required
	SCSI information transfer and status is transmitted immediately.
	If the command requests a status page and there is 
	informational, non-critical, critical, or unrecoverable
	indications to be presented, the information is presented
	immediately.  If no information is presently to be transmitted,
	the command remains disconnected no longer than the timed
	disconnect period before presenting the status page and the
	SCSI status.  If information becomes available, it is presented
	immediately.

	The effect is that information can be immediately presented
	without a timing granularity associated with the polling
	frequency.
	
8)	Ken Jeffries of Dell Computer asked about the mechanisms for
	determining that the configuration has changed.  Two mechanisms
	were identified:

	a)	Unit Attention is presented with a configuration changed
		indication when a new command comes to the enclosure
		services device, if the enclosure services device
		uses the enclosure services device type model.

	b)	A generation field of one byte will be placed in the
		configuration page and the status page.  The generation
		field is incremented every time a change in the
		configuration page takes place.  That simplifies
		the case where two separate processes are managing different
		aspects of the enclosure behavior from the same
		initiator.

9)	Youssef Vazir of Adaptec requested the addition of a four bit
	field to set a global status field for the entire enclosure.
	The four bit field presently used in the sense field will be
	applied to the control field, using the same 
	informational/non-critical/critical/unrecoverable bits.
	This can be used to turn on lights, alarms, or other functions.

10)	The group further discussed the alarm field.  The "speaker"
	element is changed to the "audible alarm" element.  Four
	bits are provided to invoke a sound according to the severity
	of the error.  The same four bit names are used.  In addition,
	the alarm needs an indication that muting has been requested
	and is accepted and it needs an additional reminder mode.

11)	Gary Watson of Trimm Technology requested three bits for the
	fan speed code.  There would be 7 speeds plus a stopped
	speed.

12)	The group requested that predicted failure indicators be
	assigned for all relevant elements, including fans, power supplies,
	NVRAM, and others.

13)	The requirement for a predicted failure warning involving
	the number of insertions was discussed.  No requirement
	was identified clearly enough to implement this.

14)	Yousef Vazir of Adaptec indicated that support for 
	international languages is desirable.  A language
	element will be defined to indicate and set what language and what
	display format is used in the descriptive texts that area
	not explicitly required to be ASCII character strings.

15)	Bob Snively of Sun Microsystems indicated that an element to
	indicate the orientation of an enclosure was desirable.
	An orientation element was discussed, but the final decision
	was not to put the element in for now.  The discussion also
	included the possibility of providing some mapping structure
	that could identify the location of components in an
	enclosure.  This was not a well enough defined concept to
	include for now.

16)	Yousef Vazir of Adaptec suggested the possibility of a
	mechanism to re-establish or reset the enclosure to its
	default states, especially with respect to threshholds.
	After discussion, the decision was not to create such a 
	mechanism for now.

DISCUSSION AND RESOLUTION OF ISSUES AND QUESTIONS FOR REVISION 2.1

1)	Should INQUIRY indicate support of ESI?

	It was decided to request an indicator bit in INQUIRY.

2)	Should variable length entries for elements be allowed?

	It was decided that fixed length element entry would be 
	used.

3)	Proposed change in management of diagnostic code page 
	lengths.

	Fixed by modifications between revision 2.0 and 2.1
	When sending diagnostic information out to the target, 
	the allocation length must be set to the page length + 4.  
	Receive Disagnostic does not have to conform.

4)	Device type codes

	The selected device element codes are appropriate.

5)	SCSI slot parameters

	The parameters are modified as indicated later in these
	minutes.

6)	Temperature

	The resolution and range was accepted.  Some further
	study about other possible resolutions and ranges
	may occur.

7)	Power supply indicators

	Additional power supply indicators were accepted
	for over and under voltage and over current.  There
	was not a firm consensus about under current conditions.

8)	Mapping of device indicators to SCC.

	The mapping is modified as indicated later in these minutes.

9)	Examine EFW requirements

	EFW is removed from the SCSI definitions.  There similarly
	appears to be no advantage to retaining the function in
	SFF-8045.

DISCUSSION AND RESOLUTION OF ISSUES RELATED TO DEVICE/SLOT PARAMETERS

Miscellaneous discussion items.

	Red LEDs have special meaning in international applications
	and must be used sparingly and only for those functions.
	This is outside the scope of the standard.

	The enclosure can generally ignore and refuse to store any
	values in an element entry.  It can override any setting,
	either because it does not implement the option or because
	establishing such a setting may cause the machine to operate
	outside its safe margins.

Slot ID

	The elements defined for the status page device/slot
	parameters will use byte 1 as the slot ID.

Separation of host managed functions from array controller managed functions

	After considerable discussion, it was decided that the
	status/control page will be divided into two pairs
	of status/control pages, allowing independent processes
	to manage enclosure physical functions separately from
	array state functions.   This separation constrains
	the read/modify/update atomicity problem to a single thread
	of control for each of the two functions.

	The enclosure physical functions will be managed by the 
	SCSI Device Elements in the enclosure status/control page
	using the present page codes.  I am tentatively assuming
	that SCSI device elements shall be first in the 
	list so that the first part of the page will correspond
	exactly to the new array flag status/control page, which
	does not include any other types of elements.

	A new page code will be assigned for the SCSI device elements
	for the array flag status/control pages.  Only SCSI device
	elements are established in this page, so it is much shorter
	than the previously defined page.

	Some functions will be settable by both pages independently.
	The actual function provided to the enclosure controls is the
	or of the two set conditions. 

	Both the enclosure status/control pages and the array flag
	status/control pages use the same definitions of 
	Informational/Non-Critical/Critical/Unrecoverable in 
	byte 0 of the device elements.  A "swapped" status will
	also be provided to allow for quick drive replacements
	between polling cycles.  The swapped status will be reset
	by setting the control value to 0.

	Enclosure Status Bits		Array Flag Bits

	Remove				Remove
	Identify			Identify
	Enable A/B			Enable A/B
	A/B Enabled			A/B Enabled
	Do Not Remove			Do Not Remove
	Predicted Fault			Predicted Fault
	

	Insert				Reserved Drive (Was "Unconfigured")
	Set Fault			Drive OK
	Drive Fault			Hot Spare
	Drive Off			Consistency Check in Progress
					In Critical Array
					In Failed Array
					Rebuild/Remap
					Rebuild/Remap Aborted
					
	
	
NEW FUNCTIONALITY ASSOCIATED WITH OTHER ELEMENTS

1)	New Element Definitions

	Global Element
		Sets global indicators to four levels of warning/failure.

	Language Element
		Language
		Character Encoding

	Voltage Sensor Element
		Over Voltage
		Under Voltage
		Actual Voltage (16 bits, millivolts, 2's complement)

	Current Sensor Element
		Over Current
		Under Current?
			(Note that no well designed box will fail
			in the presence of an undercurrent.)
		Actual Current (16 bits, milliamps, 2's complement)

	SCSI Target Port Element

	SCSI Initiator Port Element

2)	New element functionality

	Disable function for most sensor elements.  This is necessary
	to allow an inconsistent sensor to be shut off so that it
	will not generate alarms and other problems if it has obviously
	stopped providing correct readings.

	External bit to indicate that the element is outside the
	boundary of the actual enclosure, but is managed by the
	enclosure.  Examples include external JBODs whose information
	is being forwarded by an ESI service device and external 
	power conditioners and UPS devices.

3)	Element name changes

	Device Bay/Slot changed to SCSI Device Element

	Speaker Element changed to Audible Alert Element

	Fan Element changed to Cooling Element type

4)	New element status code

	Not available = Element is installed, does not have any
			known failures, but its operation has not
			been invoked.
	
5)	Power Supply Element function modifications

	Over voltage, under voltage, over current, and predicted
	failure will be added.

6)	Cooling element modifications

	Speed control is increased to 3 bits (7 states plus off)

7)	Temperature Sensors

	Add under temperature failure and warning indications.
	Allow sensor disable.

8)	Audio Alert Element

	A severity scale of reminder/non-critical/critical/unrecoverable
	are provided.  Reminder is both control and status.  Mute
	is provided.  New errors reset reminder and mute status.

9)	Electronic and controller type elements

	A predicted failure indication is provided for each.

10)	UPS Element was combined with UPS Battery element

	The following status and control bits were defined, but
	the UPS definition may be modified as consultation takes
	place with various UPS experts.

		AC line in lo
		AC line in hi
		AC line in quality failure
		AC line in fail
		DC in fail
		UPS fail
		UPS predicted failure
		Loss of power warning
		UPS interface failure
		Battery fail
		Battery predicted failure
		Charging Status of Batter (bits TBD)

11)	Port/Transceiver

	Added laser failure and loss of light bits.	


OTHER DISCUSSION ITEMS

1)	The ASC/ASCQ definitions need to be clarified further.

2)	For multi-channel devices, the devices on each channel 
	will be grouped together.
	The global element entry for each group will identify
	the path ID of the group.  Even on a single channel,
	multiple device types may be included.

3)	Capability of providing element part number and revision

	Reuben Martinez of DEC requested that a mechanism be provided
	to provide revision numbers and part numbers for FRUs.
	A new ESI page will be defined to provide variable length
	fields, one for each element (in the same order as the status
	page) that will contain a vendor unique combination of
	part number, revision level, and other descriptive ASCII text.
	This format is always ASCII and is not modified by the
	language element.  

4)	Drive replacement

	The problem of quick replacement of drives not being detected
	was discussed again.  It was felt that the combination of the
	optional swapped bit, the timed disconnect function, and
	Unit Attention status from the drive would be adequate.


NEXT MEETING:

The document will be provided about February 28.  The document will
be considered again by e-mail and will additionally be considered
at the SCSI working group the week of March 11.  The agenda will be
provided by the chairperson of X3T10.

ATTENDANCE:

Bob Snively	Sun Microsystems	415-786-6694	bob.snively at sun.com
Yousef Vazir	Adaptec			408-957-4803	yvazir at corp.adaptec.com
Norm Harris	Adaptec			408-945-8600	nharris at eng.adaptec.com
Radek Aster	SGI			415-933-1119	raster at sgi.com
Erik Schuchman	Dell			512-728-0803	erik_schuchmann@
							    ccmail.us.dell.com
Ken Jeffries	Dell			512-728-8384	ken_jeffries@
							    ccmail.us.dell.com
Ken Hallam	Unisys			714-380-5115	ken.hallam at mv.unisys.com
Reuben Martinez Digital Equipment	719-548-3467	martinez@
							    genral.enet.dec.com
Al Wilhelm	Adaptec			408-945-2525	awilhelm@
							    corp.adaptec.com
Dan Colegrove	IBM			408-256-1978	colegrove at vnet.ibm.com
Rod Dekoning	Symbios Logic		316-636-8842	rod.dekoning@
							    symbios.com
J. Pat Young	CMD Technology		714-454-0800	young at cmd.com
Dave Towle	Sun Microystems		415-786-7367	david.towle at eng.sun.com
Larry Hoskinson CMD Technology		714-454-0800	hoskinson at cmd.com
Ed Haske	CMD Technology		714-454-0800	haske at cmd.com
Ajay Malik	Adaptec			408-945-8600	ajay-malik@
							    corp.adaptec.com
Tom Slaight	Intel Corp		503-696-2364	tom_slaight@
							    ccm.hf.intel.com
Gary Watson	Sigma-Trimm Technologies 800-423-2024   trimm at netcom.com





More information about the T10 mailing list