Minutes of ESI Working Group Meeting
Bob Snively
Bob.Snively at Eng.Sun.COM
Sat Feb 17 23:59:47 PST 1996
* From the SCSI Reflector, posted by:
* Bob.Snively at Eng.Sun.COM (Bob Snively)
*
To: Distribution
From: Bob Snively
Date: February 16, 1996
Subject: Minutes of ESI adhoc group meeting
The acting chairperson of the ESI adhoc, Bob Snively,
called the meeting to order at 9:00 am. He announced
that the meeting was an authorized meeting of the
X3T10 committee to study the ESI proposal,
X3T1/95-324, revision 2.1.
He thanked Adaptec for providing the facilities and amenities.
The members of the group introduced themselves.
The latest revision of the ESI proposal was distributed.
The agenda was established.
Discussion of background
Collection of new inputs
Discussion of features other than slot
Discussion of slot information
ACTION ITEMS ACCUMULATED DURING THE MEETING
1) Tom Slaight to provide Unicode document to Bob Snively
for inclusion of references and appropriate code points
in language element.
2) Tom Slaight will consult with Doug Rademacher about the
best definition for the simple UPS interface status.
3) Bob Snively to provide updated document, revision 2.2
about 2/28/96.
4) John Lohmeyer to please agendize the ESI review in the
SCSI Working Group meeting in March.
INTRODUCTION AND BACKGROUND
Bob Snively expressed the desire to make the
ESI document a simple mechanism for communicating
with an enclosure. He further specified a
desire for a high degree of functional compatibility
with SAF-TE and a desire that SAF-TE-2 would
be a profile of this document. He indicated
that the goal was a stable document by
the end of the March meeting.
Interoperability and a single future model
were expressed by the group as a critical
goal.
Bob Snively indicated that the minutes would be posted as
soon as possible after the meeting. The new revision of
the document is targeted for completion about 2/28/96.
NEW INPUTS TO THE DOCUMENT
1) Rod Dekoning of Symbios requested the clarification of
the method for relating the slot id number to the
device address. A mechanism will be defined. The first
component of the mechanism is a slot ID value from
0 to 255 in the device element entry. A second mechanism
is the new SCSI mechanism for identifying drives, although
that information must be verified for completeness.
2) Radek Aster of SGI requested the capability of providing
more extensive global enclosure information, providing a
fixed minimum format. The following fields will be added
to the global field of the configuration page:
Global ID (8)
Vendor ID (8 or 16?)
Product ID (16)
Revision ID (4)
The last three will use the format from the INQUIRY command.
This will be useful for a variety of uses.
3) Gary Watson of Trimm Technologies requested information
about the LRC controls. Review showed that it was already
considered as part of the device definition. Additional
configuration switches would be vendor unique elements.
4) The help text was discussed. It represents a text string
that summarizes the present state of the enclosure.
5) The use of a bit in the INQUIRY command to define that
a device that is not an Enclosure Services peripheral type
supports the Enclosure Services pages was discussed. The
group elected to implement such a bit. This will require
assignment of a bit by the SPC editor.
6) Ken Jeffries of Dell Computer requested the ability to read and
control thresholds on temperature, voltage, current, and
airflow. This would be optional, and the information may be ignored.
The group elected to assign a separate page for threshholds.
Any sensor with a settable threshhold would be included in
the configuration page. Each threshhold element would
consist of four one-byte values, defining a high critical, high
warning, low critical, and low warning threshhold. The
threshhold page would be readable (to determine the present
values) and settable (to adjust the values), although the
enclosure would have the right to refuse to indicate a
threshhold or refuse to modify a threshhold.
7) Ken Jeffires of Dell Computer requested that a mechanism be
created that would allow faster response than period polling.
This would be especially critical for quickly notifying a
host of a button being pressed on the host. A timed
disconnect capability was developed to meet this requirement
without using AEN.
A timed disconnect value is set using MODE SENSE/SELECT using
a two-byte value with a 100 ms resolution. That value would
be the maximum disconnected time a target could wait before
reconnecting and providing status page information on the
RECEIVE DIAGNOSTIC RESULTS command. The mode page information
would identify the capability, enable and disable the capability,
and provide the value.
When a target receives a RECEIVE DIAGNOSTIC RESULTS command
and the timed disconnect is enabled, the enclosure services
device will accept the command and disconnect. If the
device is an 8067 disk device (which has no interrupt capability)
or if the requested page is not a status page, the required
SCSI information transfer and status is transmitted immediately.
If the command requests a status page and there is
informational, non-critical, critical, or unrecoverable
indications to be presented, the information is presented
immediately. If no information is presently to be transmitted,
the command remains disconnected no longer than the timed
disconnect period before presenting the status page and the
SCSI status. If information becomes available, it is presented
immediately.
The effect is that information can be immediately presented
without a timing granularity associated with the polling
frequency.
8) Ken Jeffries of Dell Computer asked about the mechanisms for
determining that the configuration has changed. Two mechanisms
were identified:
a) Unit Attention is presented with a configuration changed
indication when a new command comes to the enclosure
services device, if the enclosure services device
uses the enclosure services device type model.
b) A generation field of one byte will be placed in the
configuration page and the status page. The generation
field is incremented every time a change in the
configuration page takes place. That simplifies
the case where two separate processes are managing different
aspects of the enclosure behavior from the same
initiator.
9) Youssef Vazir of Adaptec requested the addition of a four bit
field to set a global status field for the entire enclosure.
The four bit field presently used in the sense field will be
applied to the control field, using the same
informational/non-critical/critical/unrecoverable bits.
This can be used to turn on lights, alarms, or other functions.
10) The group further discussed the alarm field. The "speaker"
element is changed to the "audible alarm" element. Four
bits are provided to invoke a sound according to the severity
of the error. The same four bit names are used. In addition,
the alarm needs an indication that muting has been requested
and is accepted and it needs an additional reminder mode.
11) Gary Watson of Trimm Technology requested three bits for the
fan speed code. There would be 7 speeds plus a stopped
speed.
12) The group requested that predicted failure indicators be
assigned for all relevant elements, including fans, power supplies,
NVRAM, and others.
13) The requirement for a predicted failure warning involving
the number of insertions was discussed. No requirement
was identified clearly enough to implement this.
14) Yousef Vazir of Adaptec indicated that support for
international languages is desirable. A language
element will be defined to indicate and set what language and what
display format is used in the descriptive texts that area
not explicitly required to be ASCII character strings.
15) Bob Snively of Sun Microsystems indicated that an element to
indicate the orientation of an enclosure was desirable.
An orientation element was discussed, but the final decision
was not to put the element in for now. The discussion also
included the possibility of providing some mapping structure
that could identify the location of components in an
enclosure. This was not a well enough defined concept to
include for now.
16) Yousef Vazir of Adaptec suggested the possibility of a
mechanism to re-establish or reset the enclosure to its
default states, especially with respect to threshholds.
After discussion, the decision was not to create such a
mechanism for now.
DISCUSSION AND RESOLUTION OF ISSUES AND QUESTIONS FOR REVISION 2.1
1) Should INQUIRY indicate support of ESI?
It was decided to request an indicator bit in INQUIRY.
2) Should variable length entries for elements be allowed?
It was decided that fixed length element entry would be
used.
3) Proposed change in management of diagnostic code page
lengths.
Fixed by modifications between revision 2.0 and 2.1
When sending diagnostic information out to the target,
the allocation length must be set to the page length + 4.
Receive Disagnostic does not have to conform.
4) Device type codes
The selected device element codes are appropriate.
5) SCSI slot parameters
The parameters are modified as indicated later in these
minutes.
6) Temperature
The resolution and range was accepted. Some further
study about other possible resolutions and ranges
may occur.
7) Power supply indicators
Additional power supply indicators were accepted
for over and under voltage and over current. There
was not a firm consensus about under current conditions.
8) Mapping of device indicators to SCC.
The mapping is modified as indicated later in these minutes.
9) Examine EFW requirements
EFW is removed from the SCSI definitions. There similarly
appears to be no advantage to retaining the function in
SFF-8045.
DISCUSSION AND RESOLUTION OF ISSUES RELATED TO DEVICE/SLOT PARAMETERS
Miscellaneous discussion items.
Red LEDs have special meaning in international applications
and must be used sparingly and only for those functions.
This is outside the scope of the standard.
The enclosure can generally ignore and refuse to store any
values in an element entry. It can override any setting,
either because it does not implement the option or because
establishing such a setting may cause the machine to operate
outside its safe margins.
Slot ID
The elements defined for the status page device/slot
parameters will use byte 1 as the slot ID.
Separation of host managed functions from array controller managed functions
After considerable discussion, it was decided that the
status/control page will be divided into two pairs
of status/control pages, allowing independent processes
to manage enclosure physical functions separately from
array state functions. This separation constrains
the read/modify/update atomicity problem to a single thread
of control for each of the two functions.
The enclosure physical functions will be managed by the
SCSI Device Elements in the enclosure status/control page
using the present page codes. I am tentatively assuming
that SCSI device elements shall be first in the
list so that the first part of the page will correspond
exactly to the new array flag status/control page, which
does not include any other types of elements.
A new page code will be assigned for the SCSI device elements
for the array flag status/control pages. Only SCSI device
elements are established in this page, so it is much shorter
than the previously defined page.
Some functions will be settable by both pages independently.
The actual function provided to the enclosure controls is the
or of the two set conditions.
Both the enclosure status/control pages and the array flag
status/control pages use the same definitions of
Informational/Non-Critical/Critical/Unrecoverable in
byte 0 of the device elements. A "swapped" status will
also be provided to allow for quick drive replacements
between polling cycles. The swapped status will be reset
by setting the control value to 0.
Enclosure Status Bits Array Flag Bits
Remove Remove
Identify Identify
Enable A/B Enable A/B
A/B Enabled A/B Enabled
Do Not Remove Do Not Remove
Predicted Fault Predicted Fault
Insert Reserved Drive (Was "Unconfigured")
Set Fault Drive OK
Drive Fault Hot Spare
Drive Off Consistency Check in Progress
In Critical Array
In Failed Array
Rebuild/Remap
Rebuild/Remap Aborted
NEW FUNCTIONALITY ASSOCIATED WITH OTHER ELEMENTS
1) New Element Definitions
Global Element
Sets global indicators to four levels of warning/failure.
Language Element
Language
Character Encoding
Voltage Sensor Element
Over Voltage
Under Voltage
Actual Voltage (16 bits, millivolts, 2's complement)
Current Sensor Element
Over Current
Under Current?
(Note that no well designed box will fail
in the presence of an undercurrent.)
Actual Current (16 bits, milliamps, 2's complement)
SCSI Target Port Element
SCSI Initiator Port Element
2) New element functionality
Disable function for most sensor elements. This is necessary
to allow an inconsistent sensor to be shut off so that it
will not generate alarms and other problems if it has obviously
stopped providing correct readings.
External bit to indicate that the element is outside the
boundary of the actual enclosure, but is managed by the
enclosure. Examples include external JBODs whose information
is being forwarded by an ESI service device and external
power conditioners and UPS devices.
3) Element name changes
Device Bay/Slot changed to SCSI Device Element
Speaker Element changed to Audible Alert Element
Fan Element changed to Cooling Element type
4) New element status code
Not available = Element is installed, does not have any
known failures, but its operation has not
been invoked.
5) Power Supply Element function modifications
Over voltage, under voltage, over current, and predicted
failure will be added.
6) Cooling element modifications
Speed control is increased to 3 bits (7 states plus off)
7) Temperature Sensors
Add under temperature failure and warning indications.
Allow sensor disable.
8) Audio Alert Element
A severity scale of reminder/non-critical/critical/unrecoverable
are provided. Reminder is both control and status. Mute
is provided. New errors reset reminder and mute status.
9) Electronic and controller type elements
A predicted failure indication is provided for each.
10) UPS Element was combined with UPS Battery element
The following status and control bits were defined, but
the UPS definition may be modified as consultation takes
place with various UPS experts.
AC line in lo
AC line in hi
AC line in quality failure
AC line in fail
DC in fail
UPS fail
UPS predicted failure
Loss of power warning
UPS interface failure
Battery fail
Battery predicted failure
Charging Status of Batter (bits TBD)
11) Port/Transceiver
Added laser failure and loss of light bits.
OTHER DISCUSSION ITEMS
1) The ASC/ASCQ definitions need to be clarified further.
2) For multi-channel devices, the devices on each channel
will be grouped together.
The global element entry for each group will identify
the path ID of the group. Even on a single channel,
multiple device types may be included.
3) Capability of providing element part number and revision
Reuben Martinez of DEC requested that a mechanism be provided
to provide revision numbers and part numbers for FRUs.
A new ESI page will be defined to provide variable length
fields, one for each element (in the same order as the status
page) that will contain a vendor unique combination of
part number, revision level, and other descriptive ASCII text.
This format is always ASCII and is not modified by the
language element.
4) Drive replacement
The problem of quick replacement of drives not being detected
was discussed again. It was felt that the combination of the
optional swapped bit, the timed disconnect function, and
Unit Attention status from the drive would be adequate.
NEXT MEETING:
The document will be provided about February 28. The document will
be considered again by e-mail and will additionally be considered
at the SCSI working group the week of March 11. The agenda will be
provided by the chairperson of X3T10.
ATTENDANCE:
Bob Snively Sun Microsystems 415-786-6694 bob.snively at sun.com
Yousef Vazir Adaptec 408-957-4803 yvazir at corp.adaptec.com
Norm Harris Adaptec 408-945-8600 nharris at eng.adaptec.com
Radek Aster SGI 415-933-1119 raster at sgi.com
Erik Schuchman Dell 512-728-0803 erik_schuchmann@
ccmail.us.dell.com
Ken Jeffries Dell 512-728-8384 ken_jeffries@
ccmail.us.dell.com
Ken Hallam Unisys 714-380-5115 ken.hallam at mv.unisys.com
Reuben Martinez Digital Equipment 719-548-3467 martinez@
genral.enet.dec.com
Al Wilhelm Adaptec 408-945-2525 awilhelm@
corp.adaptec.com
Dan Colegrove IBM 408-256-1978 colegrove at vnet.ibm.com
Rod Dekoning Symbios Logic 316-636-8842 rod.dekoning@
symbios.com
J. Pat Young CMD Technology 714-454-0800 young at cmd.com
Dave Towle Sun Microystems 415-786-7367 david.towle at eng.sun.com
Larry Hoskinson CMD Technology 714-454-0800 hoskinson at cmd.com
Ed Haske CMD Technology 714-454-0800 haske at cmd.com
Ajay Malik Adaptec 408-945-8600 ajay-malik@
corp.adaptec.com
Tom Slaight Intel Corp 503-696-2364 tom_slaight@
ccm.hf.intel.com
Gary Watson Sigma-Trimm Technologies 800-423-2024 trimm at netcom.com
More information about the T10
mailing list