To:       John Lohmeyer                          X3T9.2/87-179
          Chairman, X359.2

From:     Robert Snively
          Adaptec
          580 Cottonwood
          Milpitas, CA 95037

Date:     September 30, 1987

Subject:  Explicit control of extended contingent allegiance.

For  SCSI devices,  the peer-to-peer nature of the SCSI makes  it 
possible  for  more than one host to operate with more  than  one 
target  controller.   This dynamic sharing capability of SCSI  is 
normally a powerful advantage.  It allows very high multi-tasking 
performance,   the  sharing  of  expensive  peripheral   devices, 
flexible   congigurations,    and   communication   between   any 
combination  of host and peripheral devices.   This same  dynamic 
sharing  makes it very difficult to allow a single host to direct 
the   error  recovery  actions  of  a  single   partially-failing 
peripheral.   The  execution  of  sequential steps  of  an  error 
recovery  action may be interleaved with normal  operations  from 
other initiators that interfere with the error recovery steps.  A 
number of mechanisms are supported by SCSI to allow this recovery 
process.

The   primary  mechanism  is  the  intelligence  of  the   target 
SCSI  device.   Most SCSI devices are designed to allow  complete 
recovery  of any internal failures that can be recovered  without 
any  notification to the initiator,  other than logging some data 
to flag the occurrence of the error and recovery process.

A secondary mechanism is the use of contingent allegiance.  After 
the  detection  of an error,  the SCSI device  presents  a  CHECK 
CONDITION  to the initiator.   The error information that must be 
recovered  by  the  initiator to describe the error  is  buffered 
internally by the target controller.  To prevent the execution of 
commands  that  may  cause the information to  be  lost,  no  new 
commands are accepted for the LUN from other initiators until the 
initiator that received the CHECK CONDITION has an opportunity to 
capture  the information by execution of a REQUEST SENSE  command 
or  to  throw  away the information by  execution  of  any  other 
command.

Another  powerful  mechanism is the use of the  RESERVE  command, 
both at the unit and the extent level.    During the execution of 
a  transaction,  the  appropriate resources (unit or extent)  are 
reserved for the processor managing the transaction.   Until  the 
transaction is completed,  the reservation protects the initiator 
and  target from any interference in the  transaction,  including 
interference  in  error  recovery  procedures.   The  reservation 
process  allows the execution of as many error recovery steps  as 
may  be  required to complete the transaction and to recover  any 
error information.

The  sequential-access  device  concept of buffer mode  2  is  an 
assist in certain types of error recovery procedures.  By tagging 
the  buffer  contents,  the  data associated  with  a  particular 
initiator  can be identified.   This function,  together with the 
RECOVER  BUFFERED DATA command allows data to be re-captured  and  
written out to a new reel of tape if a previous reel of tape  has 
been filled before the buffer was emptied.  

Careful   programming  of  the  driving  operating   system   and 
applications  can  eliminate  the effects  of  overlapping  error 
recovery  procedures.   System level recovery mechanisms  ranging 
from check-pointing and transaction journaling to drive mirroring 
and redundant execution make the use of complex sequential  error 
recovery mechanisms unneccessary.   The error condition is simply 
recorded  and ignored and an alternate method is used to continue 
the operation. 



Each  of  these  mechanisms  has  some  weaknesses.   Intelligent 
peripherals  can only recover errors forseen by their  designers.  
Contingent allegiance only allows information about the error  to 
be obtained.   The reservation process limits the dynamic sharing 
of  data  under some conditions.   The skillful use of  operating 
system  features may increase system cost and still only  recover 
errors forseen by the system designer.

An  alternate  mechanism called  extended  contingent  allegiance 
(ECA)  is  defined  to  complement  these  other  error  recovery  
capabilities.   ECA assumes that the central operating system can 
make   some  intelligent  decisions  about  the  error   recovery 
requirement  and  that the error recovery will take a  number  of 
sequential steps to complete.   It further assumes that the error 
recovery steps cannot be interleaved with other normal operations 
from  other initiators.   The capability to use ECA is turned  on 
and  off  by  a  special MODE SELECT  page.   The  ECA  state  is 
established  for  any LU-initiator pair when a LU posts  a  CHECK 
CONDITION  accompanied by a special MESSAGE IN to the  initiator.  
The ECA state requires the target to present BUSY status for  any 
new  command received to the LU from any other initiator than the 
one  for  which  the  ECA  state  has  been   established.    The 
dispatching  of  any  new queued commands for the LU  is  halted.  
Some targets may additionally present BUSY status for LU's  other 
than  the failing LU if the target's resources are used up by the 
error  recovery  process.   Some targets may  additionally  cause 
CHECK  CONDITIONS  to  occur  on  other LU's  if  the  error  was 
associated  with  shared  resources.    These  additional   CHECK 
CONDITIONS  may  also cause ECA states to  be  established.   New 
commands  from  the  initiator for which the ECA state  has  been 
established  are  executed in the normal manner.   When  all  the 
commands  necessary to perform the recovery procedure  have  been 
executed,  the  ECA state is ended by the execution of a  special 
MESSAGE OUT to the target LU.   When the ECA state is ended,  all 
normal operation begins again.   Any queued commands remaining in 
the  queue  after the error recovery procedure has been  executed 
will begin being dispatched in the normal manner.   Any  commands 
from other intiators are again handled in the normal manner. 

The Asynchronous Event Notification protocol may also initiate an 
ECA  state if the SEND command containing the Asynchronous  Event 
Notification data also transmits an INITIATE RECOVERY message out 
to   the  host  being  notified.    This  allows  the  conditions 
associated with the asynchronous event to be properly managed  by 
the host system with no interference from other hosts.

It  is  obvious  that ECA also has some weaknesses  as  an  error 
recovery  mechanism.   It requires a great deal of highly  device 
dependent intelligence in the initiator managing the recovery, in 
contrast to the SCSI principle of providing a device  independent 
interface  to  the  managing system.   The  process  is  somewhat 
complex,   providing   a   great  number  of  opportunities   for 
implementation inconsistencies and errors.  However, there may be 
some  systems  which find such a mechanism  desirable.   In  that 
spirit,   the  following  changes  are  proposed  to  the  SCSI-2 
specification.   The changes are documented against the August 1, 
1987 edition of the document X3T9.2/86-109 Revision 2.

The  following descriptive information is included in a  two  new 
paragraphs to be added to section 6.  The paragraphs describe the 
Contingent   Allegiance  function  and  the  Extended  Contingent 
Allegiance (ECA) function.

     6.?    Contingent Allegiance

     All   targets  shall implement  Contingent   Allegiance   to 
     guarantee  that error information will be made available  to 
     an  initiator  when a CHECK CONDITION status  has  indicated 
     that  an error has occurred.   When a CHECK CONDITION status 
     has been presented by a target to an initiator on behalf  of 
     an  LU,  all  new commands to that LU from other  initiators 
     will  not  be executed.   BUSY status will be  presented  to 
     indicate that such commands must be reattempted at a  future 
     time.   This  temporary  busy  condition is referred  to  as 
     Contingent  Allegiance  by  the LU to  the  initiator  which 
     received the CHECK CONDITION.   Any queued commands from any 
     initiator  not  dispatched on behalf of the failing LU  will 
     not  be  dispatched until the  Contingent  Allegiance  state 
     ends.   A  Contingent Allegiance will not be present if  the 
     error  information was presented by an Autosense  operation.  
     Contingent  Allegiance will be terminated by the  completion 
     of the next successfully executed command from the initiator 
     to the LU.  Contingent Allegiance will also be terminated by 
     the  execution  of  an  ABORT  message  from  the   affected 
     initiator,  by  the execution of a BUS DEVICE RESET  message 
     from  any initiator ,  and by the execution of a hard  RESET 
     condition from any SCSI device.    Normally the next command 
     will  be a REQUEST SENSE command to obtain the stored  error 
     information.   If  the command is any other command  than  a 
     REQUEST  SENSE,  the Contingent Allegiance state shall still 
     be terminated.
     
     6.??   Extended Contingent Allegiance (ECA)

     Extended  Contingent  Allegiance (ECA) allows  the  optional 
     extension of the Contingent Allegiance state across a number 
     of  commands from the initator receiving the CHECK CONDITION 
     and  to  the  LU  sending  the  CHECK  CONDITION.   The  ECA 
     capability is enabled or disabled by the execution of a MODE 
     SELECT  command  using  page code  6  [or  other  convenient 
     number].   Once enabled, the ECA state is established by the 
     LU  when  it  sends an INITIATE RECOVERY  message  into  the 
     initiator  immediately  after  sending  in  CHECK  CONDITION 
     status  and  before sending in a COMMAND  COMPLETE  message.  
     The  ECA  state remains between the affected  initiator  and 
     LU  until the initiator transmits a RELEASE RECOVERY message 
     out to the target.  The RELEASE RECOVERY message may be sent 
     from  the  initiator  to the target LU  at  any  time  after 
     complete identification and before COMMAND COMPLETE during a 
     command  execution.   The  ECA state terminates  immediately 
     upon  receiving  the message.   The ECA state will  also  be 
     terminated by an ABORT message from the affected  initiator, 
     by  a BUS DEVICE RESET message from any initiator,  or by  a 
     hard RESET condition from any SCSI Device.

     The  ECA state may also be invoked by the Asynchronous Event 
     Notification  protocol.    When  an  asynchronous  event  is 
     discovered,  the  LU associated with the event  transmits  a 
     SEND command to selected initiators.   The SEND command will 
     have  the AEN bit set and will transmit the same data that a 
     REQUEST SENSE command would have transmitted to identify the 
     same failure.  If the detected event requires the ECA  state 
     to  be established to execute a multi-step recovery process, 
     the  LU  shall place ATTENTION on the interface  in  such  a 
     manner  that  a  MESSAGE OUT can be  transmitted  after  the 
     command  transfer  but  before  the  status  transfer.   The 
     MESSAGE  OUT  shall contain the INITIATE  RECOVERY  message.  
     The ECA state will begin at that time and continue until the 
     selected  initiator  terminates  the state  with  a  RELEASE 
     RECOVERY message out.   If the ECA state is required, the LU 
     should be sure that only one initiator is notified at a time 
     so   that   each   initiator  can  complete   the   required 
     recovery steps without encountering an ECA state for another 
     initiator.

     During  the ECA state,  the target shall present BUSY status 
     to  any  initiator other than the  affected  initiator  that 
     attempts  to  access  the affected  LU.   The  target  shall 
     additionally  insure  that no commands in the command  queue 
     will  be dispatched to begin execution on the  affected  LU. 
     Only  newly  received  untagged commands from  the  affected 
     initiator will be accepted.    Any commands remaining on the 
     queue  after the execution of commands during the ECA  state 
     will again be dispatched in the normal manner after the  ECA 
     state is terminated.

     It  is not required that all CHECK  CONDITION  presentations 
     request  an ECA state.   Some simple errors have no possible 
     recovery  and  are sufficiently served by normal  Contingent 
     Allegiance protocols.

     Implementors Note:   During the ECA state, appropriate error 
     recovery  sequences  can  be executed.   Such  commands  can 
     correct  data,  modify  or delete  future  queued  commands, 
     perform    logging   operations   and   obtain    diagnostic 
     information.   This  option  is recommended only  for  those 
     LU's  and initiators that require the execution of multiple-
     step  error-recovery  procedures that  cannot  tolerate  the 
     interleaving of other operations in the procedure.


The  following  information is installed in table  5-2:   Message 
Codes on page 5-16.

Description:             INITIATE RECOVERY
Initiator Support:       O
Target Support:          O
Direction:               In/Out
Code:                    10h

Description:             RELEASE RECOVERY
Initiator Support:       O
Target Support:          O
Direction                Out
Code:                    11h

The  following text will be installed in section 5.5.2   Messages 
on  page 5-18 in a location defined by the order or the  selected 
message codes.


     INITIATE  RECOVERY  [code to be determined]    This  inbound 
     message is sent  immediately after the presentation of CHECK 
     CONDITION status if the LU requires formation of an Extended 
     Contingent  Allegiance (ECA) as a result of  the  particular 
     error causing the CHECK CONDITION status.   The ECA state is 
     established immediately after successful transmission of the 
     message and remains until terminated as described in section 
     6.??.

     The   outbound  message  is  sent  only  as  part   of   the 
     Asynchronous  Event  Notification protocol if the ECA  state 
     will  be  required for recovery of the  asynchronous  event.  
     The message is sent after the LU has completely selected and 
     identified  the initiator to be notified.   The  message  is 
     sent before the transmission of status.   The message may be 
     sent before or after the COMMAND or DATA OUT phase.  The ECA 
     state    is   established   immediately   after   successful 
     transmission of the message and remains until terminated  by 
     the identified initiator as described in section 6.??.

     A  MESSAGE  REJECT response to an INITIATE RECOVERY  message 
     indicates  that ECA state shall not be established  for  the 
     particular command.  The enabled or disabled state of ECA is 
     not  changed  by  the  rejection  of  an  INITIATE  RECOVERY 
     message.

     RELEASE  RECOVERY  [code to be determined]    This  outbound 
     message  is  sent  to  terminate  an  ECA  state  previously 
     established   by  an  INITATE  RECOVERY  message  when   the 
     initiator  has determined that all required  error  recovery 
     actions  have  been taken.   The ECA state ends  immediately 
     after  successful  transmission  of  the  RELEASE   RECOVERY 
     message.   The  opportunity  to send  the  RELEASE  RECOVERY 
     message   may  be  requested  at  any  time  during  command 
     execution after the connection has been fully identified and 
     before the command complete is generated.   The transmission 
     of a RELEASE RECOVERY message when no ECA state is active is 
     not an error.  In that case, the message shall be ignored.

     MESSAGE  REJECT  is not an expected response  to  a  RELEASE 
     RECOVERY message.

The  following  text  should  be  added  to  the  ABORT   message 
definition on page 5-17 of section 5.5.2.

     Transmission  of this message shall terminate any ECA  state 
     that  may  exist between the selected LU and  the  selecting 
     initiator.

The  following  text  should  be added to the  BUS  DEVICE  RESET 
message definition on page 5-18 of section 5.5.2.

     Transmission  of this message shall terminate any ECA  state 
     that  may exist between LU's on the selected SCSI Bus Device 
     and any initiator.
The  following  text  should be added to the definition  of  MODE 
SELECT for all device types that support page format MODE SELECT.  
The  text  should be added after the last line of page  8-45  for 
Direct-Access  Devices and immediately before table 9-13 on  page 
9-19 for Sequential-Access Devices.   When other devices  support 
paged mode select, they too should have this page supported.


           Table ?-??  Inter-command Protocol Control
     
======================================================================
  Bit|   7   |   6   |   5   |   4   |   3   |   2   |   1   |   0   |           
Byte |       |       |       |       |       |       |       |       |
======================================================================
 0   |Resrvd |Resrvd |    Page Code    (06h)                         |
_____|_______|_______|_______________________________________________|
 1   |                Parameter Length     (02h)                     |
_____|_______________________________________________________________|
 2   |                       Reserved                                |
_____|_______________________________________________________________|
 3   |                       Reserved        | EAEN  | EASN  | EECA  |
     |                                       |       |       |       |
======================================================================

     
     This  page  allows  the enabling of special  protocols  that 
     change  the ordering or execution rules among a sequence  of 
     commands.

     The EAEN (Enable Asynchronous Event Notification) bit,  when 
     set to zero,  indicates that Asynchronous Event Notification 
     shall  not  be executed.   The EAEN bit,  when set  to  one, 
     indicates  that the Asynchronous Event Notification protocol 
     shall be executed when required.  When the EAEN bit is first 
     set  to  one or when a power-on or reset  state  establishes 
     that   the  EAEN  bit  is  one,   the   Asynchronous   Event 
     Notification initialization protocol shall be executed.  


     The  EASN  (Enable  Auto-sense)  bit,   when  set  to  zero, 
     indicates   that  the  Auto-sense  protocol  shall  not   be 
     executed.  The EASN bit, when set to one, indicates that the 
     Auto-sense  protocol  shall  be executed  to  present  sense 
     information to the initiator.


     The  EECA (Enable Extended Contingent Allegiance) bit,  when 
     set   to  zero,   indicates  that  ECA  shall  not  be   not 
     established.   The  attempt to generate an INITIATE RECOVERY 
     or  a  RELEASE RECOVERY message will meet a  MESSAGE  REJECT 
     response.  The EECA bit, when set to one, indicates that ECA 
     may be established.  In that case,  the entire EECA protocol 
     will be allowed by both the initiator and the selected LU.
     The default value of all three of the protocol control bits, 
     EAEN, EASN, and EECA, shall be zero.  The bits may be saved.  
     In  that  case,  power-on and "hard" RESET  conditions  will 
     restore the devices to operate according to the saved values 
     of the bits.  If the changeable value of any of the bits  is 
     one,  the function is supported and may be enabled.   If the 
     changeable value of the control bit is zero, the function is 
     not supported.



The  tables of Page Codes on pages 8-22,  8-48,  and 9-12 must be 
altered to include a reference to the above page, coded 06h.


I  believe  that  the above changes completely  document  an  ECA 
proposal  that would meet the requirements discussed at the  last 
working  group  meeting.   Please  consider  them  carefully  for 
possible inclusion in the SCSI-2 draft document.