3/95 Direct Attach Disk Adhoc Minutes

Kurt Chan kc at core.rose.hp.com
Thu Mar 9 12:13:43 PST 1995

From:  Kurt Chan
To:    FC, SCSI Reflectors
Subj:  March '95 FC-AL Direct Attach Disk Ad Hoc Meeting Minutes
Date:  Thu Mar  9 12:07:52 PST 1995


           FC-AL Direct Attach Disk Ad Hoc Meeting Minutes
                           Newport Beach

  Dal Allan, ENDL                     dal_allan at mcimail.com
  Radek Aster, SGI                    raster at sgi.com
  Stuart Berman, Emulex               sberman at emulex.com
  Paul Boulay, Hitachi                pboulay at hitachi.com
  Kurt Chan, HP                       kc at core.rose.hp.com
  Jim Coomes, Seagate                 jim_coomes at notes.seagate.com
  Rod DeKoning, Symbios Logic         rdekoning at wichita.ks.ncr.com
  Edward Fong, Amdahl                 esfl0 at amail.amdahl.com
  David Ford, Cambex                  dford at cambex.com
  Giles Frazier, IBM                  gfrazier at ausvm6.vnet.ibm.com
  Gene Freeman, HMPD (NCR)            gene.freeman at colospgs.ncr.com
  Ed Gardner, Quantum                 gardner at acm.org
  Stillman Gates, Adaptec             stillman at eng.adaptec.com
  Eric Griffith, Western Digital      griffith at dt.wdc.com
  Doug Hagerman, DEC                  hagerman at start.enet.dec.com
  Norm Harris, Adaptec                nharris at eng.adaptec.com
  Gerald Maurer, QLogic               g_maurer at qlc.com
  James McGrath, Quantum              jmcgrath at qntm.com
  Charles Monia, DEC                  monia at sar.dec.com
  Greg Scherer, Emulex                g-scherer at emulex.com
  Gary Schultz, QLogic                g_schultz at qlc.com
  Brian Smith, Infinity Software      isi_info at io.com
  Bob Snively, Sun Micro              bob.snively at sun.com
  Jeff Stai, Western Digital          stai at dt.wdc.com
  Horst Truestedt, IBM                truested at vnet.ibm.com
  Jim Whitworth, Conner               james.whitworth at conner.com
  Stewart Wyatt, HP                   stewart at hpdmd48.boi.hp.com


   Now that we are synchronizing with ANSI meetings, we have the
   following slots reserved:

               Day/Date       Location       ANSI week
              ----------   --------------    ---------
              Fri Apr 7    Monterey, CA        X3T11
              Mon May 8    Harrisburg, PA      X3T10
              Fri Jun 16   Rochester, MN       X3T11

   The reason for the advance notice is as a courtesy to the hosts.
   However, since we've moved the electromechanical issues to SFF and
   X3T11, I do NOT plan on using all of these days for Direct Attach
   Disk adhocs.  Indeed, unless any new issues come up between now and
   April, I will propose that we transition into using this meeting
   time primarily for FC-AL2 discussions, starting in Monterey.
   Please plan on at least a half-day Fri 4/7 for combined
   adhoc/FC-AL2 discussions.


   Despite the transition of connector issues to SFF (document 8045),
   a preliminary discussion was deemed necessary before the real
   fireworks started on Weds 3/8 at the SFF meeting.  

   Sun and Seagate proposed that no changes be made to the number of
   connector contacts (40) but that some pin reassignments be made.
   After it became obvious that there was a strong contingent which
   will be moving forward to increase the pin count, three possible
   courses of action were envisioned:

   1. A 50 pin connector would be defined.               
   2. An 80 pin connector would be defined.
   3. A unitized connector consisting of the current 40 pins with 
      auxiliary pins for power and options would be defined.

                  O\_______/O   <----- 40 pins (current)
                 O\_________/O   <---- 50 pins
              O\_______________/O  <-- 80 pins
        \  ____   _________   ____  /
         \ \__/  O\_______/O  \__/ /  40 + aux pins

   While Option 3 was suggested as a means to provide compatibility
   with implementations using 40 pins, it was criticized for having
   problems with defining "keep out" areas, lack of backward
   compatibility with retention schemes, and the idea of having two
   different connectors for FC-AL drives was repugnant to some.

   Since 3.3V power is one of the suggested reasons for needing more
   pins, it was singled out as the whipping boy by the 40-pin
   sympathizers.  In the absence of knowledge regarding which drives
   would use which supplies, cabinet makers would have to design power
   supplies for "worst-case" current on all voltages.  Obviously, as
   the number of voltages provided increases, this adds cost to the
   supply.  On the other hand, "down converting" from 5V or 12V to
   3.3V on each drive is also expensive.  A claim of 1.5 to 2 watts
   per drive savings was stated if 3.3V logic could be used.


   Horst described a problem that only occurs if an NL_Port
   arbitrates, wins arbitration, and then closes without using the
   loop.  FC-AL(2?) will be changed so that an NL_Port which has won
   arbitration but wishes to relinquish the loop without sending any
   data must OPN itself before closing, allowing the fairness window
   to be reset properly.


   Horst described a means by which a LIP may be transmitted with
   minimum impact on any active Sequences by transmitting ARB(F7) to
   win arbitration before transmitting LIP.  Since this does not
   affect interoperability with existing ports, it will be made an
   optional behavior in FC-AL(2?).


   FC-AL uses a new term ("available BB_Credit") which has confused
   some readers and is absent from the FC-AL glossary.  A definition
   will be added to FC-AL(2?)  for the term "Available_BB_Credit"
   which allows the reader to easily distinguish it from the FC-PH
   term "BB_Credit".  I will work with Horst on wordsmithing the
   definition (Horst - see my proposed definition in Annex B).

   Dal also mentioned that the term "circuit" has new meaning in
   FC-PH2, and that FC-AL(2) should synchronize with this new meaning.

   A typo was noted in Table 6 - full and half duplex opens are now
   both Allowed with the following restrictions governing the choice:

   - The OPN Initiator shall not open full duplex if it cannot
     guarantee (2 x Login_BB_Credit) on it's inbound path,

   - The OPN Recipient shall always accept both half and full duplex
     OPNs, but if opened full duplex is not required to transmit data
     back to the OPN Initiator.


   The issue was raised regarding the implementation of a dual-ported,
   class 3 loop which has a single FC-2 engine for both loops.  If the
   NL_Port on Loop 1 has the FC-2 engine busy and the NL_Port on Loop
   2 receives an OPN having advertised BB_Credit > 0, a new
   CLS-DISCARD functionality was suggested as a means for indicating
   the the shared FC-2 engine was busy.

   Dal reiterated that neither the profile nor the FC-AL standard
   attempts to modify the meaning of BB_Credit (or "Login_BB_Credit"
   as referred to by the profile).  In ALL cases, the value of
   BB_Credit agreed upon in Login is the number of receive buffers
   GUARANTEED to be available when credit is balanced.  The only
   difference between the fabric model and the FC-AL model is that
   credit is IMPLICITLY balanced whenever a circuit is established on
   FC-AL, whereas there is no circuit establishment on class 2/3
   fabrics.  The fact that there is an OPN-CLS bounding the transfer
   of data should not alter the fundamental definition of BB_Credit
   (i.e., one should not be able to invalidate BB_Credit rules using a
   special type of CLS).

   In the example above, without CLS-DISCARD, then either:

   a) Loop 2 must run with Login_BB_Credit = 0

   b) Loops 1 and 2 must have independent FC-2 engines if the loops
      are truly being used for performance, not just redundancy (i.e.,
      one loop is not being bypassed while the other is active).

   It was maintained that (a) is not as great a penalty as some might
   think, and that dual port configurations with a shared protocol
   engine are intended more for their hot swap or redundancy benefits
   where the link is viewed as a common point of failure, but are not
   expected to enhance performance.

     [Note:  Radek's simulations show that for the I/O per second
      benchmark, configurations with Login_BB_Credit = 0 at both
      Initiator and Target are virtually INDISTINGUISHABLE in
      performance from any other combination of Login_BB_Credit > 0
      for loops with less than 70-80 drives (where the best
      combination peaks at 100 drives).  In the throughput benchmark,
      Login_BB_Credit = 0 is virtually IDENTICAL to the performance of
      Login_BB_Credit > 0 if Available_BB_Credit is 2 or 3.]


   It was agreed that FC-AL 4.4 was overzealous in requiring that
   "Public Loop NL_Ports shall implicitly logout with all Ports..."
   when a LIP with the L_Bit set is received (page 57).  Since traffic
   with private loop devices may not be affected if a Fabric goes
   offline and comes back online with the same address, there is no
   need to implicitly logout (and therefore implicitly abort all open
   Exchanges) with all Ports.  Therefore, the above sentence should be
   modified to exclude references to implicit NL_Port logout:

    "If one of the following occurred (see ANSI X3.230, F-PH, 23.3.1),
     Public Loop NL_Ports shall attempt a Fabric Login to the well-known
     address hex 'FFFFFE' through AL_PA hex '00':"

   Note:  coexistence of public loop initiators and private loop
   targets in the presence of an FL_Port works as follows:  

   - Private loop targets may not respond to frames with nonzero

   - AL_PAs are administered per the FC-AL initialization protocol,
     regardless of the Area+Domain addresses (i.e., public and private
     loop devices must still share the Port address space per FC-AL

   - In Exchanges with public loop SCSI initiators, the private loop
     target is obligated to use the nonzero Area+Domain of the public
     initiator in the D_ID of frames being returned to that initiator.

8) ACA

   An education courtesy of Charles Monia took place as we learned
   that NACA=1 cannot assist in guaranteeing Request Sense data
   integrity in the presence of a protocol which allows Status to be
   lost without notification to the sender (Target).  The principal
   benefit of ACA is to prevent commands which are queued behind a
   failed command from being executed Reliable transmission of sense
   data for the failed command(s) is up to lower layer protocols.

   In order to make the transfer of sense data reliable in FCP, three
   things must occur:

   a) a new FCP IU is needed which acknowledges FCP_RSP frames (at
      least the ones which contain Status other than GOOD).

   b) Sequence retransmission in the Target must be implemented to
      attempt to retry FCP_RSP should it time out on the
      acknowledgement IU from the host.

   c) Existing ABTS policies in both the initiator and target must be
      reexamined to prevent aborting an Exchange for which FCP_RSP
      sequences are being retried.

   Since all of this is far too ugly to consider in light of the
   benefits associated with preserving sense data, the profile will
   remove any attempts to simplify ACA=1, and simply reference SAM.

      [EDITORIAL NOTE:  As an example of Charles' concern about a lack
      of FCP_RSP confirmation, FCP and the associated industry
      profiles have an exposure for Sequential Access applications
      which do not synchronize state (i.e., checkpoint).  For example,
      unlike direct access devices, a Sequential Access WRITE command
      is unaddressed (does not contain a logical block address) - it
      simply writes to the next logical block on the medium.  If READ
      POSITION is not used between WRITE operations and a WRITE
      command fails, one of two things will happen on FCP with the
      current profiles if FCP_RSP fails to get reported to the host:

      1) The device driver will abort the Exchange, retry the WRITE,
         and will corrupt the backup if the WRITE was really
         successful and FCP_RSP was merely lost.

      2) The application must rewind to the last known point of
         synchronization (BOM or BOP!?)  which means "overnight"
         backup jobs may never finish.

      Despite the fact that we all know this is poor programming
      practice, the fact remains that not all "legacy" system SW for
      tapes checkpoint their media access operations, and this SW
      would have to be rewritten for FC.  FCP implementors should
      expect some backlash from tape and system vendors on this issue
      in the future.]


    It was agreed upon that the inability to receive frames of less
    than "maximum" size until the last frame of a Sequence was not a
    desired feature and does not warrant standardization.  Note 5 on
    page 9 and the associated table entry will be deleted.  A note
    regarding early Seagate implementations will be mentioned.  An
    informal poll of silicon manufacturers did not indicate there
    would be interoperability problems.


    It was agreed upon that T1/I4 for FCP Task Management should
    supercede T5.  The first function that actually requires this is
    Abort Task Other Initiator.  The PRLI option to do either T1/I4 or
    T5 will therefore be removed from HP's public review comment.


    A more general purpose method of communicating VU information and
    supported standards was proposed by HP.  This information would
    convey which versions of standards and profiles were supported
    (e.g., FC-AL, FCP, FCSI Profiles, etc).  The consensus was that
    something like the following would be proposed to be appended to
    the PLOGI/PDISC/ACC payloads:

                              Item                          Bytes
         | Vendor Identification (per X3T10/995D Annex C) |   8  |
         |                  Reserved                      |   1  |
         |    VU/Supported Standards Info Length (Bytes)  |   1  |
         |      VU/Supported Standards Information        |   N  |

    The Vendor Identification would be the ASCII string per the
    INQUIRY command defined in X3T10/995D (SCSI-3 Primary Commands).
    Who the long-term registration authority should be for this field
    is an open issue for X3T10/11.  'N' is the number of bytes
    represented by the VU/Supported Standards Info Length field (up
    to 255 bytes).

    The format of the VU/Supported Standards Information would be as

               VU/SUPPORTED STANDARDS INFORMATION           Bytes
         |       Supported Profile/Standard Triplets      |  3*M  |
         |                 Vendor-Specific                | N-3*M |

    where 'M' is the number of supported profiles/standards
    advertised.  Each triplet consists of:

                 SUPPORTED PROFILE/STANDARD TRIPLET         Bytes
         |       Profile/Standard Identification          |   1  |
         |    Lowest Profile/Standard Revision Supported  |   1  |
         |   Highest Profile/Standard Revision Supported  |   1  |

    HP expressed some concern that, while at the minimum this proposal
    would only adde 10 bytes to login payloads, at the maximum it
    could cause PLOGI/ACC to exceed the ability of some fabrics to
    transmit the ELS in a single frame (particularly those with
    connectionless frame size limits of 128 bytes).  The response was
    less than empathetic, but this issue may affect FC-PH acceptance.
    While class 1 login should not be affected, the effect of
    multi-frame connectionless PLOGI Sequences is a study item.

    A registration authority for a) numbering profiles and standards,
    and b) encoding version numbers of the standards/ profiles is
    needed.  (Could this be documented simply in an X3T11 contributed
    document which is updated as needed?)

    The VU/Supported Standards information should be FC-specific, and
    should not overlap with information which can be discovered at the
    ULP level (e.g., SCSI INQUIRY data).

    Truly VU information (e.g., areas of non-compliance with existing
    profiles standards, product identification, product revision
    level, etc.)  would be in the Vendor-Specific field, equal in
    length to N-(3*M) bytes.  The format of this information is beyond
    the scope of FC-PH, and must be found in the product specification
    provided by the manufacturer.


   There is a reflector administered by Western Digital which is
   intended to be used for discussions specific to the FC-AL adhoc
   group.  The subscribe address is majordomo at dt.wdc.com and the
   broadcast address is disk_attach at dt.wdc.com.

   To subscribe, send a message to majordomo at dt.wdc.com with a blank
   subject line and a line in the message body of the following

         subscribe disk_attach at dt.wdc.com your_email_address

   To UNsubscribe, send a message to majordomo at dt.wdc.com with a blank
   subject line and a line in the message body of the following

         unsubscribe disk_attach at dt.wdc.com your_email_address

   To broadcast a message to the reflector, email to:

         disk_attach at dt.wdc.com

More information about the T10 mailing list