Feb '95 FC-AL Direct Attach Disk Adhoc minutes

Kurt Chan core.rose.hp.com!kc
Mon Feb 6 17:16:52 PST 1995

From: Kurt Chan
To:   FC, SCSI Reflectors
Subj: FC-AL Ad Hoc meeting
Date: 2/6/95

           FC-AL Direct Attach Disk Ad Hoc Meeting Minutes
                           2/2/95 - 2/3/95

  Dal Allan, ENDL                     dal_allan at mcimail.com
  Radek Aster, SGI                    raster at sgi.com
  Jon Buck, Amp                       jon.buck at amp.com
  Vince Cavanna, HP                   vvc at epcot.rose.hp.com
  Kurt Chan, HP                       kc at core.rose.hp.com
  Howey Chin, Vitesse                 howey at vitsemi.com
  Jim Coomes, Seagate                 jim_coomes at notes.seagate.com
  Jan Dedek, Ancot                    dedek at ancot.com
  David Ford, Cambex                  dford at cambex.com
  Giles Frazier, IBM                  gfrazier at ausvm6.vnet.ibm.com
  Gene Freeman, HMPD (NCR)            gene.freeman at colospgs.ncr.com
  Ed Frymoyer, EMF                    70523-3010 at compuserve.com 
  Stillman Gates, Adaptec             stillman at eng.adaptec.com
  Bill Gintz, Conner                  bill.gintz at conner.com
  Doug Hagerman, DEC                  hagerman at starch.enet.dec.com
  Norm Harris, Adaptec                nharris at eng.adaptec.com
  Jean Kodama, QLogic                 j-kodama at qlc.com
  James McGrath, Quantum              jmcgrath at qntm.com
  Bill McMaken, Trimm Technology      trimmeng at netcom.com
  Margaret Nakamoto, HP               margaret_nakamoto at sj.hp.com
  Allison Parsons, Conner             allison.parsons at conner.com
  Subhash R. Patel, Emulex            s-patel at emulex.com
  Ron Petersen, W.L. Gore             ron4849 at delphi.com
  Richard Rolls, IBM, San Jose        rrolls at vnet.ibm.com
  Greg Scherer, Emulex                g-scherer at emulex.com
  Bob Snively, Sun Micro              bob.snively at sun.com
  Craig Theorin, W.L. Gore            craig at wlgore.com
  Peter Walford, Demografx            walford at btr.com
  Gary Watson, Trimm Technology       trimm at netcom.com
  Jim Whitworth, Conner               james.whitworth at conner.com
  Neill Wood, SGI                     neill at asd.sgi.com
  Stewart Wyatt, HP                   stewart at hpdmd48.boi.hp.com


   There is a reflector administered by Western Digital which is
   intended to be used for discussions specific to the FC-AL adhoc
   group.  The subscribe address is majordomo at dt.wdc.com and the
   broadcast address is disk_attach at dt.wdc.com.

   To subscribe, send a message to majordomo at dt.wdc.com with a blank
   subject line and a line in the message body of the following

         subscribe disk_attach at dt.wdc.com your_email_address

   To UNsubscribe, send a message to majordomo at dt.wdc.com with a blank
   subject line and a line in the message body of the following

         unsubscribe disk_attach at dt.wdc.com your_email_address


   External Cables:       X3T11, Tues 4/4 @ Monterey
   Internal connectors:   SFF,   Wed 3/8 @ Newport Beach
   FC-AL adhoc meeting:   TBD by X3T10 (Newport likely, week of 3/6)


   Jim McGrath reviewed his requirements for internal connectoring of
   next generation FC-AL drives.

   - SCA-2 vs SCA-1, due to advanced features of SCA-2 (designed for
     hot plugging, ESD characteristics, blind insertion with short and
     long row contacts)

   - Pinouts should be be a superset of SSA, to ensure equivalent
     baseline functionality.  Currently missing are:

      o  3.3V power.  Some discussion over whether 5V-3.3V conversion
         is more practical on the drive vs in a cabinet.  The
         per-drive cost is estimated to be about $2, while the
         per-cabinet cost was stated by Trimm to be around $5.  SSA is
         missing 3.3V precharge.  All Quantum drives use 3.3V.  Trimm
         noted that the more voltages provided, the more difficult it
         is to size cabinet power supplies. 

      o  External Fault, to warn a drive of events such as external
         environmental failures (so that a write safe condition could
         be established).  

      o  Reprogramming pins, to allow a set of other codes to be
         defined, including a set of sense values for the environment.

   - 50 pins for more function and to provide the 3.3 volt capability.  

   - Consider adding 24 volts to reduce the net power in high
     performance (e.g., 10,0000 rpm) drives.  There are safety
     implications for backplane voltages over 42v.

   - Consider adding unshielded cabled connector option.  Bob Snively
     emphasized that, for Sun, cabled solutions are almost always more
     expensive and less reliable than backplane designs.  Jim is
     targeting the embedded drive marketplace, which is accustomed to
     cabled solutions.  Is a multi-bay connector required?  You could
     create an optional separate cable connector for power and
     controls with a standard SCA connector centered on the drive.
     Power distribution is a concern, since some early IDC designs did
     not allow for a sufficiently robust power connection to support
     drives.  EMI on unshielded cables/connectors at a Gbit, even
     within a shielded cabinet, can be troublesome.  Same with stubs
     created by cabled topologies.

   - Jim does not believe that the 2.5" drives will be able to use the
     same connectors as the 3.5" and that attempting to define an
     identical connector would compromise 3.5" functionality.

   - Consider having a 3-bay design vs a unitized connector (one for
     basic functions, one for power, and one for options) if this is
     less costly.


   Vince Cavanna (HP) performed some measurements, with the Belden
   9209 (double-shielded) cable with BNC connectors performing the
   best with essentially zero emissions over ambient.  The Belden 9116
   was not as good, and the twinax + Hirose connector was the worst,
   even after the connector was sent back to the manufacturer to have
   supplemental shielding put into place using copper foil.

   The results of the testing will also be reported in the Sarasota
   X3T11 FC-0 copper meeting.

   SGI asked about CATV "F" Connectors.  Vince responded that even if
   he measured them and they worked, since they were not specified to
   work he was reluctant to test a connector that would not be up to
   HP quality specifications. The "F" connectors are also very close
   to BNC connectors in cost, and not as robust mechanically. Digital
   television industry is using BNC and coax.  SNA and SMB connectors
   also appear to work, with adapters to BNC widely available.

   DB-9 connector not yet tested, but will be once available.  Has
   potential for good EMI performance.  Also, Gore noted that twinax
   designs in general should perform as well or better than coax for a
   given shielding, due to their balanced design.  Moral:  EMI
   performance of a cable plant is only as good as it's weakest link.
   Often, this is the connector.

   Equalizers may not be necessary for some environments.  A fixed
   equalizer (say, optimized for 20 meters) may be adequate if cable
   lengths can be restricted to within the range of 1-30 meters.  With
   DB-9 connectors, equalization can be built into the cable
   assemblies, making cabling and termination much simpler for FC-AL
   than for SCSI. If cable lengths can be limited to 10' or less,
   no equalization is needed.

   Gore showed some very minature internal twinax jumper cables.  The
   shielded version demonstrated better performance, but the
   unshielded version was 40% less expensive.  Seagate is evaluating.
   Up to 50 meter lengths have been achieved unequalized.  Full
   duplex cables exhibits about 1% crosstalk.  More findings will be
   presented in Sarasota.

7) ACA 

   It was agreed that the following text would be added to the profile
   in section 11.5:

    "A target shall never change its current configuration or logical
     block size without receiving an explicit command or task
     management function.  An initiator shall choose its current
     operating parameters in such a manner that those parameters
     associated with correctly accessing and storing data, including
     logical block length and logical unit configuration, shall be the
     same as those stored or default parameters that a target will
     have after a target reset or power on reset.  Devices that do not
     return to a saved configuration after any of the above resets
     shall respond with CHECK CONDITION status to any attempt to
     create a saved configuration."

   The SCC command set document needs a proposal to place a similar
   restriction on reconfiguration of things like RAID configurations.


   Neill Wood (SGI) introduced the problem loop disruption when adding
   or removing a device.  Radek Aster also elaborated on the same
   subject on Friday.  Several MILLIseconds per spindle may be
   required for loop initialization, which means several HUNDRED
   milliseconds would be required to initialize a maximally configured
   loop.  Also, in video or other "guaranteed write" data streams,
   very large buffers may be required during this period.  Up to 40-80
   msec is the maximum buffering capability that normal systems have.
   SGI recommends that an X3T10 mode page proposal be presented which
   allows the Private Loop Profile to:

    a) Require a mode where Private Loop Targets must receive an Loop
       Port Enable (LPE) primitive before coming on line (don't enable
       themselves onto the loop).

    b) Require a mode where Private Loop Targets only respond to LIPs,
       but do not generate them. The assumption is that these Targets
       will have fixed addressing specified by cabinet position.
    c) Allow private loop devices to log in before doing LIP

    These would allow an Initiator to choose when to "reconfigure" or
    recognize reconfiguration of a private loop.

    One method for ensuring duplicate addresses don't occur is that a
    device could open itself.  If it doesn't get the request back, it
    knows another port has taken its preferred address, and it needs
    to take a soft address.

    Also, we should specify a SHORTER AL_Time for the Private Loop
    profile, or propose a dynamic discovery algorithm which allows
    a device to determine AL_Time by addressing itself and timing
    the round trip.


   On a related note to initialization performance, SGI would like to
   spin drives up on a modulo-N basis rather than individually (where
   N = 8,9,10, etc).  This will allow a system to spin up in 1/N'th
   the time.  This will be a SFF-8045 proposal.


    LALP and LARP will be made optional, since support is discovered
    dynamically during initialization.


    My document will be submitted to X3T11 in Sarasota to fix:

    a) Class 3 streamed and open sequence definitions
    b) FLOGI requirement for ports which do not support F_Ports


Task Management functions will now be transmitted in T1, and responded
to using I4.  Response codes are 00, 04, and 05 corresponding to

- Function Complete, 
- Not Performed (not supported), and 
- Not Performed (target or service delivery failure) 

respectively.  Absence of an FCP_RSP will indicate a Target or Service
delivery failure which is unreportable (corrupted frame, internal HW
error, etc).  HP will modify their public review comment
appropriately.  No PRLI bits necessary, this will be a mandatory, not
an optional behavior.


     Radek shared some simulations which showed the effect of
     Login_BB_Credit and Available_BB_Credit on throughput and IOs per
     - Barracuda-class drives       - Max seek time = 15ms
     - Queue depth = 1              - Min seek time = 500us
     - 4096 cylinders               - Head-sector spiral skew = 16
     - 16 heads                     - Cylinder-sector spiral skew = 8
     - 128 sectors/track            - Head switch time = 1ms
     - Burst len = 128 blocks       - Read/write FIFO = 256 blocks
     - 2048-byte frames             - 72 IOs/sec for each disk
     Offbus command overhead = 5us (host), 30us (disk decode)
     Offbus response overhead = 5us (host), 30us (disk)
     Host Onbus context switch overhead = 1us
     Onbus pre-CLS overhead = 1us (host), 1us (disk)
     Host Onbus FC-FIFO move overhead = 0us
     Disk Offbus re-instruct overhead  = 10us
     I/Os per second Observations (2kbyte Random reads and writes):

     - I/Os per second remain independent of Login_BB_Credit for
       configurations under 80 disks (Login_BB_Credit = 0 at both
       disk and host, peaks at 5300 IOs/sec with 80 disks)

     - Login_BB_Credit allows the system to get more effective use of
       about 20 more disks (peak at 100 disks, with 7000 IOs/sec). This
       is with Login_BB_Credit = 1 or 2 at host and 1 or 2 at disk.
     - With faster disks, the curves will peak out with fewer drives

     Throughput observations (512kbyte random reads and writes in 64k

     - Login_BB_credit is not a major problem, since it is amortized
       over 32 frames.  However, AVAILABLE credit does have an effect.

     - Little difference between Available credit = 2 or 3, regardless
       of whether Login_BB_Credit is 0, 1, 2, or 3.  Curves flatten
       out at about 95MB/s with 20 drives. Adding more drives does not
       improve throughput.

     - If Available Credit is 1, curves peak at 15 drives, 78MB/s and
       DECREASE as drives are added.

     Summary:  Nonzero Login BB_Credit is needed to improve latency
     and IOs/sec.  Having "fast drain" architectures and large
     link-rate buffers improves throughput.


    Annex B was discussed, and nobody had a problem with changing the
    profile to require that all ports must be able to be opened half
    duplex (in order to accomodate the Annex B proposal for nonzero

    However, Dal pointed out that there are other methods of
    supporting nonzero Login_BB_Credit without using half-duplex.  For
    example, if a port can guarantee that it's AVAILABLE BB_Credit is
    always equal to TWICE its Login_BB_Credit, it can accomodate the
    situation where data frames arrive along with a forwarded CLS,
    followed by an immediate OPN from another port.

    Also, an error with the last paragraph (bullet 4) of Annex B was
    pointed out.  It is not necessary to discard R_RDYs.  If nonzero
    Login_BB_Credit is N, a port must actually have N+1 available
    buffers, since FC-AL states that ONE OR MORE R_RDYs must be
    returned on receipt of an OPN.  A port is only required to return
    a minimum of one R_RDY, which is ADDED to the Login_BB_Credit.
    Therefore the minimum buffering requirements of such a port would
    be Available_BB_Credit = Login_BB_Credit + 1.  On Login, such
    ports must advertise Login_BB_Credit equal to 1 less than the
    Available_BB_Credit that they can guarantee on OPN.

    Annex B will be rewritten to acccomodate these changes and offer
    both methods of supporting nonzero Login_BB_Credit (conservative
    reporting of Login_BB_Credit or half-duplex opens).


    Attached is my note to the reflector re-evaluating the need to
    accomodate this.  Seagate requires that all intermediate frames of
    an inbound multi-frame Sequence be equal in length to the Receive
    Data_Field Size communicated in login by the Host (not a reliable

    > From:   Kurt Chan
    > To:     Direct attach disk reflector (disk_attach at dt.wdc.com)
    > Subj:   TX Buffer Size Negotiation 
    > Date:   Mon Feb  6 10:51:51 PST 1995
    > This note is related to Seagate's requirement that all inbound
    > Sequences to the disk contain "maximum-sized" frames except for
    > the last frame of the Sequence.
    > After thinking about this proposal to X3T11, I've come to the
    > conclusion that negotiating a minimum TX buffer size would be
    > more complicated than previously thought.  Also, this issue
    > appears to be quite topology-specific (in-order topologies only)
    > and will not have a very universal appeal to the X3T11 group.
    > Therefore, I'm suggesting we talk more about whether we want this
    > to be a vendor unique behavior, disoverable through probing VU
    > information in a standard manner, or whether this feature should
    > be part of FC-PH login.
    > Here are the difficulties:
    > - I assume we want ports to be able to be asymmetric.  For
    >   example, a port may require all but the last RECEIVED frame of
    >   an MFS (multi-frame Sequence) to be of a minimum size, but it
    >   might be capable of TRANSMITTING "runt" frames in the middle of
    >   an MFS outbound.  Conversely, a port may NOT require inbound
    >   frames to be of a minimum size, but may be capable of
    >   accomodating those that do.
    > - Having asymmetry creates the need for two bits.  Let's assume
    >   we would use the upper two reserved bits in the Max Rx
    >   Data_Field Size parameter:
    >   1) one bit that is a request to the other port to negotiate a
    >      minimum transmit buffer size,
    >   2) another (validity) bit that states whether or not bits 11-0
    >      represent just the maximum RX Data_Field size (per normal
    >      FC-PH), or whether that field represents the SMALLER of
    >      either the min TX size or the max RX size.
    > - Then the negotiation rules and relogin procedures must be
    >   defined if:  a) the request bit is set in PLOGI b) the validity
    >   bit is NOT set in PLOGI but the ACC contains the request bit c)
    >   the validity bit is set in PLOGI but bits 11-0 are too large
    >   for Login Responder to accept
    >    etc...
    > Over the next couple of weeks I'll think about proposing an X3T11
    > standard method for communicating VU information, and we can
    > discuss over this reflector.  We should be able to make some
    > simplifying assumptions with the VU proposal.


    The profile willspecify that the FCP_DL value must be set equal to
    the block size times the number of blocks for direct disk attach.


    FCP will be changed to make PRLO type-specific, so that Process
    Logout does not affect all protocol types.

Many thanks to Norm Harris from Adaptec for hosting, and to Bob
Snively for supplementing my note-taking.

More information about the T10 mailing list