Feb '95 FC-AL Direct Attach Disk Adhoc minutes
Kurt Chan
core.rose.hp.com!kc
Mon Feb 6 17:16:52 PST 1995
From: Kurt Chan
To: FC, SCSI Reflectors
Subj: FC-AL Ad Hoc meeting
Date: 2/6/95
FC-AL Direct Attach Disk Ad Hoc Meeting Minutes
2/2/95 - 2/3/95
Milpitas
Dal Allan, ENDL dal_allan at mcimail.com
Radek Aster, SGI raster at sgi.com
Jon Buck, Amp jon.buck at amp.com
Vince Cavanna, HP vvc at epcot.rose.hp.com
Kurt Chan, HP kc at core.rose.hp.com
Howey Chin, Vitesse howey at vitsemi.com
Jim Coomes, Seagate jim_coomes at notes.seagate.com
Jan Dedek, Ancot dedek at ancot.com
David Ford, Cambex dford at cambex.com
Giles Frazier, IBM gfrazier at ausvm6.vnet.ibm.com
Gene Freeman, HMPD (NCR) gene.freeman at colospgs.ncr.com
Ed Frymoyer, EMF 70523-3010 at compuserve.com
Stillman Gates, Adaptec stillman at eng.adaptec.com
Bill Gintz, Conner bill.gintz at conner.com
Doug Hagerman, DEC hagerman at starch.enet.dec.com
Norm Harris, Adaptec nharris at eng.adaptec.com
Jean Kodama, QLogic j-kodama at qlc.com
James McGrath, Quantum jmcgrath at qntm.com
Bill McMaken, Trimm Technology trimmeng at netcom.com
Margaret Nakamoto, HP margaret_nakamoto at sj.hp.com
Allison Parsons, Conner allison.parsons at conner.com
Subhash R. Patel, Emulex s-patel at emulex.com
Ron Petersen, W.L. Gore ron4849 at delphi.com
Richard Rolls, IBM, San Jose rrolls at vnet.ibm.com
Greg Scherer, Emulex g-scherer at emulex.com
Bob Snively, Sun Micro bob.snively at sun.com
Craig Theorin, W.L. Gore craig at wlgore.com
Peter Walford, Demografx walford at btr.com
Gary Watson, Trimm Technology trimm at netcom.com
Jim Whitworth, Conner james.whitworth at conner.com
Neill Wood, SGI neill at asd.sgi.com
Stewart Wyatt, HP stewart at hpdmd48.boi.hp.com
1) REFLECTOR:
There is a reflector administered by Western Digital which is
intended to be used for discussions specific to the FC-AL adhoc
group. The subscribe address is majordomo at dt.wdc.com and the
broadcast address is disk_attach at dt.wdc.com.
To subscribe, send a message to majordomo at dt.wdc.com with a blank
subject line and a line in the message body of the following
format:
subscribe disk_attach at dt.wdc.com your_email_address
To UNsubscribe, send a message to majordomo at dt.wdc.com with a blank
subject line and a line in the message body of the following
format:
unsubscribe disk_attach at dt.wdc.com your_email_address
2) FUTURE MEETINGS:
External Cables: X3T11, Tues 4/4 @ Monterey
Internal connectors: SFF, Wed 3/8 @ Newport Beach
FC-AL adhoc meeting: TBD by X3T10 (Newport likely, week of 3/6)
3) INTERNAL CABLES/CONNECTORS
Jim McGrath reviewed his requirements for internal connectoring of
next generation FC-AL drives.
- SCA-2 vs SCA-1, due to advanced features of SCA-2 (designed for
hot plugging, ESD characteristics, blind insertion with short and
long row contacts)
- Pinouts should be be a superset of SSA, to ensure equivalent
baseline functionality. Currently missing are:
o 3.3V power. Some discussion over whether 5V-3.3V conversion
is more practical on the drive vs in a cabinet. The
per-drive cost is estimated to be about $2, while the
per-cabinet cost was stated by Trimm to be around $5. SSA is
missing 3.3V precharge. All Quantum drives use 3.3V. Trimm
noted that the more voltages provided, the more difficult it
is to size cabinet power supplies.
o External Fault, to warn a drive of events such as external
environmental failures (so that a write safe condition could
be established).
o Reprogramming pins, to allow a set of other codes to be
defined, including a set of sense values for the environment.
- 50 pins for more function and to provide the 3.3 volt capability.
- Consider adding 24 volts to reduce the net power in high
performance (e.g., 10,0000 rpm) drives. There are safety
implications for backplane voltages over 42v.
- Consider adding unshielded cabled connector option. Bob Snively
emphasized that, for Sun, cabled solutions are almost always more
expensive and less reliable than backplane designs. Jim is
targeting the embedded drive marketplace, which is accustomed to
cabled solutions. Is a multi-bay connector required? You could
create an optional separate cable connector for power and
controls with a standard SCA connector centered on the drive.
Power distribution is a concern, since some early IDC designs did
not allow for a sufficiently robust power connection to support
drives. EMI on unshielded cables/connectors at a Gbit, even
within a shielded cabinet, can be troublesome. Same with stubs
created by cabled topologies.
- Jim does not believe that the 2.5" drives will be able to use the
same connectors as the 3.5" and that attempting to define an
identical connector would compromise 3.5" functionality.
- Consider having a 3-bay design vs a unitized connector (one for
basic functions, one for power, and one for options) if this is
less costly.
4) EXTERNAL CABLES/CONNECTORS
Vince Cavanna (HP) performed some measurements, with the Belden
9209 (double-shielded) cable with BNC connectors performing the
best with essentially zero emissions over ambient. The Belden 9116
was not as good, and the twinax + Hirose connector was the worst,
even after the connector was sent back to the manufacturer to have
supplemental shielding put into place using copper foil.
The results of the testing will also be reported in the Sarasota
X3T11 FC-0 copper meeting.
SGI asked about CATV "F" Connectors. Vince responded that even if
he measured them and they worked, since they were not specified to
work he was reluctant to test a connector that would not be up to
HP quality specifications. The "F" connectors are also very close
to BNC connectors in cost, and not as robust mechanically. Digital
television industry is using BNC and coax. SNA and SMB connectors
also appear to work, with adapters to BNC widely available.
DB-9 connector not yet tested, but will be once available. Has
potential for good EMI performance. Also, Gore noted that twinax
designs in general should perform as well or better than coax for a
given shielding, due to their balanced design. Moral: EMI
performance of a cable plant is only as good as it's weakest link.
Often, this is the connector.
Equalizers may not be necessary for some environments. A fixed
equalizer (say, optimized for 20 meters) may be adequate if cable
lengths can be restricted to within the range of 1-30 meters. With
DB-9 connectors, equalization can be built into the cable
assemblies, making cabling and termination much simpler for FC-AL
than for SCSI. If cable lengths can be limited to 10' or less,
no equalization is needed.
Gore showed some very minature internal twinax jumper cables. The
shielded version demonstrated better performance, but the
unshielded version was 40% less expensive. Seagate is evaluating.
Up to 50 meter lengths have been achieved unequalized. Full
duplex cables exhibits about 1% crosstalk. More findings will be
presented in Sarasota.
7) ACA
It was agreed that the following text would be added to the profile
in section 11.5:
"A target shall never change its current configuration or logical
block size without receiving an explicit command or task
management function. An initiator shall choose its current
operating parameters in such a manner that those parameters
associated with correctly accessing and storing data, including
logical block length and logical unit configuration, shall be the
same as those stored or default parameters that a target will
have after a target reset or power on reset. Devices that do not
return to a saved configuration after any of the above resets
shall respond with CHECK CONDITION status to any attempt to
create a saved configuration."
The SCC command set document needs a proposal to place a similar
restriction on reconfiguration of things like RAID configurations.
8) LOOP PERFORMANCE AND INITIALIZATION
Neill Wood (SGI) introduced the problem loop disruption when adding
or removing a device. Radek Aster also elaborated on the same
subject on Friday. Several MILLIseconds per spindle may be
required for loop initialization, which means several HUNDRED
milliseconds would be required to initialize a maximally configured
loop. Also, in video or other "guaranteed write" data streams,
very large buffers may be required during this period. Up to 40-80
msec is the maximum buffering capability that normal systems have.
SGI recommends that an X3T10 mode page proposal be presented which
allows the Private Loop Profile to:
a) Require a mode where Private Loop Targets must receive an Loop
Port Enable (LPE) primitive before coming on line (don't enable
themselves onto the loop).
b) Require a mode where Private Loop Targets only respond to LIPs,
but do not generate them. The assumption is that these Targets
will have fixed addressing specified by cabinet position.
c) Allow private loop devices to log in before doing LIP
These would allow an Initiator to choose when to "reconfigure" or
recognize reconfiguration of a private loop.
One method for ensuring duplicate addresses don't occur is that a
device could open itself. If it doesn't get the request back, it
knows another port has taken its preferred address, and it needs
to take a soft address.
Also, we should specify a SHORTER AL_Time for the Private Loop
profile, or propose a dynamic discovery algorithm which allows
a device to determine AL_Time by addressing itself and timing
the round trip.
9) DRIVE SPINUP
On a related note to initialization performance, SGI would like to
spin drives up on a modulo-N basis rather than individually (where
N = 8,9,10, etc). This will allow a system to spin up in 1/N'th
the time. This will be a SFF-8045 proposal.
10) POSITION SUPPORT OPTIONAL
LALP and LARP will be made optional, since support is discovered
dynamically during initialization.
11) FC-PH2 CHANGES
My document will be submitted to X3T11 in Sarasota to fix:
a) Class 3 streamed and open sequence definitions
b) FLOGI requirement for ports which do not support F_Ports
12) FCP TASK MANAGEMENT RESPONSE
Task Management functions will now be transmitted in T1, and responded
to using I4. Response codes are 00, 04, and 05 corresponding to
- Function Complete,
- Not Performed (not supported), and
- Not Performed (target or service delivery failure)
respectively. Absence of an FCP_RSP will indicate a Target or Service
delivery failure which is unreportable (corrupted frame, internal HW
error, etc). HP will modify their public review comment
appropriately. No PRLI bits necessary, this will be a mandatory, not
an optional behavior.
13) PERFORMANCE AND BB_CREDIT
Radek shared some simulations which showed the effect of
Login_BB_Credit and Available_BB_Credit on throughput and IOs per
second.
Assumptions:
- Barracuda-class drives - Max seek time = 15ms
- Queue depth = 1 - Min seek time = 500us
- 4096 cylinders - Head-sector spiral skew = 16
- 16 heads - Cylinder-sector spiral skew = 8
- 128 sectors/track - Head switch time = 1ms
- Burst len = 128 blocks - Read/write FIFO = 256 blocks
- 2048-byte frames - 72 IOs/sec for each disk
Offbus command overhead = 5us (host), 30us (disk decode)
Offbus response overhead = 5us (host), 30us (disk)
Host Onbus context switch overhead = 1us
Onbus pre-CLS overhead = 1us (host), 1us (disk)
Host Onbus FC-FIFO move overhead = 0us
Disk Offbus re-instruct overhead = 10us
I/Os per second Observations (2kbyte Random reads and writes):
- I/Os per second remain independent of Login_BB_Credit for
configurations under 80 disks (Login_BB_Credit = 0 at both
disk and host, peaks at 5300 IOs/sec with 80 disks)
- Login_BB_Credit allows the system to get more effective use of
about 20 more disks (peak at 100 disks, with 7000 IOs/sec). This
is with Login_BB_Credit = 1 or 2 at host and 1 or 2 at disk.
- With faster disks, the curves will peak out with fewer drives
Throughput observations (512kbyte random reads and writes in 64k
bursts):
- Login_BB_credit is not a major problem, since it is amortized
over 32 frames. However, AVAILABLE credit does have an effect.
- Little difference between Available credit = 2 or 3, regardless
of whether Login_BB_Credit is 0, 1, 2, or 3. Curves flatten
out at about 95MB/s with 20 drives. Adding more drives does not
improve throughput.
- If Available Credit is 1, curves peak at 15 drives, 78MB/s and
DECREASE as drives are added.
Summary: Nonzero Login BB_Credit is needed to improve latency
and IOs/sec. Having "fast drain" architectures and large
link-rate buffers improves throughput.
14) NONZERO BB_CREDIT
Annex B was discussed, and nobody had a problem with changing the
profile to require that all ports must be able to be opened half
duplex (in order to accomodate the Annex B proposal for nonzero
Login_BB_Credit).
However, Dal pointed out that there are other methods of
supporting nonzero Login_BB_Credit without using half-duplex. For
example, if a port can guarantee that it's AVAILABLE BB_Credit is
always equal to TWICE its Login_BB_Credit, it can accomodate the
situation where data frames arrive along with a forwarded CLS,
followed by an immediate OPN from another port.
Also, an error with the last paragraph (bullet 4) of Annex B was
pointed out. It is not necessary to discard R_RDYs. If nonzero
Login_BB_Credit is N, a port must actually have N+1 available
buffers, since FC-AL states that ONE OR MORE R_RDYs must be
returned on receipt of an OPN. A port is only required to return
a minimum of one R_RDY, which is ADDED to the Login_BB_Credit.
Therefore the minimum buffering requirements of such a port would
be Available_BB_Credit = Login_BB_Credit + 1. On Login, such
ports must advertise Login_BB_Credit equal to 1 less than the
Available_BB_Credit that they can guarantee on OPN.
Annex B will be rewritten to acccomodate these changes and offer
both methods of supporting nonzero Login_BB_Credit (conservative
reporting of Login_BB_Credit or half-duplex opens).
15) FIXED LENGTH INTERMEDIATE FRAMES WITHIN A MULTI-FRAME SEQUENCE
Attached is my note to the reflector re-evaluating the need to
accomodate this. Seagate requires that all intermediate frames of
an inbound multi-frame Sequence be equal in length to the Receive
Data_Field Size communicated in login by the Host (not a reliable
mechanism).
> From: Kurt Chan
> To: Direct attach disk reflector (disk_attach at dt.wdc.com)
> Subj: TX Buffer Size Negotiation
> Date: Mon Feb 6 10:51:51 PST 1995
>
> This note is related to Seagate's requirement that all inbound
> Sequences to the disk contain "maximum-sized" frames except for
> the last frame of the Sequence.
>
> After thinking about this proposal to X3T11, I've come to the
> conclusion that negotiating a minimum TX buffer size would be
> more complicated than previously thought. Also, this issue
> appears to be quite topology-specific (in-order topologies only)
> and will not have a very universal appeal to the X3T11 group.
>
> Therefore, I'm suggesting we talk more about whether we want this
> to be a vendor unique behavior, disoverable through probing VU
> information in a standard manner, or whether this feature should
> be part of FC-PH login.
>
> Here are the difficulties:
>
> - I assume we want ports to be able to be asymmetric. For
> example, a port may require all but the last RECEIVED frame of
> an MFS (multi-frame Sequence) to be of a minimum size, but it
> might be capable of TRANSMITTING "runt" frames in the middle of
> an MFS outbound. Conversely, a port may NOT require inbound
> frames to be of a minimum size, but may be capable of
> accomodating those that do.
>
> - Having asymmetry creates the need for two bits. Let's assume
> we would use the upper two reserved bits in the Max Rx
> Data_Field Size parameter:
>
> 1) one bit that is a request to the other port to negotiate a
> minimum transmit buffer size,
>
> 2) another (validity) bit that states whether or not bits 11-0
> represent just the maximum RX Data_Field size (per normal
> FC-PH), or whether that field represents the SMALLER of
> either the min TX size or the max RX size.
>
> - Then the negotiation rules and relogin procedures must be
> defined if: a) the request bit is set in PLOGI b) the validity
> bit is NOT set in PLOGI but the ACC contains the request bit c)
> the validity bit is set in PLOGI but bits 11-0 are too large
> for Login Responder to accept
>
> etc...
>
> Over the next couple of weeks I'll think about proposing an X3T11
> standard method for communicating VU information, and we can
> discuss over this reflector. We should be able to make some
> simplifying assumptions with the VU proposal.
16) FCP_DL SPECIFICATION
The profile willspecify that the FCP_DL value must be set equal to
the block size times the number of blocks for direct disk attach.
17) TYPE-SPECIFIC PRLO
FCP will be changed to make PRLO type-specific, so that Process
Logout does not affect all protocol types.
Many thanks to Norm Harris from Adaptec for hosting, and to Bob
Snively for supplementing my note-taking.
More information about the T10
mailing list