Time to write routing tables

Fairchild, Steve Steve.Fairchild at hp.com
Thu Feb 27 13:14:36 PST 2003


* From the T10 Reflector (t10 at t10.org), posted by:
* "Fairchild, Steve" <Steve.Fairchild at hp.com>
*
This is a multi-part message in MIME format.

------_=_NextPart_001_01C2DEA5.3AFF9155
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hugh,
=20
My answers are below.
=20
Thanks,
=20
Steve

-----Original Message-----
From: Hugh Curley [mailto:hcurley at indra.com]
Sent: Thursday, February 27, 2003 1:52 PM
To: Fairchild, Steve; t10 at t10.org
Subject: Re: Time to write routing tables


Steve,
=20
Thanks for the quick response.
=20
Are you saying that a single initiator can do 999 Discoveries and 1000
SMP Configure Route Information operations in 5 ms?[Fairchild, Steve]
No, I'm saying that the multiple requests to configure a single device
for routing through multiple levels would be less than 5 ms.  I picked
that number based on being very generous with the time, I believe
implementations will be well below this number.  Time will tell.
=20
Actually the number of discoveries (and I believe the number of Route
table entries) would be much greater than 1000.  I must first discover
the edge expander to which I am directly attached (Level 1), the edge
expander to which it is attached (part of a edge expander device
set)(Level 2), the fanout expander (level 3), the edge expander on the
other end (level 4) and finally the end device (level 5).  So for each
phy (end device, I assume) on that far end edge expander, I must do 3,
4, 5 or 6 discoveries.  So the number of discoveries could number many
thousand for 1000 end devices.[Fairchild, Steve]  I agree, but the time
to do those multiple requests should be in the 5 ms per device.  In a
topology as large as you are describing, I would also suspect that the
fanout expander itself would be more intelligent (ie. self =
configuring).
If that is the case, then there would be no need for an initiator to
configure any expander on the other side of the fanout device.  It =
would
necessarily attempt to configure the expanders within the edge that it
is connected to, but it would not have to compete with a "more
intelligent" fanout expander to configure the entire domain.  If the
initiators were connected to the fanout directly, then only the fanout
would have to configure the domain, the initiators would see the =
fanout,
see that it did not need configuration and stop there.
=20
Either I do not understand something in this setup, or I believe SAS
will not scale well.[Fairchild, Steve]  See the comment above.  The
wording in the spec today is trying to address initial implementations
with primary emphasis on domains without "intelligent" fanout devices.
I would expect SAS2 to address more on fanout devices and larger
domains.
=20
I am actively soliciting input to help me understand this.
=20
I am well aware of the 1/4 second wait for each empty address on the
parallel bus, so that the minimum working bus (2 devices, 14 empty
addresses) would be 3.5 seconds.  The parallel bus has many other
limitations such as number of devices and distances that SAS fixes very
well.  And, if we keep SAS configurations small like parallel SCSI,
there will be no problem.  But we cannot have a standard that allows =
16K
devices and "hope" that no one tries that many.[Fairchild, Steve]  FC
has an address range that is significantly greater that 16k, but no one
builds a topology much larger than a few hundred drives.  The
addressability should not be construed as saying you should connect =
that
many devices.  I would expect initial implementations are more =
concerned
with being able to provide a reasonable number of devices within an
edge.  HP requested an increase from 64 devices to 128, not because we
expect to have 128 devices in an edge, but because we want to provide =
at
least 64 devices along with access to managment ports within the edge.
=20
100 initiators performing the discovery and configuration would not
equal 1 initiator times 100, but the activity will not be without
increase.  100 initiators arbitrating for the same resources are going
to cause a lot of ARB Lost.[Fairchild, Steve]  I agree, but during an
initial power up this would be a problem in almost any environment, =
with
the change counts in the expanders the discovery process should be
reduced significantly once the domain is discovered.  The spec does not
describe the possible optimizations for the discovery, only the layout
to get everything right.  If the domain is fairly static then the =
impact
of discovery on a running topology should be minimal.  Even swapping =
out
enclosures should have an isolated affect and not require a full
discovery.=20
=20
Thanks,
=20
Hugh Curley
hcurley at indra.com =20
=20

----- Original Message -----=20
From: Fairchild, Steve  =20
To: Hugh Curley   ; t10 at t10.org
<mailto:t10 at t10.org> =20
Sent: Thursday, February 27, 2003 9:35 AM
Subject: RE: Time to write routing tables


Hugh,
=20
I don't think there is a way to pick who does the configuration.  =
Mainly
because an initiator cannot find out who else is in the domain
(especially as large as you describe) without doing the configuration =
of
the tables first to be able to reach the other devices.  So it is
simpler to say that all initiators "shall" go through the exercise.  If
the first initiator to configure the expander marked the configuration
as complete, when would it do it?  How are CHANGEs managed?
=20
As to your question about how long to configure something with 1000 end
devices.  If you allow a very generous delay of 5ms per device then it
would be 5 seconds for a single initiator to configure the entire
topology.  Multiple initiators would be configuring the topology in
parallel, so the time should not be additive.
=20
For a more reasonable topology of around 128 end devices and a couple =
of
initiators it would be less than 1 second.
Current parallel SCSI designs allow 250 ms to timeout selections for
attached devices on a single bus of 16 devices which would mean an =
empty
bus takes at least 3.75 seconds to indicate no devices are attached.
=20
So I don't think the configuration by each initiator is an overwhelming
burden.
=20
Thanks,
=20
Steven Fairchild=20
Senior Member Technical Staff=20
Hewlett-Packard Corporation=20
MS150901=20
20555 SH 249=20
Houston, TX 77070=20
281 514 6448=20
steve.fairchild at hp.com=20


-----Original Message-----
From: Hugh Curley [mailto:hcurley at indra.com]
Sent: Thursday, February 27, 2003 4:46 AM
To: t10 at t10.org
Subject: Time to write routing tables



If I understand the protocol correctly, when the domain powers on (or
when one or more devices are added), all the initiators will discover
the entire topology by using Discovery one phy at a time.  All
initiators will then write the complete routing table for each expander
that has a configurable routing table. =20
=20
Changing the standard from "all initiators shall discover and write the
routing table" to "all initiators should discover and write the routing
table" simply means that in some configurations will have no initiators
that do this, while in other configurations all initiators will still =
do
it.  When I purchase the equipment for my new SAS domain, I will
probably by all the initiators from the same vendor.  If brand X writes
the routing table, then all my initiators will attempt to do so.  If
brand Y does not write the routing table, then none of my initiators
will attempt to do so.
=20
Let us imagine a domain that uses only 8% of the total possible
connections; consisting of 100 initiators and 900 targets.  How long
will it take to do discovery and update the routing tables.
=20
Would it not be quicker if a single initiator was selected to create =
the
tables?  It could be the one with the highest SAS address, lowest SAS
address, or the first one there marks the table unconfigurable, or
places a reserve (similar to the SCSI Reserve) on the SMP target.
=20
Am I on target?
=20
Thanks,
=20
Hugh Curley
hcurley at indra.com =20
=20


------_=_NextPart_001_01C2DEA5.3AFF9155
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

 Hugh,
  
 My=20 answers are below.
  
 Thanks,
  
 Steve
 -----Original Message-----
From: Hugh Curley=20 [mailto:hcurley at indra.com]
Sent: Thursday, February 27, = 2003 1:52=20 PM
To: Fairchild, Steve; t10 at t10.org
Subject: Re: = Time to=20 write routing tables


 Steve,
  
 Thanks for the quick = response.
  
 Are you saying that a single = initiator can=20 do 999 Discoveries and 1000 SMP Configure Route Information = operations in 5=20 ms?[Fairchild, = Steve]   No, I'm saying that the multiple requests to = configure=20 a single device for routing through multiple levels would be less = than 5=20 ms.  I picked that number based on being very generous with the = time, I=20 believe implementations will be well below this number.  = Time will=20 tell.
  
 Actually the number of = discoveries (and I=20 believe the number of Route table entries) would be much greater than = 1000.  I must first discover the edge expander to which I am = directly=20 attached (Level 1), the edge expander to which it is attached (part = of a edge=20 expander device set)(Level 2), the fanout expander (level 3), the = edge=20 expander on the other end (level 4) and finally the end device (level = 5).  So for each phy (end device, I assume) on that far end edge = expander, I must do 3, 4, 5 or 6 discoveries.  So the number of=20 discoveries could number many thousand for 1000 end devices.[Fairchild, = Steve]  I=20 agree, but the time to do those multiple requests should be in = the 5 ms=20 per device.  In a topology as large as you are describing, I = would also=20 suspect that the fanout expander itself would be more intelligent = (ie. self=20 configuring).  If that is the case, then there would be no need = for an=20 initiator to configure any expander on the other side of the fanout=20 device.  It would necessarily attempt to configure the expanders = within=20 the edge that it is connected to, but it would not have to compete = with a=20 "more intelligent" fanout expander to configure the entire = domain. =20 If the initiators were connected to the fanout directly, then only = the fanout=20 would have to configure the domain, the initiators would see the = fanout, see=20 that it did not need configuration and stop=20 there.
  
 Either I do not understand = something in=20 this setup, or I believe SAS will not scale well.[Fairchild, = Steve]  See the=20 comment above.  The wording in the spec today is trying to = address=20 initial implementations with = primary emphasis on domains=20 without "intelligent" fanout devices.  I would expect SAS2 to = address=20 more on fanout devices and larger = domains.
  
 I am actively soliciting input to = help me=20 understand this.
  
 I am well aware of the 1/4 = second wait for=20 each empty address on the parallel bus, so that the minimum working = bus (2=20 devices, 14 empty addresses) would be 3.5 seconds.  The parallel = bus has=20 many other limitations such as number of devices and distances that = SAS fixes=20 very well.  And, if we keep SAS configurations small like = parallel SCSI,=20 there will be no problem.  But we cannot have a standard that = allows 16K=20 devices and "hope" that no one tries that many.[Fairchild, = Steve]  FC has=20 an address range that is significantly greater that 16k, but no = one=20 builds a topology much larger than a few hundred drives.  = The=20 addressability should not be construed as saying you should connect = that many=20 devices.  I would expect initial implementations are = more=20 concerned with being able to provide a reasonable number of devices = within an=20 edge.  HP requested an increase from 64 devices to 128, not = because=20 we expect to have 128 devices in an edge, but because we want to = provide at=20 least 64 devices along with access to managment ports within the=20 edge.
  
 100 initiators performing the = discovery and=20 configuration would not equal 1 initiator times 100, but the activity = will not=20 be without increase.  100 initiators arbitrating for the same = resources=20 are going to cause a lot of ARB Lost.[Fairchild, Steve]  I agree, but during an = initial power up=20 this would be a problem in almost any environment, with the change = counts in=20 the expanders the discovery process should be reduced significantly = once the=20 domain is discovered.  The spec does not describe the possible=20 optimizations for the discovery, only the layout to get everything=20 right.  If the domain is fairly static then the impact of = discovery on a=20 running topology should be minimal.  Even swapping out = enclosures=20 should have an isolated affect and not require a full=20 discovery. 
  
 Thanks,
  
 Hugh Curley
 hcurley at indra.com
  
 ----- Original Message ----- 
black">From:=20 Fairchild, Steve 
To: Hugh Curley ; t10 at t10.org 
Sent: Thursday, February 27, = 2003 9:35=20 AM
 Subject: RE: Time to write = routing=20 tables
 

Hugh,
  
 I=20 don't think there is a way to pick who does the = configuration. =20 Mainly because an initiator cannot find out who else is in the = domain=20 (especially as large as you describe) without doing the = configuration of the=20 tables first to be able to reach the other devices.  So it is = simpler=20 to say that all initiators "shall" go through the exercise.  = If the=20 first initiator to configure the expander marked the = configuration as=20 complete, when would it do it?  How are CHANGEs=20 managed?
  
 As=20 to your question about how long to configure something with 1000 = end=20 devices.  If you allow a very generous delay of 5ms per device = then it=20 would be 5 seconds for a single initiator to configure the entire=20 topology.  Multiple initiators would be configuring the = topology in=20 parallel, so the time should not be additive.
  
 For a more reasonable topology of around 128 end devices = and a couple=20 of initiators it would be less than 1 second.
 Current parallel SCSI designs allow 250 ms to timeout = selections=20 for attached devices on a single bus of 16 devices which would mean = an empty=20 bus takes at least 3.75 seconds to indicate no devices are=20 attached.
  
 So=20 I don't think the configuration by each initiator is an = overwhelming=20 burden.
  
 Thanks,
  
 Steven=20 Fairchild 
Senior = Member Technical=20 Staff 
Hewlett-Packard=20 Corporation 
MS150901=20 
20555 SH 249 = 
Houston, TX 77070 
281 514 6448 
steve.fairchild at hp.com 

-----Original Message-----
From: Hugh Curley=20 [mailto:hcurley at indra.com]
Sent: Thursday, February 27, = 2003=20 4:46 AM
To: t10 at t10.org
Subject: Time to = write routing=20 tables


 If I understand the protocol = correctly, when=20 the domain powers on (or when one or more devices are added), all = the=20 initiators will discover the entire topology by using Discovery = one phy at=20 a time.  All initiators will then write the complete routing = table=20 for each expander that has a configurable routing table. =20 
 
 Changing the standard from "all initiators shall discover = and write=20 the routing table" to "all initiators should discover and = write=20 the routing table" simply means that in some configurations = will have=20 no initiators that do this, while in other configurations all = initiators=20 will still do it.  When I purchase the equipment for my new = SAS=20 domain, I will probably by all the initiators from the same = vendor. =20 If brand X writes the routing table, then all my initiators = will=20 attempt to do so.  If brand Y does not write the routing = table, then=20 none of my initiators will attempt to do so.
  
 Let us imagine a domain that = uses only 8% of=20 the total possible connections; consisting of 100 initiators and = 900=20 targets.  How long will it take to do discovery and update = the=20 routing tables.
  
 Would it not be quicker if a = single initiator=20 was selected to create the tables?  It could be the one with = the=20 highest SAS address, lowest SAS address, or the first one there = marks the=20 table unconfigurable, or places a reserve (similar to the SCSI = Reserve) on=20 the SMP target.
  
 Am I on target?
  
 Thanks,
  
 Hugh Curley
 hcurley at indra.com
 =  



------_=_NextPart_001_01C2DEA5.3AFF9155--




More information about the T10 mailing list