FCP-2 recovery problem

Binford, Charles cbinford at lsil.com
Mon Jun 26 10:51:37 PDT 2000


* From the T10 Reflector (t10 at t10.org), posted by:
* "Binford, Charles" <cbinford at lsil.com>
*
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_001_01BFDF97.2E7F4854
Content-Type: text/plain; charset="iso-8859-1"

255 would be large enough if the LUN was part of the REC.  32 bits is
large enough without LUN in the REC.  REC is *already* defined without
LUN, therefore I'm supporting Dave Baldwin's idea that meets the
following goals:

- no change in defined ELS payload lengths 
- enough information to cover the hole 
- simple to implement 
- doesn't have compatibility problems 

Charles Binford 
LSI Logic Storage Systems 
(316) 636-8566 


-----Original Message----- 
From: David A. Peterson [ mailto:dap at storage.network.com
 ] 
Sent: Friday, June 23, 2000 7:21 PM 
To: Binford, Charles 
Cc: T10 at t10.org; 'FC Reflector' 
Subject: Re: FCP-2 recovery problem 



I'd rather qualify the REC/SRR using OX_ID, CRN, LUN. 
A max of 255 outstanding commands to a queueing tape lun works for me. 
Are we trying to make the error recovery work/bullet-proof for disk? 

> "Binford, Charles" wrote: 
> 
> Oops.  I forgot to consider I_T vs. I_T_L behavior on this.  I don't 
> think the CRN (at 8 bits) is large enough to close the hole (it wraps 
> too soon). 
> 
> Summary:  Back to Dave's proposal.  I'll retract my layering argument 
> in favor of something that will work. :-)  I consider my layering 
> argument less important than the compatibility issues of enlarging the

> payload.  There are not enough reserved bytes to not change the 
> payload size and have enough bits to close the hole.  Therefore, I'm 
> back to Dave's change the parameter data. 
> 
> ******* 
> Details for those interested in why my CRN suggestion won't work: 
> 
> Suppose I have two non-queuing tape devices behind bridge being 
> represented as two LUNs behind a single target.  Take Dave's original 
> scenario, add my CRN field in the REC and we still come up short. 
> 
> **** from Dave's original email with CRN info added ***** 
> Initiator                                        Target 
> 
> CMD ----------------------------> 
> 
> 1. A command (e.g. Test Unit Ready) is sent to the target with 
> OX_ID=1. (CRN=5) 
> 
>            <---------------------------     Response 
> 
> 2. A "good" response is sent back to the initiator. The initiator gets

> 
> the response and knows the TUR command has been completed, so the 
> exchange resources are freed. The target has sent the response, so it 
> saves the exchange information just in case the initiator needs to 
> recover a dropped response with REC/SRR. 
> 
> CMD ---------------------------->  X (dropped frame) 
> 
> 3. A new command (e.g. SPACE forward 1 block) is sent to the target 
> with 
> OX_ID = 1 (AND CRN=6). This OX_ID reuse can occur for many reasons in 
> various 
> systems. The command never makes it to the target because of a bit 
> error. 
> 
> REC ------------------------------> 
> 
> 4. The initiator sends an REC ELS command to the target to make sure 
> all 
> is well with OX_ID 1 (and CRN=6). 
> 
> ******* end cut and paste from Dave's email ****** 
> 
> At this point my plan is working - single LUN.  But consider 
> Multi-LUN.  IF you change the scenario only slightly we break: 
> 
> CMD ------OX_ID=1, CRN=6, LUN=1 ----------------------> 
> 
>            <---------------------------     Response 
> 
> CMD -----OX_ID=1, CRN=6, LUN=2 ----->  X (dropped frame) 
> 
> REC -----OX_ID=1, CRN=6  --------------------> 
> 
> The target can't tell if host wants LUN=1 data (lost response) or if 
> the host is really talking about the lost CMD to LUN=2 
> 
> With Dave's proposal an initiator can increment the payload field of 
> the FCP_CMD frame every time he sends it out.  He only needs to track 
> it for the open exchanges.  Because this is a 32 bit field the wrap 
> time is so long we don't have to worry about it.  That is why it 
> works! 
> 
> Charles Binford 
> LSI Logic Storage Systems 
> (316) 636-8566 
> 
> -----Original Message----- 
> From: David A. Peterson [ mailto:dap at storage.network.com
 ] 
> Sent: Friday, June 23, 2000 12:29 PM 
> To: Binford, Charles 
> Cc: T10 at t10.org; 'FC Reflector' 
> Subject: Re: FCP-2 recovery problem 
> 
> Howdy All, 
> Finally catching up on emails. 
> Charles proposal is exactly what I was thinking also. In my reading of

> 
> the CRN text in FCP-2 today, it does not explicitly state the CRN is 
> based on the I_T_L nexus, but the EPDC bit is contained in the lun 
> control mode page, so I guess it is implied. Would be nice to see some

> 
> text stating this. 
> 
> Anyways, I think the proposal would work if the CRN is based on the 
> I_T_L 
> nexus (i.e. not I_T nexus). 
> 
> Dave 
> 
> "Binford, Charles" wrote: 
> > 
> > * 
> > * From the fc reflector, posted by: 
> > * "Binford, Charles" <cbinford at lsil.com> 
> > * 
> > Carl, I fail to see how adding one bit does any more than extend the

> 
> > OX_ID 
> > field by one bit.  Now instead of roll over at 64k, you roll over at

> 
> > 128k. 
> > However, in many implementations I'm familiar with the OX_ID range 
> used 
> > is 
> > much smaller.  This is because people are using it as designed by 
> the FC 
> > committee - to be an HW lookup index to the resources associated 
> with 
> > the 
> > exchange.  Most FC chips don't have resources to support 64K 
> exchanges 
> > per 
> > d_id and thus the real range for OX_ID is much smaller.  (We'd be 
> guilty 
> > of 
> > designing a very limiting architecture if the typical implementation

> was 
> > actually bumping up against the limitations of the standard.) 
> > 
> > The other point (which is more important to this discussion) I want 
> to 
> > make 
> > about OX_ID assignments is that because it is an index into chip 
> > resources 
> > the management of it is often in a very low layer of the driver that

> has 
> > no 
> > knowledge of the payload.  The layer building the FCP_CMD payload 
> has no 
> > clue what OX_ID is going to be assigned, the layer assigning the 
> OX_ID 
> > has 
> > no clue about any 'Hermann' bit that may or may not need to be set. 
> > 
> > While I believe Dave Baldwin's solution will work, I also have some 
> > reservations about it.  Again it is the layering thing.  FC driver 
> > interfaces would have to be changed to allow the upper layer to 
> specify 
> > a 
> > value to be placed in the header (built by a lower layer).  I'd 
> rather 
> > place 
> > the new data in the payload that is built by the same layer that 
> > understands 
> > what is going on. 
> > 
> > ********* here is my proposal ********** 
> > For any command that can't be simply aborted and retried (i.e. not 
> > Inquiry, 
> > etc.) use a non-zero CRN value.  Define the reserved byte in the REC

> 
> > payload 
> > to hold the CRN value if non-zero.  A target receiving an REC with a

> 
> > non-zero "CRN" value shall match it and the OX_ID before determining

> if 
> > to 
> > ACC or RJT the command. 
> > 
> > This is basically the same thing as Dave's proposal with the 
> following 
> > modification: 
> > - new data in payload instead of header 
> > - on 8 bits instead of 32 
> > 
> > I'd argue that 8 bits is sufficient for the same reasons it is large

> 
> > enough 
> > for command delivery ordering.  There it not a need to queue larger 
> than 
> > 255 
> > for sequential devices. 
> > 
> > Comments?? 
> > 
> > Charles Binford 
> > LSI Logic Storage Systems 
> > (316) 636-8566 
> > 


------_=_NextPart_001_01BFDF97.2E7F4854
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: Quoted-Printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">

RE: FCP-2 recovery problem 255 would be large enough if the LUN was part of the = REC.  32 bits is large enough without LUN in the REC.  REC is = *already* defined without LUN, therefore I'm supporting Dave Baldwin's = idea that meets the following goals: - no change in defined ELS payload lengths 
- enough information to cover the hole 
- simple to implement 
- doesn't have compatibility problems Charles Binford 
LSI Logic Storage Systems 
(316) 636-8566 
-----Original Message----- 
From: David A. Peterson [mailto:dap at storage.network.com] 
Sent: Friday, June 23, 2000 7:21 PM 
To: Binford, Charles 
Cc: T10 at t10.org; 'FC Reflector' 
Subject: Re: FCP-2 recovery problem 

I'd rather qualify the REC/SRR using OX_ID, CRN, = LUN. 
A max of 255 outstanding commands to a queueing tape = lun works for me. 
Are we trying to make the error recovery = work/bullet-proof for disk? > ;Binford, Charles; wrote: 
> 
> Oops.  I forgot to consider I_T vs. I_T_L = behavior on this.  I don't 
> think the CRN (at 8 bits) is large enough to = close the hole (it wraps 
> too soon). 
> 
> Summary:  Back to Dave's proposal.  = I'll retract my layering argument 
> in favor of something that will work. :-)  = I consider my layering 
> argument less important than the compatibility = issues of enlarging the 
> payload.  There are not enough reserved = bytes to not change the 
> payload size and have enough bits to close the = hole.  Therefore, I'm 
> back to Dave's change the parameter = data. 
> 
> ******* 
> Details for those interested in why my CRN = suggestion won't work: 
> 
> Suppose I have two non-queuing tape devices = behind bridge being 
> represented as two LUNs behind a single = target.  Take Dave's original 
> scenario, add my CRN field in the REC and we = still come up short. 
> 
> **** from Dave's original email with CRN info = added ***** 
> = Initiator          sp;           sp;           sp;     Target 
> 
> CMD ----------------------------> 
> 
> 1. A command (e.g. Test Unit Ready) is sent to = the target with 
> OX_ID=3D1. (CRN=3D5) 
> 
>         ;  <---------------------------     = Response 
> 
> 2. A ;good; response is sent back to = the initiator. The initiator gets 
> 
> the response and knows the TUR command has been = completed, so the 
> exchange resources are freed. The target has = sent the response, so it 
> saves the exchange information just in case the = initiator needs to 
> recover a dropped response with REC/SRR. 
> 
> CMD ---------------------------->  X = (dropped frame) 
> 
> 3. A new command (e.g. SPACE forward 1 block) = is sent to the target 
> with 
> OX_ID =3D 1 (AND CRN=3D6). This OX_ID reuse can = occur for many reasons in 
> various 
> systems. The command never makes it to the = target because of a bit 
> error. 
> 
> REC ------------------------------> 
> 
> 4. The initiator sends an REC ELS command to = the target to make sure 
> all 
> is well with OX_ID 1 (and CRN=3D6). 
> 
> ******* end cut and paste from Dave's email = ****** 
> 
> At this point my plan is working - single = LUN.  But consider 
> Multi-LUN.  IF you change the scenario = only slightly we break: 
> 
> CMD ------OX_ID=3D1, CRN=3D6, LUN=3D1 = ----------------------> 
> 
>         ;  <---------------------------     = Response 
> 
> CMD -----OX_ID=3D1, CRN=3D6, LUN=3D2 = ----->  X (dropped frame) 
> 
> REC -----OX_ID=3D1, CRN=3D6  = --------------------> 
> 
> The target can't tell if host wants LUN=3D1 = data (lost response) or if 
> the host is really talking about the lost CMD = to LUN=3D2 
> 
> With Dave's proposal an initiator can increment = the payload field of 
> the FCP_CMD frame every time he sends it = out.  He only needs to track 
> it for the open exchanges.  Because this = is a 32 bit field the wrap 
> time is so long we don't have to worry about = it.  That is why it 
> works! 
> 
> Charles Binford 
> LSI Logic Storage Systems 
> (316) 636-8566 
> 
> -----Original Message----- 
> From: David A. Peterson [mailto:dap at storage.network.com] 
> Sent: Friday, June 23, 2000 12:29 PM 
> To: Binford, Charles 
> Cc: T10 at t10.org; 'FC Reflector' 
> Subject: Re: FCP-2 recovery problem 
> 
> Howdy All, 
> Finally catching up on emails. 
> Charles proposal is exactly what I was thinking = also. In my reading of 
> 
> the CRN text in FCP-2 today, it does not = explicitly state the CRN is 
> based on the I_T_L nexus, but the EPDC bit is = contained in the lun 
> control mode page, so I guess it is implied. = Would be nice to see some 
> 
> text stating this. 
> 
> Anyways, I think the proposal would work if the = CRN is based on the 
> I_T_L 
> nexus (i.e. not I_T nexus). 
> 
> Dave 
> 
> ;Binford, Charles; wrote: 
> > 
> > * 
> > * From the fc reflector, posted by: 
> > * ;Binford, Charles; = <cbinford at lsil.com> 
> > * 
> > Carl, I fail to see how adding one bit = does any more than extend the 
> 
> > OX_ID 
> > field by one bit.  Now instead of = roll over at 64k, you roll over at 
> 
> > 128k. 
> > However, in many implementations I'm = familiar with the OX_ID range 
> used 
> > is 
> > much smaller.  This is because people = are using it as designed by 
> the FC 
> > committee - to be an HW lookup index to = the resources associated 
> with 
> > the 
> > exchange.  Most FC chips don't have = resources to support 64K 
> exchanges 
> > per 
> > d_id and thus the real range for OX_ID is = much smaller.  (We'd be 
> guilty 
> > of 
> > designing a very limiting architecture if = the typical implementation 
> was 
> > actually bumping up against the = limitations of the standard.) 
> > 
> > The other point (which is more important = to this discussion) I want 
> to 
> > make 
> > about OX_ID assignments is that because it = is an index into chip 
> > resources 
> > the management of it is often in a very = low layer of the driver that 
> has 
> > no 
> > knowledge of the payload.  The layer = building the FCP_CMD payload 
> has no 
> > clue what OX_ID is going to be assigned, = the layer assigning the 
> OX_ID 
> > has 
> > no clue about any 'Hermann' bit that may = or may not need to be set. 
> > 
> > While I believe Dave Baldwin's solution = will work, I also have some 
> > reservations about it.  Again it is = the layering thing.  FC driver 
> > interfaces would have to be changed to = allow the upper layer to 
> specify 
> > a 
> > value to be placed in the header (built by = a lower layer).  I'd 
> rather 
> > place 
> > the new data in the payload that is built = by the same layer that 
> > understands 
> > what is going on. 
> > 
> > ********* here is my proposal = ********** 
> > For any command that can't be simply = aborted and retried (i.e. not 
> > Inquiry, 
> > etc.) use a non-zero CRN value.  = Define the reserved byte in the REC 
> 
> > payload 
> > to hold the CRN value if non-zero.  A = target receiving an REC with a 
> 
> > non-zero ;CRN; value shall match = it and the OX_ID before determining 
> if 
> > to 
> > ACC or RJT the command. 
> > 
> > This is basically the same thing as Dave's = proposal with the 
> following 
> > modification: 
> > - new data in payload instead of = header 
> > - on 8 bits instead of 32 
> > 
> > I'd argue that 8 bits is sufficient for = the same reasons it is large 
> 
> > enough 
> > for command delivery ordering.  There = it not a need to queue larger 
> than 
> > 255 
> > for sequential devices. 
> > 
> > Comments?? 
> > 
> > Charles Binford 
> > LSI Logic Storage Systems 
> > (316) 636-8566 
> > 
------_=_NextPart_001_01BFDF97.2E7F4854--




More information about the T10 mailing list