[T11.3] Re: [Fwd: FCP-2: Lost FCP_CMND, Unacknowledged classes.]

John Tyndall jtyndall at Crossroads.com
Thu Jul 11 11:06:44 PDT 2002


INCITS T11.3 Mail Reflector
********************************
Doesn't Table 34 in FCP-2 Section 11.1 already extend RR_TOV to be at =
least 3 times REC_TOV? Does this help or is there still a problem.

John Tyndall
Architect
Crossroads Systems Inc.
Email  : jtyndall at crossroads.com
Phone : 512-928-7282


-----Original Message-----
From: Santosh Rao [mailto:santoshr at cup.hp.com]=20
Sent: Thursday, July 11, 2002 1:09 PM
To: RogerR at exabyte.com
Cc: t10 at t10.org; t11_3 at mail.t11.org
Subject: [T11.3] Re: [Fwd: FCP-2: Lost FCP_CMND, Unacknowledged =
classes.]


INCITS T11.3 Mail Reflector
********************************
Roger,

Thanks for your response. A couple of comments below.

> PPS. Due to buffering issues, the case you describe with the lost=20
> FCP_RSP is a potential problem even if the command does involve data=20
> transfers.

The reason I did'nt bring up the above case is because it is usually =
possible for the initiator HBA driver to track (based on its scsi =
exchange state) whether some bytes were transferred as a part of this =
exchange for those I/Os that involve a data transfer. Thus, a lost =
FCP_RSP for an exchange that involved data transfer can be distingushed =
|from a lost FCP_CMD case based on the fact that the scsi exchange state =
within the driver indicates some bytes were transferred.

However, if all the FCP_DATA IUs and the FCP_RSP IU was lost, then, the =
below scenario discussed does apply.

> The weird catch is that even if the two conditions you describe could =

> be distinguished, some cases where FCP_RSP is lost would still =
require=20
> a READ POSITION and initiator knowledge of the expected location (or=20
> some other way of determining/establishing position).  The reason for =

> this is that some tape commands can be partially completed, and the=20
> information on how much of the command actually completed is in the=20
> lost FCP_RSP.

If we could distingush b/n a lost FCP_CMD and a lost FCP_RSP (by =
increasing RR_TOV or using another timer for the discard of exchange =
state, such that the target would not discard state information), then, =
the REC response should indicate the exchange state giving equivalent =
information as conveyed in FCP_RSP.

In the case where FCP-2 SLER is unsuccessful, the driver returns an =
error on that I/O to the SCSI ULP and the error is propagated to the =
tape application which may choose to issue READ POSITION or other =
commands to determine the state of the tape device.

Thanks,
Santosh


> RogerR at exabyte.com wrote:
>=20
> In addition to what Dave Peterson already answered, I might provide a =

> little help with the tape case.  This probably isn't the answer you=20
> would prefer to hear, but it is what we have to work with these days. =

> (BTW. I'll retain the T10 reflector on this reply, but I don't=20
> subscribe to that mailing list and won't see replies limited to that
> side.)
>=20
> Currently one way the two issues you describe below are distinguished =

> is to issue a READ POSITION and compare the results to what is=20
> expected based upon the command.  The problem with this is that=20
> somebody on the initiator side of things needs to keep track of what=20
> the position is.  Whoever is keeping track of the current location=20
> pretty much needs to have exclusive control of the tape device.  This =

> means not only device reservations, but reservations on the logical=20
> representation of the device within the host (e.g. an "exclusive =
open"=20
> from the OS layer).
>=20
> The weird catch is that even if the two conditions you describe could =

> be distinguished, some cases where FCP_RSP is lost would still =
require=20
> a READ POSITION and initiator knowledge of the expected location (or=20
> some other way of determining/establishing position).  The reason for =

> this is that some tape commands can be partially completed, and the 
> information on how much of the command actually completed is in the=20
> lost FCP_RSP.  (A facility to request the status/sense to be resent=20
> would helpful here, but properly implementing such a facility =
probably=20
> cascades up into the SCSI SAM layer and might get costly to =
implement;=20
> unless, the number of commands to retain can be significantly
> limited.)
>=20
> It's not unusual for a lot of this tape error recovery stuff to =
bubble=20
> up to the higher application levels, simply because it avoids =
tracking=20
> the state information within the driver.  Tracking in the driver is a =

> bit more complicated, since the driver has to handle the general =
case;=20
> whereas, an application only has to track the state information it=20
> cares about.  (Application-specific drivers can help here, but=20
> application-specific drivers frequently break other applications by=20
> changing the driver semantics and refusing to get out the way when=20
> other applications are using the device.)
>=20
> Another common method for recovery on tape is a checkpoint/restart=20
> mechanism.  Every once in a while, the application requests =
unbuffered=20
> filemarks or setmarks, then saves away checkpoint state information.=20
> Whenever a failure occurs, the application backs up and recovers from =

> the last successful checkpoint.  This is almost invariably =
implemented=20
> at application level, since the application state information is=20
> needed for restarting.  (Some OS's provide tools to assist
> checkpointing.)
>=20
> As I said, these probably aren't the answers you want, but it's =
kind'a=20
> where things are today.
>=20
> -roger
>=20
> PS.  I'll apologize now if I messed up the quoting levels below, but =
I=20
> wanted to trim out the stuff I wasn't replying to, and I was actually =

> working off of Dave's reply email.
>=20
> PPS. Due to buffering issues, the case you describe with the lost=20
> FCP_RSP is a potential problem even if the command does involve data=20
> transfers.
>=20
> > Santosh Rao wrote:
> ...
> > Issue 2
> > =3D=3D=3D=3D=3D=3D=3D
> > How should the initiator differentiate b/n a FCP_RJT from a target
> due
> > to a lost FCP_CMD (the scenario described in Annexe C Fig C.2) and
> the
> > case where the target has discarded exchange state due to the
> expiration
> > of RR_TOV after sending FCP_RSP.
> >
> > In both the above cases, our interpretation of FCP-2 is that the=20
> > initiator will see a FCP_RJT response to the REC with : rsn_code =
=3D=20
> > "Logical Error" or "Unable to perform command request" rsn_expln =
=3D=20
> > "Invalid OXID-RXID combination"
> >
> > In this case, the initiator cannot apply the same error recovery =
for
> the
> > 2 cases. In the lost FCP_CMND case, the initiator may safely
> re-issue
> > the command. The latter case could occur in the following manner :
> >
> > - Initiator issues a command which does not involve data xfer.
> > - Target sends FCP_RSP, FCP_RSP is lost.
> > - Initiator REC_TOV timer pops and initiators sends REC.
> > - REC times out after RA_TOVels (which is > RR_TOV, for fabric)
> > - Initiator aborts REC and issues another REC
> > - Target sends FCP_RJT response since it has discarded the exchange =

> > state.
> >
> > In the above case, the initiator MUST NOT re-issue the FCP_CMND,
> since
> > this can potentially cause a data corruption with tape devices. (ex
> :
> > re-issuing a scsi command like SPACE, WRITE FILEMARKS when they had =

> > previously been executed successfully can cause tape data
> corruption.)
> >
> > Can someone clarify on how FCP-2 differentiates these 2 cases?
> Without
> > the ability to differentiate between these 2 cases, the use of SLER
> in a
> > lost FCP_CMND scenario can result in potential data corruption with
> tape
> > devices.

--=20
Education is when you read the fine print.=20
Experience is what you get if you don't.

To Unsubscribe: mailto:t11_3-request at mail.t11.org?subject=3Dunsubscribe



To Unsubscribe:
mailto:t11_3-request at mail.t11.org?subject=3Dunsubscribe





More information about the T10 mailing list