Thanks for the comments. With
my tape hat on, and looking at a theoretical tape device that has a queue
depth of 10, the worst case queue timoeut is 10 * Timeout for Erase command
(e.g. 10 * 2 hours). This type of number would not be useful - I
think.
You could take a set of timeout values
for this theoretical tape device:
Erase = 2 hrs
Write = 5 min/15 min
Write Filemark = 5 min/10 min
Read = 5 min/14 min
Load = 2 min / 14 min
Unload = 2 min / 15 min
Others = 1 min
If you were to try to take the max (minimum
value) for everything except erase, you would chose 5 min as your mulitplier.
But if an erase were in the queue, you would fail. I do not
think this type of approach will work well.
Additionally, Tape devices I am aware
of, only use Simple Task Attributes (and Untagged). Other devices
use Head of Queue (which shouldn't be a problem because they jump to the
head of the queue and bypass all the queue. They just have to worry
about the currently processing command.
I think we can limit the queue delay
issue to Simple, and Ordered task attributes.
Thanks,
Kevin D. Butt
SCSI & Fibre Channel Architect, Tape Firmware
MS 6TYA, 9000 S. Rita Rd., Tucson, AZ 85744
Tel: 520-799-2869 / 520-799-5280
Fax: 520-799-2723 (T/L:321)
Email address: kdbutt@us.ibm.com
http://www-03.ibm.com/servers/storage/
"Knight, Frederick"
<Frederick.Knight@netapp.com> Sent by: owner-t10@t10.org
In my former life as a host driver
writer, this is exactly the kind of feature we wanted.
We couldn't just send commands
and wait forever. We had to invent maximum times
we were willing to wait, and pray
that they were right (or through testing, determine
the worst case value - which was
then pretty bad for "well behaved" devices).
So, in that regard, I think this
is something the host people will welcome. However,
I agree that queues are an issue.
With intelligent HBAs, the host puts the request
(command) in a queue, and the
HBA sends it whenever IT decides to. That queueing
is not something you can deal
with in the device; it must be delt with by the hosts
outside of this proposal.
The queues within the device however,
are important, I think. The host want's to know
how long to wait. If it
takes longer than that, then the host will typically do recovery,
which may be an abort, and a retry
(for disks), or maybe even failing the operation
outright, or tape repositioning,
data validation, and retry.
So, if device queues are important,
does the task attribute become important? If
the requests is HEAD OF QUEUE,
does it have a shorter timeout than a request
that is SIMPLE QUEUE? If
it's SIMPLE QUEUE, what impact does the current
queue depth have on the timeout
value? Since this command is likely to be issued
once by the host, the value should
really account for the worst case queue depth.
Should the host be forced to ask
multiple times for different task attributes? I don't
think so. Should we add
more timeouts to the command timeout descriptor? From
the host side, I'd be interested
in knowing the maximum (when the queue is full).
From the device side, I'd like
to tell the host about timing events such as ALUA state
transitions (what my maximum time
will be in the TPGS "transitioning" state). But
does that get too long a list
of timeouts? Which ones will the host really use?
+---------------------------------------------------------------------+
|
timeouts length
| (0-1)
+---------------------------------------------------------------------+
|
reserved
| (2)
+---------------------------------------------------------------------+
|
restricted
| (3)
+---------------------------------------------------------------------+
|
minimum command timeout
| (4-7)
+---------------------------------------------------------------------+
|
error recovery timeout
| (8-11)
+---------------------------------------------------------------------+
|
timeout when queue is full
| (12-15)
+---------------------------------------------------------------------+
| maximum
time in TPGS transitioning state | (16-19)
+---------------------------------------------------------------------+
Tapes could use the queue full
timeout location to specify the worst case when
they have to flush their buffers.
I hope we'll be getting some comments
from some host side folks, and that they
will really use this. I
think it's a really good idea!
Fred
Knight
From: Kevin D Butt [mailto:kdbutt@us.ibm.com]
Sent: Sunday, August 06, 2006 2:24 PM
To: Pat LaVarre
Cc: t10@t10.org
Subject: RE: SPC-4: Self Describing Command Timeouts (05-284r2)
Pat,
Thanks for the response. It sounds like, for your interests, the
main delays are at the host side and not in the device queue(s). In
SCSI terms, the host queue is still considered part of the application
client (as I understand it) and therefore not something that I can address.
So I agree that from the host perspective the time from Command Out
to Status In is what it has to work with. On the other hand, target
devices only have the time from receipt of the command to the time the
status is sent. The difference in these times is whatever bus delays
there are. However, since the Command Timeout values are in units
of seconds, I believe that the bus/fabric delay time is negligible.
With this in mind, I am trying to concentrate my efforts on providing a
Command Timeout value from receipt of the command to the sending of the
status. My proposal currently does not take into consideration any
time that a command might sit in the target device queue prior to entering
the enabled task state.
My proposal covers the issues from when the command enters the enabled
task state to when it enters the task ended state. The sense that
I had when the previous version was discussed in CAP is that I will have
a difficult time getting this passed without somehow addressing the time
spent in the queue waiting to enter the enabled task state (i.e. when the
command is in the dormant task state).
The only ways I have thought of that might work are to use a Task Management
function like query task and attach a timout to the return status (if that
is even possible in the SCSI architecture). However, this does not
meet the goal of the proposal. The goal of the proposal is to have
a method that allows an application to call an API from the device driver
and provide a timeout value for the completion of that command. This
requires an a priori knowledge of how long that command will take.
Thanks,
Kevin D. Butt
SCSI & Fibre Channel Architect, Tape Firmware
MS 6TYA, 9000 S. Rita Rd., Tucson, AZ 85744
Tel: 520-799-2869 / 520-799-5280
Fax: 520-799-2723 (T/L:321)
Email address: kdbutt@us.ibm.com
http://www-03.ibm.com/servers/storage/
Clear explanation of how Reservations help Tape devices, thank you.
> The delay injected by ... the command(s) in the queue
> prior to this one) do not often have an effect on the host doing
data I/O.
Yes. For Disk and all the more for Dvd/cd devices, in my low-end
commodity peripheral world, the cache & queues are mostly in the
host, not in the device. The write cache in the host can be huge,
even as large as the device, and not aggressively flushed. If an
early write request stumbles across a difficult to write area, then
that time delays all the remaining requests. The only measurabe time
that reliably fits within limits is the time from Command Out to
Status In measured at the bus, not as measured at a level above the
queue.
-----Original Message-----
From: owner-t10@t10.org on behalf of Kevin D Butt
Sent: Sat 8/5/2006 10:05 PM
To: t10@t10.org
Subject: SPC-4: Self Describing Command Timeouts (05-284r2)
A new version of my "self-describing" command time-outs has been
posted.
This is a major revision from the one posted last November. I have
a
few
issues to solve that I would appreciate help with. The main one
being how
to sufficiently address or skirt the delay injected by the time in the
queue.
My thoughts and experience are in the tape realm, and I don't have a
good
feel for disk or enclosure or MMC. In the tape realm, reservations
are
often used to ensure that only one host is doing data I/O (or time
intensive activities) at a time. While multiple host may be talking
to
the drive, most are just polling to see if it is there or if it is
available (i.e. doesn't have an active reservation). In this scenario,
the command time-outs as I have described them will solve a high
percentage of the issues related to unknown command time-outs. The
delay
injected by the queue (or the command(s) in the queue prior to this one)
do not often have an effect on the host doing data I/O.
Anyway, I need to understand better the issues seen by the other device
types. I would also appreciate any suggestions.
2006/08/05 22:47:18
Your request to upload a file or files to the T10 site has been
accepted.