Add Immediate bit to XPWRITE and XDWRITE Extended commands
Stephen Holmstead
stephen at hpdmd48.boi.hp.com
Tue Jul 9 13:08:38 PDT 1996
* From the SCSI Reflector, posted by:
* Stephen Holmstead <stephen at hpdmd48.boi.hp.com>
*
In response to Gerry Houlder's post:
>The need for this feature was identified by Chris Burns (Maximum Strategy) in
>reflector messages and follow up phone calls with me. He is concerned about
>poor performance in situations where several (and particularly if all) data
>drives that have the same parity drive need to be updated at the same time. The
>parity drive will be a bottleneck in this situation.
Welcome to RAID-4. By definition, the RAID-4 parity drive is the bottleneck.
That is one of the reasons for RAID-5 (distributed parity).
>As an example, consider a 4 drives plus parity situation. If all 4 drives need
>to be written using XDWRITE command the command sequence would be as follows:
>(1) An XDWRITE command is issued to a data drive.
>(2) An XDREAD command is issued to return the xor data to the initiator.
>(3) An XPWRITE command is issued to write the xor data to the parity drive.
>(4) Steps 1 through 3 are repeated for each of the data drives.
A better example:
(1) An XDWRITE is sent to each of the 4 drives.
(2) An XDREAD is sent to each of the 4 drives.
(3) The initiator sends a SINGLE XPWRITE to the parity drive containing the
data from the 4 XDREADs.
This example does assume a sequential operation in a RAID-4 environment.
These assumptions were extracted from reading Gerry's text on his example
of how to make the performance better.
My bottom line opinion (if anyone cares, and I doubt they do) is that I am
STRONGLY opposed to adding an immediate bit to the XOR commands. I
am similarly opposed to using WCE bit to control XOR command flow. I
feel that by doing so opens that ugly mess about DEFERRED ERRORS and
system data integrity.
If you have a problem with RAID-4 performance, try something else (like
RAID-5 perhaps).
--
____ ____
| / /_ __\ | Disk Stephen Holmstead All comments (c)1996
| | / / /_/ | | Memory stephen at mail.boi.hp.com My opinions should
|___\ / /___| Division Fax: 208/396-6858 be held by everyone
----------
From: Gerry Houlder[SMTP:Gerry_Houlder at notes.seagate.com]
Sent: Tuesday, July 09, 1996 12:12 PM
To: scsi
Subject: Add Immediate bit to XPWRITE and XDWRITE Extended commands
* From the SCSI Reflector, posted by:
* Gerry Houlder <Gerry_Houlder at notes.seagate.com>
*
This proposal (document 96-194) will be introduced at the Colorado Springs SCSI
Working Group meeting (July 16-17). Comments are welcome at the meeting or on
this reflector.
Date: July 9, 1996
To: X3T10 Committee
From: Gerry Houlder, Seagate Technology
Subj: Add Immediate bit to XPWRITE and XDWRITE Extended commands
The need for this feature was identified by Chris Burns (Maximum Strategy) in
reflector messages and follow up phone calls with me. He is concerned about
poor performance in situations where several (and particularly if all) data
drives that have the same parity drive need to be updated at the same time. The
parity drive will be a bottleneck in this situation.
As an example, consider a 4 drives plus parity situation. If all 4 drives need
to be written using XDWRITE command the command sequence would be as follows:
(1) An XDWRITE command is issued to a data drive.
(2) An XDREAD command is issued to return the xor data to the initiator.
(3) An XPWRITE command is issued to write the xor data to the parity drive.
(4) Steps 1 through 3 are repeated for each of the data drives.
This sequence results in 2 commands (one XDWRITE and one XDREAD) being issued
to each data drive and 4 XPWRITE commands to the parity drive. If each XPWRITE
command has to write the xor result to the drive before doing the next XPWRITE
command, at least one extra disk revolution will be lost on each XPWRITE
command. Performance would be better if the first 3 XPWRITE commands left the
resulting parity data in cache and returned GOOD status without writing the
data to disk. The last XPWRITE would write the data to disk when it is
completed.
Chris suggests using a bit in the command block to indicate that the data
shouldnt be written to disk yet. If the bit is one, the XPWRITE command returns
GOOD status as soon as the xor operation is complete and doesnt attempt to
write the data to disk. If the bit is zero, then GOOD status cannot be returned
until the data has been written to disk. This conforms to the existing
requirements on this command.
This feature must also work with the XDWRITE Extended command. A system that
uses XDWRITE Extended would use the following command sequence:
(1) An XDWRITE extended command is issued to a data drive. The data drive sends
XPWRITE command to parity drive. When XPWRITE returns status, data drive
returns status to the controller.
(2) Controller repeats step 1 for each of the data drives.
In order to make use of the XPWRITE immediate feature, a bit must also be added
to the XDWRITE Extended command. That way the bit can be carried over to the
XPWRITE command that is issued by the data drive. This would be used in the
same way: the first 3 XDWRITE Extended commands would set the immediate bit and
the last command would have the bit cleared so it would cause the data to be
written to disk.
There is an alternative that must also be discussed. We could generalize the
use of the WCE bit in Mode Page 8 so that it applies to XPWRITE commands as
well as regular write commands. This would be a reasonable extension because
the xor data that is left in cache after completion of the xor operation is the
same as the data written to disk. Therefore it is safe to let it be used to
satisfy a read request for that LBA (as a cache hit) and otherwise has the same
retention and safety requirements as regular write data.
An advantage for the WCE bit alternative is that the controller wouldnt have to
be concerned with which of the xor commands executes first or last. The
disadvantage is that there is no assurance that the parity drive wont try to
write the data to disk before the last command is received. There is also no
assurance the data will be written to disk as soon as the last command has
completed. RAID controllers like to know (and exert control over) exactly when
data is written to disk. That is why Chris Burns prefers adding a bit to the
xor commands -- it provides explicit control over the write operation.
The WCE bit option should only be persued if the RAID controller companies feel
comfortable with it. Even if we decide that the XPWRITE command should make use
of the WCE bit, some changes to the standard will probably be needed. The model
for xor commands will need to be updated to describe how write caching can be
used to help certain xor operations and cannot be used for others.
More information about the T10
mailing list