Write those again please, encoded how

Pat LaVarre p.lavarre at IEEE.org
Wed Aug 25 16:33:24 PDT 2010


* From the T10 Reflector (t10 at t10.org), posted by:
* Pat LaVarre <p.lavarre at ieee.org>
*
> > > there is no existing answer for a drive to tell an initiator
> > > that there are more LBAs than just the ones in the command that
> > > failed that also are bad. Once that many LBAs go bad, the host has
> > > to switch to a backup copy in order to restore the data anyway or
> > > else maybe the data is gone forever (no backup copy).
Then our last few years' progress in SCSI Background Scan and SMART and
Self-Test have actually not already given us a mechanism to keep Write
service available when one die of thousands fails and so millions of
Read requests fail.
> unless device architected to deal with whole die failures ...
> nothing can be trusted
Aye.
Gigabytes used to be a high percentage of capacity.
I mean now to talk of a device under test that initially survives the
simple test of opening its 'warranty void if broken' seal & cutting the
wire that connects one die of flash.
True some devices crash quickly when I cut that single point of failure.
All the same, intriguingly enough, some devices don't crash.
Those devices report precisely & accurately that I cut away one die, by
refusing my every request to read any LBA whose data or metadata existed
only on that lost die, while still honouring my request to read other
LBAs and honouring my request to write any LBA.
I then see only the host pointlessly confused by my simple test.
Like I see a RAID host flail away discovering the lost sectors
one-by-one, rather than guessing that rewriting the RAID stripe or even
rewriting the whole drive could succeed a whole lot faster. And then
like as not next I see the human operators on scene correctly conclude
the churn and catastrophic automatic warning is practically meaningless,
shut the warning off, force a quick rewrite, and restore service.
I mean then to ask here how an algorithm in the host could make such a
reasonable choice on behalf of the human operator, such that the RAID
cries out that the sky has fallen only after the last die fails, instead
of immediately when the first die fails.
I agree we all see that more lost bits follow the first lost bit, and we
should guess that more lost sectors will follow the first lost sector,
and we should guess that more lost dies will follow the first lost die.
Thus I conclude:
Our last few years' progress in SCSI Background Scan and SMART and
Self-Test have actually not already given us a mechanism to keep Write
service available when one die of thousands fails and so millions of
Read requests fail.
Said, confirmed, explained, said again, ...
I believe we have a complete consensus so far as this e-mail reflector
medium can measure.
Thanks again all around for this education here & offline,
----
* From the T10 Reflector, posted by Marty W Czekalski
* Date: 08/24/2010 06:05:57 PM
The relationship between LBAs and the physical die that has failed is
not known to the host.	The mapping is dynamic and continuously
changing.  Every time you rewrite a sector, chances are it is going to a
different die. There is no why for a host to figure out which LBAs are
affected.
If the device cannot recover the data due to a lost die on it's own, the
device should be considered dead since there is no guarantee as to the
behavior of the device. Metadata, tables etc are also scattered
throughout the die and unless the device has been specifically
architected to deal with whole die failures on it's own, nothing can be
trusted.
----
* From the T10 Reflector, posted by Gerry Houlder
* Date: 08/24/2010 03:06:48 PM
Yes, you are correct that there is no existing answer for a drive to
tell an initiator that there are more LBAs than just the ones in the
command that failed that also are bad. Once that many LBAs go bad, the
host has to switch to a backup copy in order to restore the data anyway
or else maybe the data is gone forever (no backup copy). I suspect that
most hosts would treat the lost of several Gigabytes of data as a
catastrophic error, worthy of never trusting that drive to store data
ever again.
----
* From the T10 Reflector, posted by Pat LaVarre
* Date: 08/24/2010 01:57:05 PM
Given this paradox:
a) Often SSD's include spare capacity to raise the apparent 'endurance'
of how many times the host can rewrite the sector at an LBA.
b) Such SSD's can suddenly unexpectedly lose all the gigabytes of a
whole chip die & still keep on running if the host rewrites all those
lost LBA’s by remapping those LBA's to other dies.
c) T10 SBC requires such SSD's to refuse to READ any lost sector,
instead returning a KCQ such as:
311C3h MEDIUM ERROR/ UNRECOVERED READ ERROR - VENDOR-SPECIFIC 311C3H
3110Ch MEDIUM ERROR/ UNRECOVERED READ ERROR - RECOMMEND REWRITE THE DATA
31103h MEDIUM ERROR/ MULTIPLE READ ERRORS
31100h MEDIUM ERROR/ UNRECOVERED READ ERROR
Then I ask: 
Q: What signal(s) should the drive send to suggest the host rewrite all
the LBA's of the lost die?
Merely refusing to read the lost sectors, or even all sectors, doesn't
just work. Many hosts don't then efficiently find this middle ground of
rewriting just the lost die, instead hosts waste much time by trying
each lost sector individually or by actually returning the drive for
service.
Is there no existing answer? Inventing an entirely new answer of course
wastes time in waiting for the hosts to catch up and implement it.
Thanks in advance,
*
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at t10.org



More information about the T10 mailing list