Write those again please, encoded how

Marty W Czekalski marty.w.czekalski at seagate.com
Tue Aug 24 18:05:57 PDT 2010


* From the T10 Reflector (t10 at t10.org), posted by:
* Marty W Czekalski <marty.w.czekalski at seagate.com>
*
The relationship between LBAs and the physical die that has failed is not
known to the host.  The mapping is dynamic and continuously changing.  Every
time you rewrite a sector, chances are it is going to a different die. There
is no why for a host to figure out which LBAs are affected.  If the device
cannot recover the data due to a lost die on it's own, the device should be
considered dead since there is no guarantee as to the behavior of the
device.   Metadata, tables etc are also scattered throughout the die and
unless the device has been specifically architected to deal with whole die
failures on it's own, nothing can be trusted.
On Tue, Aug 24, 2010 at 6:06 PM, Gerry Houlder <gerry.houlder at seagate.com>w=
rote:
> Yes, you are correct that there is no existing answer for a drive to tell
> an initiator that there are more LBAs than just the ones in the command t=
hat
> failed that also are bad. Once that many LBAs go bad, the host has to swi=
tch
> to a backup copy in order to restore the data anyway or else maybe the da=
ta
> is gone forever (no backup copy). I suspect that most hosts would treat t=
he
> lost of several Gigabytes of data as a catastrophic error, worthy of neve=
r
> trusting that drive to store data ever again.
>
>
> On Tue, Aug 24, 2010 at 12:42 PM, Pat LaVarre <p.lavarre at ieee.org> wrote:
>
>> * From the T10 Reflector (t10 at t10.org), posted by:
>> * Pat LaVarre <p.lavarre at ieee.org>
>> *
>> Given this paradox:
>>
>> a) Often SSD's include spare capacity to raise the apparent 'endurance'
>> of how many times the host can rewrite the sector at an LBA.
>>
>> b) Such SSD's can suddenly unexpectedly lose all the gigabytes of a
>> whole chip die & still keep on running if the host rewrites all those
>> lost LBA=92s by remapping those LBA's to other dies.
>>
>> c) T10 SBC requires such SSD's to refuse to READ any lost sector,
>> instead returning a KCQ such as:
>>
>> 311C3h MEDIUM ERROR/ UNRECOVERED READ ERROR - VENDOR-SPECIFIC 311C3H
>> 3110Ch MEDIUM ERROR/ UNRECOVERED READ ERROR - RECOMMEND REWRITE THE DATA
>> 31103h MEDIUM ERROR/ MULTIPLE READ ERRORS
>> 31100h MEDIUM ERROR/ UNRECOVERED READ ERROR
>>
>> Then I ask:
>>
>> Q: What signal(s) should the drive send to suggest the host rewrite all
>> the LBA's of the lost die?
>>
>> Merely refusing to read the lost sectors, or even all sectors, doesn't
>> just work. Many hosts don't then efficiently find this middle ground of
>> rewriting just the lost die, instead hosts waste much time by trying
>> each lost sector individually or by actually returning the drive for
>> service.
>>
>> Is there no existing answer? Inventing an entirely new answer of course
>> wastes time in waiting for the hosts to catch up and implement it.
>>
>> Thanks in advance,
>>
>>
>> *
>> * For T10 Reflector information, send a message with
>> * 'info t10' (no quotes) in the message body to majordomo at t10.org
>>
>
>
*
* For T10 Reflector information, send a message with
* 'info t10' (no quotes) in the message body to majordomo at t10.org



More information about the T10 mailing list