Poster:
|
Seaware |
Date:
|
Oct 3, 2012 9:49pm |
Forum:
|
petabox
|
Subject:
|
How long does the data last? |
I am interested to learn if there are estimates on how many years the data would last before it needs to be replicated to a new drive. Also, is ECC used so if there is some data lost in a period, that data is likely to still be recoverable? It would be interesting to know if we would consider if this archive is really an archive from the perspective of 1000 years from now.
Poster:
|
Coderjo |
Date:
|
Oct 8, 2012 1:49pm |
Forum:
|
petabox
|
Subject:
|
Re: How long does the data last? |
The data is stored on two separate hardware nodes as soon as it is uploaded to archive.org. As far as I know, the system does not do extra ECC (beyond what the hard drive does internally). However, in one of the item's xml files, it stores a list of files for the item along with checksums, which can be used to verify the files on each node.
Poster:
|
Seaware |
Date:
|
Oct 9, 2012 12:47am |
Forum:
|
petabox
|
Subject:
|
Re: How long does the data last? |
Thanks. So if the half life of the data on the disk is 100 years (for example) would the drive be powered on and data be checked at least once during that period and the first failing checksum cause a replication to a fresh drive? Also, I hope you are using a CRC, not a pure checksum, which will be more likely to find multi-bit errors.
Poster:
|
Coderjo |
Date:
|
Oct 10, 2012 11:04pm |
Forum:
|
petabox
|
Subject:
|
Re: How long does the data last? |
I don't know low-level details, so I don't know if the data is scrubbed regularly. I also don't know the procedures that occur when a drive fails and needs to be replaced.
Currently, looking at the files.xml file for a random item, the system does sha1, md5, and crc32. It also stores the file size and mtime.