In part 1 I said that I would look at storage solutions and here is a summary of what I have found.
Choosing the right storage solution for archiving depends on a number of factors including but not limited too:
By This I mean considerations about the value of the data to the individual, the organisation and indeed to the world, for example the digital copies of the original Hollywood silent films have more significance to the world than my family holiday snaps. My companies tax records which have to be kept for several years have more significance because of legality than my customer sales records for the past 10 years.
This is all very subjective but hopefully you get the point that some data if lost would have more significance that other data.
It is important to archive all data safely and securely but some data is judged as more important that others.
Also the more significant the data the higher the reliability you will want, this may involve more expensive hardware, multiple distributed copies, plus ongoing monitoring of the data in terms of accuracy and of a viable format.
The way I look at data significance is by using the following categories:
|1||National treasures such as rare films, rare recordings, photographs|
|2||Research data, Legal archives, Compliance|
|3||Personal data, non legal records|
For some archive data such as research data you may want regular fast access to the data, in such a case then archives stored on disk are faster to access than archives stored on tape. In other cases where preservation is the priority then some data you may want to deep archive and in such cases quick access is not a requirement.
In most cases all archive data needs to be secure and in general security is provided by the infrastructure in which the archives are stored, user access controls, firewalls etc.
Encryption One thing that is quite common is that people will want to encrypt the archive data, I personally believe that encryption often does not make sense for all but the most critical security needs and is best provided by building a secure infrastructure. Remember that encryption requires keys stored independently and as such you have to have the insurance that the keys will be available many years later.
The options for archiving solutions are often limited by budget, setting up a 100 tape library will cost you a very large amount of money, whereas a subscription to a cloud storage service will cost a relatively small amount in comparison.
The more data you want to store, the more expensive will be, meaning that if your budget is limited you may need to take a low cost option.
|Tape||Long Life, media cost is low, multiple copies, long term support from industry||Expensive hardware|
|Optical Discs||Long Life, multiple copies, long term support from industry||Expensive hardware|
|Disk||Low cost, fast access||Comparably short life, multiple copies requires supporting infrastructure|
|Cloud||Low cost||Third party, multiple copies difficult to manage|
Based on the above categories and the pros and cons for the storage options then I believe the best storage mediums by category are:
|1||Tape, Optical Disks|
|2||Tape, Optical Disks, Disk, Cloud|
Tapes have been used for many years for digital backups and archiving and are still commonly used today.
There are a number of tape formats, the current popular formats are LTO and DLT tape drives which offer excellent performance, are highly reliable and have a long life. The technologies are continually developing adding higher capacity in order to keep up with the larger amounts of data to be stored. And there is active development and capability to move from older versions to new versions usually with higher capacity.
Nowadays though the volume of data that we want to store has increased so much that implementing a tape solution to provide enough storage is generally quite expensive as it often requires tape library hardware with tape management software.
Additionally if you have a large amount of data on tape then restoring and verifying of archives can be quite complex
Digital Linear Tape (DLT)
Linear Tape Open (LTO)
An alternative to tape storage, highly reliable with reasonably high capacity that provide fast access to archive data. They offer a life often quoted at over 50 years. Built into a library of drives then they can scale to accommodate large data archives.
More: Optical Disc
Example of Optical Disk Drives
As an alternative to tapes we can archive to low cost spinning disks, they do have the advantage of increased read/write performance, relatively cheap to build clusters of and therefore expand. They are generally less reliable which is a concern with any archiving and steps need to be in place to ensure the data is regularly verified with higher monitoring and maintenance and some level of fault tolerance.
Whereas tapes themselves are relatively easy to distribute to multiple safe locations, disk drives are not generally portable enough to distribute unless you have externally mounted drives which are generally more expensive so to maintain multiple distributed copies then the infrastructure required is harder to implement.
Storing your archives in the cloud is becoming more popular as relatively cheap storage is available due to economies of scale.
These solutions are generally highly reliable and by nature distributed and highly fault tolerant. This means that you do not have to worry about the maintenance and monitoring of this storage providing you trust the cloud provider.
Cloud storage is a good solution for all but the most important data.
Most of the above hardware has specific software for making archives. If you are using spinning disks there are a number of disk storage solutions and file systems.
There are also a number of specific archiving solutions on the market which operate either as a service or a standalone solution.
In a later article I may do a review of the current software solutions
There are a large number of archiving formats, see Archive formats
This list includes archive only, compression only and archive and compress formats, it generally makes sense to use an archive and compression format combined, the two most popular are .tar.gz and .zip files with .zip probably the most popular and certainly the most easy to use across all operating systems.
The files you store will be of the particular file formats used when creating the data. You must consider that these formats will soon become obsolete meaning the likelihood of being able to read the data in 25 years is unlikely, therefore you need a strategy to account for this, for example if you have a set of Microsoft word documents, you may consider keeping multiple copies in different formats, i.e. docx, pdf and plain text and as part of your strategy you may review format versions on a regular basis with a strategy to upgrade the file formats when required.
In Part 3 I will define the requirements for my own archiving solution, this will be the basis of my development