ContentsComment
LoginRegister

Archiving - Part 3

My Requirements

In part 1 and part 2 I discussed some archiving principles, now I want to look at my spectific requirements.

Use

Ease of access

No matter where the data is stored it should be easily retrievable and easily used by the relevant applications

Secure

Data should be kept in a secure way, should only be allowed authorized access, possibly encrypted although encryption should be avoided and used only for super secret files.

Data compression

Data should be compressed when it makes sense, i.e. on data transfer for performance and at final destination for storage cost.

Ability to move to a new file system

In the event that I move from one storage solution to another I should be able to move the data simply, for example all of my music is currently on Amazon storage but I may want to move it to my google drive.

Ability to upgrade formats

Data formats need to be updated to the most current version, for example pdf files should be updated to the latest version possibly keeping the original version.

Search meta data

Storage of key words for files, add a set of key words to the meta data. for example a photograph of the family on holiday in Wales would have the keywords added to the meta data, 'family', 'holiday', 'Wales', 'Photograph'

Reliability

Data must be stored reliably.

Highly reliable replication

Data must be replicated reliably

Longevity

The data must be available indefinitely and it must be possible to pass data onto later generations

Verifiable copies

Data must be verifiable that it is intact, the meta data should contain checksums that can be used to verify the file is intact, Files should be checked on a regular basis.

Development

Open source

The development will be based on open source software and itself be made available as open source.

Any architecture

The solution must be deployable on Intel or Arm, Linux, BSD or OSX. The development world nowadays seems to be full of different build systems which often have complex dependencies and poor support for non Intel platforms and are generally poor quality.

Keep it simple

When looking for an already built solution I found many that had grown from multiple projects that were very complex with different technologies and complex build processes and dependencies. I want to avoid this with this solution.

Infrastructure

Meta data with multiple copies

Meta data to be stored for each file including data used for verification, location of copies, number of copies, search data.

Minimum number of copies

Keep multiple copies in multiple locations, reliability is based on the possibility of loosing one copy of the files but having another copy to replace it.

Multi tier replication broad and high

It must be possible for a file system to both replicate itself to multiple locations as well as those locations replicating to others. For example we could have a single replication to the cloud then with the cloud replicating elsewhere making multiple copies. It would be preferable to have simple rules that define how replication is configured

Pluggable file systems

Use standard file systems local, network and remote file systems, including distributed systems to provide high availability storage.

Pluggable replication

Different replication solutions can be used

Pluggable file version handlers

Ability to read multiple files types, to be able to convert files to a newer version. To determine the file type by examining content.

Multiple copies of the same file in different formats

To ensure against obsolete file formats then keep files in different formats for example keep a word document also in a text version, keep a spreadsheet as a csv file, keep a png file as a jpeg etc.

Other

Low cost

The cost of storage, hardware, networking etc. should be affordable.

Versioning

It is expected that the backing file systems will be read-only however if we want to allow replacing of files or deleting of files then the solution should support version control.

Next Part

Over the next couple of months I will look at the various storage solutions that I could possibly use and in Part 4 I will summarise the various storage solutions and file systems.

Comments and feedback
© Jeremy Smith 2017, 2018