Archiving - Part 4

Development starts

In part 1 and part 2 I discussed some archiving principles, In part 3 I looked at my spectific requirements. Now I am going to start the development process. I appreciate this has been a bit sketchy perhaps, this has really been part of the thought process for myself on determining what I want to build as an archiving system

Storage solutions

I have spent quite a bit of time looking at various storage solutions, I looked at several of the popular cloud based solutions. I looked at creating my own solutions based on various technolgies including union file systems, I looked at a number of Fuse based file systems both off the shelf solutions and solutions I built myself.

The original aim of using such file systems was to provide a solution that would appear as a normal file system allowing you to copy into and out of the archive file system at will, maintaing permissions and access rights.

While this is doable and I have in the past year built some prototypes that I have running currently, but after much thought I have decided not to go down this route for now, all solutions I have looked at have had issues or they have been complex to implement, not cross platform, expensive, difficult to maintain.

Developing a solution of my own that is totally resilient and feature complete is not something I have the time or resources to do at this stage however I may develop it further as an additional storage solution later.

While looking at possible solutions I also revisited iRODS which is a system I had really enjoyed working on previously, and I always thought that it would make a good basis for an archiving system.

Unfortunately iRODS as fantastic as it is does not meet my specific needs, it does not provide a serverless option (for obvious reasons) and it requires a reasonably complex setup, extending it is not well documented, I have experience writing extensions for iRODS using C and although I achieved good results it was a little painful.

So I have decided to write a rules based archiving system of my own that will meet my requirments using commonly available file storage solutions from Amazon, Google, Microsoft etc. as well as local and network based file storage solutions.

My Requirements revisited

Going over my requirments with the intention to first create a command line client as an initial solution demonstarted as 'pseudo' command line examples

Requirements like data compression, upgrading data formats will be taken care of by the application

archive copy -z test.pdf mystor:/testfiles

archive upgrade mystor:/testfiles/test.pdf

I plan to use network drives on my local LAN as well as Google Drive for storing my data giving me ease of access whenever required. Providing network and cloud security. And the reliabilty that comes with Google storage.

archive makestor mystor e:/datastor

archive makestor -t google mystor e:/datastor
authorise somehow ....
google drive attached ....

subsequent connects

archive connect mystor

Replication will be taken care of by the application which will have its own rule based command language to carry out replication, fixity checking, minumum number of copies, multi tier replication. The application will have the option of running as different roles including client, storage, replicator etc.

archive check mystor:/testfiles/test.pdf
archive set mystor:/testfiles/test.pdf -m"user:jeremy subject:testing"
archive set mystor:/testfiles/test.pdf -min 2
archive replicate mystor:/testfiles otherstor:testfiles

The solution will be open sourced and run on any architechture or operating system. I plan to do my development in Golang as from my experience it fits well into providing the capabilities I plan to provide.

The application will be easy to use having both a command line cli as well as eventually a full browser based ui

There will be an archive database which will contain local configuration, storage definitions and file info and meta data and will save its transactions to multiple locations allowing other instances to process those transaction to keep a replicate of the database. The database will also optional versioning of data.

Longevity i.e. keeping the data for many years will initialy be achieved by regular use of the storage and multiple replicated copies, I will at some point still need to look at something that will ensure my data persists beyond my own demise.

Low cost will be achieved with simple hardware requirments.

Next Part

So I am now quite exited as I feel now I really know what I want to do and have made a start, I will be developing the command line solution over the next few months and plan to release the inital command line version around the middle of 2018 on github. I will of course provide updates on this site.

Comments and feedback
© Jeremy Smith 2017, 2018