Friday, July 17, 2009


Yet Another Versioning File System....

I've been reading about Versioning and Snapshoting File Systems and here is a mini-dump of what I think so far. Please correct me if I'm mistaken....

I saw a few file systems with slightly more than a cursory glance, and here are my comments:

  1. ext3cow: This is very close to what I have in mind for a versioning and continuous snapshoting file system. However, it does not agree with some of my "requirements of a versioning and continuous snapshoting file system"(below).

  2. Elephant File System: This is probably one of the earlier snapshoting/versioning file systems and has been implemented on BSD. It is modeled on BSD's FFS. It does a great job of doing what it says it does. I like everything about it except that I can't use it on Linux.

  3. NetApp's WAFL: This again is mainly solving a different problem, that of providing very fast and safe storage on a storage appliance. To the best of my knowledge, you need to trigger a snapshot(though the process itself does not take more than a few seconds), and the number of snapshots are restricted to a fixed number(32 or 256). This may be done using hourly cron jobs, etc.... The interface of getting previous versions of files is a little clumsy though.

Requirements of a Versioning and Continuous Snapshoting file system:

  1. It should be able to store a large number of versions of files and directories alike

  2. It should be able to do so very fast and without any user intervention. ie. Snapshoting should not be user-triggered, but should be done automatically on every close()

  3. You should not need to re-compile the kernel to get this functionality

  4. You should be able to apply this to a specific directory root. This means that you should be able to snapshot parts of your directory tree. You should not incur a performance penalty for accessing file out of this region of interest

  5. You should have policies for controlling which file to snapshot and version files based on file attributes such as size, name, MIME type, etc....

  6. The interface for accessing continuous-in-time file versions should be very clean and fairly intuitive, and it should be fairly easy to write tools around it so that GUI/web-based access can be enabled

  7. You should be able to enable this functionality on any file system(and not require to re-format the partition on which you wish to enable it)

  8. You should be able to disable this operation at any point in time and get back to accessing the most recent versions of the versioned files using the earlier access methods without any performance penalty. ie. Users should be able to turn on and off this functionality at will and it should allow users to proof this on real workloads without them having to make any drastic changes in their environments

  9. The operations performed should be safe, and should not result in data corruption

  10. The system should be easy to install and set up

I have in mind something that satisfies the requirements mentioned above....

1 comment:

Anonymous said...

good blog sir.very informative..