News | Overview | One Solution | License | How to get it working | Links

One Solution

I was very impressed by the paper "The Google File System", by Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung. This paper describes a solution to a problem, that is bigger, then the problem above (it describes a file system with a sensible subset of the posix functionality).

I found the paper here.

drbs introduces 3 components:

The blobs are validated with a (md5) checksum. This makes sure that failing disk and/or mistakes by humans are detected. The blobmaster keeps all his data in ram (it is not very large, since it's only the meta data on the blobs).

The Blobserver keeps all the meta data in ram an has the blobs as files in the ordinary file system. The blobserver logs all changes in a logfiles, so this server could be restarted fast: the blobserver reads a logfile on startup and replays the actions, reaching the old state again. Since the logfile is just mmap'ed it could be read and interpreted fast.

Of course it would be possible to implement such a solution on top of a ordinary database but I follow the "The Google File System" paper, that claims all this could be done with much lower overhead.

This solution here is cheaper: do the math yourself and calculate what a fileserver and this el-cheapo solution would cost you. This souftware assumes that hardware will fail, so cheaper hardware that will fail could be choosen.

While this blob server works on a single machine, it is intended to scale up to store larger sets of blobs on many machines. The google paper talks of hundreds of machines.


SourceForge.net Logo Copyright © 2004 by Jörg Beyer. All rights reserved.