One Solution

I was very impressed by the paper "The Google File System", by Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung. This paper describes a solution to a problem, that is bigger, then the problem above (it describes a file system with a sensible subset of the posix functionality).

I found the paper here.

drbs introduces 3 components:

the blobclient. It is the client Library to access the blobs.
a number of blobserver. The atually store the blobs in a file system. Blobs are stored and downloaded. Each blob is stored on a number (e.g.: 3) blobserver, so the failure of a blobserver can be compensated, the remaining blobserver could replicate the blob to the degree of redundance that you want. A sensible setup needs at least 10 blobservers, but they could all run on the same host. For more redundancy I would spread them to more hardware - but for a test a single machine works well. The google people speak of hundreds of these server processes and machines.
a single blobmaster. It coordinates where the blobs are stored and tells the blobclient for a blob lookup, where they can get each blob. The blobmaster never sees the actual blob - only the meta information.

The blobs are validated with a (md5) checksum. This makes sure that failing disk and/or mistakes by humans are detected. The blobmaster keeps all his data in ram (it is not very large, since it's only the meta data on the blobs).

The Blobserver keeps all the meta data in ram an has the blobs as files in the ordinary file system. The blobserver logs all changes in a logfiles, so this server could be restarted fast: the blobserver reads a logfile on startup and replays the actions, reaching the old state again. Since the logfile is just mmap'ed it could be read and interpreted fast.

Of course it would be possible to implement such a solution on top of a ordinary database but I follow the "The Google File System" paper, that claims all this could be done with much lower overhead.

This solution here is cheaper: do the math yourself and calculate what a fileserver and this el-cheapo solution would cost you. This souftware assumes that hardware will fail, so cheaper hardware that will fail could be choosen.

While this blob server works on a single machine, it is intended to scale up to store larger sets of blobs on many machines. The google paper talks of hundreds of machines.