Record Keeper

This document is current. The ideas haven't been implemented nor fully worked out yet. See the mailing list for discussion.

CVS revision: $Id: record_keeper.html,v 1.123 2004/07/17 11:33:25 arnowa Exp $

desiderata

  1. replace the current content trackers
  2. unlimited fields for metadata
  3. redunant copies of metadata
  4. minimal dead links in metadata (files that are listed but cannot be downloaded)

design

the basic idea

The complete metadata about a file is stored in DHT once for every keyword that a user might search for this file by.

more details

Mnet/Mojo Nation has always had the primitive of the "block". The block had a block id which was the sha1 of the contents of the block. Blocks were only allowed to be certain fixed sizes (16kiB up to some power of two). The block is always encrypted so that they computer serving the block couldn't discover what the contents of the block are.

Record Keeper adds a new primitive, called a "record" (this name suggested by Artimage). Like the block it will have a range of fixed sizes, but smaller than the block (512B up to 16kiB). It will also be encrypted. The key difference is that record id cannot be derived from the encrypted record. That is: a record is not self authenticating.

Let's see what happens when we publish the metadata about a file to the Record Keeper system:

  1. We have the Mnet URI of the file (eg. mnet:ynm5macprw399bgxo4dhimeepcx7r79xin4knuzsyamj51t1mkiutdwc45), and we also know some other bits about it, like that that it's a song by Lisa Reins, called "Shake All Over", and offered under the creative common's license "Attribution-NonCommercial 1.0" (here is the song I'm talking about: http://creativecommons.org/works/view/1449). We also know a lot more details like that the file has the bitprint CWRXLWCZZDOAL7PHJNMWHAOQH6HNJETJ.SBWGQLP4VXZ22MSXZL5VI3ZLLACXWZRVMLOWBZQ and the musicbrainz track id of 25d6bd0c-1a37-4e26-bacd-31dc316515c8, and lots and lots of other details that we can find by looking at the file or using a lookup service like bitzi or musicbrainz, find out about a file.
  2. We take all these bits of information, and turn them into Triples. (for a tutorial about triples see Introduction to the Semantic Web and RDF. The idea behind using Triples is that they are becoming a metadata standard used in lots of other applications. One of the problems we are faced with now is how much information we found out should we put in this triple list. The answer for now is "I don't know".
    SubjectPropertyObject
    mnet:ynm5macprw399bgxo4dhimeepcx7r79xin4knuzsyamj51t1mkiutdwc45 http://purl.org/dc/elements/1.1/title Shake All Over
    mnet:ynm5macprw399bgxo4dhimeepcx7r79xin4knuzsyamj51t1mkiutdwc45 http://purl.org/dc/elements/1.1/creator Lisa Reins
    mnet:ynm5macprw399bgxo4dhimeepcx7r79xin4knuzsyamj51t1mkiutdwc45 http://web.resource.org/cc/license http://creativecommons.org/licenses/by-nc/1.0
    and on and on...
  3. We store this triple list in N-triple format (because it's less verbose than XML based RDF) and gzip to save space (all those repeating mnet uris)
  4. We pick certain fields that we want people to search for and create a record for them. Which ones? I don't know. The title, creator, bitprint, and license (if present) I think should be maditory (but then who am I, the metadata Nazi? "No metadata for YOU!"). We take the sha1 of the keyword in the field we want people to be able to search by, say for example "Lisa". If it's a non case importaint we lowercase the word, so now we have "lisa". We take the sha1 of the word, c4ed14e2020dd45edb57b5fba2f40dd93952505e and that is the record id.
  5. But wait, the record is unencrypted. How can we encrypt it so that someone looking for the keyword "lisa" can find it, but to everyone else it's garbage? Take the same keyword, get the binary representation of it, add 1 to that and that is your encryption key. Now encrypt the record data with AES (zooko: what mode would be good for this? I was thinking CCM).
  6. Now we use a DHT (like Kadmelia, or maybe Coral) to put this record into the system.
  7. Repeat #4 - #6 for all keywords that we want people to be able to search for this file by

open issues


icepick

CVS revision: $Id: record_keeper.html,v 1.123 2004/07/17 11:33:25 arnowa Exp $